Use read1 instead of read to get magic number#7698
Use read1 instead of read to get magic number#7698groutr wants to merge 1 commit intopydata:mainfrom
Conversation
|
I think some backends rely on this magic number to determine the exact file format. |
Agree, this seems a bit unsafe? https://stackoverflow.com/questions/57726771/what-the-difference-between-read-and-read1-in-python |
|
Agreed, and a reference to a pretty authoritative source: https://github.com/python/cpython/blob/3.11/Modules/_io/bufferedio.c#L915 It's confusing the method has a parameter called One workaround is to use def get_magic_number(filename_or_obj, count=8):
if isinstance(filename_or_obj, (str, os.PathLike)):
fd = os.open(filename_or_obj, os.RDONLY) # Append os.O_BINARY on windows
magic_number = os.read(fd, count)
if len(magic_number) != count:
raise TypeError("Error reading magic number")
os.close(fd)
elif isinstance(filename_or_obj, io.BufferedIOBase):
if filename_or_obj.seekable():
pos = filename_or_obj.tell()
filename_or_obj.seek(0)
magic_number = filename_or_obj.read(count)
filename_or_obj.seek(pos)
else:
raise TypeError("File not seekable.")
else:
raise TypeError("Cannot read magic number.")
return magic_numberOn my laptop (w/ SSD) using |
|
I think this logic is done one level above in the call stack. But yes, maybe a different name for the argument would be better. |
Not sure about the details here. I think it would be good to discuss in an issue before proceeding |
Addresses #7697.
I changed the isinstance check because neither
readnorread1are provided by IOBase. Only RawIOBase and BufferedIOBase providereadandread1respectively.I think that there is little benefit to using
.tell(). I suggest the following: