You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have had this close() call for years in __del__() of AbstractBufferedFile.
I am unsure whether this is a real concern since we have a long history of this practice. But calling close() in __del__(), which needs to access file systems or other resources, can cause some problems in real-world situations.
The main concern is timing when __del__() is called. This is frequently not under our control (some bad frameworks do not close file objects but continue maintaining references to them). It can be at the very last moment in the Python interpreter's shutdown sequence. This can be a problem, particularly in async filesystems using event loops.
Let me demonstrate a toy example.
importasyncioimportfsspec.asynfromfsspec.asynimportsync_wrapperfromfsspec.implementations.memoryimportMemoryFileSystemfromfsspec.specimportAbstractBufferedFileclassDummyAsyncFileSystem(MemoryFileSystem):
def__init__(self, *args, **storage_options):
super().__init__(*args, **storage_options)
# fsspec.asyn.AsyncFileSystem does the sameself.loop=fsspec.asyn.get_loop()
classDummyFile(AbstractBufferedFile):
def__init__(self, fs, **kwargs):
super().__init__(fs, **kwargs)
# Many file class implementations reuse the same event loop used by the fsself.loop=fs.loopdef_fetch_range(self, start, end):
return []
asyncdef_flush(self, force=False):
awaitasyncio.sleep(1) # Dummy async operation# Typical idiomflush=sync_wrapper(_flush)
cache=set() # Cause of the problem, but commonly exists in real worlddefrun():
print("starting")
cache.add(DummyFile(DummyAsyncFileSystem(), path="dummy.txt", mode="wb"))
print("exiting")
if__name__=="__main__":
run()
This code just (1) creates a new dummy file instance and (2) puts it into cache. However, the execution of this code will get stuck when exiting.
To investigate why, add the following print() to __del__() of the file class.
As you can see, the event loop is marked as "running," but the daemon thread hosting it has stopped. This means the file object was garbage-collected after the interpreter terminated the thread. The sync() will never return as the thread running the event loop has stopped.
The root cause in this example is cache.add(), which creates a reference from the global cache object to the file object. We should not do this, but we can accidentally have this kind of reference chain from global objects to file objects in real-world situations. It will lead to unexpected deadlocks that are difficult to investigate.
I have two proposal:
Remove close() call from AbstractBufferedFile.__del__(). Instead, reimplement it in __del__() in each concrete file class if it is guaranteed that the class can execute close() at any moment in the Python interpreter's lifecycle.
For async filesystems, it would be better to use atexit to explicitly close the event loop before entering the shutdown sequence.
Exception ignored in: <function DummyFile.__del__ at 0x7f1dffcc2fc0>
Traceback (most recent call last):
File "/home/.../test.py", line 35, in __del__
File "/home/.../lib/python3.11/site-packages/fsspec/spec.py", line 2057, in __del__
File "/home/.../lib/python3.11/site-packages/fsspec/spec.py", line 2035, in close
File "/home/.../lib/python3.11/site-packages/fsspec/asyn.py", line 122, in wrapper
File "/home/.../lib/python3.11/site-packages/fsspec/asyn.py", line 77, in sync
RuntimeError: Loop is not running
We immediately got an exception this time. But I believe this is much better than the deadlock.
The text was updated successfully, but these errors were encountered:
https://github.com/fsspec/filesystem_spec/blob/2024.9.0/fsspec/spec.py#L2055-L2057
We have had this
close()
call for years in__del__()
ofAbstractBufferedFile
.I am unsure whether this is a real concern since we have a long history of this practice. But calling
close()
in__del__()
, which needs to access file systems or other resources, can cause some problems in real-world situations.The main concern is timing when
__del__()
is called. This is frequently not under our control (some bad frameworks do not close file objects but continue maintaining references to them). It can be at the very last moment in the Python interpreter's shutdown sequence. This can be a problem, particularly in async filesystems using event loops.Let me demonstrate a toy example.
This code just (1) creates a new dummy file instance and (2) puts it into
cache
. However, the execution of this code will get stuck when exiting.To investigate why, add the following
print()
to__del__()
of the file class.As you can see, the event loop is marked as "running," but the daemon thread hosting it has stopped. This means the file object was garbage-collected after the interpreter terminated the thread. The
sync()
will never return as the thread running the event loop has stopped.The root cause in this example is
cache.add()
, which creates a reference from the globalcache
object to the file object. We should not do this, but we can accidentally have this kind of reference chain from global objects to file objects in real-world situations. It will lead to unexpected deadlocks that are difficult to investigate.I have two proposal:
close()
call fromAbstractBufferedFile.__del__()
. Instead, reimplement it in__del__()
in each concrete file class if it is guaranteed that the class can executeclose()
at any moment in the Python interpreter's lifecycle.atexit
to explicitly close the event loop before entering the shutdown sequence.We immediately got an exception this time. But I believe this is much better than the deadlock.
The text was updated successfully, but these errors were encountered: