Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is calling close() in AbstractBufferedFile.__del__() a good idea? #1685

Open
sugibuchi opened this issue Sep 21, 2024 · 0 comments
Open

Is calling close() in AbstractBufferedFile.__del__() a good idea? #1685

sugibuchi opened this issue Sep 21, 2024 · 0 comments

Comments

@sugibuchi
Copy link

sugibuchi commented Sep 21, 2024

    def __del__(self):
        if not self.closed:
            self.close()

https://github.com/fsspec/filesystem_spec/blob/2024.9.0/fsspec/spec.py#L2055-L2057

We have had this close() call for years in __del__() of AbstractBufferedFile.

I am unsure whether this is a real concern since we have a long history of this practice. But calling close() in __del__(), which needs to access file systems or other resources, can cause some problems in real-world situations.

The main concern is timing when __del__() is called. This is frequently not under our control (some bad frameworks do not close file objects but continue maintaining references to them). It can be at the very last moment in the Python interpreter's shutdown sequence. This can be a problem, particularly in async filesystems using event loops.

Let me demonstrate a toy example.

import asyncio

import fsspec.asyn
from fsspec.asyn import sync_wrapper
from fsspec.implementations.memory import MemoryFileSystem
from fsspec.spec import AbstractBufferedFile


class DummyAsyncFileSystem(MemoryFileSystem):
    def __init__(self, *args, **storage_options):
        super().__init__(*args, **storage_options)
        # fsspec.asyn.AsyncFileSystem does the same
        self.loop = fsspec.asyn.get_loop()


class DummyFile(AbstractBufferedFile):
    def __init__(self, fs, **kwargs):
        super().__init__(fs, **kwargs)
        # Many file class implementations reuse the same event loop used by the fs
        self.loop = fs.loop

    def _fetch_range(self, start, end):
        return []

    async def _flush(self, force=False):
        await asyncio.sleep(1)  # Dummy async operation

    # Typical idiom
    flush = sync_wrapper(_flush)

cache = set() # Cause of the problem, but commonly exists in real world

def run():
    print("starting")
    cache.add(DummyFile(DummyAsyncFileSystem(), path="dummy.txt", mode="wb"))
    print("exiting")

if __name__ == "__main__":
    run()

This code just (1) creates a new dummy file instance and (2) puts it into cache. However, the execution of this code will get stuck when exiting.

To investigate why, add the following print() to __del__() of the file class.

class DummyFile(AbstractBufferedFile):
    ...
    def __del__(self):
        print(fsspec.asyn.loop[0])
        print(fsspec.asyn.iothread[0])
        super().__del__()
starting
exiting
<_UnixSelectorEventLoop running=True closed=False debug=False> # Running... really?
<Thread(fsspecIO, stopped daemon 139767434610240)> # Stopped :(

As you can see, the event loop is marked as "running," but the daemon thread hosting it has stopped. This means the file object was garbage-collected after the interpreter terminated the thread. The sync() will never return as the thread running the event loop has stopped.

The root cause in this example is cache.add(), which creates a reference from the global cache object to the file object. We should not do this, but we can accidentally have this kind of reference chain from global objects to file objects in real-world situations. It will lead to unexpected deadlocks that are difficult to investigate.

I have two proposal:

  • Remove close() call from AbstractBufferedFile.__del__(). Instead, reimplement it in __del__() in each concrete file class if it is guaranteed that the class can execute close() at any moment in the Python interpreter's lifecycle.
  • For async filesystems, it would be better to use atexit to explicitly close the event loop before entering the shutdown sequence.
def _stop_fsspec_async_event_loop():
    loop = fsspec.asyn.loop[0]
    thread = fsspec.asyn.iothread[0]

    if loop:
        if loop.is_running():
            loop.call_soon_threadsafe(loop.stop)
            thread.join()
        if not loop.is_closed():
            loop.close()

atexit.register(_stop_fsspec_async_event_loop)
Exception ignored in: <function DummyFile.__del__ at 0x7f1dffcc2fc0>
Traceback (most recent call last):
  File "/home/.../test.py", line 35, in __del__
  File "/home/.../lib/python3.11/site-packages/fsspec/spec.py", line 2057, in __del__
  File "/home/.../lib/python3.11/site-packages/fsspec/spec.py", line 2035, in close
  File "/home/.../lib/python3.11/site-packages/fsspec/asyn.py", line 122, in wrapper
  File "/home/.../lib/python3.11/site-packages/fsspec/asyn.py", line 77, in sync
RuntimeError: Loop is not running

We immediately got an exception this time. But I believe this is much better than the deadlock.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant