- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 33.2k
Description
Feature or enhancement
Proposal:
A previous issue, #81782, discussed the possibility of modifying shutil.make_archive() so that symbolic links could be followed when creating zip files. The issue states that symbolic links are not followed. My testing in Linux revealed to me that os.walk(), used within the _make_zipfile() function, by default does follow symbolic links to files, but does not follow symbolic links to directories.
From my understanding of the documentation for os.walk() I think the default behaviour is probably quite intentional, since if a symbolic link to a directory points to a parent directory of the location of the link, os.walk() will become trapped in a process of infinite recursion. But os.walk() does have a followlinks parameter than can be set to True to override the default behaviour and resolve symbolic links to directories.
I originally started looking into any of this because I was using shutil.make_archive() in a backup script to create a compressed tarball from a folder that contains symbolic links to many files and directories of interest, and the behaviour with symbolic links was not quite what I expected. The tarfile module has a dereference parameter available that can be set to True to override the default behaviour and resolve symbolic links to files and directories.
I also investigated the behaviour with hard links for completeness.
Default behaviour when creating a "zip" with shutil.make_archive():
- Hard links are resolved. Multiple hard links to the same file results in multiple copies of the file being added to the archive.
- Symbolic links to files are resolved.
- Symbolic links to directories are not resolved.
Modified behaviour when creating a "zip" with os.walk() set to follow links:
- Symbolic links to directories are resolved.
- Hard links and symbolic links to files behaviour is unchanged.
Default behaviour when creating compressed tarball:
- Hard links are resolved intelligently. That is to say that multiple hard links to a single file result in only one copy of the file being added to the archive, other links remain as hard links. When a tarball is extracted in its entirety, the hard links are resolved and multiple copies of the file are produced in the output.
- Symbolic links to files are preserved as symbolic links.
- Symbolic links to directories are preserved as symbolic links.
Modified behaviour when creating compressed tarball set to dereference:
- Hard links are resolved in the same way as for "zip", multiple hard links to a file results in multiple copies of the file in the archive.
- Symbolic links to files are resolved.
- Symbolic links to directories are resolved.
My proposal is to include an optional follow_links parameter to the shutil.make_archive() function that is passed to the _make_zipfile() and _make_tarball() functions, defaulting to False to preserve the existing behaviour as the default.
I am of course happy to discuss the proposal. Since the features exist in the underlying functions I feel it is reasonable to provide the option to the user, along with detailed documentation of what it will do. In #81782 a member of the development team said they felt it was a reasonable addition, and the issue remains open.
I have prepared a forked branch with commits for the following:
- Add follow_links parameter to the functions in the shutil module, and updates to docstrings to document the default and modified behaviour in detail.
- Add tests to test_shutil.py to check the behaviour of symlinks on systems where they are supported.
If it is felt that this would be a useful addition I can finalise things and submit a PR for review. Thank you for taking the time to read this.
Has this already been discussed elsewhere?
For completeness, existing APIs already have a follow_symlinks parameter: #81782 (comment).