Iterate large directories efficiently with python.
python-getdents
is a simple wrapper around Linux system call getdents64
(see man getdents
for details). More details on approach.
- Verify that implementation works on platforms other than
x86_64
.
pip install getdents
python3 -m venv env
. env/bin/activate
pip install -e .[test]
pip install cibuildwheel
cibuildwheel --platform linux --output-dir wheelhouse
ulimit -v 33554432 && py.test tests/
Or
ulimit -v 33554432 && ./setup.py test
from getdents import getdents
for inode, type, name in getdents('/tmp', 32768):
print(name)
import os
from getdents import *
fd = os.open('/tmp', O_GETDENTS)
for inode, type, name in getdents_raw(fd, 2**20):
print({
DT_BLK: 'blockdev',
DT_CHR: 'chardev ',
DT_DIR: 'dir ',
DT_FIFO: 'pipe ',
DT_LNK: 'symlink ',
DT_REG: 'file ',
DT_SOCK: 'socket ',
DT_UNKNOWN: 'unknown ',
}[type], {
True: 'd',
False: ' ',
}[inode == 0],
name,
)
os.close(fd)
python-getdents [-h] [-b N] [-o NAME] PATH
Option | Description |
---|---|
-b N |
Buffer size (in bytes) to allocate when iterating over directory. Default is 32768, the same value used by glibc, you probably want to increase this value. Try starting with 16777216 (16 MiB). Best performance is achieved when buffer size rounds to size of the file system block. |
--buffer-size N |
|
-o NAME |
Output format:
|
--output-format NAME |
- 3 - Requested buffer is too large
- 4 -
PATH
not found. - 5 -
PATH
is not a directory. - 6 - Not enough permissions to read contents of the
PATH
.
python-getdents /path/to/large/dir
python -m getdents /path/to/large/dir
python-getdents /path/to/large/dir -o csv -b 16777216 > dir.csv