This repository contains general Python utilities commonly used in the RXN universe.
For utilities related to chemistry, see our other repository rxn-chemutils.
Links:
This package is supported on all operating systems. It has been tested on the following systems:
-
macOS: Big Sur (11.1)
-
Linux: Ubuntu 18.04.4
A Python version of 3.6 or greater is recommended.
The package can be installed from Pypi:
pip install rxn-utilsFor local development, the package can be installed with:
pip install -e ".[dev]"load_list_from_file: read a files into a list of strings.iterate_lines_from_file: same asload_list_from_file, but produces an iterator instead of a list. This can be much more memory-efficient.dump_list_to_fileandappend_to_file: Write an iterable of strings to a file (one per line).named_temporary_pathandnamed_temporary_directory: provide a context with a file or directory that will be deleted when the context closes. Useful for unit tests.>>> with named_temporary_path() as temporary_path: ... # do something on the temporary path. ... # The file or directory at that path will be deleted at the ... # end of the context, except if delete=False.
- ... and others.
- The function
iterate_csv_columnand the related executablerxn-extract-csv-columnprovide an easy way to extract one single column from a CSV file. - The
StreamingCsvEditorallows for doing a series of operations onto a CSV file without loading it fully in the memory. This is for instance used inrxn-reaction-preprocessing. See a few examples in the unit tests.
For reproducible shuffling, or for shuffling two files of identical length so that the same permutation is obtained, one can use the stable_shuffle function.
The executable rxn-stable-shuffle is also provided for this purpose.
Both also work with CSV files if the appropriate flag is provided.
For batching an iterable into lists of a specified size, chunker comes in handy.
It also does so in a memory-efficient way.
>>> from rxn.utilities.containers import chunker
>>> for chunk in chunker(range(1, 10), chunk_size=4):
... print(chunk)
[1, 2, 3, 4]
[5, 6, 7, 8]
[9]remove_duplicates (or iterate_unique_values, its memory-efficient variant) removes duplicates from a container, possibly based on a callable instead of the values:
>>> from rxn.utilities.containers import remove_duplicates
>>> remove_duplicates([3, 6, 9, 2, 3, 1, 9])
[3, 6, 9, 2, 1]
>>> remove_duplicates(["ab", "cd", "efg", "hijk", "", "lmn"], key=lambda x: len(x))
['ab', 'efg', 'hijk', '']regex.py provides a few functions that make it easier to build regex strings (considering whether segments should be optional, capturing, etc.).
- A custom, more general enum class,
RxnEnum. remove_prefix,remove_postfix.- Initialization of loggers, in a
logging-compatible way:logging.py. sandboxed_random_contextandtemporary_random_seed, to create a context with a specific random state that will not have side effects. Especially useful for testing purposes (unit tests).- ... and others.