Skip to content
This repository was archived by the owner on Apr 25, 2023. It is now read-only.
This repository was archived by the owner on Apr 25, 2023. It is now read-only.

enhanced parallelization #33

@bryantChhun

Description

@bryantChhun

problem

Each module run_patch.py run_segmentation.py etc... launches a process to parallelize across sites, but often we have a large dataset for a given site (high Z, T, C count) and are not able to parallelize each, say, timepoint.

possible solutions

Each module uses its own worker class and the python multiprocessing library to spawn new processes. If we use multiprocess pool (either with concurrent.futures.processpoolexecutor or multiprocess libraries), we can pass a list of parameters to the executor, which will spawn processes on its own.

Questions

This touches on two questions:

  • what is the data structure we'll use? Do we want a single array (currently .npy), in which case, we need it to be capable of concurrent writing. Should patches be written to individual files?
  • at exactly what level do we parallelize the data? Should it be possible at the finest level (Z, T, C)?

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions