[FEA] Dask Array Support for rsc.pp.scrublet: A Straightforward implementation

Right now, rsc.pp.scrublet doesn't support Dask arrays, and there's a relatively straightforward path to implement one (at least, from what I know). 

Background:
1. Scrublet only really needs to run within a sample, or batch. This is provided to the function as a 'batch_key'
2. These samples/batches are typically on the order of < 100k cells for batches, or < 10,000 for samples, meaning that they can fit within a typical GPU's memory. 

Implementation concept:
1. Check the the anndata object has a Dask array. If so, require a batch_key be provided. 
2. Rechunk the dask array by batch_key - one dask array for each batch_key
3. Run scrublet in memory on each GPU (.compute_chunk_sizes())
5. Save results in obs as normal. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FEA] Dask Array Support for rsc.pp.scrublet: A Straightforward implementation #388

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FEA] Dask Array Support for rsc.pp.scrublet: A Straightforward implementation #388

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions