dataset: DepMap #41

gli527 · 2025-07-30T02:25:40Z

initial addition of the DepMap dataset development for SPRAS so far, will add more changes

…ript, and README file describing each folder and raw data download directions

tristan-f-r

Great to see another dataset! There does seem to be some extra files leftover in gene_index_mapping_attempt, and I would love to see this pipeline represented as a series of python scripts instead a jupyter notebook, that way we are always able to reproduce this on CI to avoid the script decay issues we got in the HIV, yeast-osmotic-stress, and responsenet datasets.

We're working on this as well in #39 - we don't have any strong examples of reproducibility yet, unfortunately. #25 is the closest to this, but it's locked under a PR and isn't particularity strong in documentation.

agitter

I would love to see this pipeline represented as a series of python scripts instead a jupyter notebook, that way we are always able to reproduce this on CI

My guidance was to first focus on transferring private files to the public repo and documenting what they to (absolutely essential) and second work on reproducibility. The first needs to happen before @gli527's summer project ends. The second will ideally make some progress before that as well, but if it doesn't others on the SPRAS team can contribute to it afterward.

I'm leaving initial comments on the overall structure but didn't review the notebook closely yet.

.github/workflows/publish.yml

.pre-commit-config.yaml

datasets/depmap/README.md

datasets/depmap/processed/FADU_gold_standard.txt

datasets/depmap/processed/gene_index_mapping_attempt/gene_numbers.txt

datasets/depmap/scripts/local_cell_line_preprocessing.ipynb

gli527 · 2025-07-30T22:38:33Z

I fixed some files and scripts based on feedback but wasn't able to review all my scripts and fix the automatic changes — I will keep adding updates, just wanted to give an update.

datasets/depmap/README.md

Co-authored-by: Anthony Gitter <[email protected]>

datasets/depmap/scripts/cell_line_processing.py

agitter · 2025-08-05T14:52:56Z

We are close to having an initial version ready to merge. My goal for the first pull request is to have a notebook and readme that documents everything that was done with this dataset as well as a preliminary script that conducts a parallel automated version of that analysis. The DepMap dataset is still a work in progress. Over the following months, one of us will continue to update the notebook to explore how to include additional cell lines and datasets from DepMap. Once we finalize those decisions, we can update the script accordingly. At the very end, we can decide how to deprecate or archive the notebook and have the script reproduce everything.

There are still a few comments to resolve about yaml formatting (I will fix those if it is tricky, let me know) and raising errors in the script.

tristan-f-r · 2025-08-05T20:02:28Z

The YAML changes can all be reverted - one looks a little nicer, but yamlfmt wasn't doing that automatically. (Perhaps it was VSCode's yaml formatter?)

It would be ideal if we could stop committing large processed files (I've already enabled squash merging only as the git history was starting to climb into the megabytes) - since the scripts here do work, we can push some commits to Snakemakeify them and remove the processed files.

agitter · 2025-08-08T20:48:00Z

It would be ideal if we could stop committing large processed files

We're lacking a working scratch space for new datasets. Once a dataset is stable, we don't need large processed files in the repo and they waste space. However, when a new dataset is being explored, I find it helpful to have intermediate outputs to understand the dataset and review code outputs. That could happen in yet another repo, but that risks scattering our work even further. I'm open to suggestions.

tristan-f-r · 2025-08-18T05:20:13Z

Keeping intermediary files in PRs is perfectly fine since the intermediate commits don't end up in the main branch history. We also can work on PRs as long as anyone who wants to work on a dataset has write-access to that branch (or open superseding PRs, like with #25.)

agitter · 2025-08-23T02:45:42Z

Keeping intermediary files in PRs is perfectly fine since the intermediate commits don't end up in the main branch history.

Was that a configuration change made at some point during the summer? I see that the default is to squash and merge but don't remember that always being the case.

datasets/depmap/scripts/cell_line_processing.py

datasets/depmap/scripts/uniprot_mapping.py

datasets/depmap/scripts/cell_line_processing.py

…into pr/41

Co-Authored-By: Anthony Gitter <[email protected]>

i need to change this to dmm

tristan-f-r

All processing happens automatically and Gene IDs are preferred in mapping.

Initial addition of all processed data, example config file, local sc…

a08a841

…ript, and README file describing each folder and raw data download directions

tristan-f-r added the dataset Mutating datasets in any way. label Jul 30, 2025

tristan-f-r requested changes Jul 30, 2025

View reviewed changes

agitter reviewed Jul 30, 2025

View reviewed changes

applied changes on documentation and files based on feedback

cc2309c

agitter reviewed Jul 31, 2025

View reviewed changes

datasets/depmap/README.md Outdated Show resolved Hide resolved

agitter reviewed Jul 31, 2025

View reviewed changes

datasets/depmap/README.md Outdated Show resolved Hide resolved

gli527 and others added 6 commits July 31, 2025 15:13

Update datasets/depmap/README.md

4b64c64

Co-authored-by: Anthony Gitter <[email protected]>

fix typos and formatting errors

7373401

fix typoes and formatting errors

04595de

fix typoes and formatting errors

1a59454

added python scripts for cell line processing

e7ac55f

updated readme for scripts

e63e9aa

agitter reviewed Aug 1, 2025

View reviewed changes

datasets/depmap/scripts/cell_line_processing.py Outdated Show resolved Hide resolved

agitter reviewed Aug 1, 2025

View reviewed changes

datasets/depmap/scripts/cell_line_processing.py Outdated Show resolved Hide resolved

added scripts for Uniprot mapping, updated readme and config file

35d1fb9

added error handling for cell line processing

81925ea

Revert yaml and lint script

b332c0a

agitter reviewed Aug 23, 2025

View reviewed changes

tristan-f-r mentioned this pull request Dec 26, 2025

dataset: DISEASES #39

Merged

tristan-f-r added 4 commits December 26, 2025 01:02

Merge branch 'main' into pr/41

91c8e0a

chore: bump spras

482815c

Merge branch 'depmap' of https://github.com/gli527/spras-benchmarking …

8dbdf2a

…into pr/41

chore(spras): bump [again!]

3c98a5c

tristan-f-r and others added 14 commits December 26, 2025 09:40

style: fmt

d788e2b

Merge branch 'main' into pr/41

8daf02e

chore: set up fetch

e90f384

fix: more path changes

4650a28

chore: drop output date from uniprot mapping

e1ff0e5

chore: apply suggestions

5bec9c9

Co-Authored-By: Anthony Gitter <[email protected]>

chore: correct file names, more docs

e6f5d53

feat: map uniprot through gene ids when available

ba75406

fix: merge cellline_fadu config with dmmm

b19d925

style: fmt

745e41c

fix: clear up irefindex interactome dependency from hiv

dd4707a

chore: add depmap to run_snakemake.sh

8987a88

fix: commas for process input

c877384

fix: dmmm prefix

d946192

i need to change this to dmm

tristan-f-r approved these changes Dec 30, 2025

View reviewed changes

tristan-f-r merged commit 487f68d into Reed-CompBio:main Dec 30, 2025
3 checks passed

dataset: DepMap #41

dataset: DepMap #41

Uh oh!

Conversation

gli527 commented Jul 30, 2025

Uh oh!

tristan-f-r left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gli527 commented Jul 30, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agitter commented Aug 5, 2025

Uh oh!

tristan-f-r commented Aug 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agitter commented Aug 8, 2025

Uh oh!

tristan-f-r commented Aug 18, 2025

Uh oh!

agitter commented Aug 23, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tristan-f-r left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tristan-f-r left a comment •

edited

Loading

tristan-f-r commented Aug 5, 2025 •

edited

Loading