Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
a08a841
Initial addition of all processed data, example config file, local sc…
Jul 30, 2025
cc2309c
applied changes on documentation and files based on feedback
Jul 30, 2025
4b64c64
Update datasets/depmap/README.md
gli527 Jul 31, 2025
7373401
fix typos and formatting errors
Jul 31, 2025
04595de
fix typoes and formatting errors
Jul 31, 2025
1a59454
fix typoes and formatting errors
Jul 31, 2025
e7ac55f
added python scripts for cell line processing
Aug 1, 2025
e63e9aa
updated readme for scripts
Aug 1, 2025
35d1fb9
added scripts for Uniprot mapping, updated readme and config file
Aug 2, 2025
81925ea
added error handling for cell line processing
gli527 Aug 18, 2025
b332c0a
Revert yaml and lint script
agitter Aug 23, 2025
91c8e0a
Merge branch 'main' into pr/41
tristan-f-r Dec 26, 2025
482815c
chore: bump spras
tristan-f-r Dec 26, 2025
8dbdf2a
Merge branch 'depmap' of https://github.com/gli527/spras-benchmarking…
tristan-f-r Dec 26, 2025
3c98a5c
chore(spras): bump [again!]
tristan-f-r Dec 26, 2025
d788e2b
style: fmt
tristan-f-r Dec 26, 2025
8daf02e
Merge branch 'main' into pr/41
tristan-f-r Dec 26, 2025
e90f384
chore: set up fetch
tristan-f-r Dec 27, 2025
4650a28
fix: more path changes
tristan-f-r Dec 27, 2025
e1ff0e5
chore: drop output date from uniprot mapping
tristan-f-r Dec 27, 2025
5bec9c9
chore: apply suggestions
tristan-f-r Dec 27, 2025
e6f5d53
chore: correct file names, more docs
tristan-f-r Dec 29, 2025
ba75406
feat: map uniprot through gene ids when available
tristan-f-r Dec 30, 2025
b19d925
fix: merge cellline_fadu config with dmmm
tristan-f-r Dec 30, 2025
745e41c
style: fmt
tristan-f-r Dec 30, 2025
dd4707a
fix: clear up irefindex interactome dependency from hiv
tristan-f-r Dec 30, 2025
8987a88
chore: add depmap to run_snakemake.sh
tristan-f-r Dec 30, 2025
c877384
fix: commas for process input
tristan-f-r Dec 30, 2025
d946192
fix: dmmm prefix
tristan-f-r Dec 30, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,11 @@ jobs:
- name: Run Snakemake workflow for DMMMs
shell: bash --login {0}
run: snakemake --cores 4 --configfile configs/dmmm.yaml --show-failed-logs -s spras/Snakefile
# - name: Run Snakemake workflow for PRAs
# shell: bash --login {0}
# run: snakemake --cores 1 --configfile configs/pra.yaml --show-failed-logs -s spras/Snakefile
- name: Setup PNPM
# TODO: re-enable PRAs once RN/synthetic data PRs are merged.
# - name: Run Snakemake workflow for PRAs
uses: pnpm/action-setup@v4
with:
version: 10
Expand Down
2 changes: 2 additions & 0 deletions datasets/depmap/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
raw
processed
28 changes: 20 additions & 8 deletions datasets/depmap/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,29 @@ You can read more DepMap and the projects included here: https://www.broadinstit
You can visit the DepMap all data downloads portal at: https://depmap.org/portal/data_page/?tab=allData
Download the following datasets under the primary files section and move them to the raw folder, the dataset descriptions from the website is also included :

Currently used files:

- OmicsProfiles.csv: Omics metadata and ID mapping information for files indexed by Profile ID.This dataset is used for mapping cell line names to DepMap model IDs as a basis for data processing. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsProfiles.csv)
- CRISPRGeneDependency.csv: Gene dependency probability estimates for all models in the integrated gene effect.
- 'OmicsProfiles.csv': Omics metadata and ID mapping information for files indexed by Profile ID.This dataset is used for mapping cell line names to DepMap model IDs as a basis for data processing. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsProfiles.csv)
- 'CRISPRGeneDependency.csv': Gene dependency probability estimates for all models in the integrated gene effect.
This dataset is used to identify gold standard genes in each cell line, a dependency probability cutoff of 0.5 is currently used to get the genes with considerable impact on the cell line. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=CRISPRGeneDependency.csv)
- OmicsCNGeneWGS.csv: Gene-level copy number data inferred from WGS data only.Additional copy number datasets are available for download as part of the full DepMap Data Release.(file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsCNGeneWGS.csv)
- OmicsSomaticMutationsMatrixDamaging.csv: Genotyped matrix determining for each cell line whether each gene has at least one damaging mutation. A variant is considered a damaging mutation if LikelyLoF == True. (0 == no mutation; If there is one or more damaging mutations in the same gene for the same cell line, the allele frequencies are summed, and if the sum is greater than 0.95, a value of 2 is assigned and if not, a value of 1 is assigned.). This dataset is used to prepare the input prize file. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsSomaticMutationsMatrixDamaging.csv)
- OmicsExpressionProteinCodingGenesTPMLogp1.csv:Model-level TPMs derived from Salmon v1.10.0 (Patro et al 2017) Rows: Model IDs Columns: Gene names. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsExpressionProteinCodingGenesTPMLogp1.csv)
- 'OmicsSomaticMutationsMatrixDamaging.csv': Genotyped matrix determining for each cell line whether each gene has at least one damaging mutation. A variant is considered a damaging mutation if LikelyLoF == True. (0 == no mutation; If there is one or more damaging mutations in the same gene for the same cell line, the allele frequencies are summed, and if the sum is greater than 0.95, a value of 2 is assigned and if not, a value of 1 is assigned.). This dataset is used to prepare the input prize file. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsSomaticMutationsMatrixDamaging.csv)

Future extention files:

- 'OmicsExpressionProteinCodingGenesTPMLogp1.csv':Model-level TPMs derived from Salmon v1.10.0 (Patro et al 2017) Rows: Model IDs Columns: Gene names. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsExpressionProteinCodingGenesTPMLogp1.csv)
- 'OmicsCNGeneWGS.csv': Gene-level copy number data inferred from WGS data only. Additional copy number datasets are available for download as part of the full DepMap Data Release.(file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsCNGeneWGS.csv)


## scripts
Currently only the Jupyter notebook file I used to analyze dependency data and do the data processing locally to get the input prize file and gold standards. Should be reproducible for any cell line name, but is not yet organized or refined for GitHub.
Currently only the Jupyter notebook file used to analyze dependency data and do the data processing locally to get the input prize file and gold standards. Should be reproducible for any cell line name, but is not yet organized or refined for GitHub.
'OmicsProfiles.csv' used for mapping cell line names to DepMap model IDs.
'OmicsSomaticMutationsMatrixDamaging.csv' used for preparing prize input file.
'CRISPRGeneDependency.csv' used for preparing gold standard output.

## processed data
Files used for Uniprot ID mapping:
- Gene symbols parsed
- Gene symbols mapped to Uniprot IDs
- 'DamamingMutationsGeneSymbols_20250718.csv': Gene symbols parsed from gene columns in 'OmicsSomaticMutationsMatrixDamaging.csv' on the date described
- 'DamagingMutations_idMapping_20250718.tsv': Gene symbols from 'DamamingMutationsGeneSymbols_20250718.csv' mapped to Uniprot IDs using Uniprot Web Service on the date described
- folder of processed data for an attempt to do UniProt mapping with the gene index numbers instead, got stuck due to duplicate matches for the same gene number, a future step could be referring to the original mutations file(OmicsSomaticMutations.csv on DepMap, URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsSomaticMutations.csv) for gene numbers with duplicate matches and do exact matches by seeing where the mutation is located and get more accurate mappings.

Started processing with the FADU cell line:
Expand All @@ -31,4 +39,8 @@ Started processing with the FADU cell line:

## config
Example Config file used to get preliminary results on OmicsIntegrator1 and 2 following the EGFR dataset example. Will test out more parameters and update.
The input edge file for the background network can be obtained from the SPRAS repo 'input/phosphosite-irefindex13.0-uniprot.txt'

## Release Citation
For DepMap Release data, including CRISPR Screens, PRISM Drug Screens, Copy Number, Mutation, Expression, and Fusions:
DepMap, Broad (2025). DepMap Public 25Q2. Dataset. depmap.org
23 changes: 10 additions & 13 deletions datasets/depmap/config/cellline_fadu.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,21 @@ hash_length: 7
singularity: false

algorithms:
-
name: pathlinker
- name: pathlinker
params:
include: false
run1:
k:
- 10
- 20
-
name: omicsintegrator1
- name: omicsintegrator1
params:
include: true
run1:
b:
- 0.55
- 2
- 4
- 7
- 10
d:
- 10
Expand All @@ -31,11 +30,12 @@ algorithms:
- 0.01
w:
- 0.1
- 0.2
mu:
- 0.008
- 0.01
dummy_mode: ["terminal"]
-
name: omicsintegrator2
- name: omicsintegrator2
params:
include: true
run1:
Expand All @@ -48,8 +48,7 @@ algorithms:
- 2
g:
- 3
-
name: meo
- name: meo
params:
include: false
run1:
Expand All @@ -59,8 +58,7 @@ algorithms:
- 3
rand_restarts:
- 10
-
name: domino
- name: domino
params:
include: false
run1:
Expand All @@ -69,8 +67,7 @@ algorithms:
module_threshold:
- 0.05
datasets:
-
data_dir: input
- data_dir: input
edge_files:
- phosphosite-irefindex13.0-uniprot.txt
label: cellline
Expand Down
Loading
Loading