Skip to content

Commit 04595de

Browse files
LiLi
authored andcommitted
fix typoes and formatting errors
1 parent 7373401 commit 04595de

File tree

1 file changed

+26
-26
lines changed

1 file changed

+26
-26
lines changed

datasets/depmap/README.md

Lines changed: 26 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,49 +1,49 @@
1-
# Cancer Dependency Map Dataset
1+
# Cancer Dependency Map Dataset
22

3-
This folder contains the processed data and the scripts for data analysis and preparation on datasets from The Cancer Dependency Map, an initiative led by the Broad Institute to provide large-scale omics data in identifying cancer dependencies/vulnerabilities.
3+
This folder contains the processed data and the scripts for data analysis and preparation on datasets from The Cancer Dependency Map, an initiative led by the Broad Institute to provide large-scale omics data in identifying cancer dependencies/vulnerabilities.
44

5-
You can read more about DepMap and the projects included here: https://www.broadinstitute.org/cancer/cancer-dependency-map
5+
You can read more about DepMap and the projects included here: https://www.broadinstitute.org/cancer/cancer-dependency-map
66

7-
## Raw Data
8-
You can visit the DepMap all data downloads portal at: https://depmap.org/portal/data_page/?tab=allData
9-
Download the following datasets under the primary files section of DepMap and move them to a directory named `raw` that you create. The dataset descriptions from the website are also included:
7+
## Raw Data
8+
You can visit the DepMap all data downloads portal at: https://depmap.org/portal/data_page/?tab=allData
9+
Download the following datasets under the primary files section of DepMap and move them to a directory named `raw` that you create. The dataset descriptions from the website are also included:
1010

1111
Currently used files:
1212

13-
- `OmicsProfiles.csv`: Omics metadata and ID mapping information for files indexed by Profile ID. This dataset is used for mapping cell line names to DepMap model IDs as a basis for data processing. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsProfiles.csv)
14-
- `CRISPRGeneDependency.csv`: Gene dependency probability estimates for all models in the integrated gene effect. This dataset is used to identify gold standard genes in each cell line, a dependency probability cutoff of 0.5 is currently used to get the genes with considerable impact on the cell line. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=CRISPRGeneDependency.csv)
15-
- `OmicsSomaticMutationsMatrixDamaging.csv`: Genotyped matrix determining for each cell line whether each gene has at least one damaging mutation. A variant is considered a damaging mutation if LikelyLoF == True. (0 == no mutation; If there is one or more damaging mutations in the same gene for the same cell line, the allele frequencies are summed, and if the sum is greater than 0.95, a value of 2 is assigned and if not, a value of 1 is assigned.). This dataset is used to prepare the input prize file. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsSomaticMutationsMatrixDamaging.csv)
13+
- `OmicsProfiles.csv`: Omics metadata and ID mapping information for files indexed by Profile ID. This dataset is used for mapping cell line names to DepMap model IDs as a basis for data processing. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsProfiles.csv)
14+
- `CRISPRGeneDependency.csv`: Gene dependency probability estimates for all models in the integrated gene effect. This dataset is used to identify gold standard genes in each cell line, a dependency probability cutoff of 0.5 is currently used to get the genes with considerable impact on the cell line. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=CRISPRGeneDependency.csv)
15+
- `OmicsSomaticMutationsMatrixDamaging.csv`: Genotyped matrix determining for each cell line whether each gene has at least one damaging mutation. A variant is considered a damaging mutation if LikelyLoF == True. (0 == no mutation; If there is one or more damaging mutations in the same gene for the same cell line, the allele frequencies are summed, and if the sum is greater than 0.95, a value of 2 is assigned and if not, a value of 1 is assigned.). This dataset is used to prepare the input prize file. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsSomaticMutationsMatrixDamaging.csv)
1616

17-
Future extension files:
17+
Future extension files:
1818

19-
- `OmicsExpressionProteinCodingGenesTPMLogp1.csv`: Model-level TPMs derived from Salmon v1.10.0 (Patro et al 2017) Rows: Model IDs Columns: Gene names. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsExpressionProteinCodingGenesTPMLogp1.csv)
20-
- `OmicsCNGeneWGS.csv`: Gene-level copy number data inferred from WGS data only. Additional copy number datasets are available for download as part of the full DepMap Data Release. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsCNGeneWGS.csv)
19+
- `OmicsExpressionProteinCodingGenesTPMLogp1.csv`: Model-level TPMs derived from Salmon v1.10.0 (Patro et al 2017) Rows: Model IDs Columns: Gene names. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsExpressionProteinCodingGenesTPMLogp1.csv)
20+
- `OmicsCNGeneWGS.csv`: Gene-level copy number data inferred from WGS data only. Additional copy number datasets are available for download as part of the full DepMap Data Release. (file URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsCNGeneWGS.csv)
2121

2222

2323
## Scripts
2424
Currently only the Jupyter notebook file used to analyze dependency data and do the data processing locally to get the input prize file and gold standards. Should be reproducible for any cell line name, but is not yet organized or refined for GitHub.
25-
- `OmicsProfiles.csv` used for mapping cell line names to DepMap model IDs.
26-
- `OmicsSomaticMutationsMatrixDamaging.csv` used for preparing prize input file.
25+
- `OmicsProfiles.csv` used for mapping cell line names to DepMap model IDs.
26+
- `OmicsSomaticMutationsMatrixDamaging.csv` used for preparing prize input file.
2727
- `CRISPRGeneDependency.csv` used for preparing gold standard output.
2828

29-
## Processed Data
30-
Files used for UniProt ID mapping:
29+
## Processed Data
30+
Files used for UniProt ID mapping:
3131
- `DamagingMutationsGeneSymbols_20250718.csv`: Gene symbols parsed from gene columns in `OmicsSomaticMutationsMatrixDamaging.csv` on the date described
3232
- `DamagingMutations_idMapping_20250718.tsv`: Gene symbols from `DamagingMutationsGeneSymbols_20250718.csv` mapped to UniProt IDs using UniProt Web Service on the date described
33-
- Folder of processed data for an attempt to do UniProt mapping with the gene index numbers instead, got stuck due to duplicate matches for the same gene number. A future step could be referring to the original mutations file (OmicsSomaticMutations.csv on DepMap, URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsSomaticMutations.csv) for gene numbers with duplicate matches and do exact matches by seeing where the mutation is located and get more accurate mappings. Contains preliminary processed data (all as of 07/24/2025):
33+
- Folder of processed data for an attempt to do UniProt mapping with the gene index numbers instead, got stuck due to duplicate matches for the same gene number. A future step could be referring to the original mutations file (OmicsSomaticMutations.csv on DepMap, URL: https://depmap.org/portal/data_page/?tab=allData&releasename=DepMap%20Public%2025Q2&filename=OmicsSomaticMutations.csv) for gene numbers with duplicate matches and do exact matches by seeing where the mutation is located and get more accurate mappings. Contains preliminary processed data (all as of 07/24/2025):
3434
- `gene_index_mapping_attempt\gene_numbers.txt`: Gene index numbers parsed from gene columns in `OmicsSomaticMutationsMatrixDamaging.csv`
35-
- `raw_uniprot_idmapping_2025_07_24.tsv`: Initial mapping results, contains both reviewed and unreviewed results, wasn't able to filter directly on UniProt Web Service due to volume
36-
- `reviewed_id_mapping_2025_07_24.tsv`: Filtered mapping results to only reviewed matches
37-
- `duplicated_mapping_entries.tsv`: Gene index numbers with duplicate matches
35+
- `raw_uniprot_idmapping_2025_07_24.tsv`: Initial mapping results, contains both reviewed and unreviewed results, wasn't able to filter directly on UniProt Web Service due to volume
36+
- `reviewed_id_mapping_2025_07_24.tsv`: Filtered mapping results to only reviewed matches
37+
- `duplicated_mapping_entries.tsv`: Gene index numbers with duplicate matches
3838

39-
Started processing with the FADU cell line:
40-
- Input prize file prepared from the damaging mutations dataset
39+
Started processing with the FADU cell line:
40+
- Input prize file prepared from the damaging mutations dataset
4141
- Gold standard file prepared from the CRISPR gene dependency dataset
4242

43-
## Config
44-
Example config file used to get preliminary results on OmicsIntegrator1 and 2 following the EGFR dataset example. Will test out more parameters and update.
45-
The input edge file for the background network can be obtained from the SPRAS repo `input/phosphosite-irefindex13.0-uniprot.txt`
43+
## Config
44+
Example config file used to get preliminary results on OmicsIntegrator1 and 2 following the EGFR dataset example. Will test out more parameters and update.
45+
The input edge file for the background network can be obtained from the SPRAS repo `input/phosphosite-irefindex13.0-uniprot.txt`
4646

47-
## Release Citation
47+
## Release Citation
4848
For DepMap Release data, including CRISPR Screens, PRISM Drug Screens, Copy Number, Mutation, Expression, and Fusions:
4949
DepMap, Broad (2025). DepMap Public 25Q2. Dataset. depmap.org

0 commit comments

Comments
 (0)