Skip to content

Commit 4018d0f

Browse files
committed
actually, lets trust the processed data
1 parent ad2cd4a commit 4018d0f

File tree

3 files changed

+19
-15
lines changed

3 files changed

+19
-15
lines changed

datasets/egfr/README.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -30,9 +30,8 @@ We want to generate the [egfr-prizes.txt](https://github.com/gitter-lab/tps/blob
3030
The script depends on three files, all inside the TPS [data/timeseries](https://github.com/koksal/tps/tree/bb58d6d89e24dbc39e976a02f1e31387dbe17dfb/data/timeseries)
3131
folder, which also depend on their own raw files.
3232

33-
`firstfile` and `prevfile`, mapped to `p-values-first.tsv` and `p-values-prev.tsv` are processed from the raw data provided by first supplementary data in the TPS paper.
34-
35-
We first get the raw data with `fetch-tps-data.py`, then TODO.
33+
`firstfile` and `prevfile`, mapped to `p-values-first.tsv` and `p-values-prev.tsv` are processed from the raw data provided by first supplementary data in the TPS paper. However, since the raw data inside the paper is unlikely
34+
to be updated with the same output format, we trust that the second supplementary data (or the processed data) is correct.
3635

3736
`mapfile`, or `peptide-mapping.tsv`, is, as quoted by the TPS paper:
3837
> Obtained by mapping the UniProt accession number (e.g. P00533) to the UniProt ID (e.g. EGFR_HUMAN, also known as the UniProt entry name).

datasets/egfr/Snakefile

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -13,15 +13,22 @@ rule prizes:
1313

1414
rule prizes_unsprased:
1515
input:
16-
"raw/p-values-first.tsv",
17-
"raw/p-values-prev.tsv",
16+
"download/tps_input_pvals_first.tsv",
17+
"download/tps_input_pvals_prev.tsv",
1818
"raw/peptide-mapping.tsv"
1919
output:
2020
# We haven't made the final SPRAS adjustments yet.
2121
"processed/egfr-prizes-unSPRASed.txt"
2222
shell:
2323
"uv run scripts/generate_prizes.py " \
24-
"--firstfile=raw/p-values-first.tsv " \
25-
"--prevfile=raw/p-values-prev.tsv " \
24+
"--firstfile=download/tps_input_pvals_first.tsv " \
25+
"--prevfile=download/tps_input_pvals_prev.tsv " \
2626
"--mapfile=raw/peptide-mapping.tsv " \
2727
"--outfile={output}"
28+
29+
rule get_prize_files:
30+
output:
31+
"download/tps_input_pvals_first.tsv",
32+
"download/tps_input_pvals_prev.tsv"
33+
shell:
34+
"uv run scripts/fetch-tps-data.py"
Lines changed: 6 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
"""
2-
Fetches the first supplementary data from the TPS paper:
3-
https://doi.org/10.1016/j.celrep.2018.08.085.
2+
Fetches the second supplementary data from the TPS paper:
3+
https://doi.org/10.1016/j.celrep.2018.08.085. We trust the processed data from the paper:
4+
see the README for motivation.
45
5-
This contains the raw data necessary to get `p-values-first.tsv` and `p-values-prev.tsv`,
6-
which are fed in to generate the prizes.
6+
The ZIP contains `p-values-first.tsv` and `p-values-prev.tsv`,
7+
which are fed in to generate the prizes file.
78
"""
89

910
import requests
@@ -14,7 +15,7 @@
1415

1516
current_directory = Path(os.path.dirname(os.path.realpath(__file__)))
1617

17-
DOWNLOAD_URL = "https://ars.els-cdn.com/content/image/1-s2.0-S2211124718313895-mmc2.zip"
18+
DOWNLOAD_URL = "https://ars.els-cdn.com/content/image/1-s2.0-S2211124718313895-mmc3.zip"
1819

1920
def main():
2021
# https://stackoverflow.com/a/14260592/7589775
@@ -25,8 +26,5 @@ def main():
2526
download_folder.mkdir(exist_ok=True)
2627
zipf.extractall(current_directory / '..' / 'download')
2728

28-
initial_xlsx = download_folder / 'initial.xlsx'
29-
assert initial_xlsx.exists(), "initial.xlsx should be present from the DOWNLOAD_URL!"
30-
3129
if __name__ == '__main__':
3230
main()

0 commit comments

Comments
 (0)