dataset: EGFR #43

tristan-f-r · 2025-08-23T03:13:42Z

I am having some trouble with replication. A few questions:

Is the script for generating phosphosite-irefindex13.0-uniprot.txt still around? I'm having trouble even coming close to reproducing that file. (Note that while iRefIndex is down, a lot of data is still available on the wayback machine - this may be a good canidate for storing in OSDF as mentioned in the HIV PR.)
I can not replicate the rounding procedure for prizes - these almost look like floating point errors. Was there any special trunc function calls done for these prize files?

I'm currently reversing the peptide-mapping.tsv file.

for now

agitter · 2025-08-23T12:33:18Z

I think this pull request is out of scope for SPRAS. SPRAS can accept that the TPS paper did the processing it did of the original files and hosted those in its supplementary datasets or GitHub repository. Recreating those pipelines creates a lot of extra work for us.

Some of the TPS analysis was done in Scala so that could explain floating point differences.

phosphosite-irefindex13.0-uniprot.txt was created ~12 years ago in the Fraenkel lab at MIT. That may indicate we should stop using it for SPRAS. I have archives of some (all?) of the scripts used for network processing at that time. I pushed them to a private GitHub repo for preservation. However, I don't want to make it public right now because I don't have permission from the authors and haven't tracked down licensing terms for all of the data files in that repo.

tristan-f-r · 2025-08-23T17:04:31Z

This being out of scope is what I suspected. I was hoping to be able to get enough scripts to be able to update the data using the more recent data sources, but the phosphosite-irefinded PPI world have been the most affected file.

agitter · 2025-08-23T18:05:24Z

I added a note in my lab's fork of the TPS repo, which has more recent activity than the upstream copy, about the origin of the network to help ensure I don't forget: gitter-lab/tps#9

agitter · 2025-10-10T14:15:46Z

@ntalluri had questions about the data normalization and statistical testing for the phosphoproteomics data in the EGFR dataset. I added scripts and details about that in gitter-lab/tps#10.

However, even with that as a reference, her attempt to reanalyze the data with Python in 2025 still gives different results than the original analysis in R in 2014. That may be expected due to differences in languages and statistical packages.

tristan-f-r · 2025-10-10T15:37:00Z

What are the differences? If it's small, it could be the floating point err mentioned above.

agitter · 2025-10-10T16:01:43Z

Neha can correct me if needed, but my understanding is that it was more fundamental. The R version of the Tukey test was fitting an ANOVA model first and the Python version was not. The statistical test itself in the available packages was implemented differently.

tristan-f-r added 2 commits August 22, 2025 20:09

dataset: egfr

1e3ad8e

docs: specifics on process_prizes

bad65b9

tristan-f-r added the dataset Mutating datasets in any way. label Aug 23, 2025

tristan-f-r changed the title ~~dataset: egfr~~ dataset: EGFR Aug 23, 2025

tristan-f-r added 6 commits August 22, 2025 20:19

docs: clarify

f8360dd

fetch script for TPS

38aeae3

chore: drop first/prev files

ad2cd4a

for now

actually, lets trust the processed data

4018d0f

refactor: use same var

f5dc3b1

docs: more info

9ec720f

tristan-f-r closed this Aug 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

dataset: EGFR #43

dataset: EGFR #43

Uh oh!

tristan-f-r commented Aug 23, 2025 •

edited

Loading

Uh oh!

agitter commented Aug 23, 2025

Uh oh!

tristan-f-r commented Aug 23, 2025

Uh oh!

agitter commented Aug 23, 2025

Uh oh!

agitter commented Oct 10, 2025

Uh oh!

tristan-f-r commented Oct 10, 2025

Uh oh!

agitter commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dataset: EGFR #43

dataset: EGFR #43

Uh oh!

Conversation

tristan-f-r commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agitter commented Aug 23, 2025

Uh oh!

tristan-f-r commented Aug 23, 2025

Uh oh!

agitter commented Aug 23, 2025

Uh oh!

agitter commented Oct 10, 2025

Uh oh!

tristan-f-r commented Oct 10, 2025

Uh oh!

agitter commented Oct 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tristan-f-r commented Aug 23, 2025 •

edited

Loading