-
Notifications
You must be signed in to change notification settings - Fork 9
dataset: EGFR #43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dataset: EGFR #43
Conversation
|
I think this pull request is out of scope for SPRAS. SPRAS can accept that the TPS paper did the processing it did of the original files and hosted those in its supplementary datasets or GitHub repository. Recreating those pipelines creates a lot of extra work for us. Some of the TPS analysis was done in Scala so that could explain floating point differences.
|
|
This being out of scope is what I suspected. I was hoping to be able to get enough scripts to be able to update the data using the more recent data sources, but the phosphosite-irefinded PPI world have been the most affected file. |
|
I added a note in my lab's fork of the TPS repo, which has more recent activity than the upstream copy, about the origin of the network to help ensure I don't forget: gitter-lab/tps#9 |
|
@ntalluri had questions about the data normalization and statistical testing for the phosphoproteomics data in the EGFR dataset. I added scripts and details about that in gitter-lab/tps#10. However, even with that as a reference, her attempt to reanalyze the data with Python in 2025 still gives different results than the original analysis in R in 2014. That may be expected due to differences in languages and statistical packages. |
|
What are the differences? If it's small, it could be the floating point err mentioned above. |
|
Neha can correct me if needed, but my understanding is that it was more fundamental. The R version of the Tukey test was fitting an ANOVA model first and the Python version was not. The statistical test itself in the available packages was implemented differently. |
I am having some trouble with replication. A few questions:
phosphosite-irefindex13.0-uniprot.txtstill around? I'm having trouble even coming close to reproducing that file. (Note that whileiRefIndexis down, a lot of data is still available on the wayback machine - this may be a good canidate for storing in OSDF as mentioned in the HIV PR.)I'm currently reversing the
peptide-mapping.tsvfile.