Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
7cb68cd
Add R script to process data
oliverfanderson Oct 31, 2024
7165629
Update R script
oliverfanderson Nov 8, 2024
c417102
Preprocessed 14 pathways into S/T sets
oliverfanderson Nov 15, 2024
d512be8
Create node-prizes file from sources/targets
oliverfanderson Nov 20, 2024
0293ae4
Process datasets into Prizes with 100 as score
oliverfanderson Dec 4, 2024
01a0f07
added network thresholding feature and output images
ctrlaltaf Feb 24, 2025
626ae30
output images
ctrlaltaf Feb 27, 2025
e236451
optimized thresholding
ctrlaltaf Mar 31, 2025
6dc7e0a
added command line inputs
ctrlaltaf Mar 31, 2025
45b9c4e
added edge output file generation
ctrlaltaf Mar 31, 2025
614ac74
output files
ctrlaltaf Mar 31, 2025
0fc770c
added updated outputs
ctrlaltaf Mar 31, 2025
316b53b
new method
ctrlaltaf Apr 1, 2025
e983c86
added output to new method
ctrlaltaf Apr 1, 2025
f7a1961
namespace mapped outputs
ctrlaltaf Apr 2, 2025
a482e0f
refactored api calls
ctrlaltaf Apr 2, 2025
ce1b43e
updated output files
ctrlaltaf Apr 3, 2025
8be45da
update with overlap analytics
ntalluri Apr 5, 2025
12700b4
restructerd the files, updated the src files, adding a steps to proce…
ntalluri Apr 5, 2025
a1f78ae
Ignoring big files
ntalluri Apr 5, 2025
0f5bd1b
Remove attributes file
ntalluri Apr 5, 2025
bf13b4e
updating HumanInteractome script and added directions
ntalluri Apr 5, 2025
7cab836
add code to make pathways uniprot ids and spras compatible, update re…
ntalluri Apr 5, 2025
d287d3e
spras compatible pathway data
ntalluri Apr 5, 2025
3608e75
updated human interactome script to make the interactomes file
ntalluri Apr 5, 2025
79db1ff
updated spras compatible code location, removed files, and removed un…
ntalluri Apr 7, 2025
70c2f15
updated readme, gitignore, and src files
ntalluri Apr 7, 2025
848d948
updated the string-uniprot ids for the interactomes
ntalluri Apr 7, 2025
e57489a
renamed and moved files
ntalluri Apr 7, 2025
1940a1e
updated code to include unreviewed ids
ntalluri Apr 8, 2025
f15378e
updated code to deal with directionality
ntalluri Apr 8, 2025
086b668
remove unused code
ntalluri Apr 8, 2025
447561d
updated the pathway-data and src files to accomidate this
ntalluri Apr 8, 2025
63571f8
updated README with instructions on how to genereate synthethic netwo…
ntalluri Apr 8, 2025
4d2e889
switched the files for sources and targets
ntalluri Apr 9, 2025
93fe873
picked new pilot data using the ratios
ntalluri Apr 12, 2025
0cbc0df
added the sources and targets file origins
ntalluri Apr 12, 2025
ea3bef7
updated to get rid of extra step to get overlap analytics
ntalluri Apr 12, 2025
ce81316
update variable names
ntalluri Apr 14, 2025
db2e7a0
update the varibles names in the rscript
ntalluri Apr 14, 2025
7b12cf9
updated code to deal with duplicate edges
ntalluri Apr 14, 2025
2e9f47d
updated the prize values and the rank values
Apr 16, 2025
d882814
Merge branch 'Reed-CompBio:main' into synthetic_networks
ntalluri Apr 16, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -160,3 +160,14 @@ cython_debug/
# and can be added to the global gitignore or merged into this file. For a more nuclear
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
#.idea/
synthetic-data/human-interactome/9606.protein.links.full.v12.0.txt
synthetic-data/human-interactome/9606.protein.links.full.v12.0.txt.gz
synthetic-data/B_cell_activation/.Rhistory
synthetic-data/.DS_Store

.DS_Store
.RData
.Rhistory

synthetic-data/interactomes/
synthetic-data/spras-compatible-pathway-data/
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
# spras-benchmarking
Benchmarking datasets for the [SPRAS](https://github.com/Reed-CompBio/spras) project
Benchmarking datasets for the [SPRAS](https://github.com/Reed-CompBio/spras) project
97 changes: 97 additions & 0 deletions synthetic-data/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Synthetic Data

> All commands should be run from the `synthetic-data/` root directory.

## Download STRING Human Interactome
1. Download the STRING *Homo sapiens* `9606.protein.links.full.v12.0.txt.gz` database file from [STRING](https://string-db.org/cgi/download?sessionId=bL9sRTdIaUEt&species_text=Homo+sapiens&settings_expanded=0&min_download_score=0&filter_redundant_pairs=0&delimiter_type=txt).
2. Move the downloaded file into the `human-interactome/` folder.
3. From the synthetic-data/ directory, extract the file using:

```
gunzip human-interactome/9606.protein.links.full.v12.0.txt.gz
```

## Download New PANTHER Pathways
1. Visit [Pathway Commons](https://www.pathwaycommons.org/).
2. Search for the desired pathway (e.g., "signaling") and filter the results by the **PANTHER pathway** data source.
Example: [Search for "Signaling" filtered by PANTHER pathway](https://apps.pathwaycommons.org/search?datasource=panther&q=Signaling&type=Pathway)
3. Click on the desired pathway and download the **Extended SIF** version of the pathway.
4. In the `pathway-data/` folder, create a new subfolder named after the pathway you downloaded.
5. Move the downloaded Extended SIF file to this new folder (as a `.txt` file). Rename the file to match the subfolder name exactly.

## Sources and Targets

[Sources](https://www.pnas.org/doi/full/10.1073/pnas.1808790115) are silico human surfaceomes receptors.

[Targets](https://academic.oup.com/nar/article/51/D1/D39/6765312) are human transcription factors.

## Steps to Generate SPRAS-Compatible Pathways

### 1. Process PANTHER Pathways

1. Open `process_panther_pathway.R` and add the name of any new pathways to the `pathways` vector on **line 6**.
2. From the `synthetic-data/` root directory, run the command:
```
Rscript src/process_panther_pathway.R
```
3. This will create seven new files in each subfolder of the `pathway-data/` directory:
- `DEL-EDGES.txt`
- `DEL-NODES.txt`
- `EDGES.txt`
- `NODES.txt`
- `PRIZES-100.txt`
- `SOURCES.txt`
- `TARGETS.txt`

### 2. Convert Pathways to SPRAS-Compatible Format
1. In `SPRAS_compatible_files.py`, add the name of any new pathways to the `pathway_dirs` list on **line 8**.
2. From the synthetic-data/ directory, run the command:
```
python src/SPRAS_compatible_files.py
```
3. This will create a new folder named `spras-compatible-pathway-data`, containing subfolders for each PANTHER pathway in SPRAS-compatible format.
Each subfolder will include the following three files:
- `<pathway_name>_gs_edges.txt`
- `<pathway_name>_gs_nodes.txt`
- `<pathway_name>_node_prizes.txt`

4. From the synthetic-data/ directory, run the command:
```
python src/ratios.py
```
5. This will create a new file `data_ratio.txt` in `spras-compatible-pathway-data` to explain the edge to target/sources ratios.

## Steps to get the interactomes
### 1. Steps to get threshold interactomes
1. From the synthetic-data/ directory, run the command:
```
python src/threshold_interactomes.py
```
2. This will create a new folder named `interactomes`, containing a subfolder called `uniprot-threshold-interactomes`.
The subfolder will include the following 12 files:
- 10 thresholded interactomes: `uniprot_human_interactome_<threshold>.txt` (thresholds range from 1 to 900)
- `proteins_missing_aliases.csv`: STRING IDs that are missing UniProt accession identifiers
- `removed_edges.txt`: All edges removed from the uniprot_human_interactome_<threshold>.txt files

### 2. Steps to get combined interactomes (Panther pathways and threshold interactomes)
1. In `combine.py`, adjust the `pathway_dirs` list on **line 11** to be the pathways to be included in the combined networks
2. From the synthetic-data/ directory, run the command:
```
python src/combine.py
```
3. This will create a new a subfolder called `uniprot-combined-threshold-interactomes` in `interactomes`.
This subfolder will include 12 files:
- 10 combined threshold interactomes combined with the chosen pathways: `uniprot_combined_interactome_<threshold>.txt` (thresholds range from 1 to 900)
- `overlap_combined_info.csv`
- `overlap_info.csv`

# Pilot Data
For the pilot data, use the list `["Wnt_signaling", "JAK_STAT_signaling", "Interferon_gamma_signaling", "FGF_signaling", "Ras"]` in both:
- the list in `combine.py`
- the list in `overlap_analytics.py`

Make sure these pathways in the list are also added `["Wnt_signaling", "JAK_STAT_signaling", "Interferon_gamma_signaling", "FGF_signaling", "Ras"]`to:
- the `pathways` vector in `ProcessPantherPathway.R`
- the list in `SPRAS_compatible_files.py`

**Once you’ve updated the pathway lists in all relevant scripts, run all the steps above to generate the Pilot dataset.**
Loading