-
Notifications
You must be signed in to change notification settings - Fork 25
docs: updating tutorial for COMBINE25 #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 26 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
7e1e912
updated beginner
ntalluri 83da105
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri 96bc8b6
updating the introduction to better explain pathway reconstruction an…
ntalluri ed8d8dc
updating the install and config part
ntalluri c4239bd
fixing the first part of step 2 in beginner
ntalluri 7970872
add in warning about Docker images and containers
ntalluri d0f503a
add in one more command
ntalluri 52aa055
add in one more options to command
ntalluri 10a65f8
updated beginner's format and wording more
ntalluri 0c2bbe3
added dockerhub link to the algortihm images and add spras overview i…
ntalluri 25478b8
remove repeat of image
ntalluri 9285d8d
updating intermediate and adding a docker_troubleshooting section
ntalluri 59eb3bd
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri f611bd3
adding in tristan's suggestions
ntalluri f0859c8
update path to spras overview image
ntalluri f3e7f9e
fixed formatting
ntalluri 875057b
updated intermediate tutorial with data part, added images, updated b…
ntalluri 528af72
edit references and add a todo to the unsused rst
ntalluri 68fad39
removing plannign document
ntalluri 5f2cfe3
outline for advanced
ntalluri 2a3f332
added images for advanced
ntalluri 7af4237
Apply suggestions from code review
ntalluri 49cc49b
Merge branch 'tutorial' of github.com:ntalluri/spras into tutorial
ntalluri c2098df
added some periods
ntalluri 099f2c4
update intermediate with more images and information
ntalluri c46cafe
added the CHTC integration part
ntalluri 135dfbf
Apply suggestions from another code review
ntalluri 3043cbe
added alt text, updated erbb pathway, and made images larger
ntalluri bbe5e40
add back some parts of the intro
ntalluri 496c0af
make the pathway reconstruction algo tagline better
ntalluri 32c86d4
add in to the files paths and updated a comment
ntalluri b47be5f
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri 3610e0b
added in final suggestions from code review
ntalluri 4adaf38
more the docker trouble shooting to be a general trouble shooting page
ntalluri 58c4478
Delete docs/troubleshooting.rst
ntalluri 3b08ebd
Apply final suggestions from code review
ntalluri File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,31 +1,183 @@ | ||
| ################################### | ||
| Advanced Capabilities and Features | ||
| ====================================== | ||
| ################################### | ||
|
|
||
| More like these are all the things we can do with this, but will not be showing | ||
| Parameter tuning | ||
| ================ | ||
| Parameter tuning is the process of determining which parameter combinations should be explored for each algorithm for a given dataset. | ||
| Parameter tuning focuses on defining and refining the parameter search space. | ||
|
|
||
| - mention parameter tuning | ||
| - say that parameters are not preset and need to be tuned for each dataset | ||
| Each dataset has unique characteristics so there are no preset parameters combinations to use and instead must be tuned individually for an algorithm. | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| SPRAS provides a flexible framework for getting parameter grids for any algorithms for a given dataset. | ||
|
|
||
| Grid Search | ||
| ------------ | ||
|
|
||
| A grid search systematically tests different combinations of parameter values to see how each affects network reconstruction results. | ||
|
|
||
| In SPRAS, users can define parameter grids for each algorithm directly in the configuration file. | ||
| When executed, SPRAS automatically runs each algorithm across all parameter combinations and collects the resulting subnetworks. | ||
|
|
||
| SPRAS will also supports parameter refinement using graph topological heuristics. | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| These topological metrics help identify parameter regions that produce stable or biologically plausible outputs networks. | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Based on these heuristics, SPRAS will generate new configuration files with refined parameter grids for each algorithm per dataset. | ||
|
|
||
| Users can further refine these grids by rerunning the updated configuration and adjusting the parameter ranges around the newly identified regions to find and fine-tune the most promising algorithm specific outputs for a given dataset. | ||
|
|
||
| .. note:: | ||
|
|
||
| Some grid search features are still under development and will be added in future SPRAS releases. | ||
|
|
||
| Parameter selection | ||
| ------------------- | ||
|
|
||
| Parameter selection refers to the process of determining which parameter combinations should be used for evalaution and how to identify the “best” set of parameters per algorithm for a given dataset. | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Parameter selection is handled in the evaluation code, which supports multiple parameter selection strategies. | ||
|
|
||
| Once the grid space search is complete for each dataset, the user can enable evaluation (by setting evaluation include to true) and it will run all of the parameter selection code. | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| PCA-based parameter selection | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The PCA-based approach identifies a representative parameter setting for each pathway reconstruction algorithm on a given dataset. | ||
| It selects the single parameter combination that best captures the central trend of an algorithm's reconstruction behavior. | ||
|
|
||
| .. image:: ../_static/images/pca-kde.png | ||
| :alt: description of the image | ||
| :width: 500 | ||
| :align: center | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <div style="margin:20px 0;"></div> | ||
|
|
||
| For each algorithm, all reconstructed subnetworks are projected into an algorithm-specific PCA space. | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| This projection summarizes how the algorithm's outputs vary across different parameter combinations, allowing patterns in the outputs to be visualized in a lower-dimensional space. | ||
|
|
||
| Within each PCA space, a kernel density estimate (KDE) is computed over the projected points to identify regions of high density. | ||
| The output closest to the highest KDE peak is selected as the most representative parameter setting, as it corresponds to the region where the algorithm most consistently produces similar subnetworks. | ||
|
|
||
| Ensemble network-based parameter selection | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| The ensemble-based approach combines results from all parameter settings for each pathway reconstruction algorithm on a given dataset. | ||
| Instead of focusing on a single "best" parameter combination, it summarizes the algorithm's overall reconstruction behavior across parameters. | ||
|
|
||
| All reconstructed subnetworks are merged into algorithm-specific ensemble networks, where each edge weight reflects how frequently that interaction appears across the outputs. | ||
| Edges that occur more often are assigned higher weights, highlighting interactions that are most consistently recovered by the algorithm. | ||
|
|
||
| These consensus networks help identify the core patterns of an algorithm's output's without needing to choose a single parameter setting. | ||
|
|
||
| .. This approach is useful when users want to understand the overall stability of an algorithm's reconstructions or when no clear optimal parameter combination exists. | ||
| Ground truth-based evaluation without parameter selection | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The no parameter selection approach chooses all parameter combinations for each pathway reconstruction algorithm on a given dataset. | ||
| This approach can be useful for idenitifying patterns in algorithm performance without favoring any specific parameter setting. | ||
|
|
||
| Evaluation | ||
| ============ | ||
|
|
||
| In some cases, users may have a gold standard file that allows them to evaluate the quality of the reconstructed subnetworks generated by pathway reconstruction algorithms. | ||
|
|
||
| However, gold standards may not exist for certain types of experimental data where validated ground truth interactions or molecules are unavailable or incomplete. | ||
| For example, in emerging research areas or poorly characterized biological systems, interactions may not yet be experimentally verified or fully known, making it difficult to define a reliable reference network for evaluation. | ||
|
|
||
| Adding gold standard datasets and evaluation post analysis a configuration | ||
| -------------------------------------------------------------------------- | ||
|
|
||
| In the configuration file, users can specify one or more gold standard datasets to evaluate the subnetworks reconstructed from each dataset. | ||
| When gold standards are provided and evaluation is enabled (include: true), SPRAS will automatically compare the reconstructed subnetworks for a specific dataset against the corresponding gold standards. | ||
|
|
||
| .. code-block:: yaml | ||
| gold_standards: | ||
| - | ||
| label: gs1 | ||
| node_files: ["gs_nodes0.txt", "gs_nodes1.txt"] | ||
| data_dir: "input" | ||
| dataset_labels: ["data0"] | ||
| - | ||
| label: gs2 | ||
| edge_files: ["gs_edges0.txt"] | ||
| data_dir: "input" | ||
| dataset_labels: ["data0", "data1"] | ||
| analysis: | ||
| evaluation: | ||
| include: true | ||
| A gold standard dataset must include the following types of keys and files: | ||
|
|
||
| - label: a name that uniquely identifies a gold standard dataset throughout the SPRAS workflow and outputs. | ||
| - node_file or edge_file: A list of node or edge files. Only one of these can be defined per gold standard dataset. | ||
| - data_dir: The file path of the directory where the input gold standard dataset files are located. | ||
| - dataset_labels: a list of dataset labels indicating which datasets this gold standard dataset should be evaluated against. | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| When evaluation is enabled, SPRAS will automatically run its built-in evaluation analysis on each defined dataset-gold standard pair. | ||
| This evaluation computes metrics such as precision, recall, and precision-recall curves, depending on the parameter selection method used. | ||
|
|
||
| For each pathway, evaluation can be run independently of any parameter selection method (the ground truth-based evaluation without parameter selection idea) to directly inspect precision and recall for each reconstructed network from a given dataset. | ||
|
|
||
| .. image:: ../_static/images/pr-per-pathway-nodes.png | ||
| :alt: description of the image | ||
| :width: 400 | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| :align: center | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <div style="margin:20px 0;"></div> | ||
|
|
||
| Ensemble-based parameter selection generates precision-recall curves by thresholding on the frequency of edges across an ensemble of reconstructed networks for an algorithm for given dataset. | ||
|
|
||
| .. image:: ../_static/images/pr-curve-ensemble-nodes-per-algorithm-nodes.png | ||
| :alt: description of the image | ||
| :width: 400 | ||
| :align: center | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <div style="margin:20px 0;"></div> | ||
|
|
||
| PCA-based parameter selection computes a precision and recall for a single reconstructed network selected using PCA from all reconstructed networks for an algorithm for given dataset. | ||
|
|
||
| .. image:: ../_static/images/pr-pca-chosen-pathway-per-algorithm-nodes.png | ||
| :alt: description of the image | ||
| :width: 400 | ||
| :align: center | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <div style="margin:20px 0;"></div> | ||
|
|
||
| .. note:: | ||
| Evaluation will only execute if ml include is also set to true, because the PCA parameter selection step depends on the PCA ML analysis. | ||
|
|
||
| .. note:: | ||
| To see evaluation in action, run SPRAS using the config.yaml or egfr.yaml configuration files. | ||
|
|
||
| CHTC integration | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| ================= | ||
|
|
||
| Running SPRAS locally can become slow and resource intensive, especially when running many algorithms, parameter combinations, or datasets simultaneously. | ||
|
|
||
| To address this, SPRAS supports integration with the Center for High-Throughput Computing (CHTC), allowing Snakemake jobs to be distributed in parallel and executed across available compute. | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Anything not included in the config file | ||
| See :doc:`Running with HTCondor <../htcondor>` for more information on SPRAS's integrations with HTConder. | ||
|
|
||
| 1. Global Workflow Control | ||
|
|
||
| Sets options that apply to the entire workflow. | ||
| Ability to run with different container frameworks | ||
| --------------------------------------------------- | ||
|
|
||
| - Examples: the container framework (docker, singularity, dsub) and where to pull container images from | ||
| CHTC uses Apptainer to run containerized software in secure, high-performance environments. | ||
|
|
||
| running spras with multiple parameter combinations with multiple algorithms on multiple Datasets | ||
| - for the tutorial we are only doing one dataset | ||
| SPRAS accommodates this by allowing users to specify which container framework to use globally within their workflow configuration. | ||
|
|
||
| 4. Gold Standards | ||
| The global workflow control section in the configuration file allows a user to set which SPRAS supported container framework to use: | ||
|
|
||
| Defines the input files SPRAS will use to evaluate output subnetworks | ||
| .. code-block:: yaml | ||
| A gold standard dataset is comprised of: | ||
| container_framework: docker | ||
| - a label: defines the name of the gold standard dataset | ||
| - node_file or edge_file: a list of either node files or edge files. Only one or the other can exist in a single dataset. At the moment only one edge or one node file can exist in one dataset | ||
| - data_dir: the path to where the input gold standard files live | ||
| - dataset_labels: a list of dataset labels that link each gold standard links to one or more datasets via the dataset labels | ||
| - the frameworks include Docker, Apptainer/Singularity, or dsub | ||
ntalluri marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.