-
Notifications
You must be signed in to change notification settings - Fork 25
docs: updating tutorial for COMBINE25 #421
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
7e1e912
updated beginner
ntalluri 83da105
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri 96bc8b6
updating the introduction to better explain pathway reconstruction an…
ntalluri ed8d8dc
updating the install and config part
ntalluri c4239bd
fixing the first part of step 2 in beginner
ntalluri 7970872
add in warning about Docker images and containers
ntalluri d0f503a
add in one more command
ntalluri 52aa055
add in one more options to command
ntalluri 10a65f8
updated beginner's format and wording more
ntalluri 0c2bbe3
added dockerhub link to the algortihm images and add spras overview i…
ntalluri 25478b8
remove repeat of image
ntalluri 9285d8d
updating intermediate and adding a docker_troubleshooting section
ntalluri 59eb3bd
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri f611bd3
adding in tristan's suggestions
ntalluri f0859c8
update path to spras overview image
ntalluri f3e7f9e
fixed formatting
ntalluri 875057b
updated intermediate tutorial with data part, added images, updated b…
ntalluri 528af72
edit references and add a todo to the unsused rst
ntalluri 68fad39
removing plannign document
ntalluri 5f2cfe3
outline for advanced
ntalluri 2a3f332
added images for advanced
ntalluri 7af4237
Apply suggestions from code review
ntalluri 49cc49b
Merge branch 'tutorial' of github.com:ntalluri/spras into tutorial
ntalluri c2098df
added some periods
ntalluri 099f2c4
update intermediate with more images and information
ntalluri c46cafe
added the CHTC integration part
ntalluri 135dfbf
Apply suggestions from another code review
ntalluri 3043cbe
added alt text, updated erbb pathway, and made images larger
ntalluri bbe5e40
add back some parts of the intro
ntalluri 496c0af
make the pathway reconstruction algo tagline better
ntalluri 32c86d4
add in to the files paths and updated a comment
ntalluri b47be5f
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri 3610e0b
added in final suggestions from code review
ntalluri 4adaf38
more the docker trouble shooting to be a general trouble shooting page
ntalluri 58c4478
Delete docs/troubleshooting.rst
ntalluri 3b08ebd
Apply final suggestions from code review
ntalluri File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,31 +1,182 @@ | ||
| ################################### | ||
| Advanced Capabilities and Features | ||
| ====================================== | ||
| ################################### | ||
|
|
||
| More like these are all the things we can do with this, but will not be showing | ||
| Parameter tuning | ||
| ================ | ||
| Parameter tuning is the process of determining which parameter combinations should be explored for each algorithm for a given dataset. | ||
| Parameter tuning focuses on defining and refining the parameter search space. | ||
|
|
||
| - mention parameter tuning | ||
| - say that parameters are not preset and need to be tuned for each dataset | ||
| Each dataset has unique characteristics so there are no preset parameters combinations to use. | ||
| Instead, we recommend tuning parameters individually for each new dataset. | ||
| SPRAS provides a flexible framework for getting parameter grids for any algorithms for a given dataset. | ||
|
|
||
| CHTC integration | ||
| Grid Search | ||
| ------------ | ||
|
|
||
| Anything not included in the config file | ||
| A grid search systematically checks different combinations of parameter values to see how each affects network reconstruction results. | ||
|
|
||
| 1. Global Workflow Control | ||
| In SPRAS, users can define parameter grids for each algorithm directly in the configuration file. | ||
| When executed, SPRAS automatically runs each algorithm across all parameter combinations and collects the resulting subnetworks. | ||
|
|
||
| Sets options that apply to the entire workflow. | ||
| SPRAS will also support parameter refinement using graph topological heuristics. | ||
| These topological metrics help identify parameter regions that produce biologically plausible outputs networks. | ||
| Based on these heuristics, SPRAS will generate new configuration files with refined parameter grids for each algorithm per dataset. | ||
|
|
||
| - Examples: the container framework (docker, singularity, dsub) and where to pull container images from | ||
| Users can further refine these grids by rerunning the updated configuration and adjusting the parameter ranges around the newly identified regions to find and fine-tune the most promising algorithm specific outputs for a given dataset. | ||
|
|
||
| running spras with multiple parameter combinations with multiple algorithms on multiple Datasets | ||
| - for the tutorial we are only doing one dataset | ||
| .. note:: | ||
|
|
||
| 4. Gold Standards | ||
| Some grid search features are still under development and will be added in future SPRAS releases. | ||
|
|
||
| Defines the input files SPRAS will use to evaluate output subnetworks | ||
| Parameter selection | ||
| ------------------- | ||
|
|
||
| A gold standard dataset is comprised of: | ||
| Parameter selection refers to the process of determining which parameter combinations should be used for evaluation on a gold standard dataset. | ||
|
|
||
| - a label: defines the name of the gold standard dataset | ||
| - node_file or edge_file: a list of either node files or edge files. Only one or the other can exist in a single dataset. At the moment only one edge or one node file can exist in one dataset | ||
| - data_dir: the path to where the input gold standard files live | ||
| - dataset_labels: a list of dataset labels that link each gold standard links to one or more datasets via the dataset labels | ||
| Parameter selection is handled in the evaluation code, which supports multiple parameter selection strategies. | ||
| Once the grid space search is complete for each dataset, the user can enable evaluation (by setting evaluation ``include: true``) and it will run all of the parameter selection code. | ||
|
|
||
| PCA-based parameter selection | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The PCA-based approach identifies a representative parameter setting for each pathway reconstruction algorithm on a given dataset. | ||
| It selects the single parameter combination that best captures the central trend of an algorithm's reconstruction behavior. | ||
|
|
||
| .. image:: ../_static/images/pca-kde.png | ||
| :alt: Principal component analysis visualization across pathway outputs with a kernel density estimate computed on top | ||
| :width: 600 | ||
| :align: center | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <div style="margin:20px 0;"></div> | ||
|
|
||
| For each algorithm, all reconstructed subnetworks are projected into an algorithm-specific 2D PCA space based on the set of edges produced by the respective parameter combinations for that algorithm. | ||
| This projection summarizes how the algorithm's outputs vary across different parameter combinations, allowing patterns in the outputs to be visualized in a lower-dimensional space. | ||
|
|
||
| Within each PCA space, a kernel density estimate (KDE) is computed over the projected points to identify regions of high density. | ||
| The output closest to the highest KDE peak is selected as the most representative parameter setting, as it corresponds to the region where the algorithm most consistently produces similar subnetworks. | ||
|
|
||
| Ensemble network-based parameter selection | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
| The ensemble-based approach combines results from all parameter settings for each pathway reconstruction algorithm on a given dataset. | ||
| Instead of focusing on a single "best" parameter combination, it summarizes the algorithm's overall reconstruction behavior across parameters. | ||
|
|
||
| All reconstructed subnetworks are merged into algorithm-specific ensemble networks, where each edge weight reflects how frequently that interaction appears across the outputs. | ||
| Edges that occur more often are assigned higher weights, highlighting interactions that are most consistently recovered by the algorithm. | ||
|
|
||
| These consensus networks help identify the core patterns and overall stability of an algorithm's output's without needing to choose a single parameter setting (no clear optimal parameter combination could exists). | ||
|
|
||
|
|
||
| Ground truth-based evaluation without parameter selection | ||
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
|
|
||
| The no parameter selection approach chooses all parameter combinations for each pathway reconstruction algorithm on a given dataset. | ||
| This approach can be useful for idenitifying patterns in algorithm performance without favoring any specific parameter setting. | ||
|
|
||
| Evaluation | ||
| ============ | ||
|
|
||
| In some cases, users may have a gold standard file that allows them to evaluate the quality of the reconstructed subnetworks generated by pathway reconstruction algorithms. | ||
|
|
||
| However, gold standards may not exist for certain types of experimental data where validated ground truth interactions or molecules are unavailable or incomplete. | ||
| For example, in emerging research areas or poorly characterized biological systems, interactions may not yet be experimentally verified or fully known, making it difficult to define a reliable reference network for evaluation. | ||
|
|
||
| Adding gold standard datasets and evaluation post analysis a configuration | ||
| -------------------------------------------------------------------------- | ||
|
|
||
| In the configuration file, users can specify one or more gold standard datasets to evaluate the subnetworks reconstructed from each dataset. | ||
| When gold standards are provided and evaluation is enabled (``include: true``), SPRAS will automatically compare the reconstructed subnetworks for a specific dataset against the corresponding gold standards. | ||
|
|
||
| .. code-block:: yaml | ||
|
|
||
| gold_standards: | ||
| - | ||
| label: gs1 | ||
| node_files: ["gs_nodes0.txt", "gs_nodes1.txt"] | ||
| data_dir: "input" | ||
| dataset_labels: ["data0"] | ||
| - | ||
| label: gs2 | ||
| edge_files: ["gs_edges0.txt"] | ||
| data_dir: "input" | ||
| dataset_labels: ["data0", "data1"] | ||
|
|
||
| analysis: | ||
| evaluation: | ||
| include: true | ||
|
|
||
| A gold standard dataset must include the following types of keys and files: | ||
|
|
||
| - ``label``: a name that uniquely identifies a gold standard dataset throughout the SPRAS workflow and outputs. | ||
| - ``node_file`` or ``edge_file``: A list of node or edge files. Only one of these can be defined per gold standard dataset. | ||
| - ``data_dir``: The file path of the directory where the input gold standard dataset files are located. | ||
| - ``dataset_labels``: a list of dataset labels indicating which datasets this gold standard dataset should be evaluated against. | ||
|
|
||
| When evaluation is enabled, SPRAS will automatically run its built-in evaluation analysis on each defined dataset-gold standard pair. | ||
| This evaluation computes metrics such as precision, recall, and precision-recall curves, depending on the parameter selection method used. | ||
|
|
||
| For each pathway, evaluation can be run independently of any parameter selection method (the ground truth-based evaluation without parameter selection idea) to directly inspect precision and recall for each reconstructed network from a given dataset. | ||
|
|
||
| .. image:: ../_static/images/pr-per-pathway-nodes.png | ||
| :alt: Precision and recall computed for each pathway and visualized on a scatter plot | ||
| :width: 600 | ||
| :align: center | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <div style="margin:20px 0;"></div> | ||
|
|
||
| Ensemble-based parameter selection generates precision-recall curves by thresholding on the frequency of edges across an ensemble of reconstructed networks for an algorithm for given dataset. | ||
|
|
||
| .. image:: ../_static/images/pr-curve-ensemble-nodes-per-algorithm-nodes.png | ||
| :alt: Precision-recall curve computed for a single ensemble file / pathway and visualized as a curve | ||
| :width: 600 | ||
| :align: center | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <div style="margin:20px 0;"></div> | ||
|
|
||
| PCA-based parameter selection computes a precision and recall for a single reconstructed network selected using PCA from all reconstructed networks for an algorithm for given dataset. | ||
|
|
||
| .. image:: ../_static/images/pr-pca-chosen-pathway-per-algorithm-nodes.png | ||
| :alt: Precision and recall computed for each pathway chosen by the PCA-selection method and visualized on a scatter plot | ||
| :width: 600 | ||
| :align: center | ||
|
|
||
| .. raw:: html | ||
|
|
||
| <div style="margin:20px 0;"></div> | ||
|
|
||
| .. note:: | ||
| Evaluation will only execute if ml has ``include: true``, because the PCA parameter selection step depends on the PCA ML analysis. | ||
|
|
||
| .. note:: | ||
| To see evaluation in action, run SPRAS using the config.yaml or egfr.yaml configuration files. | ||
|
|
||
| HTCondor integration | ||
| ===================== | ||
|
|
||
| Running SPRAS locally can become slow and resource intensive, especially when running many algorithms, parameter combinations, or datasets simultaneously. | ||
|
|
||
| To address this, SPRAS supports an integration with `HTCondor <https://htcondor.org/>`__ (a high throughput computing system), allowing Snakemake jobs to be distributed in parallel and executed across available compute. | ||
|
|
||
| See :doc:`Running with HTCondor <../htcondor>` for more information on SPRAS's integrations with HTConder. | ||
|
|
||
|
|
||
| Ability to run with different container frameworks | ||
| --------------------------------------------------- | ||
|
|
||
| CHTC uses Apptainer to run containerized software in secure, high-performance environments. | ||
|
|
||
| SPRAS accommodates this by allowing users to specify which container framework to use globally within their workflow configuration. | ||
|
|
||
| The global workflow control section in the configuration file allows a user to set which SPRAS supported container framework to use: | ||
|
|
||
| .. code-block:: yaml | ||
|
|
||
| container_framework: docker | ||
|
|
||
| The frameworks include Docker, Apptainer/Singularity, or dsub |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.