Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
7e1e912
updated beginner
ntalluri Oct 14, 2025
83da105
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri Oct 14, 2025
96bc8b6
updating the introduction to better explain pathway reconstruction an…
ntalluri Oct 14, 2025
ed8d8dc
updating the install and config part
ntalluri Oct 14, 2025
c4239bd
fixing the first part of step 2 in beginner
ntalluri Oct 14, 2025
7970872
add in warning about Docker images and containers
ntalluri Oct 14, 2025
d0f503a
add in one more command
ntalluri Oct 14, 2025
52aa055
add in one more options to command
ntalluri Oct 14, 2025
10a65f8
updated beginner's format and wording more
ntalluri Oct 14, 2025
0c2bbe3
added dockerhub link to the algortihm images and add spras overview i…
ntalluri Oct 14, 2025
25478b8
remove repeat of image
ntalluri Oct 15, 2025
9285d8d
updating intermediate and adding a docker_troubleshooting section
ntalluri Oct 15, 2025
59eb3bd
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri Oct 15, 2025
f611bd3
adding in tristan's suggestions
ntalluri Oct 15, 2025
f0859c8
update path to spras overview image
ntalluri Oct 15, 2025
f3e7f9e
fixed formatting
ntalluri Oct 16, 2025
875057b
updated intermediate tutorial with data part, added images, updated b…
ntalluri Oct 16, 2025
528af72
edit references and add a todo to the unsused rst
ntalluri Oct 16, 2025
68fad39
removing plannign document
ntalluri Oct 17, 2025
5f2cfe3
outline for advanced
ntalluri Oct 17, 2025
2a3f332
added images for advanced
ntalluri Oct 17, 2025
7af4237
Apply suggestions from code review
ntalluri Oct 17, 2025
49cc49b
Merge branch 'tutorial' of github.com:ntalluri/spras into tutorial
ntalluri Oct 17, 2025
c2098df
added some periods
ntalluri Oct 17, 2025
099f2c4
update intermediate with more images and information
ntalluri Oct 17, 2025
c46cafe
added the CHTC integration part
ntalluri Oct 17, 2025
135dfbf
Apply suggestions from another code review
ntalluri Oct 18, 2025
3043cbe
added alt text, updated erbb pathway, and made images larger
ntalluri Oct 19, 2025
bbe5e40
add back some parts of the intro
ntalluri Oct 19, 2025
496c0af
make the pathway reconstruction algo tagline better
ntalluri Oct 19, 2025
32c86d4
add in to the files paths and updated a comment
ntalluri Oct 19, 2025
b47be5f
Merge branch 'main' of github.com:ntalluri/spras into tutorial
ntalluri Oct 19, 2025
3610e0b
added in final suggestions from code review
ntalluri Oct 19, 2025
4adaf38
more the docker trouble shooting to be a general trouble shooting page
ntalluri Oct 19, 2025
58c4478
Delete docs/troubleshooting.rst
ntalluri Oct 21, 2025
3b08ebd
Apply final suggestions from code review
ntalluri Oct 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 4 additions & 3 deletions docs/_static/config/beginner.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ algorithms:
include: true
run1:
k: 1
# run2: # uncomment for step 3.2
# k: [10, 100] # uncomment for step 3.2

# run2: # uncomment for step 3.2
# k: [10, 100] # uncomment for step 3.2

# Here we specify which pathways to run and other file location information.
# Assume that if a dataset label does not change, the lists of associated input files do not change
Expand All @@ -45,7 +46,7 @@ reconstruction_settings:

# Set where everything is saved
locations:
reconstruction_dir: "output/basic"
reconstruction_dir: "output/beginner"

analysis:
# Create one summary per pathway file and a single summary table for all pathways for each dataset
Expand Down
Binary file added docs/_static/images/egf-interactome.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/erbb-signaling-pathway.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/pca-kde.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/_static/images/pr-per-pathway-nodes.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
187 changes: 169 additions & 18 deletions docs/tutorial/advanced.rst
Original file line number Diff line number Diff line change
@@ -1,31 +1,182 @@
###################################
Advanced Capabilities and Features
======================================
###################################

More like these are all the things we can do with this, but will not be showing
Parameter tuning
================
Parameter tuning is the process of determining which parameter combinations should be explored for each algorithm for a given dataset.
Parameter tuning focuses on defining and refining the parameter search space.

- mention parameter tuning
- say that parameters are not preset and need to be tuned for each dataset
Each dataset has unique characteristics so there are no preset parameters combinations to use.
Instead, we recommend tuning parameters individually for each new dataset.
SPRAS provides a flexible framework for getting parameter grids for any algorithms for a given dataset.

CHTC integration
Grid Search
------------

Anything not included in the config file
A grid search systematically checks different combinations of parameter values to see how each affects network reconstruction results.

1. Global Workflow Control
In SPRAS, users can define parameter grids for each algorithm directly in the configuration file.
When executed, SPRAS automatically runs each algorithm across all parameter combinations and collects the resulting subnetworks.

Sets options that apply to the entire workflow.
SPRAS will also support parameter refinement using graph topological heuristics.
These topological metrics help identify parameter regions that produce biologically plausible outputs networks.
Based on these heuristics, SPRAS will generate new configuration files with refined parameter grids for each algorithm per dataset.

- Examples: the container framework (docker, singularity, dsub) and where to pull container images from
Users can further refine these grids by rerunning the updated configuration and adjusting the parameter ranges around the newly identified regions to find and fine-tune the most promising algorithm specific outputs for a given dataset.

running spras with multiple parameter combinations with multiple algorithms on multiple Datasets
- for the tutorial we are only doing one dataset
.. note::

4. Gold Standards
Some grid search features are still under development and will be added in future SPRAS releases.

Defines the input files SPRAS will use to evaluate output subnetworks
Parameter selection
-------------------

A gold standard dataset is comprised of:
Parameter selection refers to the process of determining which parameter combinations should be used for evaluation on a gold standard dataset.

- a label: defines the name of the gold standard dataset
- node_file or edge_file: a list of either node files or edge files. Only one or the other can exist in a single dataset. At the moment only one edge or one node file can exist in one dataset
- data_dir: the path to where the input gold standard files live
- dataset_labels: a list of dataset labels that link each gold standard links to one or more datasets via the dataset labels
Parameter selection is handled in the evaluation code, which supports multiple parameter selection strategies.
Once the grid space search is complete for each dataset, the user can enable evaluation (by setting evaluation ``include: true``) and it will run all of the parameter selection code.

PCA-based parameter selection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The PCA-based approach identifies a representative parameter setting for each pathway reconstruction algorithm on a given dataset.
It selects the single parameter combination that best captures the central trend of an algorithm's reconstruction behavior.

.. image:: ../_static/images/pca-kde.png
:alt: Principal component analysis visualization across pathway outputs with a kernel density estimate computed on top
:width: 600
:align: center

.. raw:: html

<div style="margin:20px 0;"></div>

For each algorithm, all reconstructed subnetworks are projected into an algorithm-specific 2D PCA space based on the set of edges produced by the respective parameter combinations for that algorithm.
This projection summarizes how the algorithm's outputs vary across different parameter combinations, allowing patterns in the outputs to be visualized in a lower-dimensional space.

Within each PCA space, a kernel density estimate (KDE) is computed over the projected points to identify regions of high density.
The output closest to the highest KDE peak is selected as the most representative parameter setting, as it corresponds to the region where the algorithm most consistently produces similar subnetworks.

Ensemble network-based parameter selection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The ensemble-based approach combines results from all parameter settings for each pathway reconstruction algorithm on a given dataset.
Instead of focusing on a single "best" parameter combination, it summarizes the algorithm's overall reconstruction behavior across parameters.

All reconstructed subnetworks are merged into algorithm-specific ensemble networks, where each edge weight reflects how frequently that interaction appears across the outputs.
Edges that occur more often are assigned higher weights, highlighting interactions that are most consistently recovered by the algorithm.

These consensus networks help identify the core patterns and overall stability of an algorithm's output's without needing to choose a single parameter setting (no clear optimal parameter combination could exists).


Ground truth-based evaluation without parameter selection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The no parameter selection approach chooses all parameter combinations for each pathway reconstruction algorithm on a given dataset.
This approach can be useful for idenitifying patterns in algorithm performance without favoring any specific parameter setting.

Evaluation
============

In some cases, users may have a gold standard file that allows them to evaluate the quality of the reconstructed subnetworks generated by pathway reconstruction algorithms.

However, gold standards may not exist for certain types of experimental data where validated ground truth interactions or molecules are unavailable or incomplete.
For example, in emerging research areas or poorly characterized biological systems, interactions may not yet be experimentally verified or fully known, making it difficult to define a reliable reference network for evaluation.

Adding gold standard datasets and evaluation post analysis a configuration
--------------------------------------------------------------------------

In the configuration file, users can specify one or more gold standard datasets to evaluate the subnetworks reconstructed from each dataset.
When gold standards are provided and evaluation is enabled (``include: true``), SPRAS will automatically compare the reconstructed subnetworks for a specific dataset against the corresponding gold standards.

.. code-block:: yaml

gold_standards:
-
label: gs1
node_files: ["gs_nodes0.txt", "gs_nodes1.txt"]
data_dir: "input"
dataset_labels: ["data0"]
-
label: gs2
edge_files: ["gs_edges0.txt"]
data_dir: "input"
dataset_labels: ["data0", "data1"]

analysis:
evaluation:
include: true

A gold standard dataset must include the following types of keys and files:

- ``label``: a name that uniquely identifies a gold standard dataset throughout the SPRAS workflow and outputs.
- ``node_file`` or ``edge_file``: A list of node or edge files. Only one of these can be defined per gold standard dataset.
- ``data_dir``: The file path of the directory where the input gold standard dataset files are located.
- ``dataset_labels``: a list of dataset labels indicating which datasets this gold standard dataset should be evaluated against.

When evaluation is enabled, SPRAS will automatically run its built-in evaluation analysis on each defined dataset-gold standard pair.
This evaluation computes metrics such as precision, recall, and precision-recall curves, depending on the parameter selection method used.

For each pathway, evaluation can be run independently of any parameter selection method (the ground truth-based evaluation without parameter selection idea) to directly inspect precision and recall for each reconstructed network from a given dataset.

.. image:: ../_static/images/pr-per-pathway-nodes.png
:alt: Precision and recall computed for each pathway and visualized on a scatter plot
:width: 600
:align: center

.. raw:: html

<div style="margin:20px 0;"></div>

Ensemble-based parameter selection generates precision-recall curves by thresholding on the frequency of edges across an ensemble of reconstructed networks for an algorithm for given dataset.

.. image:: ../_static/images/pr-curve-ensemble-nodes-per-algorithm-nodes.png
:alt: Precision-recall curve computed for a single ensemble file / pathway and visualized as a curve
:width: 600
:align: center

.. raw:: html

<div style="margin:20px 0;"></div>

PCA-based parameter selection computes a precision and recall for a single reconstructed network selected using PCA from all reconstructed networks for an algorithm for given dataset.

.. image:: ../_static/images/pr-pca-chosen-pathway-per-algorithm-nodes.png
:alt: Precision and recall computed for each pathway chosen by the PCA-selection method and visualized on a scatter plot
:width: 600
:align: center

.. raw:: html

<div style="margin:20px 0;"></div>

.. note::
Evaluation will only execute if ml has ``include: true``, because the PCA parameter selection step depends on the PCA ML analysis.

.. note::
To see evaluation in action, run SPRAS using the config.yaml or egfr.yaml configuration files.

HTCondor integration
=====================

Running SPRAS locally can become slow and resource intensive, especially when running many algorithms, parameter combinations, or datasets simultaneously.

To address this, SPRAS supports an integration with `HTCondor <https://htcondor.org/>`__ (a high throughput computing system), allowing Snakemake jobs to be distributed in parallel and executed across available compute.

See :doc:`Running with HTCondor <../htcondor>` for more information on SPRAS's integrations with HTConder.


Ability to run with different container frameworks
---------------------------------------------------

CHTC uses Apptainer to run containerized software in secure, high-performance environments.

SPRAS accommodates this by allowing users to specify which container framework to use globally within their workflow configuration.

The global workflow control section in the configuration file allows a user to set which SPRAS supported container framework to use:

.. code-block:: yaml

container_framework: docker

The frameworks include Docker, Apptainer/Singularity, or dsub
Loading
Loading