- Introduction
- Conda Environment Setup
- Download Processed Dataset
- How to Use SynVerse
- Docker Setup Guide for Pretrained Models
- Publication
SynVerse is a framework with an encoder-decoder architecture. It incorporates diverse input features and a reasonable approximation of model architectures commonly employed by existing deep learning-based synergy prediction methods. It includes four data-splitting strategies and three ablation methods: module-based, feature shuffling, and a novel network-based approach to isolate factors influencing model performance.
If you haven't cloned the repository yet, run the following command to clone it and navigate to the SynVerse folder:
git clone https://github.com/Murali-group/SynVerse.git
cd SynVerseThen, follow the steps below to set up the synverse environment with required libraries using the provided synverse.yml file.
conda env create -f synverse.ymlTo run the command, make sure Conda is installed. If not, install Anaconda or the lighter version, Miniconda.
After the environment is created, activate it using:
conda activate synverseTo verify that the environment and its dependencies are set up correctly, you can list the installed packages:
conda listAll datasets used in this study, including drug and cell line features and the preprocessed synergy dataset required to reproduce the results, are available in the Zenodo repository. Download and unzip the inputs.zip file, and place the inputs folder in the project directory, at the same level as the code/ folder.
SynVerse is configured using a YAML file (e.g., sample_config.yaml), which allows users to define the input features, model architecture, and evaluation strategies. Once a configuration file is prepared, SynVerse can be run in various modes to perform different tasks:
- To train and test a model:
python main.py --config config_files/sample_config.yaml --train_type 'regular'
- To perform feature-shuffling-based ablation study:
python main.py --config config_files/sample_config.yaml --train_type 'shuffle'
- To perform network-based ablation study:
python main.py --config config_files/sample_config.yaml --train_type 'rewire'
- To parse the output files and create plots showing RMSE and PCC score of the models:
python -m code.plots.results_plots --parse --config code/config_files/sample_config.yaml
python -m code.plots.results_plots --plot --config code/config_files/sample_config.yaml
This section describes each field in the YAML configuration file used by SynVerse.
The synergy score to predict (Options:'S_mean_mean', 'synergy_loewe_mean')
Base directory where all input files are stored.
Each entry defines a path to a required input file.
| Key | Description |
|---|---|
synergy_file |
Contains synergy triplets. Required columns: drug_1_pid, drug_2_pid, cell_line_name, and S_mean_mean (or synergy_loewe_mean). |
maccs_file |
MACCS fingerprint file. Columns: pid, MACCS_0, ..., MACCS_166. |
mfp_file |
Morgan fingerprints. Columns: pid, Morgan_FP_0, ..., Morgan_FP_255. |
ecfp_file |
ECFP_4 fingerprints. Columns: pid, ECFP4_0, ..., ECFP4_1023. |
smiles_file |
SMILES strings. Columns: pid, smiles. |
mol_graph_file |
Pickle file with DeepChem-derived molecular graphs: {pid: graph}. |
target_file |
Drug target binary profile. Columns: pid and target names |
genex_file |
Cell line gene expression. Columns: cell_line_name and gene names. |
lincs |
Landmark genes file for LINCS1000. |
vocab_file |
Vocabulary file to convert smiles to tokens. |
net_file |
STRING network file (gzipped). |
prot_info_file |
STRING protein metadata file (gzipped). |
Describes drug-level features.
Features appearing here will determine which subset of triplets from synergy_file is used in training, validation, and test. Example 1: If only SMILES-based features such as MACCS, MFP, ECFP_4, mol_graph, smiles appear in drug_features, then the final synergy dataset will contain the triplets where both drugs have SMILES available. Example 2: If both SMILES-based features such as MACCS, MFP, ECFP_4, mol_graph, smiles, and target appear in the list of drug_features, then the final synergy dataset will contain the triplets where drugs have both smiles and target data available.
name: str: Feature namepreprocess: str: Preprocessing methodcompress: bool: Use autoencoder to reduce dimensions.norm: str: Normalization methodencoder: str: Feature-specific encodersuse: List of boolean values: Determines if the feature should be used in the model. InFeature Combination Control, we describe in detail how theuseparameter, together with other feature-control settings, defines the specific feature combinations used during model training.
Same structure as drug_features, but for cell lines.
name: Model architecture (e.g.,'MLP').hp_range: Hyperparameter search space for tuning.hp: Default configuration.
List of encoder configs. Each contains:
name: Encoder type (e.g.,'GCN','Transformer').hp_range: Hyperparameter search space for tuning.hp: Default configuration.
Dimension of hidden layers for the autoencoder.
Number of samples per batch during training.
Maximum training epochs.
type: Spitting strategy to use (Options:random,leave_comb,leave_drug,leave_cell_line).test_frac: Test set size (fraction of total).val_frac: Validation set size (fraction of training set).
Weights & Biases integration for experiment tracking.
enabled: Enable logging.entity_name,token,project_name: W&B credentials.timezone,timezone_format: Time info formatting.
Minimum percentage of triplets required per cell line to be included in training.
Defines the number of features in combinations.
| Field | Meaning |
|---|---|
max_drug_feat |
Maximum number of drug features per model. |
min_drug_feat |
Minimum number of drug features per model. |
max_cell_feat |
Maximum number of cell line features per model. |
min_cell_feat |
Minimum number of cell line features per model. |
SynVerse provides the flexibility to train models with any combination of drug and cell line features using a single configuration file. The feature selection process is governed by a few key parameters described below.
The parameters max_drug_feat and min_drug_feat define the maximum and minimum number of drug features that a model can include. Similarly, max_cell_feat and min_cell_feat control the number of cell line features.
As mentioned before, each feature also has a use parameter that specifies its inclusion rule:
use = [true]: the feature must always be included in all combinations.
use = [false]: the feature must always be excluded.
use = [true, false]: the feature may be included in some combinations and excluded in others.
For example, if a configuration file specifies max_drug_feat = 3, min_drug_feat = 2, and use = ['false'] for MACCS, the generated models will include combinations of two to three drug features, none of which can be MACCS. Thus, a model using both smiles and mol_graph is valid, while one using only smiles, only mol_graph, or MACCS with mol_graph is not.
Set to true to enable hyperparameter optimization.
Settings for Bayesian Optimization with BOHB.
min_budget,max_budget: Resource limits per trial.n_iterations: Number of trials.run_id: BOHB session ID.server_type:'local'or'cluster'.
Which network rewiring method to use when flag --train_type = rewire.
Options: SM: Degree-preserving (Maslov-Sneppen), SA: Strength-preserving (Simulated Annealing).
output_dir: Output directory for results.
This guide provides detailed instructions for containerizing KPGT and MolE using Docker with GPU support.
-
Hardware Requirements
- NVIDIA GPU with Compute Capability ≥ 3.5
- Minimum 16GB+ RAM
- 50GB+ free disk space for Docker images and dependencies
-
Software Requirements
- For KPGT (CUDA 11.3)
- OS: Linux (Ubuntu 20.04 recommended)
- NVIDIA drivers: v465+ (CUDA 11.3 compatibility)
- CUDA Toolkit: 11.3.1
- Docker: Engine 20.10+ with NVIDIA Container Toolkit configured for CUDA 11.3
- For MolE (CUDA 12.1)
- OS: Linux (Ubuntu 20.04 recommended)
- NVIDIA drivers: v525+ (CUDA 12.1 compatibility)
- CUDA Toolkit: 12.1
- Docker: Engine 20.10+ with NVIDIA Container Toolkit configured for CUDA 12.1
- For KPGT (CUDA 11.3)
We already have created Dockerfiles for each project. To build the Docker images, simply run the provided script.
This script will build the kpgt:base image and the mole:base image using their respective Dockerfiles located in the root directory of each project at pretrain directory.
Tasnina, N., Haghani, M. and Murali, T.M., 2025. SynVerse: A Framework for Systematic Evaluation of Deep Learning Based Drug Synergy Prediction Models. bioRxiv, pp.2025-04. https://doi.org/10.1101/2025.04.30.651516
