SynVerse

Introduction

SynVerse is a framework with an encoder-decoder architecture. It incorporates diverse input features and a reasonable approximation of model architectures commonly employed by existing deep learning-based synergy prediction methods. It includes four data-splitting strategies and three ablation methods: module-based, feature shuffling, and a novel network-based approach to isolate factors influencing model performance.

Conda Environment Setup

If you haven't cloned the repository yet, run the following command to clone it and navigate to the SynVerse folder:

git clone https://github.com/Murali-group/SynVerse.git
cd SynVerse

Then, follow the steps below to set up the synverse environment with required libraries using the provided synverse.yml file.

conda env create -f synverse.yml

To run the command, make sure Conda is installed. If not, install Anaconda or the lighter version, Miniconda.

After the environment is created, activate it using:

conda activate synverse

To verify that the environment and its dependencies are set up correctly, you can list the installed packages:

conda list

Download Processed Dataset

All datasets used in this study, including drug and cell line features and the preprocessed synergy dataset required to reproduce the results, are available in the Zenodo repository. Download and unzip the inputs.zip file, and place the inputs folder in the project directory, at the same level as the code/ folder.

How to Use SynVerse

SynVerse is configured using a YAML file (e.g., sample_config.yaml), which allows users to define the input features, model architecture, and evaluation strategies. Once a configuration file is prepared, SynVerse can be run in various modes to perform different tasks:

To train and test a model:

   python main.py --config config_files/sample_config.yaml --train_type 'regular'

To perform feature-shuffling-based ablation study:

   python main.py --config config_files/sample_config.yaml --train_type 'shuffle'

To perform network-based ablation study:

   python main.py --config config_files/sample_config.yaml --train_type 'rewire'

To parse the output files and create plots showing RMSE and PCC score of the models:

    python -m code.plots.results_plots --parse --config code/config_files/sample_config.yaml
    python -m code.plots.results_plots --plot --config code/config_files/sample_config.yaml

Configuration File

This section describes each field in the YAML configuration file used by SynVerse.

`score_name`

The synergy score to predict (Options:'S_mean_mean', 'synergy_loewe_mean')

`input_dir`

Base directory where all input files are stored.

`input_files`

Each entry defines a path to a required input file.

Key	Description
`synergy_file`	Contains synergy triplets. Required columns: `drug_1_pid`, `drug_2_pid`, `cell_line_name`, and `S_mean_mean` (or `synergy_loewe_mean`).
`maccs_file`	MACCS fingerprint file. Columns: `pid`, `MACCS_0`, ..., `MACCS_166`.
`mfp_file`	Morgan fingerprints. Columns: `pid`, `Morgan_FP_0`, ..., `Morgan_FP_255`.
`ecfp_file`	ECFP_4 fingerprints. Columns: `pid`, `ECFP4_0`, ..., `ECFP4_1023`.
`smiles_file`	SMILES strings. Columns: `pid`, `smiles`.
`mol_graph_file`	Pickle file with DeepChem-derived molecular graphs: `{pid: graph}`.
`target_file`	Drug target binary profile. Columns: `pid` and target names
`genex_file`	Cell line gene expression. Columns: `cell_line_name` and gene names.
`lincs`	Landmark genes file for LINCS1000.
`vocab_file`	Vocabulary file to convert smiles to tokens.
`net_file`	STRING network file (gzipped).
`prot_info_file`	STRING protein metadata file (gzipped).

`drug_features`

Describes drug-level features.

Features appearing here will determine which subset of triplets from synergy_file is used in training, validation, and test. Example 1: If only SMILES-based features such as MACCS, MFP, ECFP_4, mol_graph, smiles appear in drug_features, then the final synergy dataset will contain the triplets where both drugs have SMILES available. Example 2: If both SMILES-based features such as MACCS, MFP, ECFP_4, mol_graph, smiles, and target appear in the list of drug_features, then the final synergy dataset will contain the triplets where drugs have both smiles and target data available.

name: str: Feature name
preprocess: str: Preprocessing method
compress: bool: Use autoencoder to reduce dimensions.
norm: str: Normalization method
encoder: str: Feature-specific encoders
use: List of boolean values: Determines if the feature should be used in the model. In Feature Combination Control, we describe in detail how the use parameter, together with other feature-control settings, defines the specific feature combinations used during model training.

`cell_line_features`

Same structure as drug_features, but for cell lines.

`model_info`

`decoder`

name: Model architecture (e.g., 'MLP').
hp_range: Hyperparameter search space for tuning.
hp: Default configuration.

`drug_encoder`

List of encoder configs. Each contains:

name: Encoder type (e.g., 'GCN', 'Transformer').
hp_range: Hyperparameter search space for tuning.
hp: Default configuration.

`autoencoder_dims`

Dimension of hidden layers for the autoencoder.

`batch_size`

Number of samples per batch during training.

`epochs`

Maximum training epochs.

`splits`

type: Spitting strategy to use (Options: random, leave_comb, leave_drug, leave_cell_line).
test_frac: Test set size (fraction of total).
val_frac: Validation set size (fraction of training set).

`wandb`

Weights & Biases integration for experiment tracking.

enabled: Enable logging.
entity_name, token, project_name: W&B credentials.
timezone, timezone_format: Time info formatting.

`abundance`

Minimum percentage of triplets required per cell line to be included in training.

Feature Combination Control

Defines the number of features in combinations.

Field	Meaning
`max_drug_feat`	Maximum number of drug features per model.
`min_drug_feat`	Minimum number of drug features per model.
`max_cell_feat`	Maximum number of cell line features per model.
`min_cell_feat`	Minimum number of cell line features per model.

SynVerse provides the flexibility to train models with any combination of drug and cell line features using a single configuration file. The feature selection process is governed by a few key parameters described below.

The parameters max_drug_feat and min_drug_feat define the maximum and minimum number of drug features that a model can include. Similarly, max_cell_feat and min_cell_feat control the number of cell line features.

As mentioned before, each feature also has a use parameter that specifies its inclusion rule:

use = [true]: the feature must always be included in all combinations.

use = [false]: the feature must always be excluded.

use = [true, false]: the feature may be included in some combinations and excluded in others.

For example, if a configuration file specifies max_drug_feat = 3, min_drug_feat = 2, and use = ['false'] for MACCS, the generated models will include combinations of two to three drug features, none of which can be MACCS. Thus, a model using both smiles and mol_graph is valid, while one using only smiles, only mol_graph, or MACCS with mol_graph is not.

`hp_tune`

Set to true to enable hyperparameter optimization.

`bohb`

Settings for Bayesian Optimization with BOHB.

min_budget, max_budget: Resource limits per trial.
n_iterations: Number of trials.
run_id: BOHB session ID.
server_type: 'local' or 'cluster'.

`rewire_method`

Which network rewiring method to use when flag --train_type = rewire. Options: SM: Degree-preserving (Maslov-Sneppen), SA: Strength-preserving (Simulated Annealing).

`output_settings`

output_dir: Output directory for results.

Docker Setup Guide for Pretrained Models

This guide provides detailed instructions for containerizing KPGT and MolE using Docker with GPU support.

Prerequisites

Hardware Requirements
- NVIDIA GPU with Compute Capability ≥ 3.5
- Minimum 16GB+ RAM
- 50GB+ free disk space for Docker images and dependencies
Software Requirements
- For KPGT (CUDA 11.3)
  - OS: Linux (Ubuntu 20.04 recommended)
  - NVIDIA drivers: v465+ (CUDA 11.3 compatibility)
  - CUDA Toolkit: 11.3.1
  - Docker: Engine 20.10+ with NVIDIA Container Toolkit configured for CUDA 11.3
- For MolE (CUDA 12.1)
  - OS: Linux (Ubuntu 20.04 recommended)
  - NVIDIA drivers: v525+ (CUDA 12.1 compatibility)
  - CUDA Toolkit: 12.1
  - Docker: Engine 20.10+ with NVIDIA Container Toolkit configured for CUDA 12.1

Build the Docker Image

We already have created Dockerfiles for each project. To build the Docker images, simply run the provided script. This script will build the kpgt:base image and the mole:base image using their respective Dockerfiles located in the root directory of each project at pretrain directory.

Publication

Tasnina, N., Haghani, M. and Murali, T.M., 2025. SynVerse: a modular framework for building and evaluating deep learning-based drug synergy prediction models. Briefings in Bioinformatics, 26(6), p.bbaf676. https://doi.org/10.1093/bib/bbaf676

Name		Name	Last commit message	Last commit date
Latest commit History 343 Commits
code		code
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SynVerse_Overview.jpg		SynVerse_Overview.jpg
synverse.yml		synverse.yml

Folders and files

Latest commit

History

Repository files navigation

SynVerse

Table of Contents

Introduction

Conda Environment Setup

Download Processed Dataset

How to Use SynVerse

Configuration File

score_name

input_dir

input_files

drug_features

cell_line_features

model_info

decoder

drug_encoder

autoencoder_dims

batch_size

epochs

splits

wandb

abundance

Feature Combination Control

hp_tune

bohb

rewire_method

output_settings

Docker Setup Guide for Pretrained Models

Prerequisites

Build the Docker Image

Publication

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`score_name`

`input_dir`

`input_files`

`drug_features`

`cell_line_features`

`model_info`

`decoder`

`drug_encoder`

`autoencoder_dims`

`batch_size`

`epochs`

`splits`

`wandb`

`abundance`

`hp_tune`

`bohb`

`rewire_method`

`output_settings`

Packages