Skip to content

openproblems-bio/task_grn_inference

Repository files navigation

GRN Inference

Benchmarking GRN inference methods

Repository: openproblems-bio/task_grn_inference

Description

geneRNIB is a living benchmark platform for GRN inference. This platform provides curated datasets for GRN inference and evaluation, standardized evaluation protocols and metrics, computational infrastructure, and a dynamically updated leaderboard to track state-of-the-art methods. It runs novel GRNs in the cloud, offers competition scores, and stores them for future comparisons, reflecting new developments over time.

The platform supports the integration of new inference methods, datasets and protocols. When a new feature is added, previously evaluated GRNs are re-assessed, and the leaderboard is updated accordingly. The aim is to evaluate both the accuracy and completeness of inferred GRNs. It is designed for both single-modality and multi-omics GRN inference.

In the current version, geneRNIB contains 11 inference methods including both single and multi-omics, 8 evalation metrics, and five datasets (OPSCA, Nakatake, Norman, Adamson, and Replogle).

See our publication for the details of methods.

llation

You need to have Docker, Java, and Viash installed. Follow these instructions to install the required dependencies.

Download resources

git clone [email protected]:openproblems-bio/task_grn_inference.git

cd task_grn_inference

To interact with the framework, you should download the resources containing necessary inferene and evaluation datasets to get started.

scripts/download_resources.sh

Run a GRN inference method

To infer a GRN for a given dataset (e.g. norman) using simple Pearson correlation:

viash run src/control_methods/pearson_corr/config.vsh.yaml -- \
            --rna resources/grn_benchmark/inference_data/norman_rna.h5ad --prediction output/net.h5ad

Evaluate a GRN prediction

Once got the prediction for a given dataset, use the following code to obtain evaluation scores.

scripts/single_grn_evaluation.sh output/net.h5ad norman

This outputs the scores into output/test_run/scores.json

Add a method

To add a method to the repository, follow the instructions in the scripts/add_a_method.sh script.

Authors & contributors

name roles
Jalil Nourisa author
Robrecht Cannoodt author
Antoine Passimier contributor
Marco Stock contributor
Christian Arnold contributor

API

flowchart TB
  file_atac_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-chromatin-accessibility-data'>chromatin accessibility data</a>")
  comp_method[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-method'>method</a>"/]
  file_prediction_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-grn-prediction'>GRN prediction</a>")
  comp_metric_regression[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-feature-based-metrics'>feature-based metrics</a>"/]
  comp_metric_ws[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-wasserstein-distance-metrics'>Wasserstein distance metrics</a>"/]
  comp_metric[/"<a href='https://github.com/openproblems-bio/task_grn_inference#component-type-metrics'>metrics</a>"/]
  file_score_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-score'>score</a>")
  file_evaluation_bulk_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--pseudo-bulk'>perturbation data (pseudo)bulk</a>")
  file_evaluation_sc_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-perturbation-data--sc-'>perturbation data (sc)</a>")
  file_rna_h5ad("<a href='https://github.com/openproblems-bio/task_grn_inference#file-format-gene-expression-data'>gene expression data</a>")
  file_atac_h5ad-.-comp_method
  comp_method-.->file_prediction_h5ad
  file_prediction_h5ad---comp_metric_regression
  file_prediction_h5ad---comp_metric_ws
  file_prediction_h5ad---comp_metric
  comp_metric_regression-->file_score_h5ad
  comp_metric_ws-->file_score_h5ad
  comp_metric-->file_score_h5ad
  file_evaluation_bulk_h5ad---comp_metric_regression
  file_evaluation_sc_h5ad---comp_metric_ws
  file_rna_h5ad---comp_method
Loading

File format: chromatin accessibility data

Chromatin accessibility data

Example file: resources_test/grn_benchmark/inference_data//op_atac.h5ad

Format:

AnnData object
 obs: 'cell_type', 'donor_id'

Data structure:

Slot Type Description
obs["cell_type"] string (Optional) The annotated cell type of each cell based on RNA expression.
obs["donor_id"] string (Optional) Donor id.

Component type: method

A GRN inference method

Arguments:

Name Type Description
--rna file RNA expression data.
--atac file (Optional) Chromatin accessibility data.
--prediction file (Optional, Output) File indicating the inferred GRN.
--tf_all file (Optional) NA. Default: resources_test/grn_benchmark/prior/tf_all.csv.
--max_n_links integer (Optional) NA. Default: 50000.
--num_workers integer (Optional) NA. Default: 4.
--temp_dir string (Optional) NA. Default: output/temdir.
--seed integer (Optional) NA. Default: 32.
--dataset_id string (Optional) NA. Default: op.
--method_id string (Optional) NA. Default: grnboost2.

File format: GRN prediction

File indicating the inferred GRN.

Example file: resources_test/grn_models/op/collectri.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'prediction'

Data structure:

Slot Type Description
uns["dataset_id"] string A unique identifier for the dataset.
uns["method_id"] string A unique identifier for the inference method.
uns["prediction"] object Inferred GRNs in the format of source, target, weight.

Component type: feature-based metrics

A regression metric to evaluate the performance of the inferred GRN

Arguments:

Name Type Description
--prediction file File indicating the inferred GRN.
--score file (Output) File indicating the score of a metric.
--method_id string (Optional) NA.
--layer string (Optional) NA. Default: X_norm.
--max_n_links integer (Optional) NA. Default: 50000.
--verbose integer (Optional) NA. Default: 2.
--dataset_id string (Optional) NA. Default: op.
--num_workers integer (Optional) NA. Default: 4.
--apply_tf boolean (Optional) NA. Default: TRUE.
--apply_skeleton boolean (Optional) NA. Default: FALSE.
--evaluation_data file Perturbation dataset for benchmarking.
--tf_all file NA.
--reg_type string (Optional) NA. Default: ridge.

Component type: Wasserstein distance metrics

A Wasserstein distance based metric to evaluate the performance of the inferred GRN

Arguments:

Name Type Description
--prediction file File indicating the inferred GRN.
--score file (Output) File indicating the score of a metric.
--method_id string (Optional) NA.
--layer string (Optional) NA. Default: X_norm.
--max_n_links integer (Optional) NA. Default: 50000.
--verbose integer (Optional) NA. Default: 2.
--dataset_id string (Optional) NA. Default: op.
--num_workers integer (Optional) NA. Default: 4.
--apply_tf boolean (Optional) NA. Default: TRUE.
--apply_skeleton boolean (Optional) NA. Default: FALSE.
--evaluation_data_sc file Perturbation dataset for benchmarking (sinlge cell).

Component type: metrics

A metric to evaluate the performance of the inferred GRN

Arguments:

Name Type Description
--prediction file File indicating the inferred GRN.
--score file (Output) File indicating the score of a metric.
--method_id string (Optional) NA.
--layer string (Optional) NA. Default: X_norm.
--max_n_links integer (Optional) NA. Default: 50000.
--verbose integer (Optional) NA. Default: 2.
--dataset_id string (Optional) NA. Default: op.
--num_workers integer (Optional) NA. Default: 4.
--apply_tf boolean (Optional) NA. Default: TRUE.
--apply_skeleton boolean (Optional) NA. Default: FALSE.

File format: score

File indicating the score of a metric.

Example file: resources_test/scores/score.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'

Data structure:

Slot Type Description
uns["dataset_id"] string A unique identifier for the dataset.
uns["method_id"] string A unique identifier for the method.
uns["metric_ids"] string One or more unique metric identifiers.
uns["metric_values"] double The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’.

File format: perturbation data (pseudo)bulk

Perturbation dataset for benchmarking

Example file: resources_test/grn_benchmark/evaluation_data/op_bulk.h5ad

Format:

AnnData object
 obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
 layers: 'X_norm'

Data structure:

Slot Type Description
obs["cell_type"] string The annotated cell type of each cell based on RNA expression.
obs["perturbation"] string Name of the column containing perturbation names.
obs["donor_id"] string (Optional) Donor id.
obs["perturbation_type"] string (Optional) Name of the column indicating perturbation type.
layers["X_norm"] double Normalized values.

File format: perturbation data (sc)

Perturbation dataset for benchmarking (sinlge cell).

Example file: resources_test/grn_benchmark/evaluation_data/norman_sc.h5ad

Format:

AnnData object
 obs: 'cell_type', 'perturbation', 'donor_id', 'perturbation_type'
 layers: 'X_norm'

Data structure:

Slot Type Description
obs["cell_type"] string The annotated cell type of each cell based on RNA expression.
obs["perturbation"] string Name of the column containing perturbation names.
obs["donor_id"] string (Optional) Donor id.
obs["perturbation_type"] string (Optional) Name of the column indicating perturbation type.
layers["X_norm"] double Normalized values.

File format: gene expression data

RNA expression data.

Example file: resources_test/grn_benchmark/inference_data/op_rna.h5ad

Format:

AnnData object
 obs: 'cell_type', 'donor_id'
 layers: 'counts', 'X_norm'

Data structure:

Slot Type Description
obs["cell_type"] string (Optional) The annotated cell type of each cell based on RNA expression.
obs["donor_id"] string (Optional) Donor id.
layers["counts"] double (Optional) Counts matrix.
layers["X_norm"] double Normalized values.