Skip to content

This repository contains raw analysis scripts and assets for annotation, Venn, heatmap, STRING, and machine learning (ML) workflows.

Notifications You must be signed in to change notification settings

Prokash21/omicML_raw

Repository files navigation

omicML_GUI_Development

This repository contains raw analysis code and assets used for annotation, Venn diagrams, heatmaps, STRING network analyses, and machine learning (ML) codes.

Repository overview

Top-level folders and their purpose:

  • annotation_volcano/ - R scripts and CSV outputs for annotation and volcano-plot-related workflows (Biomart calls, annotation of differential expression results, filtering up/down regulated genes).
  • heatmap/ - R scripts to extract log fold change (LFC) matrices and plot heatmaps.
  • ML_data_preparation/ - (data prep) helper code for preparing features for ML models.
  • ML_pipeline/ - (pipeline) scripts and notebooks used to train and evaluate models.
  • final_ML_framework/ - trained model artifacts and notebooks. Contains final_model.joblib.
  • venn/ - scripts and files to generate Venn diagrams for gene overlaps.
  • STRING/ and string_db/ - code and exported results for STRING network analysis.
  • other_virus/ - analyses for other viral datasets (if present).

There are also a number of CSV files (annotated results, filtered up-/down-regulated lists, etc.) produced by the workflows.

Prerequisites

  • R (recommended >= 4.0) with common packages used in the repository. Typical packages used across the R scripts include:

    • biomaRt, dplyr, readr, tidyr, ggplot2, EnhancedVolcano or custom volcano plotting code
    • pheatmap or ComplexHeatmap for heatmaps
    • VennDiagram for venn plots
    • STRINGdb (if using STRING API within R)
  • Python (recommended 3.8+) for the ML code. Typical Python packages:

    • numpy, pandas, scikit-learn, joblib (for loading/saving models)

Environments

This repository now includes basic environment files to help reproducible setup:

  • requirements.txt — minimal Python requirements for the ML code.
  • environment.yml — Conda environment that installs Python (and optional R packages) and delegates Python dependencies to pip.
  • renv.lock — a minimal starter lock-like file listing R packages referenced by the R scripts.

Suggested steps (PowerShell):

# Using Conda (preferred if you use conda/miniconda/anaconda)
conda env create -f environment.yml
conda activate plagl1-analysis

# Or create a Python venv and install via pip
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

R users:

Open an R session in the repository root and run:

# install renv if you don't have it
install.packages('renv')
# restore packages declared in renv.lock (note: the included renv.lock is a minimal starter)
renv::restore()

If you prefer strict reproducibility, create full environment snapshots (for Python pin versions into requirements.txt with pip freeze or use conda env export, and for R run renv::snapshot() to create a canonical renv.lock).

Quick start

Run R scripts from a terminal (PowerShell shown below). Adjust paths as needed.

PowerShell examples:

# Run an annotation step (example)
Rscript annotation_volcano/1_Biomart_init_m.R

# Produce extracted LFC or heatmap input
Rscript heatmap/1_Extract_LFC.R
Rscript heatmap/2_plot_heatmap.R

For the ML model (Python): open a Python REPL or script and load the saved joblib model. Example (Python):

import joblib
model = joblib.load('final_ML_framework/final_model.joblib')
# then call model.predict(X) on prepared feature matrix X (pandas DataFrame / numpy array)

Typical workflows

  • Annotation & volcano: run the scripts in annotation_volcano/ in order (1 -> 7) to annotate results and generate CSVs used downstream.
  • Heatmap: run the heatmap/ scripts to extract LFC matrices and produce publication-ready heatmaps.
  • Venn: use the scripts in venn/ to create overlap diagrams between gene lists produced by other steps.
  • STRING: prepare gene lists and run the code under STRING/ or string_db/ to query or visualize protein interaction networks.
  • ML: data preparation followed by training and evaluation in ML_pipeline/. The trained artifact is in final_ML_framework/.

Notes and assumptions

  • The repository contains raw analysis scripts rather than a packaged pipeline. Many scripts expect input files to be present in the same folder or in sibling folders (see the CSV files committed alongside the scripts).
  • If a script fails due to a missing package, install it via install.packages('<pkg>') (R) or pip install <pkg> (Python).
  • Scripts may have hard-coded file paths or expect to be run from the repository root. If you get file-not-found errors, try running the script from the repository root or adjust paths.

License and contact

This repository currently does not include an explicit license. If you intend to reuse or redistribute code, add a license file (LICENSE) or ask the repository owner for guidance.

For questions about the analyses or code, contact the repository owner or open an issue in the repository.

About

This repository contains raw analysis scripts and assets for annotation, Venn, heatmap, STRING, and machine learning (ML) workflows.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages