omicML_GUI_Development

This repository contains raw analysis code and assets used for annotation, Venn diagrams, heatmaps, STRING network analyses, and machine learning (ML) codes.

Repository overview

Top-level folders and their purpose:

annotation_volcano/ - R scripts and CSV outputs for annotation and volcano-plot-related workflows (Biomart calls, annotation of differential expression results, filtering up/down regulated genes).
heatmap/ - R scripts to extract log fold change (LFC) matrices and plot heatmaps.
ML_data_preparation/ - (data prep) helper code for preparing features for ML models.
ML_pipeline/ - (pipeline) scripts and notebooks used to train and evaluate models.
final_ML_framework/ - trained model artifacts and notebooks. Contains final_model.joblib.
venn/ - scripts and files to generate Venn diagrams for gene overlaps.
STRING/ and string_db/ - code and exported results for STRING network analysis.
other_virus/ - analyses for other viral datasets (if present).

There are also a number of CSV files (annotated results, filtered up-/down-regulated lists, etc.) produced by the workflows.

Prerequisites

R (recommended >= 4.0) with common packages used in the repository. Typical packages used across the R scripts include:
- biomaRt, dplyr, readr, tidyr, ggplot2, EnhancedVolcano or custom volcano plotting code
- pheatmap or ComplexHeatmap for heatmaps
- VennDiagram for venn plots
- STRINGdb (if using STRING API within R)
Python (recommended 3.8+) for the ML code. Typical Python packages:
- numpy, pandas, scikit-learn, joblib (for loading/saving models)

Environments

This repository now includes basic environment files to help reproducible setup:

requirements.txt — minimal Python requirements for the ML code.
environment.yml — Conda environment that installs Python (and optional R packages) and delegates Python dependencies to pip.
renv.lock — a minimal starter lock-like file listing R packages referenced by the R scripts.

Suggested steps (PowerShell):

# Using Conda (preferred if you use conda/miniconda/anaconda)
conda env create -f environment.yml
conda activate plagl1-analysis

# Or create a Python venv and install via pip
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt

R users:

Open an R session in the repository root and run:

# install renv if you don't have it
install.packages('renv')
# restore packages declared in renv.lock (note: the included renv.lock is a minimal starter)
renv::restore()

If you prefer strict reproducibility, create full environment snapshots (for Python pin versions into requirements.txt with pip freeze or use conda env export, and for R run renv::snapshot() to create a canonical renv.lock).

Quick start

Run R scripts from a terminal (PowerShell shown below). Adjust paths as needed.

PowerShell examples:

# Run an annotation step (example)
Rscript annotation_volcano/1_Biomart_init_m.R

# Produce extracted LFC or heatmap input
Rscript heatmap/1_Extract_LFC.R
Rscript heatmap/2_plot_heatmap.R

For the ML model (Python): open a Python REPL or script and load the saved joblib model. Example (Python):

import joblib
model = joblib.load('final_ML_framework/final_model.joblib')
# then call model.predict(X) on prepared feature matrix X (pandas DataFrame / numpy array)

Typical workflows

Annotation & volcano: run the scripts in annotation_volcano/ in order (1 -> 7) to annotate results and generate CSVs used downstream.
Heatmap: run the heatmap/ scripts to extract LFC matrices and produce publication-ready heatmaps.
Venn: use the scripts in venn/ to create overlap diagrams between gene lists produced by other steps.
STRING: prepare gene lists and run the code under STRING/ or string_db/ to query or visualize protein interaction networks.
ML: data preparation followed by training and evaluation in ML_pipeline/. The trained artifact is in final_ML_framework/.

Notes and assumptions

The repository contains raw analysis scripts rather than a packaged pipeline. Many scripts expect input files to be present in the same folder or in sibling folders (see the CSV files committed alongside the scripts).
If a script fails due to a missing package, install it via install.packages('<pkg>') (R) or pip install <pkg> (Python).
Scripts may have hard-coded file paths or expect to be run from the repository root. If you get file-not-found errors, try running the script from the repository root or adjust paths.

License and contact

This repository currently does not include an explicit license. If you intend to reuse or redistribute code, add a license file (LICENSE) or ask the repository owner for guidance.

For questions about the analyses or code, contact the repository owner or open an issue in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

omicML_GUI_Development

Repository overview

Prerequisites

Environments

Quick start

Typical workflows

Notes and assumptions

License and contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
ML_data_preparation		ML_data_preparation
ML_pipeline		ML_pipeline
STRING		STRING
annotation_volcano		annotation_volcano
final_ML_framework		final_ML_framework
heatmap		heatmap
other_virus		other_virus
string_db		string_db
venn		venn
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
renv.lock		renv.lock
requirements.txt		requirements.txt

Prokash21/omicML_raw

Folders and files

Latest commit

History

Repository files navigation

omicML_GUI_Development

Repository overview

Prerequisites

Environments

Quick start

Typical workflows

Notes and assumptions

License and contact

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages