This repository contains raw analysis code and assets used for annotation, Venn diagrams, heatmaps, STRING network analyses, and machine learning (ML) codes.
Top-level folders and their purpose:
annotation_volcano/
- R scripts and CSV outputs for annotation and volcano-plot-related workflows (Biomart calls, annotation of differential expression results, filtering up/down regulated genes).heatmap/
- R scripts to extract log fold change (LFC) matrices and plot heatmaps.ML_data_preparation/
- (data prep) helper code for preparing features for ML models.ML_pipeline/
- (pipeline) scripts and notebooks used to train and evaluate models.final_ML_framework/
- trained model artifacts and notebooks. Containsfinal_model.joblib
.venn/
- scripts and files to generate Venn diagrams for gene overlaps.STRING/
andstring_db/
- code and exported results for STRING network analysis.other_virus/
- analyses for other viral datasets (if present).
There are also a number of CSV files (annotated results, filtered up-/down-regulated lists, etc.) produced by the workflows.
-
R (recommended >= 4.0) with common packages used in the repository. Typical packages used across the R scripts include:
biomaRt
,dplyr
,readr
,tidyr
,ggplot2
,EnhancedVolcano
or custom volcano plotting codepheatmap
orComplexHeatmap
for heatmapsVennDiagram
for venn plotsSTRINGdb
(if using STRING API within R)
-
Python (recommended 3.8+) for the ML code. Typical Python packages:
numpy
,pandas
,scikit-learn
,joblib
(for loading/saving models)
This repository now includes basic environment files to help reproducible setup:
requirements.txt
— minimal Python requirements for the ML code.environment.yml
— Conda environment that installs Python (and optional R packages) and delegates Python dependencies to pip.renv.lock
— a minimal starter lock-like file listing R packages referenced by the R scripts.
Suggested steps (PowerShell):
# Using Conda (preferred if you use conda/miniconda/anaconda)
conda env create -f environment.yml
conda activate plagl1-analysis
# Or create a Python venv and install via pip
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt
R users:
Open an R session in the repository root and run:
# install renv if you don't have it
install.packages('renv')
# restore packages declared in renv.lock (note: the included renv.lock is a minimal starter)
renv::restore()
If you prefer strict reproducibility, create full environment snapshots (for Python pin versions into requirements.txt
with pip freeze
or use conda env export
, and for R run renv::snapshot()
to create a canonical renv.lock
).
Run R scripts from a terminal (PowerShell shown below). Adjust paths as needed.
PowerShell examples:
# Run an annotation step (example)
Rscript annotation_volcano/1_Biomart_init_m.R
# Produce extracted LFC or heatmap input
Rscript heatmap/1_Extract_LFC.R
Rscript heatmap/2_plot_heatmap.R
For the ML model (Python): open a Python REPL or script and load the saved joblib
model. Example (Python):
import joblib
model = joblib.load('final_ML_framework/final_model.joblib')
# then call model.predict(X) on prepared feature matrix X (pandas DataFrame / numpy array)
- Annotation & volcano: run the scripts in
annotation_volcano/
in order (1 -> 7) to annotate results and generate CSVs used downstream. - Heatmap: run the
heatmap/
scripts to extract LFC matrices and produce publication-ready heatmaps. - Venn: use the scripts in
venn/
to create overlap diagrams between gene lists produced by other steps. - STRING: prepare gene lists and run the code under
STRING/
orstring_db/
to query or visualize protein interaction networks. - ML: data preparation followed by training and evaluation in
ML_pipeline/
. The trained artifact is infinal_ML_framework/
.
- The repository contains raw analysis scripts rather than a packaged pipeline. Many scripts expect input files to be present in the same folder or in sibling folders (see the CSV files committed alongside the scripts).
- If a script fails due to a missing package, install it via
install.packages('<pkg>')
(R) orpip install <pkg>
(Python). - Scripts may have hard-coded file paths or expect to be run from the repository root. If you get file-not-found errors, try running the script from the repository root or adjust paths.
This repository currently does not include an explicit license. If you intend to reuse or redistribute code, add a license file (LICENSE
) or ask the repository owner for guidance.
For questions about the analyses or code, contact the repository owner or open an issue in the repository.