Skip to content

cyanea-bio/labs

Repository files navigation

Cyanea Labs

Modular bioinformatics in Rust.
Sequences, alignments, structures, stats — native, WASM, Python.

Rust License

DocsArchitectureBuild GuideBindingsUsage GuideIssuesDiscussions


Cyanea Labs is a Cargo workspace of 13 crates covering the core primitives of computational biology — sequence analysis, alignment, genomic intervals, statistics, machine learning, cheminformatics, structural biology, and phylogenetics. Everything compiles to native, WebAssembly, and Python (via PyO3), with an Elixir NIF bridge for the Cyanea platform.

3,000+ tests. Zero unsafe. No heavyweight C/C++ dependencies in the core path.

Quick Start

# Check everything compiles
cargo check --workspace

# Run all tests (exclude cyanea-py unless Python env is configured)
cargo test --workspace --exclude cyanea-py

# Test a single crate
cargo test -p cyanea-seq

# Run benchmarks
cargo bench -p cyanea-align

From Rust

use cyanea_seq::DnaSequence;
use cyanea_align::needleman_wunsch;
use cyanea_stats::describe;

// Sequence analysis
let seq = DnaSequence::new("ATCGATCG").unwrap();
assert_eq!(seq.gc_content(), 0.5);

// Global alignment
let result = needleman_wunsch(b"ACGT", b"ACGTT", 1, -1, -2).unwrap();
println!("{}", result.cigar); // 4=1I

// Descriptive statistics
let stats = describe(&[1.0, 2.0, 3.0, 4.0, 5.0]).unwrap();
println!("mean={}, std={:.2}", stats.mean, stats.std_dev);

From Python

import cyanea

# Align two sequences
result = cyanea.smith_waterman("ACGTACGT", "CGTAC", 2, -1, -2)
print(result["score"], result["cigar"])

# Parse a VCF file
variants = cyanea.read_vcf("samples.vcf")

# PCA on a distance matrix
coords = cyanea.pca([[0,1,2],[1,0,3],[2,3,0]], n_components=2)

From JavaScript / TypeScript

import { align, seq, stats } from "@cyanea/bio";

const result = align.smithWaterman("ACGTACGT", "CGTAC", 2, -1, -2);
const gc = seq.gcContent("ATCGATCG");
const desc = stats.describe([1, 2, 3, 4, 5]);

Crates

Crate What it does Tests
cyanea-core Error types, traits, SHA-256, zstd/gzip, mmap, log-space probability, rank/select bitvectors, wavelet matrix, Fenwick tree 58
cyanea-seq DNA/RNA/protein sequences, FASTA/FASTQ, k-mers, 2-bit encoding, suffix array, FM-index, BWT, FMD-index, MinHash, pattern matching (KMP, Boyer-Moore, Myers bit-parallel), PSSM/motif scanning, ORF finder, codon tables, sequence masking, RNA secondary structure, protein properties, read simulation, de Bruijn graphs, assembly QC 474
cyanea-align Needleman-Wunsch, Smith-Waterman, semi-global, banded, MSA, seed-and-extend, minimizers, WFA, POA, LCSk++, pair HMM, profile HMM, X-drop/Z-drop, spliced alignment, CIGAR utilities, substitution matrices (BLOSUM/PAM), SIMD (NEON/SSE4.1/AVX2), GPU dispatch 321
cyanea-omics Genomic coordinates, interval sets/trees, genome arithmetic, expression matrices, sparse matrices, variants, gene annotations, coordinate liftover, AnnData/h5ad/zarr, variant annotation/VEP, CNV/CBS, methylation, spatial transcriptomics, single-cell (HVG, normalize, Leiden/Louvain, diffusion map, DPT, PAGA, markers, Harmony/ComBat/MNN) 434
cyanea-io CSV, VCF, BED, BEDPE, GFF3, GTF, SAM, BAM, CRAM, BCF, Parquet, BLAST, BLAST XML, MAF, GenBank, bigWig, Stockholm, Clustal, Phylip, EMBL, PIR, ABI, bedGraph, GFA, indexed BAM/VCF, BAM ops, VCF ops, variant calling, fetch clients 357
cyanea-stats Descriptive stats, correlation, hypothesis tests (t, chi-squared, Mann-Whitney, Fisher, KS), distributions, PCA, effect sizes, Bayesian conjugate priors, combinatorics, population genetics (Fst, Tajima's D, LD), differential expression, enrichment (GSEA, ORA), ordination (PCoA, NMDS), multivariate tests (PERMANOVA, ANOSIM), survival analysis, ecological diversity 384
cyanea-ml K-means, DBSCAN, hierarchical clustering, pairwise distances, KNN, PCA, t-SNE, UMAP, random forest, GBDT, feature selection, HMM, classification metrics, cross-validation 269
cyanea-chem SMILES/SDF V2000/V3000, SMARTS, molecular fingerprints (Morgan, MACCS), substructure search, stereochemistry, canonical SMILES, 200+ descriptors, drug-likeness (Lipinski, QED, PAINS), scaffolds (Murcko, MCS), 3D conformers (ETKDG), force fields (UFF, MMFF94), Gasteiger charges, chemical reactions (SMIRKS), standardization 200
cyanea-struct PDB/mmCIF parsing, 3D geometry, DSSP secondary structure, Kabsch superposition, contact maps, Ramachandran analysis 76
cyanea-phylo Newick/NEXUS parsing, distance matrices, UPGMA/NJ, Fitch/Sankoff parsimony, ML likelihood (GTR+G), bootstrap, tree search (NNI/SPR/TBR), model selection (AIC/BIC), protein models (LG/WAG/JTT), Bayesian MCMC, species tree (ASTRAL), UniFrac, simulation, consensus, dating, drawing 225
cyanea-gpu Backend trait with CPU, CUDA, Metal, and WebGPU implementations, GPU buffer management, k-mer counting, Smith-Waterman, MinHash, benchmarks 62
cyanea-wasm WebAssembly bindings via wasm-bindgen (seq, io, align, stats, ml, chem, struct_bio, phylo, omics, core) 223
cyanea-py Python bindings via PyO3 (seq, align, stats, ml, chem, struct_bio, phylo, io, omics, sc) with optional NumPy support

Architecture

See docs/ARCHITECTURE.md for the full architecture guide with ASCII diagrams, feature flag details, data flow pipelines, and platform support matrix.

cyanea-core (foundation)
├── cyanea-seq          Sequences, indexing, k-mers
├── cyanea-align        Pairwise & multiple alignment
├── cyanea-omics        Genomic intervals, matrices, variants
├── cyanea-io           File format I/O
├── cyanea-stats        Statistics & distributions
├── cyanea-ml           Machine learning & clustering
├── cyanea-chem         Chemical structure & fingerprints
├── cyanea-struct       Protein structure & geometry
├── cyanea-phylo        Phylogenetics (optional: cyanea-ml)
├── cyanea-gpu          GPU compute backends
├── cyanea-wasm         → WebAssembly (@cyanea/bio on npm)
└── cyanea-py           → Python (PyO3 + maturin)

Every domain crate depends only on cyanea-core. The binding crates (wasm, py) aggregate the domain crates to expose a unified API. The Elixir NIF bridge lives in a separate repo and depends on these crates via path.

Feature Flags

All domain crates default to std. Opt into additional capabilities:

Flag Scope What it enables
parallel align, ml, stats, chem, struct, phylo, io Rayon parallelism
simd align SIMD-accelerated alignment
cuda gpu, align CUDA GPU backend (requires CUDA toolkit)
metal gpu, align Metal GPU backend (macOS)
wgpu gpu WebGPU backend (cross-platform)
blas ml, stats BLAS-backed PCA
wfa align Wavefront alignment
minhash seq MinHash / FracMinHash sketching
sam io SAM format support
bam io BAM format (implies sam)
cram io CRAM format (implies sam)
parquet io Apache Parquet columnar format
h5ad omics HDF5-backed AnnData I/O
zarr omics Zarr v3 I/O
single-cell omics Single-cell analysis pipeline (Leiden, HVG, pseudotime, integration)
serde all Serialization support
wasm wasm wasm-bindgen annotations
numpy py NumPy array interop

Building Bindings

See docs/BUILDING.md for the complete build guide covering all targets, feature combinations, and troubleshooting.

WebAssembly

cargo check -p cyanea-wasm --features wasm
wasm-pack build cyanea-wasm --features wasm

Published as @cyanea/bio on npm. TypeScript types and a Web Worker API are included in cyanea-wasm/ts/.

Python

PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 cargo check -p cyanea-py
cd cyanea-py && PYO3_USE_ABI3_FORWARD_COMPATIBILITY=1 maturin develop --release

Elixir NIF

cargo check -p cyanea-native   # Check only (linking requires BEAM)
cd ../cyanea && mix compile     # Full build via Rustler

Cross-Crate Pipelines

See docs/GUIDE.md for complete cross-crate workflow examples: FASTQ→alignment→variants, single-cell analysis, cheminformatics, phylogenetics, population genetics, and microbiome analysis.

Contributing

# Full CI check
cargo fmt --all -- --check
cargo clippy --workspace --exclude cyanea-py
cargo test --workspace --exclude cyanea-py

Using Claude Code? See CLAUDE.md for project context, architecture, and coding conventions.

License

Apache-2.0

About

Cyanea Labs: Rust bioinformatics ecosystem

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages