Some julia code related to the BPRMeth package and paper.
On some benchmarks, it is hundreds of times faster than the original R package. Multithreading is not supported though because the speed boost is partly based on memory management.
Warning
Only used for study purposes. Expect nothing more. Should be considered as a proof of concept. For example it is not documented, and the API is not stable at all.
julia>] add https://github.com/arnaud-ma/BPRMeth.jlThe package comes with a small example dataset available in the data/ folder, which is a subset of the ENCODE K562 cell line. It contains the zipped methylation data in a bedRrbs format and the annotation in a gtf format.
using BPRMeth
using StableRngs; rng = StableRng(42) # used for reproducibility, optional
using Plots
met_file = "data/K562/seq_rep1.bedRrbs.gz"
anno_file = "data/K562/anno_rep1V3.gtf.gz"
regions = construct_regions(; met_file, anno_file, min_sites = 15) #! take some secondsTo have the same dataset as the R package:
using FileIO, BPRMeth
regions = load("regionsR.jld2", "regionsR")nb_basis = 5
model = bpr_model(nb_basis)
# or model = KGLM(nb_basis; kind=Binomial, link=ProbitLink, kernel = GaussianKernel, ridge_factor = 0.5, μ = LinRange(-1, 1, M + 2)[2:(end - 1)], lengthscale = sqrt(2) / M, intercept = true)
plot(region)
X = LinRange(-1, 1, 1000)
p = plot(X, BPRMeth.predict(model, X))
# fit one region (the one you want)
region = only([r for r in regions if r.region.gene_id == "ENSG00000182087"])
result = fit_mle(model, regions[1])
plot(region)
X = LinRange(-1, 1, 1000)
plot!(p, X, BPRMeth.predict(model, X))
# fit all regions at once
results = fit_mle(model, regions)Clustering results supports the Clustering.jl API.
nb_clusters = 3
cluster_result = fit_mle(nb_clusters, regions, model; rng = copy(rng), verbose = true)
# some visualization
p = plot()
X = LinRange(-1, 1, 1000)
ys = [BPRMeth.predict(model, X, result) for result in cluster_result.models]
p = plot(X, Ys, linewidth = 3)