AverageLoglikelihoodRatio.jl

Compare position frequency matrices (PFMs) using the Average Log-Likelihood Ratio (ALLR) from Wang and Stormo (2003). Computes a similarity score between two PFMs of possibly different widths, with optional p-values via permutation test.

Installation

using Pkg
Pkg.add(url="https://github.com/kchu25/AverageLoglikelihoodRatio.jl")

Usage

Single pair

using AverageLoglikelihoodRatio

# Each PFM is 4 x W (rows = A, C, G, T; columns = positions)
input = [0.90 0.05 0.05;
         0.05 0.85 0.05;
         0.025 0.05 0.85;
         0.025 0.05 0.05]

target = [0.25 0.88 0.06 0.06 0.25;
          0.25 0.04 0.84 0.09 0.25;
          0.25 0.04 0.05 0.78 0.25;
          0.25 0.04 0.05 0.07 0.25]

result = compare_pfms(input, target)
result.score              # best ALLR across all alignments
result.offset             # 0-indexed position of shorter PFM on longer PFM
result.pvalue             # permutation p-value (1000 permutations by default)
result.reverse_complement # true if the best match was on the reverse complement strand

Multiple inputs vs. multiple targets

inputs  = [pfm_a, pfm_b, pfm_c]
targets = [pfm_x, pfm_y]

# Returns a 3 x 2 Matrix{ALLRResult}
results = compare_pfms(inputs, targets)
results[2, 1]  # comparison of pfm_b vs pfm_x

Options

compare_pfms(input, target;
    background     = [0.25, 0.25, 0.25, 0.25],  # nucleotide background frequencies
    n_perm         = 1000,                        # number of permutations for p-value
    compute_pvalue = true,                        # set false to skip permutation test
    reverse_comp   = false,                       # also scan the reverse complement of the target
    rng            = Random.default_rng()         # RNG for reproducibility
)

API

Function	Description
`compare_pfms(input, target; ...)`	Compare two PFMs. Returns `ALLRResult`.
`compare_pfms(inputs, targets; ...)`	Compare all pairs. Returns `Matrix{ALLRResult}`.
`allr_column(c1, c2; background)`	ALLR between two length-4 frequency vectors.
`allr_matrix(pfm1, pfm2; background)`	Mean column-wise ALLR for equal-width PFMs.

How it works

The shorter PFM is slid across every valid offset of the longer PFM. At each offset, the mean ALLR across aligned columns is computed. The offset with the highest score is reported.

When reverse_comp=true, the same scan is repeated on the reverse complement of the target (rows reordered A↔T, C↔G; columns reversed). The strand with the higher score wins and is recorded in result.reverse_complement.

For the p-value, n_perm random PFMs are drawn from a Dirichlet distribution parameterized by the background frequencies. When reverse_comp=true, each random PFM is also scanned on both strands so the null distribution is calibrated consistently. The observed score is compared against this null:

P = (count(S_random >= S_observed) + 1) / (N_permutations + 1)

See latex/allr.tex for the full formulas.

Reference

Wang T, Stormo GD. "Combining phylogenetic data with co-regulated genes to identify regulatory motifs." Bioinformatics, 19(18):2369-2380, 2003. https://academic.oup.com/bioinformatics/article/19/18/2369/194379

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AverageLoglikelihoodRatio.jl

Installation

Usage

Single pair

Multiple inputs vs. multiple targets

Options

API

How it works

Reference

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

AverageLoglikelihoodRatio.jl

Installation

Usage

Single pair

Multiple inputs vs. multiple targets

Options

API

How it works

Reference