Skip to content

microprediction/precise

Repository files navigation

precise

ci License: MIT Python 3.9+

Online (incremental) covariance and correlation estimation — the streaming complement to sklearn.covariance, whose estimators are batch-only and have no partial_fit. Pure Python + numpy; no other required dependencies. And because no estimator wins everywhere, precise also assesses an estimate and recommends one for your data (see Assess & recommend).

📖 Docs: precise.microprediction.org

pip install precise

Use

from precise import EwaCovariance

est = EwaCovariance(r=0.05)
for y in stream:            # y is a 1d observation; pass a 2d array for a batch
    est.partial_fit(y)

est.covariance_             # (n, n) ndarray
est.correlation_            # unit-diagonal correlation
est.precision_             # inverse covariance
est.location_              # running mean
est.fit(X)                 # sklearn-style batch drop-in (X is 2d)

Every estimator is truly online — a constant amount of work per observation, no growing buffers. State is a plain dict, so you can checkpoint mid-stream with get_state() / set_state().

Estimators

Class What it does
EmpiricalCovariance running sample covariance (Welford)
EwaCovariance exponentially weighted (recency-biased)
AdaptiveEwaCovariance EWMA whose forgetting rate speeds up on regime change
LedoitWolfCovariance online Ledoit-Wolf shrinkage towards a scaled identity
OASCovariance online Oracle Approximating Shrinkage (often better-conditioned than LW)
ShrunkCovariance fixed-intensity shrinkage to identity or a constant-correlation target
PartialMomentsCovariance exponentially weighted partial-moment (semi-)covariance
HuberCovariance online robust estimator that downweights outliers
TylerCovariance recursive Tyler M-estimator — robust correlation/shape for elliptical data
GeodesicEwaCovariance recency-weighted update along the affine-invariant SPD geodesic
DCCCovariance dynamic conditional correlation — decouples volatility from correlation
FactorCovariance online low-rank + diagonal (approximate factor model); O(d·k) per step
from precise import all_estimators, estimator_from_name
all_estimators()                          # the list of classes (a bake-off in one loop)
estimator_from_name("LedoitWolfCovariance")

Keyed / dynamic universes (river-style)

In streaming/finance settings observations arrive as dicts keyed by name, and the set of names can change over time. keyed(...) decorates any of the estimators above to consume keyed dicts (river-style update / learn_one) and emit keyed output:

from precise import keyed, EwaCovariance

d = keyed(EwaCovariance(r=0.05), dynamic=True)   # changing universe (DynamicUniverse)
d.update({"AAPL": 0.01, "MSFT": -0.02})
d.update({"MSFT": 0.00, "NVDA": 0.03})           # AAPL leaves, NVDA enters
d.covariance_                                     # dict-of-dicts over the live universe
d.to_frame()                                      # pandas DataFrame  (pip install precise[pandas])

k = keyed(EwaCovariance(r=0.05))                  # fixed universe, imputes missing keys (FixedUniverse)

dynamic=False (the default) gives a FixedUniverse (one wrapped estimator, missing keys imputed); dynamic=True gives a DynamicUniverse (a wrapped estimator per live key-set). Both work with any positional estimator — the adapter adds no covariance math of its own.

Composing volatility × correlation

H = D R D is a composition, not a fixed algorithm. ConditionalCovariance lets you pick the per-series volatility model and the correlation estimator independently — DCCCovariance is just the EWMA/EWMA special case:

from precise import ConditionalCovariance, EwaCovariance, LedoitWolfCovariance

est = ConditionalCovariance(vol=EwaCovariance(r=0.02),       # any estimator, used per series in 1-D
                            corr=LedoitWolfCovariance(r=0.05))  # correlation from any estimator

The volatility model can also be any univariate model from microprediction/skaters (Holt, Hosking, …) via from_skater — precise doesn't depend on it; the adapter is duck-typed:

import skaters
from precise import ConditionalCovariance, from_skater
est = ConditionalCovariance(vol=from_skater(skaters.holt), corr=EwaCovariance(r=0.05))

Assess & recommend

No estimator wins everywhere, so precise treats judging and choosing an estimate as first-class alongside producing one.

from precise import all_assessors, suggest

all_assessors()             # scoring rules: LogLikelihood, BlockPseudoLikelihood, SchurLikelihood,
                            # SteinLoss, FrobeniusToTruth, GMVVariance, ... (higher = better)
suggest(X, top=3)           # recommend estimator classes from observable features of X

suggest maps truth-free features of your data (p/n, effective rank, sphericity, condition number, off-diagonal mass, excess kurtosis) to an estimator, via a frozen, numpy-only decision tree. The Schur pseudo-likelihood — a one-parameter (γ) bridge between the full and block-diagonal Gaussian likelihoods — is both an assessor here and the subject of a working paper.

Related

Migrating from precise < 1.0 (the functional "skater" API)? See MIGRATING.md.

Disclaimer

Not investment advice. Just code, subject to the MIT License.

Packages

 
 
 

Contributors