A robust Python repository for probabilistic Bitcoin (BTC-USD) price forecasting, focused on uncertainty quantification, rigorous backtesting, and automated optimization. The system is engineered for data science research, educational exploration, and developing advanced trading strategies — not for financial advice or live trading.
Disclaimer: This project is for research and educational purposes only. It is not investment advice. Financial markets are uncertain; past performance does not guarantee future results.
The system is built modularly, with clear divisions between key processes:
-
Data Pipeline (
/data)
Ingests, validates, and preprocesses raw OHLCV (Open/High/Low/Close/Volume) data. -
Feature Engineering (
/features)
Constructs features including technical indicators, statistical metrics, volatility ratios, "halving" event counters, and custom “alpha” features with a scikit-learn pipeline. -
Modeling (
/models)
Implements several probabilistic models:- LightGBM Quantile Regression for fast, direct quantile forecasts.
- Bayesian LSTM (with MC Dropout) for deep temporal uncertainty modeling (PyTorch).
- Gaussian Process for Bayesian baselines.
- Ensemble models combining various approaches by performance-weighted stacking.
-
Training & Hyperparameter Tuning (
/train,/autotune)
Unified scripts for model training, leveraging Optuna for automatic hyperparameter search. -
Backtesting (
/backtest)
Rigorous walk-forward validation with trading simulation, including risk management logic. -
Evaluation (
/eval)
Tools for forecast quality measurement, such as CRPS, PICP, Winkler Score, calibration plots, and post-hoc recalibration. -
Deployment & Monitoring (
/deploy)
FastAPI serving for predictions and a performance drift monitor.
- Python 3.9+
- Git
-
Clone the repository:
git clone <repository_url> cd btc-quant-prob
-
(Recommended) Create and activate a virtual environment, then install dependencies:
python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate pip install -r requirements.txt
-
Download Data:
Get Bitcoin Historical Data from Kaggle and placebitcoin_historical_data.csvinsidebtc-quant-prob/raw_data/.Tip: For quick tests, a mock data generator is also included in the codebase.
Basic workflow:
-
Activate your virtual environment:
source venv/bin/activateYou should see
(venv)in your terminal prompt. -
Train a Model:
python -m train.train --horizon 180 --model_name lgbm_quantile
Trains a LightGBM quantile regression model for 180-day predictions, saving the trained model in
artifacts/. -
Generate Walk-Forward Backtest Predictions:
python -m backtest.walkforward --horizon 180 --model_name lgbm_quantile
Stores predictions in
artifacts/backtest/. -
Evaluate and Visualize:
Open and execute the notebooknotebooks/02_traineval.ipynbfor analysis and visualization.
Summary Workflow:
- Training (
train.train) → produces model - Backtesting (
backtest.walkforward) → produces historical results - Notebook evaluation (
notebooks/02_traineval.ipynb) → visualization and analysis
- Tune hyperparameters for each model (see
autotune/iterate.py, using Optuna). - Engineer new, original features ("alpha"). Ideas:
- Short/long-term volatility ratios
- Feature interactions (e.g., RSI × volatility)
- Halving effects (
add_halving_features)
- Try more models or ensembles (
models/gp_baseline.py,models/ensemble.py). - Refine backtester: Add stop-loss, volatility-based position sizing, and "high-conviction" trading signals in
backtest/simulate.py.
- Deepen knowledge in probability/statistics (CRPS, Bayesian methods), time-series analysis (ARIMA, GARCH), linear algebra/calculus.
- Understand financial market concepts: EMH, market structure, options/futures.
- Document experiments methodically (“50 features tested, feature X improved Sharpe by 0.2…”).
- Join Kaggle time series competitions.
- Read quant research papers and replicate simple ideas.
- Applied Optuna hyperparameter tuning (50+ trials). Achieved low CRPS, but trading results poor with default median-based strategy (Sharpe < 0, max drawdown -100%).
- Added "halving" and new interaction features: model uncertainty prediction improved (CRPS = 0.11).
- Tried high-conviction, tail-based, and contrarian trading strategies. Only contrarian logic yielded sustainable returns (Sharpe ≈ 2), but drawdown remains problematic during regime shifts.
- Discovered that feature-rich models can forecast uncertainty, not direction — highlighting the challenge of trading with predictive signals.
- Plan: add volatility-based position sizing and trend regime filters (e.g., SMA-200) for risk management.
- mczielinski/bitcoin-historical-data (Kaggle)
- CRPS: Continuous Ranked Probability Score
- Quant blog reading: AQR, Renaissance, Two Sigma, etc.
Research ideas, feature pull requests, and reproducible experiment logs are welcome. Open issues, document your findings, and help make this project a clearer resource for quant-oriented probabilistic research!
MIT (c) 2024-present
Original author: jwjooth