A personal quantitative finance toolkit for analysing and optimising an Indian equity portfolio. Built around NSE data and the yfinance API, the project handles everything from raw data ingestion and cleaning through return computation, outlier treatment, and index benchmark construction — with portfolio optimisation on the roadmap.
Stocks/
├── data/
│ ├── raw/ # Raw CSV data files (gitignored)
│ └── processed/ # Processed / cleaned outputs
├── notebooks/
│ ├── 00_exploration.ipynb # Initial data exploration and rough experimentation
│ └── 01_data_cleaning.ipynb # Data cleaning walkthroughs
├── src/
│ ├── config.py # Centralised path config and index map
│ ├── scripts/
│ │ ├── fetch_tri_history.py # Auto-fetches TRI history from niftyindices.com
│ │ └── index_proxy.py # Index benchmark builder (Nifty 500 / Next 50 / Smallcap 250)
│ └── utils/
│ ├── __init__.py # Public API exports
│ ├── api_io.py # yfinance data fetching
│ ├── data_io_clean.py # Raw data loaders & cleaners
│ ├── dataset_builder.py # Portfolio, price history & index dataset builders
│ ├── features.py # Return & volatility computations
│ ├── preprocessing.py # General-purpose data cleaning helpers
│ └── universe.py # Security universe management
├── .pre-commit-config.yaml
└── pyproject.toml
Loaders for four raw data sources — NSE equity master, NSE ETF master, SGB (Sovereign Gold Bond) price history, and a broker-exported portfolio CSV. Each loader normalises column headers to snake_case, strips whitespace, parses multiple date formats robustly, and standardises security type labels (EQUITY STOCK → EQ, etc.).
Two yfinance wrappers for .NS-suffixed symbols:
fetch_current_prices— fetches the latest closing price for a list of symbols using a 1-day download with threading.fetch_historical_prices_n_years— fetchesnyears of daily adjusted close history, forward-filling any gaps.
Automatically downloads Total Return Index (TRI) history from niftyindices.com for all indices defined in config.INDEX_MAP. Key details:
- Splits the requested date range (default: 10 years) into ≤364-day chunks to comply with the API's window limits.
- Seeds session cookies by first hitting the niftyindices historical-data page, then POSTs to the
getTotalReturnIndexStringendpoint for each chunk. - Concatenates all chunks, normalises headers, strips whitespace, parses dates, deduplicates, sorts, and writes a clean CSV per index to
data/processed/. - Run it directly (
python src/scripts/fetch_tri_history.py) to refresh all benchmark CSVs in one go.
build_canonical_portfolio— merges the broker portfolio with the NSE master data on ISIN, attaches live prices viafetch_current_prices, handles SGB separately (yfinance has no support), and outputs a single standardised dataframe with columns:isin,symbol,security_type,sector,quantity,fetched_price,current_value, andweight.build_historical_price_dataset— joins the yfinance equity/ETF price history with the SGB CSV history into one aligned time-series dataframe.build_index_prices— loads a single pre-processed index price series fromdata/processed/by name, returning a labelledpd.Series.build_index_price_dataset— combines all index price series defined inconfig.INDEX_MAPinto one alignedpd.DataFrame.
- Daily percentage returns (
compute_daily_ret) - Annualised mean return (
compute_annualized_mean_ret, ×252) - Annualised volatility per security (
compute_individual_annualized_volatility, ×√252) - MAD-based winsorization (
winsorize_returns) — clips extreme outliers beyond a configurablekrobust-sigma threshold (default 8) per security independently, preserving genuine data while removing artefacts.
remove_securities— drops one or more symbols from both the canonical portfolio dataframe and the returns dataframe simultaneously, keeping them in sync. Accepts a single symbol string or a list.remove_securities_leq_weight_w— convenience wrapper that automatically identifies and removes all securities whose portfolio weight is at or below a thresholdw(default 0.5%), useful for trimming negligible positions before optimisation.
Processes historical total-return CSV files for NSE indices downloaded from NSE India. For each index, the script:
- Globs all matching raw CSV files and concatenates them.
- Normalises headers, strips whitespace, and parses dates (with debug logging for failed parses).
- Deduplicates and sorts by date, then writes a clean, date-indexed CSV to
data/processed/.
Note:
fetch_tri_history.pysupersedes the manual CSV workflow handled byindex_proxy.pyfor most use cases.
Centralised path resolution for all raw and processed data files relative to the project root, so nothing is hardcoded in notebooks or scripts. Exposes:
RAW_DATA_DIR,PROCESSED_DATA_DIR— base directory pathsNSE_EQ_PATH,NSE_ETF_PATH,SGB_PATH,PORTFOLIO_PATH— individual raw file pathsINDEX_MAP— mapping of index names to their processed CSV filenames, used bybuild_index_price_datasetandfetch_tri_history.py
Currently tracked indices:
| Index Name | Processed File |
|---|---|
NIFTY 500 |
NIFTY500.csv |
NIFTY NEXT 50 |
NEXT50.csv |
NIFTY SMALLCAP 250 |
SMALLCAP250.csv |
NIFTY IT |
NIFTYIT.csv |
All utility functions are re-exported from src/utils for clean imports in notebooks:
from utils import (
# Preprocessing
to_snake_case, normalize_column_headers, strip_string_values, convert_dates, fill_with_proxy,
# Data IO + Cleaning
load_and_clean_nse_eq_master, load_and_clean_nse_etf_master, load_and_clean_sgb, load_and_clean_broker,
# Dataset building
build_canonical_portfolio, build_historical_price_dataset, build_index_price_dataset, build_index_prices,
# API IO
fetch_current_prices, fetch_historical_prices_n_years,
# Features
compute_daily_ret, compute_annualized_mean_ret, compute_individual_annualized_volatility, winsorize_returns,
# Universe
remove_securities, remove_securities_leq_weight_w,
)| File | Location | Description |
|---|---|---|
nse_eq_master.csv |
data/raw/ |
NSE equity master list (ISIN, symbol, listing date) |
nse_etf_master.csv |
data/raw/ |
NSE ETF master list |
SGBMAY28.csv |
data/raw/ |
Sovereign Gold Bond (May 2028) price history |
current_portfolio.csv |
data/raw/ |
Broker-exported portfolio holdings |
NIFTY500.csv, NEXT50.csv, SMALLCAP250.csv, NIFTYIT.csv |
data/processed/ |
Cleaned, deduplicated TRI index series |
Raw data files are excluded from version control via
.gitignore. Processed index CSVs are auto-generated byfetch_tri_history.py.
# Install dependencies (requires Python 3.11+)
pip install -e .
# Install pre-commit hooks
pre-commit install
# Fetch/refresh all TRI benchmark data
python src/scripts/fetch_tri_history.py- Mean-variance optimisation (maximum Sharpe, minimum volatility) using
cvxpyorPyPortfolioOpt - Efficient frontier visualisation
- Target return / target risk constrained optimisation
- Weight bounds and sector concentration constraints
- Sharpe ratio, Sortino ratio, Calmar ratio
- Maximum drawdown and drawdown duration
- Portfolio Beta vs. Nifty 50
- Value at Risk (VaR) and Conditional VaR (CVaR)
- Rolling correlation heatmaps
- Sector-level and asset-class-level allocation breakdown
- Performance attribution vs. a benchmark (Nifty 500)
- Drift tracking and rebalancing triggers
- Historical backtest of current portfolio weights
- Rebalancing strategy simulation (monthly, quarterly)
- Comparison of buy-and-hold vs. rebalanced returns
- Mutual fund NAV support (via AMFI or MFApi)
- Auto-refresh of NSE master CSVs
- Scheduled price history updates
- Automated portfolio snapshot report (PDF/HTML)
- Interactive Plotly dashboard for portfolio drill-down
- Python — pandas, numpy, scipy, yfinance, cvxpy
- Visualisation — matplotlib, seaborn
- Notebooks — Jupyter
- Code Quality — pre-commit hooks (ruff / black)