Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions models/propermab_linear/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# PROPERMAB Linear Baseline

Ridge regression model trained on PROPERMAB features.

## Description

This baseline uses 5 PROPERMAB descriptors to predict antibody developability properties:
- **hyd_patch_area_cdr**: Hydrophobic surface patches
- **pos_patch_area**: Positively charged surface patches
- **dipole_moment**: Dipole moment of Fv domain
- **aromatic_asa**: Total solvent accessible surface area
- **exposed_net_charge**: Total charge of CDR atoms that are solvent-exposed

A Ridge regression model is trained separately for each biophysical property using 5-fold cross-validation.

## Requirements

- Pre-computed PROPERMAB training and test features in `feature_store_top5.csv`

## Installation

```bash
# From this directory
pixi install
```

## Usage

### CLI Interface

The baseline implements a standardized CLI interface with only required arguments. PROPERMAB features are loaded from the csv file `feature_store_top5.csv`.

#### Train Models

```bash
# From the baseline directory
pixi run python -m tap_linear train \
--data <path-to-training-csv> \
--run-dir <directory-to-save-models> \
[--seed 42]

# Example
pixi run python -m tap_linear train \
--data ../../data/GDPa1_v1.2_20250814.csv \
--run-dir ./outputs/run_001
```

This will:
1. Load training data from `--data`
2. Load PROPERMAB features automatically from csv file
3. Train Ridge models for each property using 5-fold cross-validation
4. Save trained models to `run-dir/models.pkl`
5. Save cross-validation predictions to `run-dir/cv_predictions.csv`

#### Generate Predictions

```bash
# From the baseline directory
pixi run python -m tap_linear predict \
--data <path-to-input-csv> \
--run-dir <directory-with-trained-models> \
--out-dir <directory-to-write-predictions>

# Example: CV predictions
pixi run python -m tap_linear predict \
--data ../../data/GDPa1_v1.2_20250814.csv \
--run-dir ./outputs/run_001 \
--out-dir ../../predictions/cv_run_001

# Example: Heldout predictions
pixi run python -m tap_linear predict \
--data ../../data/heldout-set-sequences.csv \
--run-dir ./outputs/run_001 \
--out-dir ../../predictions/heldout_run_001
```

Behavior:
- PROPERMAB features are loaded automatically from csv file
- For **training data** (with fold column): Uses CV predictions from training
- For **heldout data**: Uses final models trained on all data
- Writes predictions to `out-dir/predictions.csv`

### Development

```bash
# Run tests (requires dev environment)
pixi run -e dev test

# Lint code (requires dev environment)
pixi run -e dev lint
```

## Implementation

This baseline implements the `BaseModel` interface from `abdev_core`:

```python
from abdev_core import BaseModel, load_features

class TapLinearModel(BaseModel):
def train(self, df: pd.DataFrame, run_dir: Path, *, seed: int) -> None:
# Load features from centralized store
tap_features = load_features("TAP", dataset="GDPa1")
# Train models and generate CV predictions
...

def predict(self, df: pd.DataFrame, run_dir: Path, out_dir: Path) -> None:
# Load features from centralized store
tap_features = load_features("TAP", dataset="heldout_test")
# Generate predictions from trained models
...
```

Features are managed centrally by `abdev_core` - models simply import what they need. See the [abdev_core documentation](../../libs/abdev_core/README.md) for details.

## Output

Predictions are written to `<out-dir>/predictions.csv` with columns:
- `antibody_name`
- `vh_protein_sequence`, `vl_protein_sequence`
- Predicted values for: `HIC`, `Tm2`, `Titer`, `PR_CHO`, `AC-SINS_pH7.4`

## Reference

PROPERMAB features from: Li B, et al. (2025). "PROPERMAB: an integrative framework for in silico prediction of antibody developability using machine learning
" mAbs.
1,899 changes: 1,899 additions & 0 deletions models/propermab_linear/pixi.lock

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions models/propermab_linear/pixi.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
[workspace]
name = "tap-linear"
version = "0.1.0"
description = "TAP Linear baseline - Ridge regression on TAP features"
channels = ["conda-forge"]
platforms = ["linux-64", "osx-64", "osx-arm64"]

[dependencies]
python = "3.11.*"
numpy = ">=1.24"
pandas = ">=2.0"
scikit-learn = ">=1.3"
typer = ">=0.9"

[pypi-dependencies]
abdev-core = { path = "../../libs/abdev_core", editable = true }
tap-linear = { path = ".", editable = true }

[environments]
default = []
dev = ["dev"]

[feature.dev.dependencies]
pytest = ">=7.0"
ruff = ">=0.1"

[feature.dev.tasks]
# Development tasks only - orchestrator will call train/predict directly
lint = "ruff check src && ruff format --check src"
test = "pytest tests -v"

23 changes: 23 additions & 0 deletions models/propermab_linear/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
[build-system]
requires = ["setuptools>=64", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "tap-linear"
version = "0.1.0"
description = "TAP Linear baseline - Ridge regression on TAP features"
requires-python = ">=3.11"
dependencies = [
"abdev-core",
"pandas>=2.0",
"numpy>=1.24",
"scikit-learn>=1.3",
"typer>=0.9.0",
]

[tool.setuptools.packages.find]
where = ["src"]

[tool.setuptools.package-dir]
"" = "src"

4 changes: 4 additions & 0 deletions models/propermab_linear/src/propermab_linear/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
"""TAP Linear baseline for antibody developability prediction."""

__version__ = "0.1.0"

7 changes: 7 additions & 0 deletions models/propermab_linear/src/propermab_linear/__main__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""Entry point for model CLI."""

from .run import app

if __name__ == "__main__":
app()

Loading