ginkgobioworks · MichaelChungyoun · Dec 2, 2025
diff --git a/models/propermab_linear/README.md b/models/propermab_linear/README.md
@@ -0,0 +1,126 @@
+# PROPERMAB Linear Baseline
+
+Ridge regression model trained on PROPERMAB features.
+
+## Description
+
+This baseline uses 5 PROPERMAB descriptors to predict antibody developability properties:
+- **hyd_patch_area_cdr**: Hydrophobic surface patches
+- **pos_patch_area**: Positively charged surface patches
+- **dipole_moment**: Dipole moment of Fv domain
+- **aromatic_asa**: Total solvent accessible surface area  
+- **exposed_net_charge**: Total charge of CDR atoms that are solvent-exposed
+
+A Ridge regression model is trained separately for each biophysical property using 5-fold cross-validation.
+
+## Requirements
+
+- Pre-computed PROPERMAB training and test features in `feature_store_top5.csv`
+
+## Installation
+
+```bash
+# From this directory
+pixi install
+```
+
+## Usage
+
+### CLI Interface
+
+The baseline implements a standardized CLI interface with only required arguments. PROPERMAB features are loaded from the csv file `feature_store_top5.csv`.
+
+#### Train Models
+
+```bash
+# From the baseline directory
+pixi run python -m tap_linear train \
+  --data <path-to-training-csv> \
+  --run-dir <directory-to-save-models> \
+  [--seed 42]
+
+# Example
+pixi run python -m tap_linear train \
+  --data ../../data/GDPa1_v1.2_20250814.csv \
+  --run-dir ./outputs/run_001
+```
+
+This will:
+1. Load training data from `--data`
+2. Load PROPERMAB features automatically from csv file
+3. Train Ridge models for each property using 5-fold cross-validation
+4. Save trained models to `run-dir/models.pkl`
+5. Save cross-validation predictions to `run-dir/cv_predictions.csv`
+
+#### Generate Predictions
+
+```bash
+# From the baseline directory
+pixi run python -m tap_linear predict \
+  --data <path-to-input-csv> \
+  --run-dir <directory-with-trained-models> \
+  --out-dir <directory-to-write-predictions>
+
+# Example: CV predictions
+pixi run python -m tap_linear predict \
+  --data ../../data/GDPa1_v1.2_20250814.csv \
+  --run-dir ./outputs/run_001 \
+  --out-dir ../../predictions/cv_run_001
+
+# Example: Heldout predictions
+pixi run python -m tap_linear predict \
+  --data ../../data/heldout-set-sequences.csv \
+  --run-dir ./outputs/run_001 \
+  --out-dir ../../predictions/heldout_run_001
+```
+
+Behavior:
+- PROPERMAB features are loaded automatically from csv file
+- For **training data** (with fold column): Uses CV predictions from training
+- For **heldout data**: Uses final models trained on all data
+- Writes predictions to `out-dir/predictions.csv`
+
+### Development
+
+```bash
+# Run tests (requires dev environment)
+pixi run -e dev test
+
+# Lint code (requires dev environment)
+pixi run -e dev lint
+```
+
+## Implementation
+
+This baseline implements the `BaseModel` interface from `abdev_core`:
+
+```python
+from abdev_core import BaseModel, load_features
+
+class TapLinearModel(BaseModel):
+    def train(self, df: pd.DataFrame, run_dir: Path, *, seed: int) -> None:
+        # Load features from centralized store
+        tap_features = load_features("TAP", dataset="GDPa1")
+        # Train models and generate CV predictions
+        ...
+
+    def predict(self, df: pd.DataFrame, run_dir: Path, out_dir: Path) -> None:
+        # Load features from centralized store
+        tap_features = load_features("TAP", dataset="heldout_test")
+        # Generate predictions from trained models
+        ...
+```
+
+Features are managed centrally by `abdev_core` - models simply import what they need. See the [abdev_core documentation](../../libs/abdev_core/README.md) for details.
+
+## Output
+
+Predictions are written to `<out-dir>/predictions.csv` with columns:
+- `antibody_name`
+- `vh_protein_sequence`, `vl_protein_sequence`
+- Predicted values for: `HIC`, `Tm2`, `Titer`, `PR_CHO`, `AC-SINS_pH7.4`
+
+## Reference
+
+PROPERMAB features from: Li B, et al. (2025). "PROPERMAB: an integrative framework for in silico prediction of antibody developability using machine learning
+" mAbs.
diff --git a/models/propermab_linear/pixi.lock b/models/propermab_linear/pixi.lock
diff --git a/models/propermab_linear/pixi.toml b/models/propermab_linear/pixi.toml
@@ -0,0 +1,31 @@
+[workspace]
+name = "tap-linear"
+version = "0.1.0"
+description = "TAP Linear baseline - Ridge regression on TAP features"
+channels = ["conda-forge"]
+platforms = ["linux-64", "osx-64", "osx-arm64"]
+
+[dependencies]
+python = "3.11.*"
+numpy = ">=1.24"
+pandas = ">=2.0"
+scikit-learn = ">=1.3"
+typer = ">=0.9"
+
+[pypi-dependencies]
+abdev-core = { path = "../../libs/abdev_core", editable = true }
+tap-linear = { path = ".", editable = true }
+
+[environments]
+default = []
+dev = ["dev"]
+
+[feature.dev.dependencies]
+pytest = ">=7.0"
+ruff = ">=0.1"
+
+[feature.dev.tasks]
+# Development tasks only - orchestrator will call train/predict directly
+lint = "ruff check src && ruff format --check src"
+test = "pytest tests -v"
+
diff --git a/models/propermab_linear/pyproject.toml b/models/propermab_linear/pyproject.toml
@@ -0,0 +1,23 @@
+[build-system]
+requires = ["setuptools>=64", "wheel"]
+build-backend = "setuptools.build_meta"
+
+[project]
+name = "tap-linear"
+version = "0.1.0"
+description = "TAP Linear baseline - Ridge regression on TAP features"
+requires-python = ">=3.11"
+dependencies = [
+    "abdev-core",
+    "pandas>=2.0",
+    "numpy>=1.24",
+    "scikit-learn>=1.3",
+    "typer>=0.9.0",
+]
+
+[tool.setuptools.packages.find]
+where = ["src"]
+
+[tool.setuptools.package-dir]
+"" = "src"
+
diff --git a/models/propermab_linear/src/propermab_linear/__init__.py b/models/propermab_linear/src/propermab_linear/__init__.py
@@ -0,0 +1,4 @@
+"""TAP Linear baseline for antibody developability prediction."""
+
+__version__ = "0.1.0"
+
diff --git a/models/propermab_linear/src/propermab_linear/__main__.py b/models/propermab_linear/src/propermab_linear/__main__.py
@@ -0,0 +1,7 @@
+"""Entry point for model CLI."""
+
+from .run import app
+
+if __name__ == "__main__":
+    app()
+
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,4 @@
		"""TAP Linear baseline for antibody developability prediction."""

		__version__ = "0.1.0"