Skip to content

Conversation

@apathithyan
Copy link
Contributor

Adds a structure-aware baseline using the SaProt protein language model with Foldseek 3Di structural tokens.

Similar to the ESM2_Ridge philosophy:

Separate VH, VL embedding extraction, and concatenation for train and heldout data.
Uses Ridge Regression to predict the properties.

Note:

  • Added fallback to sequence only mode of SaProt, incase .pdb files are erroneously loaded.
  • Added Torch support for Apple Silicon (Metal Performance Shaders)
  • Ensured coherence between sequences and structures for Train and Heldout

This is correcting the previous errorneous
logic in model.py, by splitting the .pdb files
into separate chains for SaProt to
process VH and VL sequences individually
- Ensured that train and heldout sequences use respective structures for embedding
- Add BioPython dependency to pixi.toml
- Updated README
@sritterginkgo
Copy link
Collaborator

Hey, sorry for delay. I needed to make some changes to your submission to fix some runtime errors that I was having. Could you take a look? I've merged it into main, it was branched from your repo.

… extracted MOE structures for SaProt embedding calculations.
@apathithyan
Copy link
Contributor Author

Hey Seth, thanks for taking the time to review. Apologies for the delay on my end.

I've pushed an update that should run without issues now, that implements the following:

  1. Uses the MOE Structures for both Train and Heldout data
  2. Extracts the VH structure and the VL structure for a given antibody
  3. Gets the SaProt Embeddings of VH - by using VH sequences and VH Structures. Similarly for VL
  4. Concatenates the two to get the resultant embedding

Implementation Note:

The model needs different structure directories for train vs heldout data, but the predict() function doesn't receive this info from the existing framework. Current solution detects dataset type by checking the dataframe length (246 = train, 80 = heldout).
This is admittedly a bit hacky, and I believe a cleaner solution can be made by making changes to abdev-core.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants