Skip to content

Commit fd24d72

Browse files
Merge pull request #54 from RosettaCommons/fix/readmes
Fixes: Turn off prevalidation by default, update README with new files
2 parents e0d502e + 76f28cf commit fd24d72

File tree

8 files changed

+114
-121
lines changed

8 files changed

+114
-121
lines changed
Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
name: lint_trunk
1+
name: lint_production
22

33
on:
44
push:
5-
branches: [main, trunk, wip/for-release, wip/remove-chemdata]
5+
branches: [main, production, wip/for-release, wip/remove-chemdata]
66
pull_request:
7-
branches: [main, trunk, wip/for-release, wip/remove-chemdata]
7+
branches: [main, production, wip/for-release, wip/remove-chemdata]
88
pull_request_target:
99
types: [ready_for_review]
1010
workflow_dispatch:
@@ -23,14 +23,15 @@ jobs:
2323
- uses: actions/checkout@v4
2424
- uses: actions/setup-python@v5
2525
with:
26-
python-version: "3.11"
26+
python-version: "3.12"
2727
cache: 'pip'
28-
- name: Extract ruff version from requirements.txt
28+
- name: Extract ruff version from pyproject.toml
2929
run: |
30-
echo "RUFF_VERSION=$(grep 'ruff==' pyproject.toml | sed 's/.*ruff==\(.*\)/\1/')" >> $GITHUB_ENV
30+
echo "RUFF_VERSION=$(grep 'ruff==' pyproject.toml \
31+
| sed -E 's/.*ruff==([^",]+).*/\1/')" >> "$GITHUB_ENV"
3132
- name: Install ruff
3233
run: pip install ruff==${{ env.RUFF_VERSION }}
3334
- name: Ruff format
34-
run: ruff format --diff src tests scripts notebooks
35+
run: ruff format --diff src models tests
3536
- name: Ruff check
36-
run: ruff check src tests scripts notebooks
37+
run: ruff check src models tests

models/rfd3/README.md

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -25,15 +25,22 @@ This sets `FOUNDRY_CHECKPOINTS_DIR` and will in future look for checkpoints in t
2525

2626
To run inference (with foundry installed in your environment, or RFD3 & Foundry src in PYTHONPATH):
2727
```bash
28-
rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json skip_existing=False dump_trajectories=True
28+
rfd3 design out_dir=logs/inference_outs/demo/0 inputs=models/rfd3/docs/demo.json skip_existing=False dump_trajectories=True prevalidate_inputs=True
2929
```
3030

31-
Additional unecessary args here are added:
31+
Additional unnecessary args here are added:
3232
- Including dumping and aligning trajectory structures can be useful for debugging your setup or making cool gifs.
3333
- Printing the config and dumping trajectories are turned off by default, but turned on here for verbosity
34-
- Only `out_dir` and `inputs` are required
34+
- `prevalidate_inputs` will check that your inputs are valid before running inference. Helpful if your json has a number of different configs you want to debug / double check are valid before loading the checkpoints.
35+
- Only `out_dir` and `inputs` are required. The output directory will automatically be created.
3536

36-
The output directory will automatically be created.
37+
38+
There are various interesting ways you can use RFD3 beyond Atom14 design as it's trained on a large array of different tasks.
39+
For example, you can fix sequence and not structure (prediction-type task), fix the backbone and unfix the sequence (MPNN-type inverse folding) or unfix the sidechains only (PLACER/ChemNet-style):
40+
41+
<p align="center">
42+
<img src="docs/.assets/conditioning.png" alt="Conditioning options for RFD3">
43+
</p>
3744

3845
For full details on how to specify inputs, see the [input specification documentation](./docs/input.md). You can also see `foundry/models/rfd3/configs/inference_engine/rfdiffusion3.yaml` for even more options.
3946

models/rfd3/configs/inference_engine/rfdiffusion3.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,5 +61,5 @@ global_prefix: null
6161
dump_prediction_metadata_json: True
6262
dump_trajectories: False
6363
align_trajectory_structures: False
64-
prevalidate_inputs: True
64+
prevalidate_inputs: False
6565
low_memory_mode: False # False for standard mode, True for memory efficient tokenization mode
619 KB
Loading
184 KB
Loading
263 KB
Loading
804 KB
Loading

models/rfd3/docs/input.md

Lines changed: 93 additions & 108 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,39 @@
11
# RFdiffusion3 — Input specification (dialect **2**)
22

3-
> **TL;DR**
4-
> Inputs are now defined with a single `InputSpecification` class.
5-
> Selections like “what’s fixed?”, “what’s sequence-free?”, “which atoms are donors/acceptors?” are all expressed with the same **InputSelection** mini-language.
6-
> Everything is reproducibly logged back out alongside your generation.
7-
83
---
94

10-
- [What changed (high level)](#what-changed-high-level)
5+
## Contents
116
- [Quick start](#quick-start)
7+
- [InputSpecification fields](#inputspecification-fields)
128
- [The `InputSelection` mini-language](#the-inputselection-mini-language)
13-
- [Full schema: `InputSpecification`](#full-schema-inputspecification)
14-
- [Common recipes (cookbook)](#common-recipes-cookbook)
9+
- [Unindexing specifics](#unindexing-specifics)
1510
- [Partial diffusion](#partial-diffusion)
16-
- [Symmetry](#symmetry)
17-
- [Origin (`ori_token`) and initialization](#origin-ori_token-and-initialization)
18-
- [Validation & error messages](#validation--error-messages)
19-
- [Metadata & logging](#metadata--logging)
20-
- [Legacy configs (dialect=1) & migration guide](#legacy-configs-dialect1--migration-guide)
21-
- [Multi-example files](#multi-example-files)
11+
- [Debugging recommendations](#debugging-recommendations)
2212
- [FAQ / gotchas](#faq--gotchas)
2313

2414
---
2515

26-
## How it works (high level)
16+
## Quick start
2717

28-
- **Unified selections.** All per-residue/atom choices now use **InputSelection**:
29-
- You can pass `true`/`false`, a **contig string** (`"A1-10,B5-8"`), or a **dictionary** (`{"A1-10": "ALL", "B5": "N,CA,C,O"}`).
30-
- Selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
31-
- **Clearer unindexing.** For **unindexed** motifs you typically either fix `"ALL"` atoms or explicitly choose subsets such as `"TIP"`/`"BKBN"`/explicit atom lists via a **dictionary** (see examples).
32-
When using `unindex`, only **the atoms you mark as fixed** are carried over from the input.
33-
- **Reproducibility.** The exact specification and the **sampled contig** are logged back into the output JSON. We also log useful counts (atoms, residues, chains).
34-
- **Safer parsing.** You’ll now get early, informative errors if:
35-
- You pass unknown keys,
36-
- A selection doesn’t match any atoms,
37-
- Indexed and unindexed motifs overlap,
38-
- Mutually exclusive selections overlap (e.g., two RASA bins for the same atom).
39-
- **Backwards compatible.** Add `"dialect": 1` to keep your old configs running while you migrate. (Deprecated.)
18+
JSON inputs take the following top-level structure;
19+
```json
20+
{
21+
"spec-1": { // First design configuration
22+
"input": "<path/to/pdb>",
23+
"contig": "50-80,/0,A1-100", // Diffuses length 50-80 monomer in chain A & selects indices A1 -> A100 in input pdb to have fixed coordinates and sequences
24+
"select_unfixed_sequence": "A20-35", // Converts selected indices in input to have unfixed sequence (inputs become atom14).
25+
"ligand": "HAX,OAA", // Selects ligands HAX and OAA based on res name in the input
26+
},
27+
"spec-2": {
28+
// ... args for the second (independent) configuration for design.
29+
}
30+
}
31+
```
4032

41-
---
33+
## InputSpecification fields
34+
35+
Below is a table of all of the inputs that the `InputSpecification` accepts. Use these fields to describe what RFdiffusion3 should do with your inputs.
4236

43-
## InputSpecification
4437

4538
| Field | Type | Description |
4639
| -------------------------------------------------------------- | ----------------- | --------------------------------------------------------------------- |
@@ -67,118 +60,110 @@
6760
| `partial_t` | `float?` | Noise (Å) for partial diffusion, enables partial diffusion |
6861

6962

70-
## Quick start
71-
72-
### Minimal JSON example
73-
74-
```json
75-
{
76-
"": {
77-
"input": "path/to/template.pdb",
78-
"contig": "A1-80",
79-
"length": "150-180",
80-
"select_fixed_atoms": true,
81-
"select_unfixed_sequence": "A20-35",
82-
"ligand": "HAX,OAA",
83-
"dialect": 2
84-
}
85-
}
86-
```
87-
### Mininmal YAML example
88-
```
89-
input: path/to/template.pdb
90-
contig: A1-80
91-
length: 150-180
92-
select_fixed_atoms: true
93-
select_unfixed_sequence: A20-35
94-
ligand: HAX,OAA
95-
dialect: 2
96-
97-
```
98-
99-
### Python API
100-
```
101-
from rfd3.inference.input_parsing import create_atom_array_from_design_specification
102-
103-
atom_array, metadata = create_atom_array_from_design_specification(
104-
input="path/to/template.pdb",
105-
contig="A1-80",
106-
length="150-180",
107-
select_fixed_atoms=True,
108-
select_unfixed_sequence="A20-35",
109-
dialect=2,
110-
)
111-
```
63+
A few notes on the above:
64+
- **Unified selections.** All per-residue/atom choices now use **InputSelection**:
65+
- You can pass `true`/`false`, a **contig string** (`"A1-10,B5-8"`), or a **dictionary** (`{"A1-10": "ALL", "B5": "N,CA,C,O"}`).
66+
- Selection fields include: `select_fixed_atoms`, `select_unfixed_sequence`, `select_buried`, `select_partially_buried`, `select_exposed`, `select_hbond_donor`, `select_hbond_acceptor`, `select_hotspots`.
67+
- **Clearer unindexing.** For **unindexed** motifs you typically either fix `"ALL"` atoms or explicitly choose subsets such as `"TIP"`/`"BKBN"`/explicit atom lists via a **dictionary** (see examples).
68+
When using `unindex`, only **the atoms you mark as fixed** are carried over from the input.
69+
- **Reproducibility.** The exact specification and the **sampled contig** are logged back into the output JSON. We also log useful counts (atoms, residues, chains).
70+
- **Safer parsing.** You’ll now get early, informative errors if:
71+
- You pass unknown keys,
72+
- A selection doesn’t match any atoms,
73+
- Indexed and unindexed motifs overlap,
74+
- Mutually exclusive selections overlap (e.g., two RASA bins for the same atom).
75+
- **Backwards compatible.** Add `"dialect": 1` to keep your old configs running while you migrate. (Deprecated.)
11276

77+
---
11378
## The InputSelection mini-language
11479

115-
Fields which are specified as `InputSelection` are fields which can take either: `Bool, List, Dict`.
116-
Dictionaries are the most expressive and can also take special :
80+
Fields marked as `InputSelection` accept either a boolean, a contig-style string, or a dictionary. Dictionaries are the most expressive and can also use shorthand values like `ALL`, `TIP`, or `BKBN`:
11781
```yaml
11882
select_fixed_atoms:
119-
A1-2: BKBN
83+
A1-2: BKBN # equivalent to 'N,CA,C,O'
12084
A3: N,CA,C,O,CB # specific atoms by atom name
12185
B5-7: ALL # Selects all atoms within B5,B6 and B7
122-
B10: TIP # selects common tipatom for residue (constants.py)
86+
B10: TIP # selects common tip atom for residue (constants.py)
12387
LIG: '' # selects no atoms (i.e. unfixes the atoms for ligands named `LIG`)
12488
```
12589
126-
[Diagram]
90+
<p align="center">
91+
<img src=".assets/input_selection.png" alt="InputSelection language for foundry">
92+
</p>
12793
12894
## Unindexing specifics
12995
13096
`unindex` marks motif tokens whose relative sequence placement is unknown to the model (useful for scaffolding around active sites, etc.).
13197
Use a string to list the unindexed components and where breaks occur.
13298
Use a dictionary if you want to fix specific atoms of those residues; atoms not fixed are not copied from the input (they will be diffused).
133-
Breaks between unindexed components follow the contig conventions you’re used to. For example:
134-
135-
`"A244,A274,A320,A329,A375"`
99+
Breaks between unindexed components follow the contig conventions you’re used to. For example: `"A244,A274,A320,A329,A375"` lists multiple unindexed components; internal “breakpoints” are inferred and logged. (Offset syntax like A11-12 or A11,0,A12 still ties residues.)
100+
You can specify consecutive residues as e.g. `A11-12` (instead of `A11,A12`), this will tie the two components together in sequence (or at least it leaks to the model that residues are together in sequence).
101+
Similarly, you can specify manually any number of residues that offsets two components, e.g. `A11,0,A12` (0 sequence offset, equivalent to just `A11-12`), or `A11,3,A12` (3-residue separation).
102+
From our initial tests this only leads to a slight bias in the model, but newer models may show better adherence!
103+
104+
## Partial Diffusion
105+
To enable partial diffusion, you can pass `partial_t` with any example. This sets the *noise level* in *angstroms* for the sampler:
106+
- The `specification.partial_t` argument can be specified from JSON or the command line.
107+
- Partial diffusion will fix/unfix ligands and nucleic acids as normal, by default it will fix non-protein components and they must be specified explicitly.
108+
- By default, the ca-aligned `ca_rmsd_to_input` will be logged.
109+
- Currently, partial diffusion subsets the inference schedule based on the partial_t, so `inference_sampler.num_timesteps` will affect how many steps are used but it is not equal to the number of steps used.
110+
111+
In the following example, RFD3 will noise out by 15 angstroms and constrain atoms of three residues. In this output one of the 8 diffusion outputs swapped its sequence index by one residue:
112+
```json
113+
{
114+
"partial_diffusion": {
115+
"input": "paper_examples/7v11.cif",
116+
"ligand": "OQO",
117+
"partial_t": 15.0,
118+
"unindex": "A431,A572-573",
119+
"select_fixed_atoms": {
120+
"A431": "TIP",
121+
"A572": "BKBN",
122+
"A573": "BKBN"
123+
}
124+
}
125+
}
126+
```
127+
Below is an example of what the output should look like (diffusion outputs in teal, original native in navajo white):
128+
<p align="center">
129+
<img src=".assets/partial_diff.png" alt="Partial diffusion" width=650>
130+
</p>
136131

137-
lists multiple unindexed components; internal “breakpoints” are inferred and logged. (Offset syntax like A11-12 or A11,0,A12 still ties residues.)
132+
## Debugging recommendations
133+
- For unindexed scaffolding, you can use the option `cleanup_guideposts=False` to keep the models' outputs for the guideposts. The guideposts are saved as separate chains based on whether their relative indices were leaked to the model: e.g. for `unindex=A11-12,A22`, you should see `A11` and `A12` indexed together on one chain and `A22` on its own chain, indicating the model was provided with the fact that `A11` and `A12` are immediately next to one another in sequence but their distance to `A22` is unknown.
134+
- To see the full 14 diffused virtual atoms you can use `cleanup_virtual_atoms=False`. Default is to discard them for the sake of downstream processing.
135+
- To see the trajectories, you can use `dump_trajectories=True`. This can be useful if the outputs look strange but the config is correct, or if you want to make cool gifs of course! Trajectories do not have sequence labels and contain virtual atoms.
138136

139-
# Appendix
140137
## FAQ / gotchas
141138
<details>
142-
<summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>
143-
144-
No. Defaults apply when input present.
139+
<details>
140+
<summary><b>Can I guide on secondary structure?</b></summary>
141+
Currently no - in future models we may do so, however, you can use `is_non_loopy: true` to make fewer loops. We find this produces a lot more helices and fewer loops (and less sheets).
145142
</details>
146143

147-
<details>
148144
<summary><b>Do I need select_fixed_atoms & select_unfixed_sequence every time?</b></summary>
149-
145+
150146
No. Defaults apply when input present.
151147
</details>
152148

153149
<details>
154-
<summary><b>What does "ALL" vs "TIP" in unindex mean?</b></summary>
155-
156-
- **`ALL`** → copy full residue
157-
- **`TIP`** → fix only sidechain tip atoms
158-
</details>
159-
160-
<details>
161-
<summary><b>Can selections overlap?</b></summary>
162-
163-
Only certain ones (fixed vs unfixed) may; RASA & donor/acceptor cannot.
164-
</details>
165-
166-
<details>
167-
<summary><b>How to fix backbone but redesign sidechains?</b></summary>
150+
<summary><b>Why "Input provided but unused"?</b></summary>
168151

169-
`redesign_motif_sidechains: true`
152+
This indicates you gave an input pdb / cif (not `input: null`) but no contig, unindex, ligand or partial_t.
170153
</details>
171154

172155
<details>
173-
<summary><b>Why "Input provided but unused"?</b></summary>
156+
<summary><b>What do the logged bfactors mean?</b></summary>
174157

175-
You gave input but no contig, unindex, or partial_t.
158+
The sequence head from RFD3 logs its confidence for each token in the output structure, you can run `spectrum b` in `pymol` to see it. It usually doesn't mean anything but can give you some idea if the model has gone vastly distribution if the entropy is high (uncertain assignment of sequence).
176159
</details>
160+
</details>
177161

178-
## Shorthand atoms for easy specification
179-
Keyword Expands to
180-
BKBN N, CA, C, O
181-
TIP Residue-specific “tip” atoms
182-
ALL All atoms of each residue
162+
Let us know if you have any additional questions, we'd be happy to answer them!
183163

164+
## Further examples of InputSelection syntax
184165

166+
Below is a reference for more examples of different ways you can specify inputs to select from your pdb in configs; we hope the community can find use in this flexible system for future models!
167+
<p align="center">
168+
<img src=".assets/input_selection_large.png" alt="Input selection syntax" width=650>
169+
</p>

0 commit comments

Comments
 (0)