Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
f012b14
Align MHC typing column names across sdrf-terms.tsv, adoc templates, …
jonasscheid Feb 23, 2026
dcf0f6c
Merge pull request #801 from jonasscheid/fix/mhc-allele-typing-incons…
ypriverol Feb 26, 2026
cc490f2
Add PDF specification v1.1.0-dev
github-actions[bot] Feb 26, 2026
497f28b
Minor updates of related with inmunopeptidomics
ypriverol Feb 26, 2026
1aa7a46
small changes in the spec
ypriverol Feb 26, 2026
5ed203b
Merge branch 'dev' of https://github.com/bigbio/proteomics-sample-met…
ypriverol Feb 26, 2026
99cab9f
Add PDF specification v1.1.0-dev
github-actions[bot] Feb 26, 2026
d778705
Merge branch 'master' of https://github.com/bigbio/proteomics-sample-…
ypriverol Feb 27, 2026
f75edd1
Add PDF specification v1.1.0-dev
github-actions[bot] Feb 27, 2026
9cf6a6f
Minor changes inmunopeptidomics
ypriverol Feb 27, 2026
26692d6
Merge branch 'dev' of https://github.com/bigbio/proteomics-sample-met…
ypriverol Feb 27, 2026
61b7005
Add PDF specification v1.1.0-dev
github-actions[bot] Feb 27, 2026
90c5906
remove dda-acquisition in favor of ms-proteomics
ypriverol Feb 28, 2026
2d69de2
Add PDF specification v1.1.0-dev
github-actions[bot] Feb 28, 2026
0da5252
Major updates on syn with the current templates
ypriverol Mar 7, 2026
0cfab58
Add PDF specification v1.1.0-dev
github-actions[bot] Mar 7, 2026
1b78f70
Merge branch 'master' of https://github.com/bigbio/proteomics-sample-…
ypriverol Mar 7, 2026
2e8d2b4
fix examples
ypriverol Mar 7, 2026
7f1e290
Merge branch 'dev' of https://github.com/bigbio/proteomics-sample-met…
ypriverol Mar 7, 2026
a1eaf60
Add PDF specification v1.1.0-dev
github-actions[bot] Mar 7, 2026
8b8e736
Update aome of the issues after new schemas
ypriverol Mar 8, 2026
42316e7
Merge branch 'dev' of https://github.com/bigbio/proteomics-sample-met…
ypriverol Mar 8, 2026
ee174b3
Add PDF specification v1.1.0-dev
github-actions[bot] Mar 8, 2026
0f517a7
Updating the exisintg annotated projects
ypriverol Mar 8, 2026
32484e4
Add PDF specification v1.1.0-dev
github-actions[bot] Mar 8, 2026
ee777b0
feat: add template inheritance resolver module
ypriverol Mar 8, 2026
cf933b7
feat: add CSS styles for template pages
ypriverol Mar 8, 2026
753b6c7
feat: add Jinja2 template for per-template HTML pages
ypriverol Mar 8, 2026
23af6d1
feat: add per-template HTML page generator
ypriverol Mar 8, 2026
b64adc5
feat: add auto-generated template section for index.html
ypriverol Mar 8, 2026
2e2dc7b
feat: update build pipeline for YAML-driven template pages
ypriverol Mar 8, 2026
719c11b
fix: improve template page rendering and index layout
ypriverol Mar 8, 2026
1e8c415
major changes for the web generation
ypriverol Mar 8, 2026
015769c
Merge branch 'dev' of https://github.com/bigbio/proteomics-sample-met…
ypriverol Mar 8, 2026
dbfb270
major changes to the web and clean specification
ypriverol Mar 8, 2026
921e1f5
Merge pull request #804 from ypriverol/feat/yaml-driven-template-docs
ypriverol Mar 8, 2026
d9c0a75
Add PDF specification v1.1.0-dev
github-actions[bot] Mar 8, 2026
0fac979
minor changes
ypriverol Mar 8, 2026
28a49b4
Add PDF specification v1.1.0-dev
github-actions[bot] Mar 8, 2026
5248c75
update examples
ypriverol Mar 9, 2026
5b1197b
Add PDF specification v1.1.0-dev
github-actions[bot] Mar 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Changed

- **Version bump to v1.1.0**.
- **BREAKING: `comment[technical replicate]`** changed from RECOMMENDED to REQUIRED in base template. All inheriting templates now require a technical replicate column. Existing SDRF files without this column will fail validation -- add the column with value `1` for single-run samples.
- **BREAKING: Immunopeptidomics field renames**: `characteristics[MHC class]` → `characteristics[MHC protein complex]` (GO:0042611), `characteristics[MHC allele]`/`characteristics[HLA typing]` → `characteristics[MHC typing]` (PRIDE:0000893), `characteristics[HLA typing method]` → `characteristics[MHC typing method]` (PRIDE:0000894). Values updated to use MRO/GO ontology terms. Existing datasets must update column names and values.
- **Organism templates**: `developmental stage` and `strain/breed` changed to REQUIRED in invertebrates, vertebrates, and plants. `sex` changed to RECOMMENDED in vertebrates. `growth conditions` and `treatment` changed to RECOMMENDED in plants.
- **Human template**: `age` and `sex` changed to REQUIRED. Age pattern now allows standalone month/week/day values (e.g. `6M`, `3W`, `14D`).
- **BioSample accession**: regex fixed from `^SAM[NED]A?\d+$` to `^SAM(N|EA|D)\d+$` to reject invalid prefixes.
- **Affinity proteomics**: version set to 1.0.0. `comment[instrument]` renamed to `comment[platform]` (REQUIRED); new `comment[instrument]` added as OPTIONAL for actual sequencer/reader. Sample type values normalized to use spaces (`sample control`, `negative control`, etc.).
- **Cell-lines**: added PATO ontology to disease field for `normal` (PATO:0000461). Added `characteristics[culture medium]` (RECOMMENDED) and `characteristics[storage temperature]` (RECOMMENDED).
- **MS-proteomics/DDA**: mass tolerance patterns updated to accept `not available`/`not applicable` when those flags are set. DDA mass tolerance kept as RECOMMENDED.
- **Specification restructured** with clearer organization: Quick Start → Validation → Specification Structure → Notational Conventions → Sample Metadata → Data File Metadata → Templates → Factor Values.
- **Column naming**: `fileformat` changed to `file_format` for consistency with underscore convention.
- **Ontology recommendations**: added NCIT and PRIDE to general purpose; added PATO for healthy samples (`normal` = PATO:0000461).
Expand Down
7 changes: 2 additions & 5 deletions llms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,9 @@
- sdrf-proteomics/templates/invertebrates/README.adoc - Invertebrates: Drosophila, C. elegans
- sdrf-proteomics/templates/plants/README.adoc - Plants: Arabidopsis, crops
- sdrf-proteomics/templates/cell-lines/README.adoc - Cell Lines: Cellosaurus integration
- sdrf-proteomics/templates/dda-acquisition/README.adoc - DDA Acquisition: dissociation method, collision energy
- sdrf-proteomics/templates/dia-acquisition/README.adoc - DIA Acquisition: scan windows, isolation width
- sdrf-proteomics/templates/single-cell/README.adoc - Single-Cell Proteomics: cell isolation, carrier proteome
- sdrf-proteomics/templates/immunopeptidomics/README.adoc - Immunopeptidomics: MHC class, HLA typing
- sdrf-proteomics/templates/immunopeptidomics/README.adoc - Immunopeptidomics: MHC protein complex, MHC typing
- sdrf-proteomics/templates/crosslinking/README.adoc - Crosslinking MS: crosslinker reagents
- sdrf-proteomics/templates/metaproteomics/README.adoc - Metaproteomics: environmental and microbiome samples
- sdrf-proteomics/templates/olink/README.adoc - Olink: proximity extension assays
Expand All @@ -54,15 +53,13 @@ Machine-readable YAML definitions used by sdrf-pipelines for validation. Each te
- sdrf-proteomics/sdrf-templates/plants/1.1.0/plants.sdrf.tsv - Plants example
- sdrf-proteomics/sdrf-templates/cell-lines/1.1.0/cell-lines.yaml - Cell Lines (experiment layer): Cellosaurus integration
- sdrf-proteomics/sdrf-templates/cell-lines/1.1.0/cell-lines.sdrf.tsv - Cell Lines example
- sdrf-proteomics/sdrf-templates/dda-acquisition/1.1.0/dda-acquisition.yaml - DDA Acquisition (experiment layer): dissociation method, collision energy
- sdrf-proteomics/sdrf-templates/dda-acquisition/1.1.0/dda-acquisition.sdrf.tsv - DDA example
- sdrf-proteomics/sdrf-templates/dia-acquisition/1.1.0/dia-acquisition.yaml - DIA Acquisition (experiment layer): scan windows, isolation width
- sdrf-proteomics/sdrf-templates/dia-acquisition/1.1.0/dia-acquisition.sdrf.tsv - DIA example
- sdrf-proteomics/sdrf-templates/crosslinking/1.1.0/crosslinking.yaml - Crosslinking MS (experiment layer): crosslinker reagents
- sdrf-proteomics/sdrf-templates/crosslinking/1.1.0/crosslinking.sdrf.tsv - Crosslinking example
- sdrf-proteomics/sdrf-templates/single-cell/1.0.0/single-cell.yaml - Single-Cell (experiment layer): cell isolation, carrier proteome
- sdrf-proteomics/sdrf-templates/single-cell/1.0.0/single-cell.sdrf.tsv - Single-Cell example
- sdrf-proteomics/sdrf-templates/immunopeptidomics/1.0.0-dev/immunopeptidomics.yaml - Immunopeptidomics (experiment layer): MHC class, HLA typing
- sdrf-proteomics/sdrf-templates/immunopeptidomics/1.0.0-dev/immunopeptidomics.yaml - Immunopeptidomics (experiment layer): MHC protein complex, MHC typing
- sdrf-proteomics/sdrf-templates/metaproteomics/1.0.0-dev/metaproteomics.yaml - Metaproteomics (experiment layer): environmental and microbiome samples
- sdrf-proteomics/sdrf-templates/metaproteomics/1.0.0-dev/metaproteomics.sdrf.tsv - Metaproteomics example
- sdrf-proteomics/sdrf-templates/olink/1.0.0/olink.yaml - Olink (experiment layer): proximity extension assays
Expand Down
Binary file modified psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdf
Binary file not shown.
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,7 @@ Add extra columns for specific experimental workflows:
|DIA-specific parameters. link:templates/dia-acquisition/README.adoc[View template]

|**immunopeptidomics**
|HLA typing and related metadata. link:templates/immunopeptidomics/README.adoc[View template]
|MHC typing and related metadata. link:templates/immunopeptidomics/README.adoc[View template]

|**crosslinking**
|XL-MS specific columns. link:templates/crosslinking/README.adoc[View template]
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ base (construction artifact - not user-facing)
├── TECHNOLOGY LAYER (mutually exclusive)
│ ├── ms-proteomics # Mass spectrometry proteomics
│ │ └── EXPERIMENT LAYER
│ │ ├── dda-acquisition # Data-dependent acquisition
│ │ ├── dia-acquisition # Data-independent acquisition
│ │ ├── single-cell # Single-cell proteomics
│ │ ├── immunopeptidomics # MHC peptide analysis
Expand Down Expand Up @@ -70,7 +69,7 @@ cell-lines (EXPERIMENT layer - requires technology + sample)

| `experiment`
| Defines methodology-specific columns. Multiple can be combined if not mutually exclusive.
| dda-acquisition, single-cell, olink, cell-lines
| dia-acquisition, single-cell, olink, cell-lines
|===

=== Mutual Exclusivity
Expand All @@ -82,7 +81,6 @@ Some templates cannot be combined because they represent mutually exclusive conc
| Template A | Template B | Reason

| ms-proteomics | affinity-proteomics | Different technologies
| dda-acquisition | dia-acquisition | Different acquisition methods
| somascan | olink | Different affinity platforms
| human | vertebrates | Different organism types
| human | invertebrates | Different organism types
Expand Down Expand Up @@ -476,7 +474,7 @@ When multiple templates are combined:
**Example valid combination:**
[source]
----
ms-proteomics + human + dda-acquisition + cell-lines
ms-proteomics + human + dia-acquisition + cell-lines
↓ ↓ ↓ ↓
technology sample experiment experiment
----
Expand Down Expand Up @@ -576,10 +574,10 @@ parse_sdrf validate-sdrf --sdrf_file file.sdrf.tsv --template ms-proteomics
parse_sdrf validate-sdrf --sdrf_file file.sdrf.tsv \
--template ms-proteomics \
--template human \
--template dda-acquisition
--template dia-acquisition

# Check template compatibility
parse_sdrf check-templates --templates ms-proteomics,human,dda-acquisition
parse_sdrf check-templates --templates ms-proteomics,human,dia-acquisition
----

== References
Expand Down
42 changes: 36 additions & 6 deletions sdrf-proteomics/README.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ For the complete versioning strategy — including template versioning, ontology
[[reserved-words]]
=== Reserved words

There are general scenarios where cell values cannot be provided with actual data. The following reserved words MUST be used in these cases:
There are general scenarios where cell values cannot be provided with actual data. The following reserved words MUST be used in these cases. Reserved words MUST be all lowercase (e.g., `not available`, NOT `Not Available` or `Not available`):

- **not available**: In some cases, the column is mandatory in the format, but for some samples the corresponding value is unknown or could not be determined. In those cases, users SHOULD use *not available*.
- **not applicable**: In some cases, the column is mandatory, but for some samples the corresponding value or concept does not apply. In those cases, users SHOULD use *not applicable*.
Expand Down Expand Up @@ -258,9 +258,9 @@ The value for each property, (e.g. characteristics, comment, factor value) corre

- **Ontology url (Computer readable)**: Users can provide the corresponding URI (Uniform Resource Identifier) of the ontology/CV term as a value. This is recommended for enriched files where the user does not want to use intermediate tools to map from free text to ontology/CV terms.

- **Key=value representation (Human and Computer readable)**: The current representation aims to provide a mechanism to represent the complete information of the ontology/CV term including Accession, Name and other additional properties. In the key=value pair representation, the Value of the property is represented as an Object with multiple properties, where the key is one of the properties of the object and the value is the corresponding value for the particular key. An example of key value pairs is post-translational modification (see link:templates/ms-proteomics.html#_protein_modifications[Protein Modifications]):
- **Key=value representation (Human and Computer readable)**: The current representation aims to provide a mechanism to represent the complete information of the ontology/CV term including Accession, Name and other additional properties. In the key=value pair representation, the Value of the property is represented as an Object with multiple properties, where the key is one of the properties of the object and the value is the corresponding value for the particular key. The key order MUST be `NT` (name) first, followed by `AC` (accession), then any additional keys. An example of key value pairs is post-translational modification (see link:templates/ms-proteomics.html#_protein_modifications[Protein Modifications]):

NT=Glu->pyro-Glu;MT=fixed;PP=Anywhere;AC=Unimod:27;TA=E
NT=Glu->pyro-Glu;AC=Unimod:27;MT=fixed;PP=Anywhere;TA=E

[[validation]]
== Validating SDRF Files
Expand Down Expand Up @@ -420,6 +420,8 @@ For detailed documentation of sample preparation and MS/MS fragmentation propert
- **Fractionation**: fractionation method (used with `comment[fraction identifier]`)
- **Fragmentation**: collision energy, dissociation method

NOTE: For HCD (Higher-energy C-trap Dissociation), the canonical accession is https://www.ebi.ac.uk/ols4/ontologies/ms/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMS_1000422[MS:1000422 - beam-type collision-induced dissociation]. Use `NT=beam-type collision-induced dissociation;AC=MS:1000422` or the short label `HCD`. Do not use PRIDE:0000590 or MS:1002481.

[[data-acquisition-method]]
=== Proteomics data acquisition method

Expand Down Expand Up @@ -495,12 +497,41 @@ Templates follow a layered hierarchy:
|Organism-specific metadata

|**EXPERIMENT** (optional)
|link:templates/cell-lines/README.adoc[cell-lines], link:templates/crosslinking/README.adoc[crosslinking], link:templates/dda-acquisition/README.adoc[dda-acquisition], link:templates/dia-acquisition/README.adoc[dia-acquisition], link:templates/single-cell/README.adoc[single-cell], link:templates/immunopeptidomics/README.adoc[immunopeptidomics]
|link:templates/cell-lines/README.adoc[cell-lines], link:templates/crosslinking/README.adoc[crosslinking], link:templates/dia-acquisition/README.adoc[dia-acquisition], link:templates/single-cell/README.adoc[single-cell], link:templates/immunopeptidomics/README.adoc[immunopeptidomics]
|Methodology-specific columns
|===

Child templates inherit all columns from parents and may add new columns or strengthen requirements (e.g., `optional` → `required`).

=== Template Combination Rules

Templates within the same layer are **mutually exclusive** - you MUST choose exactly one from each applicable layer:

|===
|Layer |Mutually Exclusive Templates |Rule

|TECHNOLOGY
|`ms-proteomics` vs `affinity-proteomics`
|Choose one (REQUIRED)

|SAMPLE
|`human` vs `vertebrates` vs `invertebrates` vs `plants`
|Choose one based on organism (RECOMMENDED)

|EXPERIMENT (affinity platform)
|`olink` vs `somascan`
|Choose one if using affinity-proteomics (OPTIONAL)
|===

Templates from different layers can be freely combined. Common valid combinations:

- `ms-proteomics` + `human` (human DDA proteomics)
- `ms-proteomics` + `human` + `dia-acquisition` (human DIA proteomics)
- `ms-proteomics` + `human` + `immunopeptidomics` (human immunopeptidomics)
- `ms-proteomics` + `vertebrates` + `cell-lines` (mouse cell line proteomics)
- `ms-proteomics` + `human` + `crosslinking` (human crosslinking MS)
- `affinity-proteomics` + `human` + `olink` (human Olink)

=== Specifying Templates in SDRF Files

Declare templates using `comment[sdrf template]` columns. Only list leaf templates (parents are implied). When using multiple templates, add multiple columns with the same name. Two formats are supported:
Expand Down Expand Up @@ -540,11 +571,10 @@ sample_1 ... human v1.1.0 crosslinking v1.0.0

**Experiment-type templates**:

- link:templates/dda-acquisition/README.adoc[DDA Acquisition] - dissociation method, collision energy, modifications
- link:templates/dia-acquisition/README.adoc[DIA Acquisition] - scan windows, isolation width, spectral library
- link:templates/cell-lines/README.adoc[Cell Lines] - Cellosaurus integration
- link:templates/single-cell/README.adoc[Single-Cell] - cell isolation, carrier proteome
- link:templates/immunopeptidomics/README.adoc[Immunopeptidomics] - MHC class, HLA typing
- link:templates/immunopeptidomics/README.adoc[Immunopeptidomics] - MHC protein complex, MHC typing
- link:templates/crosslinking/README.adoc[Crosslinking MS] - crosslinker reagents
- link:templates/metaproteomics/README.adoc[Metaproteomics] - environmental sample type

Expand Down
12 changes: 3 additions & 9 deletions sdrf-proteomics/metadata-guidelines/sample-metadata.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -163,17 +163,12 @@ For model organisms, use specific stages when applicable: `embryonic day 14` (mo

For samples without disease, the terminology matters for standardization:

**`normal`** (PATO:0000461) - **Recommended**
**`normal`** (PATO:0000461) - **REQUIRED term for healthy samples**

- Standard term in pathology ("normal tissue" vs "tumor tissue")
- Well-defined ontology mapping to PATO:0000461
- Well-defined ontology mapping to https://www.ebi.ac.uk/ols4/ontologies/pato/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPATO_0000461[PATO:0000461]
- Widely used in existing proteomics datasets

**`healthy`** (SIO:001012) - **Accepted alternative**

- More intuitive for clinical/human samples
- Has valid ontology support (Semanticscience Integrated Ontology)
- Validators should accept both `normal` and `healthy`
- This is the single canonical term for healthy/control samples

[[WARNING]]
====
Expand Down Expand Up @@ -252,7 +247,6 @@ Both `healthy_1` and `adjacent_1` have `normal` as their disease state (no tumor
|Disease |SDRF Value |Ontology Link

|Healthy/no disease |normal |https://www.ebi.ac.uk/ols4/ontologies/pato/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FPATO_0000461[PATO:0000461]
|Healthy (alternative) |healthy |https://www.ebi.ac.uk/ols4/ontologies/sio/classes/http%253A%252F%252Fsemanticscience.org%252Fresource%252FSIO_001012[SIO:001012]
|Breast cancer |breast carcinoma |https://www.ebi.ac.uk/ols4/ontologies/mondo/classes/http%253A%252F%252Fpurl.obolibrary.org%252Fobo%252FMONDO_0004989[MONDO:0004989]
|===

Expand Down
Loading
Loading