Skip to content
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions .github/workflows/link-check.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,12 @@ jobs:
args: >-
--verbose
--no-progress
--timeout 20
--max-concurrency 10
--timeout 30
--max-concurrency 3
--max-retries 3
--retry-wait-time 10
'**/*.md'
'**/*.adoc'
fail: true
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
14 changes: 14 additions & 0 deletions .github/workflows/release-pdf.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,20 @@ jobs:
gem install asciidoctor-pdf
gem install rouge

- name: Setup Python
uses: actions/setup-python@v5
with:
python-version: '3.11'

- name: Install Python dependencies
run: pip install pyyaml

- name: Generate template definitions appendix
run: |
python scripts/generate_templates_appendix.py \
--templates-dir sdrf-proteomics/sdrf-templates \
--readme sdrf-proteomics/README.adoc

- name: Get version number
id: version
run: |
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
# Proteomics Sample Metadata Format

[![Version](https://flat.badgen.net/static/sdrf-proteomics/1.0.1/orange)](CHANGELOG.md)
[![License](https://flat.badgen.net/github/license/bigbio/proteomics-sample-metadata)](https://github.com/bigbio/proteomics-sample-metadata/blob/master/LICENSE)
[![Open Issues](https://flat.badgen.net/github/open-issues/bigbio/proteomics-sample-metadata)](https://github.com/bigbio/proteomics-sample-metadata/issues)
[![Open PRs](https://flat.badgen.net/github/open-prs/bigbio/proteomics-sample-metadata)](https://github.com/bigbio/proteomics-sample-metadata/pulls)
![Contributors](https://flat.badgen.net/github/contributors/bigbio/proteomics-sample-metadata)
![Watchers](https://flat.badgen.net/github/watchers/bigbio/proteomics-sample-metadata)
![Stars](https://flat.badgen.net/github/stars/bigbio/proteomics-sample-metadata)
[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Point the badge at the local LICENSE file.

Line 4 is the exact URL Lychee is failing on with 429s. Using the repo-local file removes the external GitHub request and keeps the badge on the current branch.

🔧 Suggested change
-[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE)
+[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](LICENSE)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE)
[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](LICENSE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 4, The badge link currently points to the external GitHub
URL and should be changed to reference the repo-local LICENSE file; locate the
markdown line containing the license badge (the image alt "License" and the URL
"https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard")
and update the link target to "./LICENSE" (keep the image source as-is or
replace with a local/static image if desired) so the badge points at the local
LICENSE file instead of the external GitHub URL.

[![Open Issues](https://flat.badgen.net/github/open-issues/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/issues)
[![Open PRs](https://flat.badgen.net/github/open-prs/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/pulls)
![Contributors](https://flat.badgen.net/github/contributors/bigbio/proteomics-metadata-standard)
![Watchers](https://flat.badgen.net/github/watchers/bigbio/proteomics-metadata-standard)
![Stars](https://flat.badgen.net/github/stars/bigbio/proteomics-metadata-standard)
[![llms.txt](https://flat.badgen.net/static/llms.txt/available/blue)](llms.txt)

## Improving metadata annotation of Proteomics datasets
Expand Down Expand Up @@ -45,7 +45,7 @@ In the [annotated projects](https://github.com/bigbio/proteomics-metadata-standa
Annotate a dataset in 5 steps:

- Read the [SDRF-Proteomics specification](https://github.com/bigbio/proteomics-metadata-standard/tree/master/sdrf-proteomics).
- Depending on the type of dataset, choose the appropriate [sample template](https://github.com/bigbio/proteomics-sample-metadata/tree/master/sdrf-proteomics#sdrf-templates).
- Depending on the type of dataset, choose the appropriate [sample template](https://github.com/bigbio/proteomics-metadata-standard/tree/master/sdrf-proteomics#sdrf-templates).
- Annotate the corresponding ProteomeXchange PXD dataset following the guidelines.
- Validate your SDRF file:

Expand Down
60 changes: 60 additions & 0 deletions examples/PXD003572/PXD003572.sdrf.tsv

Large diffs are not rendered by default.

31 changes: 31 additions & 0 deletions examples/PXD005969/PXD005969.sdrf.tsv

Large diffs are not rendered by default.

75 changes: 75 additions & 0 deletions examples/PXD009712/PXD009712.sdrf.tsv

Large diffs are not rendered by default.

3 changes: 3 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,8 @@ Curated examples of SDRF files covering different experiment types and organisms
| PXD012667 | DIA acquisition | Homo sapiens | label free | ms-proteomics, human, dia-acquisition | 49 |
| PXD019515 | Single-cell proteomics | Homo sapiens | label free | ms-proteomics, human, single-cell | 7 |
| PXD003791 | Metaproteomics, gut | human gut metagenome | label free | metaproteomics | 109 |
| PXD005969 | Metaproteomics, human gut extraction methods | human gut metagenome | label free | metaproteomics, human-gut | 30 |
| PXD003572 | Metaproteomics, soil (Mediterranean dryland) | soil metagenome | label free | metaproteomics, soil | 59 |
| PXD009712 | Metaproteomics, ocean (Pacific depth profiles) | marine metagenome | label free | metaproteomics, water | 74 |
| PXD006439 | Label-free, mouse | Mus musculus | label free | ms-proteomics, vertebrates | 68 |
| PXD013868 | Label-free, plant | Arabidopsis thaliana | label free | ms-proteomics, plants | 21 |
3 changes: 3 additions & 0 deletions llms.txt
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,9 @@ Machine-readable YAML definitions used by sdrf-pipelines for validation. Each te
- examples/PXD012667/ - DIA acquisition, human
- examples/PXD019515/ - Single-cell proteomics, human
- examples/PXD003791/ - Metaproteomics, gut
- examples/PXD005969/ - Metaproteomics, human gut extraction methods
- examples/PXD003572/ - Metaproteomics, soil (Mediterranean dryland)
- examples/PXD009712/ - Metaproteomics, ocean (Pacific depth profiles)
- examples/PXD006439/ - Label-free, mouse
- examples/PXD013868/ - Label-free, plant (Arabidopsis)

Expand Down
Binary file modified psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdf
Binary file not shown.
294 changes: 294 additions & 0 deletions scripts/generate_templates_appendix.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,294 @@
#!/usr/bin/env python3
"""Generate AsciiDoc template definitions and inject into README.adoc.

Reads all YAML templates from sdrf-templates/ and injects a "Template Definitions"
section directly into README.adoc, before the "Intellectual Property Statement"
section. This keeps the PDF in sync with YAML templates without a separate file.

Usage:
python scripts/generate_templates_appendix.py [--templates-dir PATH] [--readme PATH]
"""

from __future__ import annotations

import argparse
import re
import sys
from pathlib import Path
from typing import Any

# Add scripts dir to path so we can import resolve_templates
sys.path.insert(0, str(Path(__file__).parent))

from resolve_templates import load_manifest, load_template_yaml

# Marker used to identify the injected section
MARKER_START = "// AUTO-GENERATED: Template Definitions (do not edit below this line)"
MARKER_END = "// AUTO-GENERATED: End of Template Definitions"

# Injection point: insert before this heading
INJECT_BEFORE = "== Intellectual Property Statement"

# Ordered template groups for the appendix
TEMPLATE_ORDER: list[list[str]] = [
# Infrastructure
["base", "sample-metadata"],
# Technology
["ms-proteomics", "affinity-proteomics"],
# Sample (organism)
["human", "vertebrates", "invertebrates", "plants"],
# Sample (study type)
["clinical-metadata", "oncology-metadata"],
# Experiment (MS)
["dia-acquisition", "single-cell", "immunopeptidomics", "crosslinking", "cell-lines"],
# Experiment (affinity)
["olink", "somascan"],
# Metaproteomics branch
["metaproteomics", "human-gut", "soil", "water"],
]


def _escape_adoc(text: str) -> str:
"""Escape special AsciiDoc characters in table cells."""
return text.replace("|", "\\|")


def _summarize_validators(validators: list[dict[str, Any]]) -> str:
"""Produce a short human-readable summary of column validators."""
if not validators:
return ""

parts: list[str] = []
for v in validators:
vname = v.get("validator_name", "")
params = v.get("params", {})

if vname == "ontology":
ontologies = params.get("ontologies", [])
parts.append(f"ontology: {', '.join(ontologies)}")
elif vname == "pattern":
desc = params.get("description", "")
if desc:
parts.append(f"pattern: {desc}")
else:
pat = params.get("pattern", "")
parts.append(f"pattern: `{pat}`")
elif vname == "values":
values = params.get("values", [])
if len(values) <= 5:
parts.append(f"values: {', '.join(str(v) for v in values)}")
else:
shown = ", ".join(str(v) for v in values[:4])
parts.append(f"values: {shown}, ...")
elif vname == "number_with_unit":
units = params.get("units", [])
parts.append(f"number with unit ({', '.join(units)})")
elif vname == "single_cardinality_validator":
parts.append("single value only")
elif vname == "accession":
fmt = params.get("format", "")
parts.append(f"accession: {fmt}")
Comment on lines +88 to +90
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Skip the empty accession suffix when format is missing.

When params.format is absent, this renders accession: with nothing after it. The generated appendix already shows that broken summary for comment[metagenome accession].

🔧 Suggested change
         elif vname == "accession":
             fmt = params.get("format", "")
-            parts.append(f"accession: {fmt}")
+            parts.append(f"accession: {fmt}" if fmt else "accession")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif vname == "accession":
fmt = params.get("format", "")
parts.append(f"accession: {fmt}")
elif vname == "accession":
fmt = params.get("format", "")
parts.append(f"accession: {fmt}" if fmt else "accession")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/generate_templates_appendix.py` around lines 88 - 90, The template
generator currently appends "accession: {fmt}" even when params.get("format",
"") is empty, producing a bare "accession:" entry; update the branch handling
vname == "accession" (the block referencing params.get("format", "") and
parts.append) to only append the accession suffix when fmt is non-empty (e.g.,
check fmt truthiness before calling parts.append) so the empty accession suffix
is skipped when format is missing.

elif vname == "mz_value":
parts.append("m/z value")
elif vname == "mz_range_interval":
parts.append("m/z range interval")
elif vname == "identifier":
parts.append("identifier")
else:
parts.append(vname)

return "; ".join(parts)


def _collect_examples(validators: list[dict[str, Any]]) -> str:
"""Collect example values from validators."""
examples: list[str] = []
for v in validators:
params = v.get("params", {})
for ex in params.get("examples", []):
ex_str = str(ex)
if ex_str not in examples:
examples.append(ex_str)

if not examples:
return ""
shown = examples[:4]
result = ", ".join(shown)
if len(examples) > 4:
result += ", ..."
return result


def _format_extends(extends: str | None) -> str:
"""Format the extends field, stripping version constraint."""
if not extends:
return "none"
return extends.split("@")[0]


def generate_template_section(
name: str,
tpl: dict[str, Any],
manifest_entry: dict[str, Any],
) -> str:
"""Generate AsciiDoc for a single template."""
lines: list[str] = []

# Heading
lines.append(f"=== {name}")
lines.append("")

# Metadata line
version = tpl.get("version", manifest_entry.get("latest", ""))
layer = tpl.get("layer") or manifest_entry.get("layer") or "internal"
extends = _format_extends(
tpl.get("extends") or manifest_entry.get("extends")
)
usable_alone = tpl.get("usable_alone", manifest_entry.get("usable_alone", False))

lines.append(
f"**Version:** {version} | "
f"**Layer:** {layer} | "
f"**Extends:** {extends} | "
f"**Usable alone:** {'Yes' if usable_alone else 'No'}"
)
lines.append("")

# Description
desc = tpl.get("description", "")
if desc:
lines.append(_escape_adoc(desc.strip()))
lines.append("")

# Columns table
columns = tpl.get("columns", [])
if not columns:
lines.append("_No own columns defined (inherits all from parent)._")
lines.append("")
return "\n".join(lines)

lines.append('[cols="2,1,3,2,2", options="header"]')
lines.append("|===")
lines.append("| Column Name | Req. | Description | Validators | Examples")
lines.append("")

for col in columns:
col_name = col.get("name", "")
requirement = col.get("requirement", "")
col_desc = col.get("description", "")
validators = col.get("validators", [])

validator_summary = _summarize_validators(validators)
examples = _collect_examples(validators)

# If column is a minimal override (only name + requirement, no description),
# note it as an override
if not col_desc and requirement:
col_desc = f"_(override: requirement set to {requirement})_"

Comment on lines +184 to +188
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve inherited metadata before rendering override-only columns.

This branch only handles requirement-only overrides. If a child template changes validators but inherits the requirement/description, the generated row ends up with empty Req./Description cells — that's already visible for plants -> characteristics[organism part] in the checked-in appendix. Merge the missing fields from the parent definition before writing the row.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/generate_templates_appendix.py` around lines 184 - 188, The current
branch only sets col_desc to an override note when requirement exists but leaves
other inherited fields blank; before emitting the row, resolve and merge missing
metadata from the parent definition so inherited
description/requirement/validators are filled in. Concretely, in the block
handling col_desc and requirement (variables col_desc and requirement) look up
the parent column metadata (e.g., parent_col or parent_template[column_name]),
and for any missing values assign parent_col.get('description') /
parent_col.get('requirement') / parent_col.get('validators') as appropriate,
then preserve the override marker (e.g., append or set the "_(override:
requirement set to X)_" text only when requirement truly differs) so the
rendered row contains merged parent fields plus a clear override note.

lines.append(f"| `{_escape_adoc(col_name)}`")
lines.append(f"| {requirement}")
lines.append(f"| {_escape_adoc(col_desc)}")
lines.append(f"| {_escape_adoc(validator_summary)}")
lines.append(f"| {_escape_adoc(examples)}")
lines.append("")

lines.append("|===")
lines.append("")

return "\n".join(lines)


def generate_appendix(templates_dir: Path) -> str:
"""Generate the full AsciiDoc appendix content."""
manifest = load_manifest(templates_dir)

lines: list[str] = []
lines.append(MARKER_START)
lines.append("")
lines.append("[[template-definitions]]")
lines.append("== Template Definitions")
lines.append("")
lines.append(
"This section provides the column definitions for each SDRF-Proteomics template. "
"Each template shows only its *own* columns (not inherited ones). "
'See the "Extends" field to identify which parent template\'s columns are also included.'
)
lines.append("")

# Flatten ordered list, skipping templates not in manifest
ordered_names: list[str] = []
for group in TEMPLATE_ORDER:
for name in group:
if name in manifest:
ordered_names.append(name)

# Add any templates from manifest not in our explicit order
for name in manifest:
if name not in ordered_names:
ordered_names.append(name)

for name in ordered_names:
entry = manifest[name]
version = entry["latest"]
tpl = load_template_yaml(templates_dir, name, version)
section = generate_template_section(name, tpl, entry)
lines.append(section)

lines.append(MARKER_END)
return "\n".join(lines)


def inject_into_readme(readme_path: Path, appendix_content: str) -> None:
"""Inject template definitions into README.adoc.

If markers from a previous run exist, replace that section.
Otherwise, insert before the 'Intellectual Property Statement' heading.
"""
readme_text = readme_path.read_text()

# Check if markers from a previous run exist
if MARKER_START in readme_text:
# Replace existing auto-generated section
pattern = re.escape(MARKER_START) + r".*?" + re.escape(MARKER_END)
readme_text = re.sub(pattern, appendix_content, readme_text, flags=re.DOTALL)
else:
# Insert before the injection point
if INJECT_BEFORE not in readme_text:
raise ValueError(
f"Could not find '{INJECT_BEFORE}' in {readme_path}. "
"Cannot determine where to inject template definitions."
)
readme_text = readme_text.replace(
INJECT_BEFORE,
appendix_content + "\n\n" + INJECT_BEFORE,
)

readme_path.write_text(readme_text)


def main() -> None:
parser = argparse.ArgumentParser(
description="Generate and inject template definitions into README.adoc."
)
parser.add_argument(
"--templates-dir",
type=Path,
default=Path(__file__).parent.parent.parent / "sdrf-templates",
help="Path to sdrf-templates directory (default: ../../sdrf-templates)",
)
Comment on lines +274 to +279
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the default --templates-dir match the checked-in layout.

The workflow has to override this to sdrf-proteomics/sdrf-templates, so the documented no-arg invocation currently resolves outside the repository and misses templates.yaml. Defaulting to the in-repo path keeps the CLI usable for local generation too.

🔧 Suggested change
     parser.add_argument(
         "--templates-dir",
         type=Path,
-        default=Path(__file__).parent.parent.parent / "sdrf-templates",
-        help="Path to sdrf-templates directory (default: ../../sdrf-templates)",
+        default=Path(__file__).parent.parent / "sdrf-proteomics" / "sdrf-templates",
+        help="Path to sdrf-templates directory (default: sdrf-proteomics/sdrf-templates)",
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/generate_templates_appendix.py` around lines 274 - 279, The default
for the --templates-dir argument (set via parser.add_argument) points two levels
up from the script, which resolves outside the repository and misses
templates.yaml; change the default Path(__file__).parent.parent.parent /
"sdrf-templates" to the in-repo layout used by the project (e.g.,
Path(__file__).parent.parent / "sdrf-proteomics" / "sdrf-templates" or
equivalent repository-relative path that matches the checked-in
sdrf-proteomics/sdrf-templates layout) so that the no-arg CLI invocation finds
templates.yaml locally; update the default in the parser.add_argument call for
"--templates-dir" accordingly.

parser.add_argument(
"--readme",
type=Path,
default=Path(__file__).parent.parent / "sdrf-proteomics" / "README.adoc",
help="Path to README.adoc to inject into",
)
args = parser.parse_args()

appendix_content = generate_appendix(args.templates_dir)
inject_into_readme(args.readme, appendix_content)
print(f"Injected template definitions into {args.readme} ({len(appendix_content)} bytes)")


if __name__ == "__main__":
main()
Loading
Loading