Skip to content

Major refactoring to improve in examples for metaproteomics and in the PSI file generation #805

Merged
ypriverol merged 14 commits intomasterfrom
dev
Mar 9, 2026
Merged

Major refactoring to improve in examples for metaproteomics and in the PSI file generation #805
ypriverol merged 14 commits intomasterfrom
dev

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Mar 9, 2026

This pull request introduces improvements to the CI workflows and updates repository references in the documentation. The most significant changes are enhancements to the link checking and PDF release workflows, and updates to ensure the documentation points to the correct repository.

CI Workflow Improvements:

  • .github/workflows/link-check.yml: Increased link checker timeout, reduced concurrency, added retry logic, and set the GITHUB_TOKEN environment variable to improve reliability of link checking.
  • .github/workflows/release-pdf.yml: Added steps to set up Python, install dependencies, and generate a template definitions appendix as part of the PDF release workflow.

Documentation Updates:

  • README.md: Updated all badge and link references from bigbio/proteomics-sample-metadata to bigbio/proteomics-metadata-standard for consistency with the current repository.
  • README.md: Updated the sample template link to point to the correct location in proteomics-metadata-standard.This pull request updates repository references in the README.md to point to the correct proteomics-metadata-standard repository, and improves the link-checking workflow configuration for more robust link validation. These changes help ensure documentation accuracy and improve CI reliability.

Repository reference updates:

  • Updated all badges and links in README.md to reference bigbio/proteomics-metadata-standard instead of bigbio/proteomics-sample-metadata, ensuring all project metadata and links are correct.
  • Fixed the sample template link in the dataset annotation steps to point to the correct repository and path.

Link-check workflow improvements:

  • Increased link-check timeout to 30 seconds, reduced max concurrency to 3, added retry options, and set the GITHUB_TOKEN environment variable for improved reliability in .github/workflows/link-check.yml.

Summary by CodeRabbit

  • New Features

    • Added three metaproteomics example datasets: human gut extraction methods, Mediterranean soil, and Pacific ocean depth samples
    • Auto-generated template definitions reference documentation now included
  • Documentation

    • Updated documentation references to reflect current project structure
    • Enhanced CI/CD workflow with improved timeout handling, retry controls, and Python environment setup

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 9, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 45d54460-6f91-477a-a147-a9a543ee20a4

📥 Commits

Reviewing files that changed from the base of the PR and between 7412e87 and 2c21c80.

⛔ Files ignored due to path filters (1)
  • psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdf is excluded by !**/*.pdf
📒 Files selected for processing (1)
  • .github/workflows/release-pdf.yml
🚧 Files skipped from review as they are similar to previous changes (1)
  • .github/workflows/release-pdf.yml

📝 Walkthrough

Walkthrough

Adds a Python tool to auto-generate a Template Definitions appendix and inject it into documentation, updates GitHub workflows (link-check and release-pdf) to run the generator and adjust link-check settings, replaces several project badge/URL references to proteomics-metadata-standard, and adds three new example SDRF entries across docs and site files.

Changes

Cohort / File(s) Summary
GitHub Workflows
.github/workflows/link-check.yml, .github/workflows/release-pdf.yml
Link-check: increased timeout, reduced concurrency, added retries and GITHUB_TOKEN. release-pdf: enable submodules and add Python setup, dependency install, and template-generation step before version extraction.
Template generation script
scripts/generate_templates_appendix.py
New script that reads sdrf-templates manifest/YAML, builds AsciiDoc template sections (metadata, validators, examples, columns), and injects the appendix into README.adoc between defined markers or before the IP statement.
Auto-generated docs
sdrf-proteomics/README.adoc
Large auto-generated "Template Definitions" appendix inserted (appears duplicated in diff); bounded by injection markers.
Project reference updates
README.md, sdrf-proteomics/VERSIONING.adoc
Replaced references and badge links from proteomics-sample-metadata to proteomics-metadata-standard and updated related URLs.
Examples & site
examples/README.md, llms.txt, site/index.html
Added three metaproteomics example entries (PXD005969, PXD003572, PXD009712) to examples list, examples README, and site index.

Sequence Diagram

sequenceDiagram
    participant GH as GitHub Workflow (release-pdf)
    participant CH as Checkout (with submodules)
    participant PY as Python Script (generate_templates_appendix.py)
    participant TY as YAML Templates (sdrf-templates/)
    participant RD as README.adoc

    GH->>CH: checkout repository (with submodules)
    GH->>PY: run script (templates-dir, readme path)
    PY->>TY: read manifest and load YAML templates
    PY->>PY: generate AsciiDoc sections (metadata, validators, tables)
    PY->>RD: inject or replace appendix between markers
    RD-->>PY: write confirmation
    PY-->>GH: exit with status
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

Review effort 3/5

Suggested reviewers

  • nithujohn
  • noatgnu
  • levitsky
  • timosachsenberg

Poem

🐰 I hopped through YAML fields with cheer,
Built template pages for all to peer,
Three examples added, badges set anew,
Workflows call my script — docs bloom true,
A tidy hop, and the README’s full of dew ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title claims 'major refactoring' but changes are primarily documentation updates, repository reference changes, and CI workflow adjustments. It mentions only metaproteomics examples and PSI file generation, missing significant work on CI improvements and documentation restructuring. Revise the title to accurately reflect the main changes: e.g., 'Update repository references, improve CI workflows, and add metaproteomics examples' or 'Refactor documentation references and enhance CI reliability with template generation'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 87.50% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/release-pdf.yml:
- Around line 47-51: The workflow step that runs python
scripts/generate_templates_appendix.py fails on clean runners because the
sdrf-proteomics/sdrf-templates submodule is not checked out by
actions/checkout@v4; add a preceding step (before the "Generate template
definitions appendix" run) that initializes/updates git submodules (git
submodule init && git submodule update --recursive or equivalent action step) so
the sdrf-proteomics/sdrf-templates directory and its YAML templates are present
for generate_templates_appendix.py to consume.

In `@README.md`:
- Line 4: The badge link currently points to the external GitHub URL and should
be changed to reference the repo-local LICENSE file; locate the markdown line
containing the license badge (the image alt "License" and the URL
"https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard")
and update the link target to "./LICENSE" (keep the image source as-is or
replace with a local/static image if desired) so the badge points at the local
LICENSE file instead of the external GitHub URL.

In `@scripts/generate_templates_appendix.py`:
- Around line 184-188: The current branch only sets col_desc to an override note
when requirement exists but leaves other inherited fields blank; before emitting
the row, resolve and merge missing metadata from the parent definition so
inherited description/requirement/validators are filled in. Concretely, in the
block handling col_desc and requirement (variables col_desc and requirement)
look up the parent column metadata (e.g., parent_col or
parent_template[column_name]), and for any missing values assign
parent_col.get('description') / parent_col.get('requirement') /
parent_col.get('validators') as appropriate, then preserve the override marker
(e.g., append or set the "_(override: requirement set to X)_" text only when
requirement truly differs) so the rendered row contains merged parent fields
plus a clear override note.
- Around line 88-90: The template generator currently appends "accession: {fmt}"
even when params.get("format", "") is empty, producing a bare "accession:"
entry; update the branch handling vname == "accession" (the block referencing
params.get("format", "") and parts.append) to only append the accession suffix
when fmt is non-empty (e.g., check fmt truthiness before calling parts.append)
so the empty accession suffix is skipped when format is missing.
- Around line 274-279: The default for the --templates-dir argument (set via
parser.add_argument) points two levels up from the script, which resolves
outside the repository and misses templates.yaml; change the default
Path(__file__).parent.parent.parent / "sdrf-templates" to the in-repo layout
used by the project (e.g., Path(__file__).parent.parent / "sdrf-proteomics" /
"sdrf-templates" or equivalent repository-relative path that matches the
checked-in sdrf-proteomics/sdrf-templates layout) so that the no-arg CLI
invocation finds templates.yaml locally; update the default in the
parser.add_argument call for "--templates-dir" accordingly.

In `@sdrf-proteomics/VERSIONING.adoc`:
- Line 252: Replace the raw GitHub blob URL string
"https://github.com/bigbio/proteomics-metadata-standard/blob/master/CHANGELOG.md"
in VERSIONING.adoc (the example code block line that currently shows "See
CHANGELOG for what changed: ...") with a local reference such as "CHANGELOG.md"
so the sample output uses a local file path instead of an external blob URL;
update the text in that code block to read "See CHANGELOG for what changed:
CHANGELOG.md" (or similar) to avoid the flaky external link check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 88300af0-f197-41bc-9a77-b28987fbd8a7

📥 Commits

Reviewing files that changed from the base of the PR and between 04e1ed8 and 7412e87.

⛔ Files ignored due to path filters (4)
  • examples/PXD003572/PXD003572.sdrf.tsv is excluded by !**/*.tsv
  • examples/PXD005969/PXD005969.sdrf.tsv is excluded by !**/*.tsv
  • examples/PXD009712/PXD009712.sdrf.tsv is excluded by !**/*.tsv
  • psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdf is excluded by !**/*.pdf
📒 Files selected for processing (9)
  • .github/workflows/link-check.yml
  • .github/workflows/release-pdf.yml
  • README.md
  • examples/README.md
  • llms.txt
  • scripts/generate_templates_appendix.py
  • sdrf-proteomics/README.adoc
  • sdrf-proteomics/VERSIONING.adoc
  • site/index.html

![Contributors](https://flat.badgen.net/github/contributors/bigbio/proteomics-sample-metadata)
![Watchers](https://flat.badgen.net/github/watchers/bigbio/proteomics-sample-metadata)
![Stars](https://flat.badgen.net/github/stars/bigbio/proteomics-sample-metadata)
[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Point the badge at the local LICENSE file.

Line 4 is the exact URL Lychee is failing on with 429s. Using the repo-local file removes the external GitHub request and keeps the badge on the current branch.

🔧 Suggested change
-[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE)
+[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](LICENSE)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE)
[![License](https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard)](LICENSE)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` at line 4, The badge link currently points to the external GitHub
URL and should be changed to reference the repo-local LICENSE file; locate the
markdown line containing the license badge (the image alt "License" and the URL
"https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard")
and update the link target to "./LICENSE" (keep the image source as-is or
replace with a local/static image if desired) so the badge points at the local
LICENSE file instead of the external GitHub URL.

Comment on lines +88 to +90
elif vname == "accession":
fmt = params.get("format", "")
parts.append(f"accession: {fmt}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Skip the empty accession suffix when format is missing.

When params.format is absent, this renders accession: with nothing after it. The generated appendix already shows that broken summary for comment[metagenome accession].

🔧 Suggested change
         elif vname == "accession":
             fmt = params.get("format", "")
-            parts.append(f"accession: {fmt}")
+            parts.append(f"accession: {fmt}" if fmt else "accession")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
elif vname == "accession":
fmt = params.get("format", "")
parts.append(f"accession: {fmt}")
elif vname == "accession":
fmt = params.get("format", "")
parts.append(f"accession: {fmt}" if fmt else "accession")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/generate_templates_appendix.py` around lines 88 - 90, The template
generator currently appends "accession: {fmt}" even when params.get("format",
"") is empty, producing a bare "accession:" entry; update the branch handling
vname == "accession" (the block referencing params.get("format", "") and
parts.append) to only append the accession suffix when fmt is non-empty (e.g.,
check fmt truthiness before calling parts.append) so the empty accession suffix
is skipped when format is missing.

Comment on lines +184 to +188
# If column is a minimal override (only name + requirement, no description),
# note it as an override
if not col_desc and requirement:
col_desc = f"_(override: requirement set to {requirement})_"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Resolve inherited metadata before rendering override-only columns.

This branch only handles requirement-only overrides. If a child template changes validators but inherits the requirement/description, the generated row ends up with empty Req./Description cells — that's already visible for plants -> characteristics[organism part] in the checked-in appendix. Merge the missing fields from the parent definition before writing the row.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/generate_templates_appendix.py` around lines 184 - 188, The current
branch only sets col_desc to an override note when requirement exists but leaves
other inherited fields blank; before emitting the row, resolve and merge missing
metadata from the parent definition so inherited
description/requirement/validators are filled in. Concretely, in the block
handling col_desc and requirement (variables col_desc and requirement) look up
the parent column metadata (e.g., parent_col or parent_template[column_name]),
and for any missing values assign parent_col.get('description') /
parent_col.get('requirement') / parent_col.get('validators') as appropriate,
then preserve the override marker (e.g., append or set the "_(override:
requirement set to X)_" text only when requirement truly differs) so the
rendered row contains merged parent fields plus a clear override note.

Comment on lines +274 to +279
parser.add_argument(
"--templates-dir",
type=Path,
default=Path(__file__).parent.parent.parent / "sdrf-templates",
help="Path to sdrf-templates directory (default: ../../sdrf-templates)",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Make the default --templates-dir match the checked-in layout.

The workflow has to override this to sdrf-proteomics/sdrf-templates, so the documented no-arg invocation currently resolves outside the repository and misses templates.yaml. Defaulting to the in-repo path keeps the CLI usable for local generation too.

🔧 Suggested change
     parser.add_argument(
         "--templates-dir",
         type=Path,
-        default=Path(__file__).parent.parent.parent / "sdrf-templates",
-        help="Path to sdrf-templates directory (default: ../../sdrf-templates)",
+        default=Path(__file__).parent.parent / "sdrf-proteomics" / "sdrf-templates",
+        help="Path to sdrf-templates directory (default: sdrf-proteomics/sdrf-templates)",
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/generate_templates_appendix.py` around lines 274 - 279, The default
for the --templates-dir argument (set via parser.add_argument) points two levels
up from the script, which resolves outside the repository and misses
templates.yaml; change the default Path(__file__).parent.parent.parent /
"sdrf-templates" to the in-repo layout used by the project (e.g.,
Path(__file__).parent.parent / "sdrf-proteomics" / "sdrf-templates" or
equivalent repository-relative path that matches the checked-in
sdrf-proteomics/sdrf-templates layout) so that the no-arg CLI invocation finds
templates.yaml locally; update the default in the parser.add_argument call for
"--templates-dir" accordingly.

@ypriverol ypriverol merged commit f579b04 into master Mar 9, 2026
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant