Major refactoring to improve in examples for metaproteomics and in the PSI file generation #805
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdds a Python tool to auto-generate a Template Definitions appendix and inject it into documentation, updates GitHub workflows (link-check and release-pdf) to run the generator and adjust link-check settings, replaces several project badge/URL references to proteomics-metadata-standard, and adds three new example SDRF entries across docs and site files. Changes
Sequence DiagramsequenceDiagram
participant GH as GitHub Workflow (release-pdf)
participant CH as Checkout (with submodules)
participant PY as Python Script (generate_templates_appendix.py)
participant TY as YAML Templates (sdrf-templates/)
participant RD as README.adoc
GH->>CH: checkout repository (with submodules)
GH->>PY: run script (templates-dir, readme path)
PY->>TY: read manifest and load YAML templates
PY->>PY: generate AsciiDoc sections (metadata, validators, tables)
PY->>RD: inject or replace appendix between markers
RD-->>PY: write confirmation
PY-->>GH: exit with status
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/release-pdf.yml:
- Around line 47-51: The workflow step that runs python
scripts/generate_templates_appendix.py fails on clean runners because the
sdrf-proteomics/sdrf-templates submodule is not checked out by
actions/checkout@v4; add a preceding step (before the "Generate template
definitions appendix" run) that initializes/updates git submodules (git
submodule init && git submodule update --recursive or equivalent action step) so
the sdrf-proteomics/sdrf-templates directory and its YAML templates are present
for generate_templates_appendix.py to consume.
In `@README.md`:
- Line 4: The badge link currently points to the external GitHub URL and should
be changed to reference the repo-local LICENSE file; locate the markdown line
containing the license badge (the image alt "License" and the URL
"https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard")
and update the link target to "./LICENSE" (keep the image source as-is or
replace with a local/static image if desired) so the badge points at the local
LICENSE file instead of the external GitHub URL.
In `@scripts/generate_templates_appendix.py`:
- Around line 184-188: The current branch only sets col_desc to an override note
when requirement exists but leaves other inherited fields blank; before emitting
the row, resolve and merge missing metadata from the parent definition so
inherited description/requirement/validators are filled in. Concretely, in the
block handling col_desc and requirement (variables col_desc and requirement)
look up the parent column metadata (e.g., parent_col or
parent_template[column_name]), and for any missing values assign
parent_col.get('description') / parent_col.get('requirement') /
parent_col.get('validators') as appropriate, then preserve the override marker
(e.g., append or set the "_(override: requirement set to X)_" text only when
requirement truly differs) so the rendered row contains merged parent fields
plus a clear override note.
- Around line 88-90: The template generator currently appends "accession: {fmt}"
even when params.get("format", "") is empty, producing a bare "accession:"
entry; update the branch handling vname == "accession" (the block referencing
params.get("format", "") and parts.append) to only append the accession suffix
when fmt is non-empty (e.g., check fmt truthiness before calling parts.append)
so the empty accession suffix is skipped when format is missing.
- Around line 274-279: The default for the --templates-dir argument (set via
parser.add_argument) points two levels up from the script, which resolves
outside the repository and misses templates.yaml; change the default
Path(__file__).parent.parent.parent / "sdrf-templates" to the in-repo layout
used by the project (e.g., Path(__file__).parent.parent / "sdrf-proteomics" /
"sdrf-templates" or equivalent repository-relative path that matches the
checked-in sdrf-proteomics/sdrf-templates layout) so that the no-arg CLI
invocation finds templates.yaml locally; update the default in the
parser.add_argument call for "--templates-dir" accordingly.
In `@sdrf-proteomics/VERSIONING.adoc`:
- Line 252: Replace the raw GitHub blob URL string
"https://github.com/bigbio/proteomics-metadata-standard/blob/master/CHANGELOG.md"
in VERSIONING.adoc (the example code block line that currently shows "See
CHANGELOG for what changed: ...") with a local reference such as "CHANGELOG.md"
so the sample output uses a local file path instead of an external blob URL;
update the text in that code block to read "See CHANGELOG for what changed:
CHANGELOG.md" (or similar) to avoid the flaky external link check.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 88300af0-f197-41bc-9a77-b28987fbd8a7
⛔ Files ignored due to path filters (4)
examples/PXD003572/PXD003572.sdrf.tsvis excluded by!**/*.tsvexamples/PXD005969/PXD005969.sdrf.tsvis excluded by!**/*.tsvexamples/PXD009712/PXD009712.sdrf.tsvis excluded by!**/*.tsvpsi-document/sdrf-proteomics-specification-v1.1.0-dev.pdfis excluded by!**/*.pdf
📒 Files selected for processing (9)
.github/workflows/link-check.yml.github/workflows/release-pdf.ymlREADME.mdexamples/README.mdllms.txtscripts/generate_templates_appendix.pysdrf-proteomics/README.adocsdrf-proteomics/VERSIONING.adocsite/index.html
|  | ||
|  | ||
|  | ||
| [](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE) |
There was a problem hiding this comment.
Point the badge at the local LICENSE file.
Line 4 is the exact URL Lychee is failing on with 429s. Using the repo-local file removes the external GitHub request and keeps the badge on the current branch.
🔧 Suggested change
-[](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE)
+[](LICENSE)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| [](https://github.com/bigbio/proteomics-metadata-standard/blob/master/LICENSE) | |
| [](LICENSE) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` at line 4, The badge link currently points to the external GitHub
URL and should be changed to reference the repo-local LICENSE file; locate the
markdown line containing the license badge (the image alt "License" and the URL
"https://flat.badgen.net/github/license/bigbio/proteomics-metadata-standard")
and update the link target to "./LICENSE" (keep the image source as-is or
replace with a local/static image if desired) so the badge points at the local
LICENSE file instead of the external GitHub URL.
| elif vname == "accession": | ||
| fmt = params.get("format", "") | ||
| parts.append(f"accession: {fmt}") |
There was a problem hiding this comment.
Skip the empty accession suffix when format is missing.
When params.format is absent, this renders accession: with nothing after it. The generated appendix already shows that broken summary for comment[metagenome accession].
🔧 Suggested change
elif vname == "accession":
fmt = params.get("format", "")
- parts.append(f"accession: {fmt}")
+ parts.append(f"accession: {fmt}" if fmt else "accession")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| elif vname == "accession": | |
| fmt = params.get("format", "") | |
| parts.append(f"accession: {fmt}") | |
| elif vname == "accession": | |
| fmt = params.get("format", "") | |
| parts.append(f"accession: {fmt}" if fmt else "accession") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/generate_templates_appendix.py` around lines 88 - 90, The template
generator currently appends "accession: {fmt}" even when params.get("format",
"") is empty, producing a bare "accession:" entry; update the branch handling
vname == "accession" (the block referencing params.get("format", "") and
parts.append) to only append the accession suffix when fmt is non-empty (e.g.,
check fmt truthiness before calling parts.append) so the empty accession suffix
is skipped when format is missing.
| # If column is a minimal override (only name + requirement, no description), | ||
| # note it as an override | ||
| if not col_desc and requirement: | ||
| col_desc = f"_(override: requirement set to {requirement})_" | ||
|
|
There was a problem hiding this comment.
Resolve inherited metadata before rendering override-only columns.
This branch only handles requirement-only overrides. If a child template changes validators but inherits the requirement/description, the generated row ends up with empty Req./Description cells — that's already visible for plants -> characteristics[organism part] in the checked-in appendix. Merge the missing fields from the parent definition before writing the row.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/generate_templates_appendix.py` around lines 184 - 188, The current
branch only sets col_desc to an override note when requirement exists but leaves
other inherited fields blank; before emitting the row, resolve and merge missing
metadata from the parent definition so inherited
description/requirement/validators are filled in. Concretely, in the block
handling col_desc and requirement (variables col_desc and requirement) look up
the parent column metadata (e.g., parent_col or parent_template[column_name]),
and for any missing values assign parent_col.get('description') /
parent_col.get('requirement') / parent_col.get('validators') as appropriate,
then preserve the override marker (e.g., append or set the "_(override:
requirement set to X)_" text only when requirement truly differs) so the
rendered row contains merged parent fields plus a clear override note.
| parser.add_argument( | ||
| "--templates-dir", | ||
| type=Path, | ||
| default=Path(__file__).parent.parent.parent / "sdrf-templates", | ||
| help="Path to sdrf-templates directory (default: ../../sdrf-templates)", | ||
| ) |
There was a problem hiding this comment.
Make the default --templates-dir match the checked-in layout.
The workflow has to override this to sdrf-proteomics/sdrf-templates, so the documented no-arg invocation currently resolves outside the repository and misses templates.yaml. Defaulting to the in-repo path keeps the CLI usable for local generation too.
🔧 Suggested change
parser.add_argument(
"--templates-dir",
type=Path,
- default=Path(__file__).parent.parent.parent / "sdrf-templates",
- help="Path to sdrf-templates directory (default: ../../sdrf-templates)",
+ default=Path(__file__).parent.parent / "sdrf-proteomics" / "sdrf-templates",
+ help="Path to sdrf-templates directory (default: sdrf-proteomics/sdrf-templates)",
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/generate_templates_appendix.py` around lines 274 - 279, The default
for the --templates-dir argument (set via parser.add_argument) points two levels
up from the script, which resolves outside the repository and misses
templates.yaml; change the default Path(__file__).parent.parent.parent /
"sdrf-templates" to the in-repo layout used by the project (e.g.,
Path(__file__).parent.parent / "sdrf-proteomics" / "sdrf-templates" or
equivalent repository-relative path that matches the checked-in
sdrf-proteomics/sdrf-templates layout) so that the no-arg CLI invocation finds
templates.yaml locally; update the default in the parser.add_argument call for
"--templates-dir" accordingly.
This pull request introduces improvements to the CI workflows and updates repository references in the documentation. The most significant changes are enhancements to the link checking and PDF release workflows, and updates to ensure the documentation points to the correct repository.
CI Workflow Improvements:
.github/workflows/link-check.yml: Increased link checker timeout, reduced concurrency, added retry logic, and set theGITHUB_TOKENenvironment variable to improve reliability of link checking..github/workflows/release-pdf.yml: Added steps to set up Python, install dependencies, and generate a template definitions appendix as part of the PDF release workflow.Documentation Updates:
README.md: Updated all badge and link references frombigbio/proteomics-sample-metadatatobigbio/proteomics-metadata-standardfor consistency with the current repository.README.md: Updated the sample template link to point to the correct location inproteomics-metadata-standard.This pull request updates repository references in theREADME.mdto point to the correctproteomics-metadata-standardrepository, and improves the link-checking workflow configuration for more robust link validation. These changes help ensure documentation accuracy and improve CI reliability.Repository reference updates:
README.mdto referencebigbio/proteomics-metadata-standardinstead ofbigbio/proteomics-sample-metadata, ensuring all project metadata and links are correct.Link-check workflow improvements:
GITHUB_TOKENenvironment variable for improved reliability in.github/workflows/link-check.yml.Summary by CodeRabbit
New Features
Documentation