Conversation
📝 WalkthroughWalkthroughIntroduces a Template Builder site page and navigation links, restructures Quick Start into concept-led content, adds a static sdrf-builder.html, and makes the docs build header/banner injection dev-aware (inject-headers.py and add-dev-banner.sh) with multi-page banner/header processing. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor CI as CI/build script
participant BuildScript as scripts/build-docs.sh
participant Injector as scripts/inject-headers.py
participant Banner as scripts/add-dev-banner.sh
participant HTML as site/*.html
participant Output as OUTPUT_DIR
CI ->> BuildScript: run docs build
BuildScript ->> Injector: inject headers (--dev if IS_DEV)
Injector ->> HTML: read/modify header, rewrite version links (is_dev?)
Injector -->> BuildScript: headers injected
BuildScript ->> Banner: run add-dev-banner (process all .html)
Banner ->> HTML: inject dev banner/CSS where missing
Banner -->> Output: write modified HTML files to OUTPUT_DIR
BuildScript ->> Output: copy sdrf-builder.html and other assets
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (2)
site/quickstart.html (1)
275-277: Use one canonical label-free example across the docs.This section teaches
comment[label] = label free sample, while the site examples still uselabel free. Please align the wording so beginners do not end up copying two different values for the same case.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@site/quickstart.html` around lines 275 - 277, The doc uses two different canonical values for label-free samples ("label free sample" here vs "label free" elsewhere); standardize to the one used across site examples by updating the example value in this section so the text shows comment[label] = label free (or vice‑versa if your chosen canonical is "label free sample")—edit the paragraph containing the code snippet (the displayed comment[label] assignment) to match the project's canonical label string so all docs use the same term.scripts/inject-headers.py (1)
123-129: Avoid hardcoding another static-page list here.This list has to stay in sync with the copy list in
scripts/build-docs.shand with the hand-authored static pages. The newsdrf-builder.htmladdition already required touching both places in this PR. Scanning the copied top-level*.htmlfiles, or keeping the list in one location, would make future nav/version updates less fragile.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/inject-headers.py` around lines 123 - 129, The hardcoded static_page list is fragile; change the loop that currently iterates over ["index.html", "quickstart.html", ... "sdrf-builder.html"] to dynamically discover the top-level copied HTML files instead: use output_dir.glob("*.html") (or listdir and filter for names ending with ".html") and iterate those paths, calling rewrite_version_links(str(static_file), is_dev) for each existing file; update variables static_file and static_page usage accordingly and remove the hardcoded list so future additions don't require editing this script and build-docs.sh in two places.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/add-dev-banner.sh`:
- Around line 12-16: The script currently injects the shell variable OUTPUT_DIR
directly into the inline Python source (the python3 -c invocation) by assigning
it to the Python variable output_dir, which is unsafe; instead pass the path via
sys.argv or the environment: change the python3 -c invocation so the Python code
reads output_dir = sys.argv[1] (or os.environ['OUTPUT_DIR']) and call python3 -c
with the OUTPUT_DIR as a separate argument (or export OUTPUT_DIR in the shell
prior to invoking python), updating any uses of output_dir and keeping the
banner variable as-is (refer to output_dir, banner, and the python3 -c
invocation in the script).
In `@scripts/inject-headers.py`:
- Around line 75-81: The CLI currently assumes sys.argv[1] is always the output
directory which makes --dev order-sensitive; update main() to robustly parse
arguments by either using argparse or manually scanning sys.argv to set is_dev
(presence of '--dev') and assign output_dir to the first non-flag argument
instead of sys.argv[1]; modify the references to is_dev and output_dir
accordingly so commands like "inject-headers.py --dev docs" and
"inject-headers.py docs --dev" both work (refer to main, is_dev, output_dir, and
sys.argv in the diff).
In `@site/sdrf-builder.html`:
- Around line 44-46: The clickable toggles use divs with onclick which are not
keyboard accessible and lack ARIA state; replace each <div class="why-toggle"
onclick="toggleWhy('why-technology')"> (and the repeats for other why-* ids)
with <button type="button" class="why-toggle" aria-controls="why-technology"
aria-expanded="false"> and update the toggleWhy(id) function to accept the
button element or to query the button by [aria-controls="id"], toggle the
content element's "open" class and also set the button's aria-expanded to
"true"/"false" accordingly so assistive tech can read expanded state and
keyboard users can activate the control.
In `@site/sdrf-data.json`:
- Around line 4220-4255: For dataset PXD007160, normalize datasets[].diseases by
stripping trailing numeric suffixes (and any preceding space) from each disease
string (e.g., "control0" -> "control", "Alzheimer's disease 6" -> "Alzheimer's
disease", "Alzheimer's disease/Parkinson's disease6" -> "Alzheimer's
disease/Parkinson's disease"), then deduplicate the array so only canonical
disease names remain; update the diseases array in the PXD007160 entry
(function/field: datasets[].diseases) accordingly.
- Line 2929: The disease strings are not being normalized before JSON generation
(examples: "squamous cell carcinoma", extra quotes like "'Crohn''s Disease'",
misspellings like "medulallary carcinoma"), so update the generator that builds
the disease facet (the code path that extracts the disease field /
buildDiseaseFacets / generateFacets function) to canonicalize values: trim and
collapse whitespace, strip surrounding quotes/apostrophes, normalize internal
doubled quotes, apply a small spelling-correction or mapping table for known
common typos (e.g., "medulallary" -> "medullary"), and optionally run values
through a normalization helper (normalizeDiseaseLabel) before aggregating and
writing site/sdrf-data.json so variants merge into the canonical term.
---
Nitpick comments:
In `@scripts/inject-headers.py`:
- Around line 123-129: The hardcoded static_page list is fragile; change the
loop that currently iterates over ["index.html", "quickstart.html", ...
"sdrf-builder.html"] to dynamically discover the top-level copied HTML files
instead: use output_dir.glob("*.html") (or listdir and filter for names ending
with ".html") and iterate those paths, calling
rewrite_version_links(str(static_file), is_dev) for each existing file; update
variables static_file and static_page usage accordingly and remove the hardcoded
list so future additions don't require editing this script and build-docs.sh in
two places.
In `@site/quickstart.html`:
- Around line 275-277: The doc uses two different canonical values for
label-free samples ("label free sample" here vs "label free" elsewhere);
standardize to the one used across site examples by updating the example value
in this section so the text shows comment[label] = label free (or vice‑versa if
your chosen canonical is "label free sample")—edit the paragraph containing the
code snippet (the displayed comment[label] assignment) to match the project's
canonical label string so all docs use the same term.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 67ea9e56-9791-4941-b7ed-690331dada9f
⛔ Files ignored due to path filters (1)
psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdfis excluded by!**/*.pdf
📒 Files selected for processing (9)
scripts/add-dev-banner.shscripts/build-docs.shscripts/inject-headers.pysite/index.htmlsite/quickstart.htmlsite/sdrf-builder.htmlsite/sdrf-data.jsonsite/sdrf-explorer.htmlsite/sdrf-terms.html
| python3 -c " | ||
| import os, re | ||
|
|
||
| # Append dev banner CSS to stylesheet | ||
| cat >> "$OUTPUT_DIR/css/style.css" << 'EOF' | ||
| output_dir = '$OUTPUT_DIR' | ||
| banner = '<div class=\"dev-banner\">⚠️ Development Version - This documentation is from the dev branch and may contain unreleased changes. <a href=\"/\">View stable version</a></div>' |
There was a problem hiding this comment.
Don't splice OUTPUT_DIR into executable Python.
Line 15 turns the shell argument into part of the Python source. A valid path containing ' will break the script, and it also makes the injection step harder to reason about. Pass the directory via sys.argv or the environment instead.
Suggested change
-python3 -c "
-import os, re
+python3 - "$OUTPUT_DIR" <<'PY'
+import os, re, sys
-output_dir = '$OUTPUT_DIR'
+output_dir = sys.argv[1]
banner = '<div class=\"dev-banner\">⚠️ Development Version - This documentation is from the dev branch and may contain unreleased changes. <a href=\"/\">View stable version</a></div>'
@@
-"
+PY📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| python3 -c " | |
| import os, re | |
| # Append dev banner CSS to stylesheet | |
| cat >> "$OUTPUT_DIR/css/style.css" << 'EOF' | |
| output_dir = '$OUTPUT_DIR' | |
| banner = '<div class=\"dev-banner\">⚠️ Development Version - This documentation is from the dev branch and may contain unreleased changes. <a href=\"/\">View stable version</a></div>' | |
| python3 - "$OUTPUT_DIR" <<'PY' | |
| import os, re, sys | |
| output_dir = sys.argv[1] | |
| banner = '<div class=\"dev-banner\">⚠️ Development Version - This documentation is from the dev branch and may contain unreleased changes. <a href=\"/\">View stable version</a></div>' |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/add-dev-banner.sh` around lines 12 - 16, The script currently injects
the shell variable OUTPUT_DIR directly into the inline Python source (the
python3 -c invocation) by assigning it to the Python variable output_dir, which
is unsafe; instead pass the path via sys.argv or the environment: change the
python3 -c invocation so the Python code reads output_dir = sys.argv[1] (or
os.environ['OUTPUT_DIR']) and call python3 -c with the OUTPUT_DIR as a separate
argument (or export OUTPUT_DIR in the shell prior to invoking python), updating
any uses of output_dir and keeping the banner variable as-is (refer to
output_dir, banner, and the python3 -c invocation in the script).
| def main(): | ||
| if len(sys.argv) < 2: | ||
| print("Usage: python3 inject-headers.py <output_dir>") | ||
| print("Usage: python3 inject-headers.py <output_dir> [--dev]") | ||
| sys.exit(1) | ||
|
|
||
| is_dev = '--dev' in sys.argv | ||
| output_dir = Path(sys.argv[1]) |
There was a problem hiding this comment.
--dev is order-sensitive right now.
Line 81 always treats sys.argv[1] as the output directory, so python3 scripts/inject-headers.py --dev docs will try to operate on a directory literally named --dev. Please parse the CLI instead of relying on positional order.
Suggested change
+import argparse
import sys
@@
def main():
- if len(sys.argv) < 2:
- print("Usage: python3 inject-headers.py <output_dir> [--dev]")
- sys.exit(1)
-
- is_dev = '--dev' in sys.argv
- output_dir = Path(sys.argv[1])
+ parser = argparse.ArgumentParser()
+ parser.add_argument("output_dir")
+ parser.add_argument("--dev", action="store_true")
+ args = parser.parse_args()
+
+ is_dev = args.dev
+ output_dir = Path(args.output_dir)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/inject-headers.py` around lines 75 - 81, The CLI currently assumes
sys.argv[1] is always the output directory which makes --dev order-sensitive;
update main() to robustly parse arguments by either using argparse or manually
scanning sys.argv to set is_dev (presence of '--dev') and assign output_dir to
the first non-flag argument instead of sys.argv[1]; modify the references to
is_dev and output_dir accordingly so commands like "inject-headers.py --dev
docs" and "inject-headers.py docs --dev" both work (refer to main, is_dev,
output_dir, and sys.argv in the diff).
| <div class="why-toggle" onclick="toggleWhy('why-technology')"> | ||
| Why does this matter? ▾ | ||
| </div> |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Show the disclosure markup and the toggle implementation.
rg -n -C1 'why-toggle|why-content|toggleWhy' site/sdrf-builder.htmlRepository: bigbio/proteomics-sample-metadata
Length of output: 1597
🏁 Script executed:
# Get the complete toggleWhy function implementation and surrounding context
sed -n '195,220p' site/sdrf-builder.htmlRepository: bigbio/proteomics-sample-metadata
Length of output: 340
🏁 Script executed:
# Check if any ARIA attributes exist in the why-toggle divs
rg -n 'aria-' site/sdrf-builder.html | head -20Repository: bigbio/proteomics-sample-metadata
Length of output: 59
🏁 Script executed:
# Check the CSS classes being applied by toggleWhy function
rg -n 'classList|show|hidden' site/sdrf-builder.html | grep -E '(add|remove|toggle)'Repository: bigbio/proteomics-sample-metadata
Length of output: 125
Convert "Why does this matter?" toggles to accessible button elements.
The current <div onclick> pattern is keyboard-inaccessible and provides no expanded/collapsed state to assistive technology. Replace with <button type="button"> elements using aria-controls and aria-expanded attributes, and update toggleWhy() to toggle aria-expanded on the button as well as the open class on the content.
This pattern appears at lines 44–46, 66–68, 86–88, 106–108, and 166–168.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@site/sdrf-builder.html` around lines 44 - 46, The clickable toggles use divs
with onclick which are not keyboard accessible and lack ARIA state; replace each
<div class="why-toggle" onclick="toggleWhy('why-technology')"> (and the repeats
for other why-* ids) with <button type="button" class="why-toggle"
aria-controls="why-technology" aria-expanded="false"> and update the
toggleWhy(id) function to accept the button element or to query the button by
[aria-controls="id"], toggle the content element's "open" class and also set the
button's aria-expanded to "true"/"false" accordingly so assistive tech can read
expanded state and keyboard users can activate the control.
| "squamous cell carcinoma", | ||
| "control" | ||
| "control", | ||
| "squamous cell carcinoma" |
There was a problem hiding this comment.
Canonicalize these disease labels before regenerating the JSON.
This file aggregates diseases by raw string, so variants like squamous cell carcinoma, medulallary carcinoma, and "'Crohn''s Disease'" become separate facet values instead of merging with the intended term. Please normalize them in the generator so the explorer and statistics stay clean.
Also applies to: 4384-4384, 4935-4935
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@site/sdrf-data.json` at line 2929, The disease strings are not being
normalized before JSON generation (examples: "squamous cell carcinoma", extra
quotes like "'Crohn''s Disease'", misspellings like "medulallary carcinoma"), so
update the generator that builds the disease facet (the code path that extracts
the disease field / buildDiseaseFacets / generateFacets function) to
canonicalize values: trim and collapse whitespace, strip surrounding
quotes/apostrophes, normalize internal doubled quotes, apply a small
spelling-correction or mapping table for known common typos (e.g., "medulallary"
-> "medullary"), and optionally run values through a normalization helper
(normalizeDiseaseLabel) before aggregating and writing site/sdrf-data.json so
variants merge into the canonical term.
| "Parkinson's disease", | ||
| "Alzheimer's disease 6", | ||
| "Alzheimer's disease/Parkinson's disease6", | ||
| "Parkinson's disease9", | ||
| "Alzheimer's disease/Parkinson's disease4", | ||
| "Alzheimer's disease0", | ||
| "Alzheimer's disease/Parkinson's disease9", | ||
| "Parkinson's disease6", | ||
| "Parkinson's disease5", | ||
| "Alzheimer's disease 7", | ||
| "Parkinson's disease0", | ||
| "control", | ||
| "Alzheimer's disease 5", | ||
| "Parkinson's disease7", | ||
| "Parkinson's disease9", | ||
| "Alzheimer's disease/Parkinson's disease0", | ||
| "Alzheimer's disease 3", | ||
| "Alzheimer's disease/Parkinson's disease5", | ||
| "Alzheimer's disease/Parkinson's disease8", | ||
| "control", | ||
| "Alzheimer's disease 4", | ||
| "control7", | ||
| "control3", | ||
| "Alzheimer's disease/Parkinson's disease0", | ||
| "control4", | ||
| "Alzheimer's disease 5", | ||
| "control0", | ||
| "control5", | ||
| "Alzheimer's disease 6", | ||
| "Parkinson's disease6", | ||
| "Parkinson's disease4", | ||
| "control9", | ||
| "Alzheimer's disease/Parkinson's disease4", | ||
| "Parkinson's disease5", | ||
| "Alzheimer's disease/Parkinson's disease", | ||
| "Parkinson's disease0", | ||
| "Alzheimer's disease/Parkinson's disease8", | ||
| "Alzheimer's disease/Parkinson's disease3", | ||
| "Alzheimer's disease/Parkinson's disease7", | ||
| "control7", | ||
| "Alzheimer's disease", | ||
| "Parkinson's disease7", | ||
| "control8", | ||
| "Parkinson's disease8", | ||
| "Parkinson's disease3", | ||
| "control6", | ||
| "Alzheimer's disease 9", | ||
| "Alzheimer's disease 8", | ||
| "Parkinson's disease", | ||
| "Alzheimer's disease 3", | ||
| "control3", | ||
| "Parkinson's disease8", | ||
| "Alzheimer's disease/Parkinson's disease6", | ||
| "control8", | ||
| "Alzheimer's disease/Parkinson's disease3" | ||
| "control6", | ||
| "Alzheimer's disease/Parkinson's disease" |
There was a problem hiding this comment.
Strip the numeric suffixes from the PXD007160 disease values.
Labels like control0, Parkinson's disease9, and Alzheimer's disease/Parkinson's disease6 look like canonical disease names with sample/replicate suffixes appended. Leaving them in datasets[].diseases will create dozens of fake disease facets and prevent the aggregate disease counts from rolling up correctly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@site/sdrf-data.json` around lines 4220 - 4255, For dataset PXD007160,
normalize datasets[].diseases by stripping trailing numeric suffixes (and any
preceding space) from each disease string (e.g., "control0" -> "control",
"Alzheimer's disease 6" -> "Alzheimer's disease", "Alzheimer's
disease/Parkinson's disease6" -> "Alzheimer's disease/Parkinson's disease"),
then deduplicate the array so only canonical disease names remain; update the
diseases array in the PXD007160 entry (function/field: datasets[].diseases)
accordingly.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
llms.txt (1)
84-90:⚠️ Potential issue | 🟠 MajorRemove or fix broken external tool links.
Two of the external URLs are inaccessible:
https://lessdrf.streamlit.app/(lesSDRF)https://www.maxquant.org/(MaxQuant)These should be removed or replaced with current URLs, consistent with the recent fix for stale repository URLs mentioned in the changelog.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@llms.txt` around lines 84 - 90, Remove or update the two broken external tool links for lesSDRF and MaxQuant: locate the entries containing "https://lessdrf.streamlit.app/" (lesSDRF) and "https://www.maxquant.org/" (MaxQuant) and either remove those lines or replace them with current, working URLs for lesSDRF and MaxQuant; ensure the list formatting remains consistent with the other entries and reflect the same treatment you applied to stale repository URLs in the changelog.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@llms.txt`:
- Around line 84-90: Remove or update the two broken external tool links for
lesSDRF and MaxQuant: locate the entries containing
"https://lessdrf.streamlit.app/" (lesSDRF) and "https://www.maxquant.org/"
(MaxQuant) and either remove those lines or replace them with current, working
URLs for lesSDRF and MaxQuant; ensure the list formatting remains consistent
with the other entries and reflect the same treatment you applied to stale
repository URLs in the changelog.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7a03e3eb-a6aa-4cfe-878a-4e8466c09e93
⛔ Files ignored due to path filters (1)
psi-document/sdrf-proteomics-specification-v1.1.0-dev.pdfis excluded by!**/*.pdf
📒 Files selected for processing (2)
CHANGELOG.mdllms.txt
Summary by CodeRabbit
New Features
Documentation
Chores
Data