Skip to content

Conversation

@ypriverol
Copy link
Member

@ypriverol ypriverol commented Jan 28, 2026

Summary by CodeRabbit

  • New Features

    • Added SDRF metadata enrichment functionality to augment features with condition, biological replicate, and fraction information.
  • Improvements

    • Enhanced technical replicate extraction with robust multi-step fallback approach for handling various naming conventions.
    • Improved handling of wide-format parquet data with automatic unnesting of nested intensities format.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

The PR introduces SDRF metadata enrichment to the peptide normalization module. Changes include robust fallback logic for technical replicate extraction, a new enrich_with_sdrf method, DuckDB view restructuring for unnested intensities, integration of SDRF enrichment into normalization workflows, and comprehensive test suite updates for wide-format data handling.

Changes

Cohort / File(s) Summary
Git Ignore
.gitignore
Added test output directories and poetry.lock to ignored paths. No functional changes.
SDRF Enrichment & Feature Expansion
mokume/normalization/peptide.py
Added new public method enrich_with_sdrf(sdrf_path: str) that loads SDRF metadata, derives condition/biological_replicate/fraction mappings, and registers them in DuckDB. Restructured view creation in Feature.__init__ to produce unnested intensities view. Enhanced apply_initial_filtering with three-tier fallback logic for TECHREPLICATE extraction (last underscore segment → int conversion → sequential index mapping). Updated get_unique_tec_reps with expanded docstring and multi-step parsing. Integrated SDRF enrichment call in peptide_normalization.
Test Suite Restructuring
tests/test_peptide_normalize.py
Removed original test_feature_assembly and introduced comprehensive test class suite: TestFeatureWideFormat, TestPeptideNormalizationWideFormat, TestPeptideNormalizationWithFilterConfig. Added tests for wide-format parquet loading, SDRF enrichment, median maps, low-frequency peptide filtering, and normalization scenarios. Updated fixture paths to reference feature_wide.parquet and PXD020192.sdrf.tsv.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Feature as Feature Class
    participant DuckDB as DuckDB Engine
    participant SDRF as SDRF File
    participant View as View Layer

    User->>Feature: enrich_with_sdrf(sdrf_path)
    Feature->>SDRF: Load SDRF metadata
    SDRF-->>Feature: SDRF data
    Feature->>Feature: Derive condition/bio_rep/fraction mappings
    Feature->>DuckDB: Register SDRF mapping table
    DuckDB-->>Feature: Mapping registered
    Feature->>DuckDB: Create unnested intensities view
    DuckDB-->>Feature: Unnested view created
    Feature->>View: Recreate parquet_db view<br/>(join with SDRF)
    View-->>Feature: Enriched view ready
    Feature-->>User: SDRF enrichment complete
Loading
sequenceDiagram
    participant DataLoad as Data Load
    participant Feature as Feature.__init__
    participant DuckDB as DuckDB
    participant View as View Creation

    DataLoad->>Feature: Initialize with parquet
    Feature->>DuckDB: parquet_scan (create parquet_db_raw)
    DuckDB-->>View: Raw view available
    View->>DuckDB: UNNEST nested intensities
    DuckDB-->>Feature: parquet_db view (long-format)
    Note over Feature: Data reshaped:<br/>Nested → Unnested/Long-format
    Feature-->>DataLoad: Feature ready for processing
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 SDRF enrichment hops into view,
Unnesting data, making it new,
Three fallback paths when numbers stray,
Robust extraction saves the day!
Tests dance wide where features play. 🥕

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Changes to support qpx' is vague and does not clearly convey the main changes in the pull request, which involve SDRF enrichment, wide-format data handling, and technical replicate extraction improvements. Use a more descriptive title that clearly indicates the primary change, such as 'Add SDRF enrichment support and wide-format data handling' or 'Support wide-format peptide normalization with SDRF metadata integration'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 95.24% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
In `@mokume/normalization/peptide.py`:
- Around line 515-519: The CREATE VIEW call directly injects database_path into
the SQL string via format, which can break on quotes or permit SQL injection;
update the call that uses self.parquet_db.execute so the parquet_scan path is
passed as a bound parameter instead of string-formatted in the SQL (i.e. replace
the formatted "SELECT * FROM parquet_scan('{}')" usage with a parameter
placeholder and pass database_path as the parameter to execute). Ensure you
modify the code that constructs the SQL (the parquet_scan invocation inside the
CREATE VIEW statement) and run the query using the DB API's parameter binding
mechanism, then test with paths containing quotes and special chars to confirm
correct behavior.
- Around line 325-337: The code tries to use pandas .str methods on data_df[RUN]
without ensuring it's a string dtype, causing AttributeError for numeric RUNs
and bypassing the except fallback; update the block referencing data_df, RUN and
TECHREPLICATE to cast the RUN column to string first (e.g., use
data_df[RUN].astype(str) before .str.contains, .str.split, .str.get) so numeric
run names are handled correctly, keep the existing int casts/except fallback
logic intact; make the same change in the analogous block around the code that
handles lines 743-755.

In `@tests/test_peptide_normalize.py`:
- Around line 198-214: The test fails if the output parent directory doesn't
exist: before calling peptide_normalization(**args) (where args["output"] is
used and WriteCSVTask writes the file), ensure the parent directory of
Path(args["output"]) is created (e.g., out.parent.mkdir(parents=True,
exist_ok=True)) so the file can be written; apply this same fix to the other
similar test blocks in tests/test_peptide_normalize.py that construct out and
call peptide_normalization.
- Around line 116-117: The linter warnings come from unused variables in the
test loops and an unused assignment; in the loop iterating over med_map replace
the unused loop variable `sample` with `_`, similarly replace the unused
`condition` loop variable with `_`, and remove the unused `samples` assignment
(or stop assigning to it) so only used variables remain; update occurrences near
the med_map loop and the two other loops that reference `condition` and
`samples` in tests/test_peptide_normalize.py (keep the loop structure and
assertions intact).

Comment on lines +325 to +337
# Try to extract technical replicate from run name
try:
if data_df[RUN].str.contains("_").all():
# Get the last part after underscore (e.g., "S1_Brain_2" -> "2")
last_parts = data_df[RUN].str.split("_").str.get(-1)
data_df[TECHREPLICATE] = last_parts.astype("int")
else:
data_df[TECHREPLICATE] = data_df[RUN].astype("int")
except (ValueError, TypeError):
# Fall back to using run index
unique_runs = data_df[RUN].unique().tolist()
run_to_index = {run: i + 1 for i, run in enumerate(unique_runs)}
data_df[TECHREPLICATE] = data_df[RUN].map(run_to_index)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the peptide.py file
find . -type f -name "peptide.py" | head -20

Repository: bigbio/mokume

Length of output: 132


🏁 Script executed:

# Read the file around lines 325-337
cat -n mokume/normalization/peptide.py | sed -n '315,345p'

Repository: bigbio/mokume

Length of output: 1567


🏁 Script executed:

# Also read the file around lines 743-755
cat -n mokume/normalization/peptide.py | sed -n '735,765p'

Repository: bigbio/mokume

Length of output: 1664


🏁 Script executed:

# Check what RUN and TECHREPLICATE are (likely constants)
rg "^RUN\s*=" mokume/normalization/peptide.py | head -5
rg "^TECHREPLICATE\s*=" mokume/normalization/peptide.py | head -5

Repository: bigbio/mokume

Length of output: 39


🏁 Script executed:

# Search for constant definitions
rg "(RUN|TECHREPLICATE)" mokume/normalization/peptide.py | grep -E "(^[^:]*:\s*(RUN|TECHREPLICATE)\s*=)" | head -10

Repository: bigbio/mokume

Length of output: 39


🏁 Script executed:

# Check imports and constants at the top of the file
head -100 mokume/normalization/peptide.py | cat -n

Repository: bigbio/mokume

Length of output: 3749


🏁 Script executed:

# Let's create a test to understand the behavior
python3 << 'EOF'
import pandas as pd

# Test 1: str.contains() on numeric dtype
df_numeric = pd.DataFrame({'col': [1, 2, 3]})
try:
    result = df_numeric['col'].str.contains("_")
    print(f"Test 1 result: {result}")
except AttributeError as e:
    print(f"Test 1 - AttributeError: {e}")
except Exception as e:
    print(f"Test 1 - Other exception ({type(e).__name__}): {e}")

# Test 2: str.contains() on string dtype
df_string = pd.DataFrame({'col': ['A_1', 'B_2', 'C_3']})
try:
    result = df_string['col'].str.contains("_")
    print(f"Test 2 result:\n{result.tolist()}")
except Exception as e:
    print(f"Test 2 - Exception: {e}")

# Test 3: Conversion and then str.contains()
df_numeric = pd.DataFrame({'col': [1, 2, 3]})
try:
    result = df_numeric['col'].astype(str).str.contains("_")
    print(f"Test 3 result:\n{result.tolist()}")
except Exception as e:
    print(f"Test 3 - Exception: {e}")

# Test 4: What happens with mixed data (strings with numbers)
df_mixed = pd.DataFrame({'col': ['S1_Brain_2', 'S2_Brain_3', 123]})
try:
    result = df_mixed['col'].str.contains("_")
    print(f"Test 4 result:\n{result.tolist()}")
except Exception as e:
    print(f"Test 4 - Exception ({type(e).__name__}): {e}")
EOF

Repository: bigbio/mokume

Length of output: 143


🏁 Script executed:

# Simpler test with a timeout and smaller scope
timeout 30 python3 -c "
import pandas as pd
# Test what exception is raised on numeric dtype
df = pd.DataFrame({'col': [1, 2, 3]})
try:
    df['col'].str.contains('_')
except AttributeError:
    print('AttributeError')
except Exception as e:
    print(type(e).__name__)
"

Repository: bigbio/mokume

Length of output: 104


🏁 Script executed:

# Check what happens when RUN column contains numeric values
# Let's look at the data loading code to understand the dtype
rg -A 10 -B 5 "read_csv|read_parquet" mokume/normalization/peptide.py | head -50

Repository: bigbio/mokume

Length of output: 2091


🏁 Script executed:

# Look at error handling around the RUN field
rg -B 5 -A 5 "RUN\]" mokume/normalization/peptide.py | head -80

Repository: bigbio/mokume

Length of output: 1698


🏁 Script executed:

# Check how the RUN column is loaded/determined
rg -B 10 -A 10 "data_df\s*=.*read" mokume/normalization/peptide.py | head -60

Repository: bigbio/mokume

Length of output: 39


🏁 Script executed:

# Search for any type conversion or dtype specification for RUN
rg "RUN.*dtype|astype.*RUN" mokume/normalization/peptide.py

Repository: bigbio/mokume

Length of output: 39


🏁 Script executed:

# Check if there are any tests or examples showing RUN with numeric values
find . -name "*.py" -type f -exec rg -l "RUN.*=.*[0-9]" {} \; | head -10

Repository: bigbio/mokume

Length of output: 39


🏁 Script executed:

# Verify pandas behavior with simple inline Python
python3 << 'EOF'
import pandas as pd
import sys

# Test .str accessor on numeric dtype
try:
    df = pd.DataFrame({'col': [1, 2, 3]})
    result = df['col'].str.contains("_")
    print("No exception raised")
except AttributeError as e:
    print(f"AttributeError: {str(e)[:80]}")
except Exception as e:
    print(f"Other {type(e).__name__}: {str(e)[:80]}")

# Test conversion approach
try:
    df = pd.DataFrame({'col': [1, 2, 3]})
    result = df['col'].astype(str).str.contains("_")
    print(f"After astype(str): Success - {result.tolist()}")
except Exception as e:
    print(f"After astype(str): {type(e).__name__}")
EOF

Repository: bigbio/mokume

Length of output: 175


🏁 Script executed:

# Let's check the data flow more carefully - look at where data_df is created
rg -B 30 "data_df\s*=" mokume/normalization/peptide.py | grep -A 30 "def peptide_normalization"

Repository: bigbio/mokume

Length of output: 39


🏁 Script executed:

# Check function signature and early operations
sed -n '250,330p' mokume/normalization/peptide.py

Repository: bigbio/mokume

Length of output: 2915


🏁 Script executed:

# Look for any test files that might show what RUN values look like
find . -name "*test*.py" -type f | xargs rg "RUN" | head -20

Repository: bigbio/mokume

Length of output: 414


🏁 Script executed:

# Check if there's any documentation about the expected RUN format
rg "RUN" mokume/core/constants.py -B 3 -A 3

Repository: bigbio/mokume

Length of output: 580


🏁 Script executed:

# Let's look at the second location (743-755) more carefully
sed -n '730,760p' mokume/normalization/peptide.py

Repository: bigbio/mokume

Length of output: 1440


🏁 Script executed:

# Check if there are any integration tests or actual data examples
find . -name "*.parquet" -o -name "*.csv" | head -5

Repository: bigbio/mokume

Length of output: 228


🏁 Script executed:

# Verify pandas behavior documentation by looking at existing error handling
rg "AttributeError" mokume/ | head -10

Repository: bigbio/mokume

Length of output: 268


🏁 Script executed:

# Check what column type the parquet_db returns for "run"
rg "parquet_db.*sql" mokume/normalization/peptide.py -B 2 -A 5

Repository: bigbio/mokume

Length of output: 3497


Add string casting before .str accessor to handle numeric RUN dtypes.

If the RUN column is numeric, calling .str.contains("_") raises AttributeError instead of ValueError/TypeError, which bypasses the fallback logic. Cast to string first to ensure the code handles numeric runs as documented (case 2 in the docstring: "If run name is numeric, use it directly").

🛠️ Proposed fix
@@
-        if data_df[RUN].str.contains("_").all():
+        run_series = data_df[RUN].astype(str)
+        if run_series.str.contains("_").all():
             # Get the last part after underscore (e.g., "S1_Brain_2" -> "2")
-            last_parts = data_df[RUN].str.split("_").str.get(-1)
-            data_df[TECHREPLICATE] = last_parts.astype("int")
+            last_parts = run_series.str.split("_").str.get(-1)
+            data_df[TECHREPLICATE] = pd.to_numeric(last_parts, errors="raise").astype("int")
         else:
-            data_df[TECHREPLICATE] = data_df[RUN].astype("int")
+            data_df[TECHREPLICATE] = pd.to_numeric(data_df[RUN], errors="raise").astype("int")
@@
-            if unique["run"].str.contains("_").all():
+            run_series = unique["run"].astype(str)
+            if run_series.str.contains("_").all():
                 # Get the last part after splitting by underscore
-                last_parts = unique["run"].str.split("_").str.get(-1)
-                unique["run"] = last_parts.astype("int")
+                last_parts = run_series.str.split("_").str.get(-1)
+                unique["run"] = pd.to_numeric(last_parts, errors="raise").astype("int")
             else:
                 # Try to convert directly to int
-                unique["run"] = unique["run"].astype("int")
+                unique["run"] = pd.to_numeric(unique["run"], errors="raise").astype("int")

Also applies to: 743-755

🤖 Prompt for AI Agents
In `@mokume/normalization/peptide.py` around lines 325 - 337, The code tries to
use pandas .str methods on data_df[RUN] without ensuring it's a string dtype,
causing AttributeError for numeric RUNs and bypassing the except fallback;
update the block referencing data_df, RUN and TECHREPLICATE to cast the RUN
column to string first (e.g., use data_df[RUN].astype(str) before .str.contains,
.str.split, .str.get) so numeric run names are handled correctly, keep the
existing int casts/except fallback logic intact; make the same change in the
analogous block around the code that handles lines 743-755.

Comment on lines +515 to +519
# Create raw view from parquet
self.parquet_db.execute(
"CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan('{}')".format(
database_path
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n mokume/normalization/peptide.py | sed -n '510,525p'

Repository: bigbio/mokume

Length of output: 813


🌐 Web query:

DuckDB parquet_scan parameterized queries bound parameters

💡 Result:

DuckDB supports bound parameters in prepared statements using ?, positional $1/$2/..., or named $param placeholders. You can use these placeholders in normal predicates and as arguments to table functions like parquet_scan(...) (e.g., binding the file path / glob at execution time). [1]

SQL (CLI / SQL API)

PREPARE q AS
SELECT *
FROM parquet_scan($1)
WHERE some_col >= $2;

EXECUTE q('data/**/*.parquet', 10);

DuckDB also supports named parameters in prepared statements. [1]

Python (DB-API)

DuckDB’s Python execute() accepts a parameters= argument and will run the statement as a prepared statement with bound values. [2][3]

Positional (? or $1):

import duckdb

con = duckdb.connect()

path = "data/**/*.parquet"
min_val = 10

df = con.execute(
    "SELECT * FROM parquet_scan(?) WHERE some_col >= ?",
    [path, min_val],
).fetch_df()

Named ($param):

df = con.execute(
    "SELECT * FROM parquet_scan($path) WHERE some_col >= $min_val",
    {"path": path, "min_val": min_val},
).fetch_df()

Important limitation

Bind parameters substitute values, not identifiers—so you can’t parameterize things like column names or ORDER BY <identifier> directly; for those you must construct SQL text (carefully) or use other approaches. [1]

Sources: DuckDB prepared statements syntax (SQL) [1]; DuckDB Python DB-API parameter binding [2]; DuckDB Python execute(query, parameters=None) reference [3].


Parameterize parquet_scan to avoid SQL injection and quoting issues.

Direct string formatting with a user-supplied path can break on quotes and enable SQL injection. Use bound parameters instead.

🔒 Proposed fix
-            self.parquet_db.execute(
-                "CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan('{}')".format(
-                    database_path
-                )
-            )
+            self.parquet_db.execute(
+                "CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan(?)",
+                [database_path],
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Create raw view from parquet
self.parquet_db.execute(
"CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan('{}')".format(
database_path
)
# Create raw view from parquet
self.parquet_db.execute(
"CREATE VIEW parquet_db_raw AS SELECT * FROM parquet_scan(?)",
[database_path],
)
🧰 Tools
🪛 Ruff (0.14.14)

517-519: Possible SQL injection vector through string-based query construction

(S608)

🤖 Prompt for AI Agents
In `@mokume/normalization/peptide.py` around lines 515 - 519, The CREATE VIEW call
directly injects database_path into the SQL string via format, which can break
on quotes or permit SQL injection; update the call that uses
self.parquet_db.execute so the parquet_scan path is passed as a bound parameter
instead of string-formatted in the SQL (i.e. replace the formatted "SELECT *
FROM parquet_scan('{}')" usage with a parameter placeholder and pass
database_path as the parameter to execute). Ensure you modify the code that
constructs the SQL (the parquet_scan invocation inside the CREATE VIEW
statement) and run the query using the DB API's parameter binding mechanism,
then test with paths containing quotes and special chars to confirm correct
behavior.

Comment on lines +116 to +117
for sample, factor in med_map.items():
assert factor > 0
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

fd test_peptide_normalize.py

Repository: bigbio/mokume

Length of output: 88


🏁 Script executed:

#!/bin/bash
# Check if the file exists and show lines around 116-117, 152-153, 161-162
if [ -f tests/test_peptide_normalize.py ]; then
  echo "=== Lines 110-125 (around 116-117) ==="
  sed -n '110,125p' tests/test_peptide_normalize.py | cat -n
  echo ""
  echo "=== Lines 145-160 (around 152-153) ==="
  sed -n '145,160p' tests/test_peptide_normalize.py | cat -n
  echo ""
  echo "=== Lines 155-170 (around 161-162) ==="
  sed -n '155,170p' tests/test_peptide_normalize.py | cat -n
else
  echo "File not found"
fi

Repository: bigbio/mokume

Length of output: 2494


🏁 Script executed:

# Search for usage of variables 'sample', 'condition', 'samples' in the context of the flagged lines
cat tests/test_peptide_normalize.py | head -165 | tail -60

Repository: bigbio/mokume

Length of output: 2495


Remove unused variables that trigger linter warnings.

Three locations have loop or assignment variables that are never referenced:

  • Line 116: sample parameter is unused in the for loop
  • Line 152: condition parameter is unused in the for loop
  • Line 161: samples variable is assigned but never used afterward

Replace the unused loop parameters with _ and remove the unused assignment to keep code clean.

Changes needed
@@
-        for sample, factor in med_map.items():
+        for _, factor in med_map.items():
             assert factor > 0
@@
-        for condition, samples in med_map.items():
+        for _, samples in med_map.items():
             assert isinstance(samples, dict)
@@
-        samples = feature.get_unique_samples()
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for sample, factor in med_map.items():
assert factor > 0
for _, factor in med_map.items():
assert factor > 0
🧰 Tools
🪛 Ruff (0.14.14)

116-116: Loop control variable sample not used within loop body

Rename unused sample to _sample

(B007)

🤖 Prompt for AI Agents
In `@tests/test_peptide_normalize.py` around lines 116 - 117, The linter warnings
come from unused variables in the test loops and an unused assignment; in the
loop iterating over med_map replace the unused loop variable `sample` with `_`,
similarly replace the unused `condition` loop variable with `_`, and remove the
unused `samples` assignment (or stop assigning to it) so only used variables
remain; update occurrences near the med_map loop and the two other loops that
reference `condition` and `samples` in tests/test_peptide_normalize.py (keep the
loop structure and assertions intact).

Comment on lines +198 to +214
"output": str(TESTS_DIR / "example" / "out" / "PXD020192-no-sdrf.csv"),
"skip_normalization": False,
"nmethod": "median",
"pnmethod": "globalMedian",
"log2": True,
"save_parquet": False,
}

out = Path(args["output"])
if out.exists():
out.unlink()

# Should complete without errors
peptide_normalization(**args)

# Output should exist
assert out.exists()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Create the output directory before writing to avoid FileNotFoundError.
If tests/example/out doesn’t exist in a clean checkout, these tests will fail when WriteCSVTask tries to open the file.

🛠️ Proposed fix
@@
         out = Path(args["output"])
+        out.parent.mkdir(parents=True, exist_ok=True)
         if out.exists():
             out.unlink()
@@
         out = Path(args["output"])
+        out.parent.mkdir(parents=True, exist_ok=True)
         if out.exists():
             out.unlink()
@@
         out = Path(args["output"])
+        out.parent.mkdir(parents=True, exist_ok=True)
         if out.exists():
             out.unlink()

Also applies to: 230-247, 273-290

🤖 Prompt for AI Agents
In `@tests/test_peptide_normalize.py` around lines 198 - 214, The test fails if
the output parent directory doesn't exist: before calling
peptide_normalization(**args) (where args["output"] is used and WriteCSVTask
writes the file), ensure the parent directory of Path(args["output"]) is created
(e.g., out.parent.mkdir(parents=True, exist_ok=True)) so the file can be
written; apply this same fix to the other similar test blocks in
tests/test_peptide_normalize.py that construct out and call
peptide_normalization.

@ypriverol ypriverol merged commit 2cb9f40 into main Jan 28, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants