Skip to content

Conversation

@ShigrafS
Copy link

Fixes: #209
Original Issues: #135

Related Issues:

Overview

This pull request adds input validation for column indices in the cooler cload command to catch out-of-bounds values early and provide clear error messages. Previously, if a user specified a column index larger than the number of columns in the file, the program failed with an unhelpful ValueError: No objects to concatenate traceback. This update prevents that by checking the indices against the file’s actual column count, raising a detailed error if invalid indices are detected.

What Was Happening Before?

  • The Problem:
    When users passed column indices beyond the number of available columns (e.g., -c2 4 -p2 5 for a file with only 4 columns), pandas failed silently during concatenation, leading to a vague traceback.

  • Example Error:

    ValueError: No objects to concatenate  
    

    This error offered no guidance on the actual issue, making debugging frustrating.

What’s Changed?

This update introduces proactive validation of column indices and streamlines file handling for better error reporting and compatibility with standard input.

  1. Input Validation (validate_pairs_columns)

    • New function in cload.py to validate column indices against the actual number of columns in the input file.
    • Supports both file and stdin input: reads headers, counts columns, checks indices, and rewinds the stream for downstream use.
    • Raises a ValueError with a clear message if any index is out of bounds.
  2. Header Handling (get_header)

    • Rewritten to remove peek() dependency (which broke with StringIO during tests).
    • Uses readline() to handle headers, buffers remaining content, and returns a new stream if needed.
  3. Updated pairs Command

    • Replaced direct file/stdin handling with validate_pairs_columns, centralizing validation logic.
    • Removed redundant get_header calls and fixed potential double-read issues with stdin.
    • Added a safe kwargs = {} before pd.read_csv to prevent Click option leakage causing TypeError.

How It Works Now

  • Valid Case:

    cooler cload -c1 1 -p1 2 -c2 3 -p2 4 file.txt  
    

    ✅ Passes and proceeds with loading.

  • Invalid Column Index:

    cooler cload -c1 1 -p1 2 -c2 4 -p2 5 file.txt  
    

    ❌ Fails early with:

    ValueError: Invalid column index: 5. The input file has only 4 columns.  
    
  • Empty or Header-Only File:
    ❌ Fails with:

    ValueError: Input file is empty or contains only headers.  
    

Benefits

  • Early Error Detection: Invalid indices trigger an immediate, informative failure.
  • Better Stdin Support: Stream rewinding and StringIO improve compatibility for piped input and testing.
  • Cleaner Code: Consolidates file validation logic and removes redundant calls.

Testing

  • New Tests (7 cases in tests/test_cload.py)

    • Valid and invalid indices (file and stdin).
    • Header skipping and empty file handling.
    • Edge cases like extra fields and minimal input.
  • Coverage:

    • 69% for cload.py, with untested lines only in unrelated subcommands.
    • Ran with pytest and passed all tests:
      pytest tests/test_cload.py -v --no-cov  
      

Notes

  • Ready for review!
  • Coverage reports in htmlcov/ and coverage.xml.
  • Edge cases like mismatched field names or trailing tabs are logged for future improvements.

ShigrafS and others added 13 commits February 26, 2025 14:18
Co-authored-by: Nezar Abdennur <[email protected]>
- Added validate_pairs_columns to check column indices against file content.

- Updated get_header to use readline() instead of peek() for test compatibility.

- Modified pairs to use validate_pairs_columns stream, added kwargs = {}, removed duplicate call.
- Created 7 pytest cases in test_cload.py to test valid/invalid indices, stdin, headers, empty files, and extra fields.

- Ensures validate_pairs_columns and pairs handle file and stdin inputs correctly.
@vedatonuryilmaz vedatonuryilmaz requested a review from nvictus March 17, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants