Fix blacklist BED file parsing in cooler balance CLI (Fixes #196) #462
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
PR Description:
Fixes: #209
Original Issues: #196
Overview
This PR fixes issue [#196](#196) by improving how blacklist BED files are parsed in the
cooler balanceCLI. Previously, single-line BED files, files with metadata headers (track=), and empty blacklist files were not handled correctly, leading to parsing errors and crashes.The key fixes include:
bioframe.read_table.track=headers in blacklist files.np.concatenateerrors.dtype="float64"for HDF5 weights to handleNaNvalues.Changes Made
✅ Replaced custom blacklist parsing with
bioframe.read_tablecsv.Snifferassumptions.bioframe.read_tablenow ensures proper BED parsing.✅ Added header skipping for
track=metadata linestrack=, it is skipped before parsing.✅ Handled empty blacklist files (
"") gracefullynp.concatenateerrors by checking for empty results.✅ Fixed indentation issue in HDF5 weight storage
create_datasetoperations remain within thewith h5py.File(...)block.KeyErrorwhen writing to HDF5 files.✅ Explicitly set
dtype="float64"in HDF5 optionsValueError: cannot convert float NaN to integer.✅ Adjusted test parameters for better stability
test_balancing_with_blacklistparameters (tol=0.1,max-iters=1000,nproc=1)Impact
track=) without errors.bioframefor BED file handling.Closes:
Fixes #196