Add single precision hints to mapping and splitting stages#669
Add single precision hints to mapping and splitting stages#669olivialynn wants to merge 8 commits intomainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #669 +/- ##
==========================================
+ Coverage 93.72% 93.78% +0.06%
==========================================
Files 33 33
Lines 2039 2060 +21
==========================================
+ Hits 1911 1932 +21
Misses 128 128 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
This pull request adds warning functionality to detect when single-precision (float32) coordinates are used for ra/dec columns during the mapping and splitting stages of catalog import. This addresses issue #653, which requested better hints when coordinate precision issues lead to incorrect pixel assignments.
Changes:
- Added
_warn_if_not_double_precision_columnsfunction to check coordinate column precision for both pandas and pyarrow data types - Refactored pixel mapping logic into
_map_chunk_to_pixelshelper function - Integrated precision checking into
_iterate_input_fileto warn on first data chunk - Added unit tests for warning behavior in both map and split stages
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| src/hats_import/catalog/map_reduce.py | Added precision checking functions, refactored mapping logic, and integrated warnings into the iteration pipeline |
| src/hats_import/catalog/file_readers/input_reader.py | Updated docstring to clarify that read method can return both pandas DataFrames and pyarrow Tables |
| tests/hats_import/catalog/test_map_reduce.py | Added tests for single precision warnings in both mapping and splitting stages |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| first_iteration = False | ||
| for chunk_number, data in enumerate(file_reader.read(input_file, read_columns=read_columns)): | ||
| if use_healpix_29: | ||
| if isinstance(data, pd.DataFrame) and data.index.name == SPATIAL_INDEX_COLUMN: | ||
| mapped_pixels = spatial_index_to_healpix(data.index, target_order=highest_order) | ||
| else: | ||
| mapped_pixels = spatial_index_to_healpix( | ||
| data[SPATIAL_INDEX_COLUMN], target_order=highest_order | ||
| ) | ||
| else: | ||
| # Set up the pixel data | ||
| if isinstance(data, pd.DataFrame): | ||
| mapped_pixels = hp.radec2pix( | ||
| highest_order, | ||
| data[ra_column].to_numpy(copy=False, dtype=float), | ||
| data[dec_column].to_numpy(copy=False, dtype=float), | ||
| ) | ||
| else: | ||
| mapped_pixels = hp.radec2pix( | ||
| highest_order, | ||
| data[ra_column].to_numpy(), | ||
| data[dec_column].to_numpy(), | ||
| ) | ||
| if not first_iteration and not use_healpix_29: | ||
| # Only check for single-precision warning on the first chunk to | ||
| # avoid redundant warnings in large files. | ||
| _warn_if_not_double_precision_columns(data, [ra_column, dec_column]) | ||
| first_iteration = True |
There was a problem hiding this comment.
The variable first_iteration is initialized to False on line 113, but the condition on line 115 checks if not first_iteration. This means the condition will be True only on iterations after the first one (when first_iteration becomes True), which is the opposite of the intended behavior. The warning will never be raised on the first chunk as intended by the comment.
To fix this, either:
- Initialize
first_iteration = Trueand keep the condition asif not first_iteration and not use_healpix_29: - Or, change the condition to
if first_iteration and not use_healpix_29:and set it toFalseafter the check
There was a problem hiding this comment.
Hm. Almost, but this does bring my attention to the fact that I renamed this loop guard from the ai-suggested warned variable name, and now the current truth values don't really reflect the new name.
I'm switching this to start as True, then get switch to False. Actual functionality will remain unchanged.
Change Description
Closes #653
Solution Description
Code Quality