Skip to content

Conversation

@bscisel
Copy link

@bscisel bscisel commented May 30, 2025

No description provided.

bscisel added 6 commits May 20, 2025 19:22
- Updated `quality_stats.py` to support FASTQ file format in base sequence quality function.
- Refactored the scanning and frame functions to use new UDAF for base sequence quality calculations.
- Implemented a new UDAF in `udaf.rs` to compute quality scores statistics, including average, median, and quartiles.
- Modified `context.rs` to register the new UDAF for use in SQL queries.
- Adjusted `operation.rs` to execute the new SQL query for base sequence quality analysis.
- Added deregistration functionality for tables in `scan.rs`.
- Ensured compatibility with FASTQ format in input handling.
- Updated `base_sequence_quality` function to accept a quality scores column and output type.
- Introduced `BaseSequenceQualityProvider` and `BaseSequenceQualityExec` in Rust for efficient execution plans.
- Removed the custom UDAF for quality scores and replaced it with a DataFusion table provider.
- Simplified data handling by directly using DataFrames from DataFusion.
- Cleaned up unnecessary code and files related to UDAF implementation.
- Enhanced error handling and type checking for input data.
- Added `SequenceQualityHistogramProvider` and `SequenceQualityHistogramExec` to compute quality histograms from sequence data.
- Introduced `QuantileStatsTableProvider` and `QuantileStatsExec` for calculating quantile statistics based on histogram data.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant