-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Open
Description
Is your feature request related to a problem or challenge?
Tracking ticket for next release, also a place to track desired inclusions
Previous release will be #16235 (July, 2025) so next major release would be late Aug / early Sep 2025
Steps:
- Create a
branch-50
branch: - Create blog post:
- Update version and changelog:
- Test with DataFusion Python: Prepare for DF50 datafusion-python#1231
- Test with DataFusion Comet chore: prepare for DataFusion 50 datafusion-comet#2286
- Test with delta.rs: feat: update to DataFusion 50, pyo3 24, pyo3-arrow 0.11 delta-io/delta-rs#3749
- Test vortex: Bump DataFusion to 50 and arrow to 56 vortex-data/vortex#4577
- Test with iceberg-rust:
- Test with LakeSail:
- Write upgrade guide
- Test with parquet viewer
- Test with datafusion-materialized-views
- Voting Thread:
- Create ticket for next release:
TODOs
Prior release tickets:
Related
- NYC meetup: DISCUSSION: DataFusion Meetup in New York, NY, USA - Sep 15, 2025 #16265
- Boston meetup: DISCUSSION: DataFusion Meetup in Boston, USA - Nov 12, 2025 #16703
Changes to add to upgrade guide
Features to mention in the blog (if they make it)
- [EPIC] A collection of tickets for improving sorting larger than memory datasets / spilling sorts #15271
- feat: change Expr OuterReferenceColumn and Alias to Box type for reducing expr struct size #16771
- A complete solution for stable and safe sort with spill #14692 / feat: add multi level merge sort that will always fit in memory #15700 from @rluvaton
- [Epic] Enable parquet metadata cache by default #17000 @nuno-faria
- Add
SessionConfig
reference toScalarFunctionArgs
#13519 @Omega359 - feat: implement QUALIFY clause #16933 from @haohuaijin
- Add a fine grain memory usage tracking / break down of memory usage *within* each operator #16904 from @kosiew / @rluvaton
- Support
WHERE
,ORDER BY
,LIMIT
,SELECT
,EXTEND
pipe operators #17278 from @simonvandel - Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc) #7955 from @adriangb
Performance
- Rewrite Nested Loop Join executor for 5× speed and 1% memory usage #16996 from @2010YOUY01
- Push Dynamic Join Predicates into Scan ("Sideways Information Passing", etc) #7955 from @adriangb
- Only update TopK dynamic filters if the new ones are more selective #16433 from @adriangb
Bugs that need to be fixed
- Regression: projection pushdown doesn't work as expected in DF50 #17513
- Nested Loop Join: Performance Regression in DataFusion 50 for Suboptimal Join Orderings #17488
- Shared
DynamicFilterPhysicalExpr
causes recursive queries to fail #16998 -
CooperativeExec
incorrectly implementsmaintains_input_order
#16994 - [Bug] Aggregate + TopK fails when asc = false #16837
- Dynamic Filter Pushdown causes JOIN to return incorrect results #17188
- regression: inlist deserialization error #17225
Bugs that would be good to fix / investigate
- application of simple optimizer rule produces incorrect results (DF 49 regression) #17510
- to_timestamp(double) gives different results depending on scalar/vectorized call context #16678
- Exponential planning time when window function is partitioned by multiple columns #17401
- Streaming Aggregate operator not being used in deduplication of pre-sorted Parquet files #16919
- Different result of double to timestamp(9) cast when source value is constant #16636
- Different result of decimal to timestamp cast when source value is constant #16531
- Physical plan pushdown for volatile predicates #16545
Community Wishlist
AdamGS