Releases: apache/datafusion-comet
Releases · apache/datafusion-comet
0.2.0
DataFusion Comet 0.2.0 Changelog
This release consists of 87 commits from 14 contributors. See credits at the end of this changelog for more information.
Fixed bugs:
- fix: dictionary decimal vector optimization #705 (kazuyukitanimura)
- fix: Unsupported window expression should fall back to Spark #710 (viirya)
- fix: ReusedExchangeExec can be child operator of CometBroadcastExchangeExec #713 (viirya)
- fix: Fallback to Spark for window expression with range frame #719 (viirya)
- fix: Remove
skip.surefire.tests
mvn property #739 (wForget) - fix: subquery execution under CometTakeOrderedAndProjectExec should not fail #748 (viirya)
- fix: skip negative scale checks for creating decimals #723 (kazuyukitanimura)
- fix: Fallback to Spark for unsupported partitioning #759 (viirya)
- fix: Unsupported types for SinglePartition should fallback to Spark #765 (viirya)
- fix: unwrap dictionaries in CreateNamedStruct #754 (andygrove)
- fix: Fallback to Spark for unsupported input besides ordering #768 (viirya)
- fix: Native window operator should be CometUnaryExec #774 (viirya)
- fix: Fallback to Spark when shuffling on struct with duplicate field name #776 (viirya)
- fix: withInfo was overwriting information in some cases #780 (andygrove)
- fix: Improve support for nested structs #800 (eejbyfeldt)
- fix: Sort on single struct should fallback to Spark #811 (viirya)
- fix: Check sort order of SortExec instead of child output #821 (viirya)
- fix: Fix panic in
avg
aggregate and disablestddev
by default #819 (andygrove) - fix: Supported nested types in HashJoin #735 (eejbyfeldt)
Performance related:
- perf: Improve performance of CASE .. WHEN expressions #703 (andygrove)
- perf: Optimize IfExpr by delegating to CaseExpr #681 (andygrove)
- fix: optimize isNullAt #732 (kazuyukitanimura)
- perf: decimal decode improvements #727 (parthchandra)
- fix: Remove castting on decimals with a small precision to decimal256 #741 (kazuyukitanimura)
- fix: optimize some bit functions #718 (kazuyukitanimura)
- fix: Optimize getDecimal for small precision #758 (kazuyukitanimura)
- perf: add metrics to CopyExec and ScanExec #778 (andygrove)
- fix: Optimize decimal creation macros #764 (kazuyukitanimura)
- perf: Improve count aggregate performance #784 (andygrove)
- fix: Optimize read_side_padding #772 (kazuyukitanimura)
- perf: Remove some redundant copying of batches #816 (andygrove)
- perf: Remove redundant copying of batches after FilterExec #835 (andygrove)
- fix: Optimize CheckOverflow #852 (kazuyukitanimura)
- perf: Add benchmarks for Spark Scan + Comet Exec #863 (andygrove)
Implemented enhancements:
- feat: Add support for time-zone, 3 & 5 digit years: Cast from string to timestamp. #704 (akhilss99)
- feat: Support count AggregateUDF for window function #736 (huaxingao)
- feat: Implement basic version of RLIKE #734 (andygrove)
- feat: show executed native plan with metrics when in debug mode #746 (andygrove)
- feat: Add GetStructField expression #731 (Kimahriman)
- feat: Add config to enable native upper and lower string conversion #767 (andygrove)
- feat: Improve native explain #795 (andygrove)
- feat: Add support for null literal with struct type #797 (eejbyfeldt)
- feat: Optimze CreateNamedStruct preserve dictionaries #789 (eejbyfeldt)
- feat:
CreateArray
support #793 (Kimahriman) - feat: Add native thread configs #828 (viirya)
- feat: Add specific configs for converting Spark Parquet and JSON data to Arrow #832 (andygrove)
- feat: Support sum in window function #802 (huaxingao)
- feat: Simplify configs for enabling/disabling operators #855 (andygrove)
- feat: Enable
clippy::clone_on_ref_ptr
onproto
andspark_exprs
crates #859 (comphead) - feat: Enable
clippy::clone_on_ref_ptr
oncore
crate #860 (comphead) - feat: Use CometPlugin as main entrypoint #853 (andygrove)
Documentation updates:
- doc: Update outdated spark.comet.columnar.shuffle.enabled configuration doc #738 (wForget)
- docs: Add explicit configs for enabling operators #801 (andygrove)
- doc: Document CometPlugin to start Comet in cluster mode #836 (comphead)
Other:
- chore: Make rust clippy happy #701 (Xuanwo)
- chore: Update version to 0.2.0 and add 0.1.0 changelog #696 (andygrove)
- chore: Use rust-toolchain.toml for better toolchain support #699 (Xuanwo)
- chore(native): Make sure all targets in workspace been covered by clippy #702 (Xuanwo)
- Apache DataFusion Comet Logo #697 (aocsa)
- chore: Add logo to rat exclude list #709 (andygrove)
- chore: Use new logo in README and website #724 (andygrove)
- build: Add Comet logo files into exclude list #726 (viirya)
- chore: Remove TPC-DS benchmark results #728 (andygrove)
- chore: make Cast's logic reusable for other projects #716 (Blizzara)
- chore: move scalar_funcs into spark-expr #712 (Blizzara)
- chore: Bump DataFusion to rev 35c2e7e #740 (andygrove)
- chore: add more aggregate functions to benchmark test #706 (huaxingao)
- chore: Add criterion benchmark for decimal_div #743 (andygrove)
- build: Re-enable TPCDS q72 for broadcast and hash join configs #781 (viirya)
- chore: bump DataFusion to rev f4e519f #783 (huaxingao)
- chore: Upgrade to DataFusion rev bddb641 and disable "skip partial aggregates" feature [#788](https://github.com/apa...
0.1.0
DataFusion Comet 0.1.0 Changelog
This release consists of 343 commits from 41 contributors. See credits at the end of this changelog for more information.
Implemented enhancements:
- feat: Add native shuffle and columnar shuffle #30 (viirya)
- feat: Support Emit::First for SumDecimalGroupsAccumulator #47 (viirya)
- feat: Nested map support for columnar shuffle #51 (viirya)
- feat: Support Count(Distinct) and similar aggregation functions #42 (huaxingao)
- feat: Upgrade to
jni-rs
0.21 #50 (sunchao) - feat: Handle exception thrown from native side #61 (sunchao)
- feat: Support InSet expression in Comet #59 (viirya)
- feat: Add
CometNativeException
for exceptions thrown from the native side #62 (sunchao) - feat: Add cause to native exception #63 (viirya)
- feat: Pull based native execution #69 (viirya)
- feat: Add executeColumnarCollectIterator to CometExec to collect Comet operator result #71 (viirya)
- feat: Add CometBroadcastExchangeExec to support broadcasting the result of Comet native operator #80 (viirya)
- feat: Reduce memory consumption when writing sorted shuffle files #82 (sunchao)
- feat: Add struct/map as unsupported map key/value for columnar shuffle #84 (viirya)
- feat: Support multiple input sources for CometNativeExec #87 (viirya)
- feat: Date and timestamp trunc with format array #94 (parthchandra)
- feat: Support
First
/Last
aggregate functions #97 (huaxingao) - feat: Add support of TakeOrderedAndProjectExec in Comet #88 (viirya)
- feat: Support Binary in shuffle writer #106 (advancedxy)
- feat: Add license header by spotless:apply automatically #110 (advancedxy)
- feat: Add dictionary binary to shuffle writer #111 (viirya)
- feat: Minimize number of connections used by parallel reader #126 (parthchandra)
- feat: Support CollectLimit operator #100 (advancedxy)
- feat: Enable min/max for boolean type #165 (huaxingao)
- feat: Introduce
CometTaskMemoryManager
and native side memory pool #83 (sunchao) - feat: Fix old style names #201 (comphead)
- feat: enable comet shuffle manager for comet shell #204 (zuston)
- feat: Support bitwise aggregate functions #197 (huaxingao)
- feat: Support BloomFilterMightContain expr #179 (advancedxy)
- feat: Support sort merge join #178 (viirya)
- feat: Support HashJoin operator #194 (viirya)
- feat: Remove use of nightly int_roundings feature #228 (psvri)
- feat: Support Broadcast HashJoin #211 (viirya)
- feat: Enable Comet broadcast by default #213 (viirya)
- feat: Add CometRowToColumnar operator #206 (advancedxy)
- feat: Document the class path / classloader issue with the shuffle manager #256 (holdenk)
- feat: Port Datafusion Covariance to Comet #234 (huaxingao)
- feat: Add manual test to calculate spark builtin functions coverage #263 (comphead)
- feat: Support ANSI mode in CAST from String to Bool #290 (andygrove)
- feat: Add extended explain info to Comet plan #255 (parthchandra)
- feat: Improve CometSortMergeJoin statistics #304 (planga82)
- feat: Add compatibility guide #316 (andygrove)
- feat: Improve CometHashJoin statistics #309 (planga82)
- feat: Support Variance #297 (huaxingao)
- feat: Support murmur3_hash and sha2 family hash functions #226 (advancedxy)
- feat: Disable cast string to timestamp by default #337 (andygrove)
- feat: Improve CometBroadcastHashJoin statistics #339 (planga82)
- feat: Implement Spark-compatible CAST from string to integral types #307 (andygrove)
- feat: Implement Spark-compatible CAST from string to timestamp types #335 (vaibhawvipul)
- feat: Implement Spark-compatible CAST float/double to string #346 (mattharder91)
- feat: Only allow incompatible cast expressions to run in comet if a config is enabled #362 (andygrove)
- feat: Implement Spark-compatible CAST between integer types #340 (ganeshkumar269)
- feat: Supports Stddev #348 (huaxingao)
- feat: Improve cast compatibility tests and docs #379 (andygrove)
- feat: Implement Spark-compatible CAST from non-integral numeric types to integral types #399 (rohitrastogi)
- feat: Implement Spark unhex #342 (tshauck)
- feat: Enable columnar shuffle by default #250 (viirya)
- feat: Implement Spark-compatible CAST from floating-point/double to decimal #384 (vaibhawvipul)
- feat: Add logging to explain reasons for Comet not being able to run a query stage natively #397 (andygrove)
- feat: Add support for TryCast expression in Spark 3.2 and 3.3 #416 (vaibhawvipul)
- feat: Supports UUID column #395 (huaxingao)
- feat: correlation support #456 (huaxingao)
- feat: Implement Spark-compatible CAST from String to Date #383 (vidyasankarv)
- feat: Add COMET_SHUFFLE_MODE config to control Comet shuffle mode #460 (viirya)
- feat: Add random row generator in data generator #451 (advancedxy)
- feat: Add xxhash64 function support #424 (advancedxy)
- feat: add hex scalar function #449 (tshauck)
- feat: Add "Comet Fuzz" fuzz-testing utility #472 (andygrove)
- feat: Use enum to represent CAST eval_mode in expr.proto #415 (prashantksharma)
- feat: Implement ANSI support for UnaryMinus #471 (vaibhawvipul)
- feat: Add specific fuzz tests for cast and try_cast and fix NPE found during fuzz testing #514 (andygrove)
- feat: Add fuzz testing for arithmetic expressions #519 (andygr...