Skip to content

feat: [datafusion-spark] Implement next_day function #16780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

petern48
Copy link

Which issue does this PR close?

Rationale for this change

See #16775

What changes are included in this PR?

Implement spark-compatible next_day function

Are these changes tested?

Yes, I added tests from all of the links in the Spark Test Files README.md

Are there any user-facing changes?

Yes, new function.

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Jul 15, 2025
@petern48 petern48 changed the title feat: Implement next_day feat: [datafusion-spark] Implement next_day function Jul 15, 2025
@petern48 petern48 marked this pull request as ready for review July 15, 2025 04:55
impl SparkNextDay {
pub fn new() -> Self {
Self {
signature: Signature::user_defined(Volatility::Immutable),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can define a specific signature here to be (Date32, Utf8/Utf8View/LargeUtf8)
After that I think the implementation can be simplified

  • No need to implement coerce_types(), there are code to handle that automatically based on the signature.
  • We can assume the signature is valid inside invoke_with_args(), so there would be no need to check invalid input (sanity checks like unreachable!() or returning internal errors for invalid input can still be applied)

@@ -23,5 +23,17 @@

## Original Query: SELECT next_day('2015-01-14', 'TU');
## PySpark 3.5.5 Result: {'next_day(2015-01-14, TU)': datetime.date(2015, 1, 20), 'typeof(next_day(2015-01-14, TU))': 'date', 'typeof(2015-01-14)': 'string', 'typeof(TU)': 'string'}
#query
#SELECT next_day('2015-01-14'::string, 'TU'::string);
query D
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend to add tests for invalid inputs:

  1. 0 or >2 inputs
  2. Each element can be either valid input, invalid input of correct type like 2015-13-32, or invalid types, and finally nulls. We want to test different combinations, to ensure for invalid inputs, the expected (and easy-to-understand) errors are returned, instead of panicking.

Also here we only checked ScalarValue() input, let's also do the tests for Array inputs.

@2010YOUY01 2010YOUY01 requested a review from Copilot July 15, 2025 06:43
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Add support for Spark’s next_day function in DataFusion by implementing the UDF and its tests, registering it in the datetime module, and adding chrono as a dependency.

  • Introduced SQLLogicTest cases for next_day
  • Implemented SparkNextDay UDF (scalar + array)
  • Registered the UDF in mod.rs and updated Cargo.toml

Reviewed Changes

Copilot reviewed 4 out of 5 changed files in this pull request and generated 1 comment.

File Description
next_day.slt Added functional tests for next_day with various inputs
next_day.rs Full implementation of next_day UDF logic
mod.rs Registered and exported next_day in datetime module
Cargo.toml Added chrono as a workspace dependency
Comments suppressed due to low confidence (3)

datafusion/spark/src/function/datetime/next_day.rs:77

  • The code only handles Date32 inputs for the date argument but the tests pass string dates. You need to add a branch to parse ScalarValue::Utf8/LargeUtf8 as ISO-8601 dates and convert them to Date32 before computing the next day.
            (ColumnarValue::Scalar(date), ColumnarValue::Scalar(day_of_week)) => {

datafusion/sqllogictest/test_files/spark/datetime/next_day.slt:32

  • Consider adding tests for edge cases such as NULL inputs and invalid weekday strings to verify null propagation and error handling behavior.
SELECT next_day('2015-07-27'::string, 'Sun'::string);

datafusion/spark/Cargo.toml:40

  • The syntax for adding a workspace dependency is incorrect. Change to chrono = { workspace = true } to match the other entries.
chrono.workspace = true

Comment on lines +219 to +221
fn spark_next_day(days: i32, day_of_week: &str) -> Option<i32> {
let date = Date32Type::to_naive_date(days);

Copy link
Preview

Copilot AI Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The spark_next_day function recomputes trim().to_uppercase() and parses the weekday for each element in an array. You could pre-normalize and parse the target Weekday once outside loops for better performance on large arrays.

Suggested change
fn spark_next_day(days: i32, day_of_week: &str) -> Option<i32> {
let date = Date32Type::to_naive_date(days);
fn spark_next_day_with_weekday(days: i32, day_of_week: Weekday) -> Option<i32> {
let date = Date32Type::to_naive_date(days);
Some(Date32Type::from_naive_date(
date + Duration::days(
(7 - date.weekday().days_since(day_of_week)) as i64,
),
))
}
fn normalize_and_parse_weekday(day_of_week: &str) -> Option<Weekday> {

Copilot uses AI. Check for mistakes.


export_functions!((
next_day,
"Returns the first date which is later than start_date and named as indicated. The function returns NULL if at least one of the input parameters is NULL. When both of the input parameters are not NULL and day_of_week is an invalid input, the function throws SparkIllegalArgumentException if spark.sql.ansi.enabled is set to true, otherwise NULL.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs to be adjusted. Rust does not have exceptions and ansi mode is not hooked up yet (might need something like #16661 for that to happen)

@petern48 petern48 marked this pull request as draft July 16, 2025 05:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spark sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[datafusion-spark] Implement Spark date function next_day
3 participants