-
Notifications
You must be signed in to change notification settings - Fork 28.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-51419][SQL] Get hours of TIME datatype #50355
Conversation
Hi @MaxGekk Could you please review this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you fix the output of the example:
scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show()
+------------------------------+
|minute(TIME '07:01:09.123123')|
+------------------------------+
| 7|
+------------------------------+
should be hour
not minute
.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Outdated
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
Yes corrected |
Modified as per review feedback @MaxGekk , Please check it looks good |
ExpressionsSchemaSuite failled with below error
|
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/core/src/test/resources/sql-functions/sql-expression-schema.md
Outdated
Show resolved
Hide resolved
|
||
test("Hour with TIME type") { | ||
// A few test times in microseconds since midnight: | ||
// time in microseconds -> expected minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// time in microseconds -> expected minute | |
// time in microseconds -> expected hours |
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Show resolved
Hide resolved
...catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/TimeExpressionsSuite.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM except @MaxGekk 's comments.
…Builder in TimeExpressionSuite
…rExpressionBuilder" This reverts commit e24c07f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm except of a couple comments.
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Outdated
Show resolved
Hide resolved
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/timeExpressions.scala
Outdated
Show resolved
Hide resolved
The test failure is not related to the changes, I believe:
+1, LGTM. Merging to master. |
@MaxGekk @HyukjinKwon @beliefer Thank you for patiently reviewing my PR |
### What changes were proposed in this pull request? This PR adds support for extracting the hour component from TIME (TimeType) values in Spark SQL. ``` scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show() +----------------------------+ |hour(TIME '07:01:09.123123')| +----------------------------+ | 7| +----------------------------+ scala> spark.sql("SELECT hour('2009-07-30 12:58:59')").show() +-------------------------+ |hour(2009-07-30 12:58:59)| +-------------------------+ | 12| +-------------------------+ ``` ### Why are the changes needed? Spark previously supported hour() for only TIMESTAMP type values. TIME support was missing, leading to implicit casting attempt to TIMESTAMP, which was incorrect. This PR ensures that `hour(TIME'HH:MM:SS.######')` behaves correctly without unnecessary type coercion. ### Does this PR introduce _any_ user-facing change? Yes - Before this PR, calling hour(TIME'HH:MM:SS.######') resulted in a type mismatch error or an implicit cast attempt to TIMESTAMP, which was incorrect. - With this PR, hour(TIME'HH:MM:SS.######') now works correctly for TIME values without implicit casting. - Users can now extract the hour component from TIME values natively. ### How was this patch tested? By running new tests: ```$ build/sbt "test:testOnly *TimeExpressionsSuite"``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#50355 from senthh/getHour. Authored-by: senthh <[email protected]> Signed-off-by: Max Gekk <[email protected]>
Seq(child.dataType) | ||
) | ||
|
||
override def inputTypes: Seq[AbstractDataType] = Seq(TimeType()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@senthh Could you open a follow up PR and allow any valid precision of the TIME type other it fails now with the error:
spark-sql (default)> select hour(cast('12:30' as time(0)));
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "hour(CAST(12:30 AS TIME(0)))" due to data type mismatch: The first parameter requires the "TIME(6)" type, however "CAST(12:30 AS TIME(0))" has the type "TIME(0)". SQLSTATE: 42K09; line 1 pos 7;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MaxGekk Sure Sure
…sion of TIME type ### What changes were proposed in this pull request? This is followup PR of [SPARK-51419 ](#50355) ### Why are the changes needed? This Followup PR allows any precision in the range of [0,6] for the hour function ### Does this PR introduce _any_ user-facing change? Yes. This changes allows User to execute below query ``` spark.sql("select hour(cast('12:00:01.123' as time(3)))").show(false) ``` ### How was this patch tested? We tested by running sample query as below ``` spark.sql("select hour(cast('12:00:01.123' as time(3)))").show(false) ``` Output: ``` +-----------------------------------+ |hour(CAST(12:00:01.123 AS TIME(3)))| +-----------------------------------+ |12 | +-----------------------------------+ ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes #50554 from senthh/SPARK-51419_followup. Authored-by: senthh <[email protected]> Signed-off-by: Max Gekk <[email protected]>
What changes were proposed in this pull request?
This PR adds support for extracting the hour component from TIME (TimeType) values in Spark SQL.
Why are the changes needed?
Spark previously supported hour() for only TIMESTAMP type values. TIME support was missing, leading to implicit casting attempt to TIMESTAMP, which was incorrect. This PR ensures that
hour(TIME'HH:MM:SS.######')
behaves correctly without unnecessary type coercion.Does this PR introduce any user-facing change?
Yes
How was this patch tested?
By running new tests:
$ build/sbt "test:testOnly *TimeExpressionsSuite"
Was this patch authored or co-authored using generative AI tooling?
No