Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-51419][SQL] Get hours of TIME datatype #50355

Closed
wants to merge 10 commits into from

Conversation

senthh
Copy link
Contributor

@senthh senthh commented Mar 23, 2025

What changes were proposed in this pull request?

This PR adds support for extracting the hour component from TIME (TimeType) values in Spark SQL.

scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show()
+----------------------------+
|hour(TIME '07:01:09.123123')|
+----------------------------+
|                           7|
+----------------------------+

scala> spark.sql("SELECT hour('2009-07-30 12:58:59')").show()
+-------------------------+
|hour(2009-07-30 12:58:59)|
+-------------------------+
|                       12|
+-------------------------+

Why are the changes needed?

Spark previously supported hour() for only TIMESTAMP type values. TIME support was missing, leading to implicit casting attempt to TIMESTAMP, which was incorrect. This PR ensures that hour(TIME'HH:MM:SS.######') behaves correctly without unnecessary type coercion.

Does this PR introduce any user-facing change?

Yes

  • Before this PR, calling hour(TIME'HH:MM:SS.######') resulted in a type mismatch error or an implicit cast attempt to TIMESTAMP, which was incorrect.
  • With this PR, hour(TIME'HH:MM:SS.######') now works correctly for TIME values without implicit casting.
  • Users can now extract the hour component from TIME values natively.

How was this patch tested?

By running new tests:

$ build/sbt "test:testOnly *TimeExpressionsSuite"

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the SQL label Mar 23, 2025
@senthh
Copy link
Contributor Author

senthh commented Mar 23, 2025

Hi @MaxGekk

Could you please review this PR?

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you fix the output of the example:

scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show()
+------------------------------+
|minute(TIME '07:01:09.123123')|
+------------------------------+
| 7|
+------------------------------+

should be hour not minute.

@senthh
Copy link
Contributor Author

senthh commented Mar 23, 2025

Could you fix the output of the example:

scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show()
+------------------------------+
|minute(TIME '07:01:09.123123')|
+------------------------------+
| 7|
+------------------------------+

should be hour not minute.

Yes corrected

@senthh senthh requested a review from MaxGekk March 23, 2025 11:44
@senthh
Copy link
Contributor Author

senthh commented Mar 23, 2025

Modified as per review feedback @MaxGekk , Please check it looks good

@senthh
Copy link
Contributor Author

senthh commented Mar 23, 2025

ExpressionsSchemaSuite failled with below error

 - Check schemas for expression examples *** FAILED *** (444 milliseconds)
[info]   "SELECT hour('20[09-07-30] 12:58:59')" did not equal "SELECT hour('20[18-02-14] 12:58:59')" SQL query did not match (ExpressionsSchemaSuite.scala:190)
[info]   Analysis:
[info]   "SELECT hour('20[09-07-30] 12:58:59')" -> "SELECT hour('20[18-02-14] 12:58:59')"

@senthh senthh requested a review from HyukjinKwon March 24, 2025 06:22

test("Hour with TIME type") {
// A few test times in microseconds since midnight:
// time in microseconds -> expected minute
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// time in microseconds -> expected minute
// time in microseconds -> expected hours

@senthh senthh requested a review from MaxGekk March 25, 2025 03:33
@MaxGekk MaxGekk changed the title [SPARK-51419][SQL] Get hour of TIME datatype [SPARK-51419][SQL] Get hours of TIME datatype Mar 25, 2025
Copy link
Contributor

@beliefer beliefer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM except @MaxGekk 's comments.

Copy link
Member

@MaxGekk MaxGekk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm except of a couple comments.

@senthh senthh requested a review from MaxGekk March 26, 2025 02:01
@MaxGekk
Copy link
Member

MaxGekk commented Mar 26, 2025

The test failure is not related to the changes, I believe:

[info] - SPARK-51097: Verify snapshot lag metrics are updated correctly with RocksDBStateStoreProvider (with changelog checkpointing) *** FAILED *** (11 seconds, 275 milliseconds)
[info]   Assert on query failed: Execute: The code passed to eventually never returned normally. Attempted 667 times over 10.010695936 seconds. Last failure message: instanceMetrics.forall(((x$2: (String, Long)) => x$2._2.==(2))) was false.

+1, LGTM. Merging to master.
Thank you, @senthh and @HyukjinKwon @beliefer for review.

@MaxGekk MaxGekk closed this in 8a50b0f Mar 26, 2025
@senthh
Copy link
Contributor Author

senthh commented Mar 26, 2025

@MaxGekk @HyukjinKwon @beliefer Thank you for patiently reviewing my PR

kazemaksOG pushed a commit to kazemaksOG/spark-custom-scheduler that referenced this pull request Mar 27, 2025
### What changes were proposed in this pull request?
This PR adds support for extracting the hour component from TIME (TimeType) values in Spark SQL.

```
scala> spark.sql("SELECT hour(TIME'07:01:09.12312321231232');").show()
+----------------------------+
|hour(TIME '07:01:09.123123')|
+----------------------------+
|                           7|
+----------------------------+

scala> spark.sql("SELECT hour('2009-07-30 12:58:59')").show()
+-------------------------+
|hour(2009-07-30 12:58:59)|
+-------------------------+
|                       12|
+-------------------------+

```

### Why are the changes needed?
Spark previously supported hour() for only TIMESTAMP type values. TIME support was missing, leading to implicit casting attempt to TIMESTAMP, which was incorrect. This PR ensures that `hour(TIME'HH:MM:SS.######')` behaves correctly without unnecessary type coercion.

### Does this PR introduce _any_ user-facing change?
Yes

- Before this PR, calling hour(TIME'HH:MM:SS.######') resulted in a type mismatch error or an implicit cast attempt to TIMESTAMP, which was incorrect.
- With this PR, hour(TIME'HH:MM:SS.######') now works correctly for TIME values without implicit casting.
- Users can now extract the hour component from TIME values natively.

### How was this patch tested?
By running new tests:

```$ build/sbt "test:testOnly *TimeExpressionsSuite"```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50355 from senthh/getHour.

Authored-by: senthh <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Seq(child.dataType)
)

override def inputTypes: Seq[AbstractDataType] = Seq(TimeType())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@senthh Could you open a follow up PR and allow any valid precision of the TIME type other it fails now with the error:

spark-sql (default)> select hour(cast('12:30' as time(0)));
[DATATYPE_MISMATCH.UNEXPECTED_INPUT_TYPE] Cannot resolve "hour(CAST(12:30 AS TIME(0)))" due to data type mismatch: The first parameter requires the "TIME(6)" type, however "CAST(12:30 AS TIME(0))" has the type "TIME(0)". SQLSTATE: 42K09; line 1 pos 7;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MaxGekk Sure Sure

MaxGekk pushed a commit that referenced this pull request Apr 11, 2025
…sion of TIME type

### What changes were proposed in this pull request?
This is followup PR of [SPARK-51419 ](#50355)

### Why are the changes needed?
This Followup PR allows any precision in the range of [0,6] for the hour function

### Does this PR introduce _any_ user-facing change?
Yes. This changes allows User to execute below query

```
spark.sql("select hour(cast('12:00:01.123' as time(3)))").show(false)
```

### How was this patch tested?
We tested by running sample query as below

```
spark.sql("select hour(cast('12:00:01.123' as time(3)))").show(false)
```

Output:

```
+-----------------------------------+
|hour(CAST(12:00:01.123 AS TIME(3)))|
+-----------------------------------+
|12                                                        |
+-----------------------------------+

```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes #50554 from senthh/SPARK-51419_followup.

Authored-by: senthh <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants