-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-53491][SS] Fix exponential formatting of inputRowsPerSecond and processedRowsPerSecond in progress metrics JSON #52237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…d processedRowsPerSecond in StreamProgressMetrics json
@anishshri-db could you please review this PR? Thanks! |
@@ -400,6 +400,35 @@ class StreamingQueryStatusAndProgressSuite extends StreamTest with Eventually { | |||
assert(data(0).getAs[Timestamp](0).equals(validValue)) | |||
} | |||
|
|||
test("SPARK-53491: `inputRowsPerSecond` and `processedRowsPerSecond` " + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: what is the reason to use "`"?
|
||
print(progress) | ||
|
||
assert(!(progress \ "inputRowsPerSecond").values.toString.contains("E")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: will it be better to use matchers instead of assert
?
… instead of assert
val df = inputData.toDF() | ||
val query = df.writeStream | ||
.format("memory") | ||
.queryName("TestFormatting") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: lets use a different name here ?
|
||
val progress = query.lastProgress.jsonValue | ||
|
||
(progress \ "inputRowsPerSecond").values.toString should not include "E" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this doing exactly ? maybe add some comments ?
@jayantdb - please look into CI failures here - https://github.com/jayantdb/spark/actions/runs/17472105263/job/49622983276 ? |
Can you also paste the new output for the progress metrics with your change ? |
@anishshri-db The CI pipeline is failing due to Scala linter with this message:
The reason seems to be due to the formatting in Upon running the following check, I can see 1000+ files are marked as unformatted:
For example:
I didn't touch any of these thousands of files, so I am unsure if I should do anything or not. Kindly check and advise. |
@anishshri-db , you can find the output of my code change at the comment in the JIRA: https://issues.apache.org/jira/browse/SPARK-53491 Pasting the output here as well for your reference:
|
def safeDecimalToJValue(value: Double): JValue = { | ||
if (value.isNaN || value.isInfinity) { | ||
JNothing | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: while not enforced, single line} else {
is more commonly used in Spark, AFAIK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 - can u pls fix the formatting here
/** Convert BigDecimal to JValue while handling empty or infinite values */ | ||
def safeDecimalToJValue(value: Double): JValue = { | ||
if (value.isNaN || value.isInfinity) { | ||
JNothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a corresponding test case for isNaN
?
@anishshri-db |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Would this break parsing for streaming query listeners that might be parsing these values ? cc - @HeartSaVioR to confirm |
...ore/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
Show resolved
Hide resolved
No @anishshri-db . It won't break anything. I have kept the core implementation of Accessing using something like |
…alidate accuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Could you please still update the PR description to contain the specific event before the fix vs after the fix?
@HeartSaVioR . |
Thanks! Merging to master. |
What changes were proposed in this pull request?
This PR fixes an issue where
inputRowsPerSecond
andprocessedRowsPerSecond
in streaming progress metrics JSONwere displayed in scientific notation (e.g.,
1.9871777605776876E8
). The fix uses safeDecimal
castingto ensure values are displayed in a more human-readable format.
Results
Before change
After changes
Why are the changes needed?
Improves the readability of Spark Structured Streaming progress metrics JSON.
Does this PR introduce any user-facing change?
No
How was this patch tested?
Run this Maven test:
Results:
Was this patch authored or co-authored using generative AI tooling?
No