Skip to content

Conversation

@jerrypeng
Copy link
Contributor

@jerrypeng jerrypeng commented Dec 11, 2025

What changes were proposed in this pull request?

Add RTM trigger to pyspark so that pyspark queries can run in RTM. Only stateless (without UDF) queries will be supported for now.

Also added support for spark connect since it fails a test if the method signatures do not match.

Why are the changes needed?

To support running RTM queries in pyspark

Does this PR introduce any user-facing change?

Yes, add RTM trigger to pyspark

How was this patch tested?

Add a simple test. I will add more tests in a subsequent PR.

Was this patch authored or co-authored using generative AI tooling?

No

@jerrypeng jerrypeng changed the title [WIP][SPARK-54660][SS] Add RTM trigger to python [SPARK-54660][SS] Add RTM trigger to python Dec 12, 2025
except Exception as e:
# This error is expected
self._assert_exception_tree_contains_msg(
e, "STREAMING_REAL_TIME_MODE.INPUT_STREAM_NOT_SUPPORTED"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this input stream is not supported by real time mode, can we possibly test supported source in Python test?

Copy link
Member

@viirya viirya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code change looks okay. Only wondering if we can add a supported case in Python test.

@jerrypeng
Copy link
Contributor Author

@viirya thank you for the review. The issue with that is currently no RTM supported sources can be used in pyspark. There is no memory source equivalent in python and using Kafka would require us to start a kafka cluster via docker and I had limited success with at via:

#53415

In my follow up PR to this I will probably convert the socket source to support RTM and use that to test.

@viirya viirya closed this in 7009fb5 Jan 9, 2026
@viirya
Copy link
Member

viirya commented Jan 9, 2026

Merged to master. Thanks @jerrypeng

Yicong-Huang pushed a commit to Yicong-Huang/spark that referenced this pull request Jan 9, 2026
### What changes were proposed in this pull request?

Add RTM trigger to pyspark so that pyspark queries can run in RTM.  Only stateless (without UDF) queries will be supported for now.

Also added support for spark connect since it fails a test if the method signatures do not match.

### Why are the changes needed?

To support running RTM queries in pyspark

### Does this PR introduce _any_ user-facing change?

Yes, add RTM trigger to pyspark

### How was this patch tested?

Add a simple test.  I will add more tests in a subsequent PR.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#53448 from jerrypeng/SPARK-53998-3.

Authored-by: Jerry Peng <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants