You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-53490][CONNECT][SQL] Fix Protobuf conversion in observed metrics
### What changes were proposed in this pull request?
This PR fixes a critical issue in the protobuf conversion of observed metrics in Spark Connect, specifically when dealing with complex data types like structs, arrays, and maps. The main changes include:
1. **Modified Observation class to store Row objects instead of Map[String, Any]**: Changed the internal promise type from `Promise[Map[String, Any]]` to `Promise[Row]` to preserve type information during protobuf serialization/deserialization.
2. **Enhanced protobuf conversion for complex types**:
- Added proper handling for struct types by creating `GenericRowWithSchema` objects instead of tuples
- Added support for map type conversion in `LiteralValueProtoConverter`
- Improved data type inference with a new `getDataType` method that properly handles all literal types
3. **Fixed observed metrics**: Modified the observed metrics processing to include data type information in the protobuf conversion, ensuring that complex types are properly serialized and deserialized.
### Why are the changes needed?
The previous implementation had several issues:
1. **Data type loss**: Observed metrics were losing their original data types during Protobuf conversion, causing errors
2. **Struct handling problems**: The conversion logic didn't properly handle Row objects and struct types
### Does this PR introduce _any_ user-facing change?
Yes, this PR fixes a bug that was preventing users from successfully using observed metrics with complex data types (structs, arrays, maps) in Spark Connect. Users can now:
- Use `struct()` expressions in observed metrics and receive properly typed `GenericRowWithSchema` objects
- Use `array()` expressions in observed metrics and receive properly typed arrays
- Use `map()` expressions in observed metrics and receive properly typed maps
Previously, the code below would fail.
```scala
val observation = Observation("struct")
spark
.range(10)
.observe(observation, struct(count(lit(1)).as("rows"), max("id").as("maxid")).as("struct"))
.collect()
observation.get
// Below is the error message:
"""
org.apache.spark.SparkUnsupportedOperationException: literal [10,9] not supported (yet).
org.apache.spark.sql.connect.common.LiteralValueProtoConverter$.toLiteralProtoBuilder(LiteralValueProtoConverter.scala:104)
org.apache.spark.sql.connect.common.LiteralValueProtoConverter$.toLiteralProto(LiteralValueProtoConverter.scala:203)
org.apache.spark.sql.connect.execution.SparkConnectPlanExecution$.$anonfun$createObservedMetricsResponse$2(SparkConnectPlanExecution.scala:571)
org.apache.spark.sql.connect.execution.SparkConnectPlanExecution$.$anonfun$createObservedMetricsResponse$2$adapted(SparkConnectPlanExecution.scala:570)
"""
```
### How was this patch tested?
`build/sbt "connect-client-jvm/testOnly *ClientE2ETestSuite -- -z SPARK-53490"`
`build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"`
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 1.5.9
Closes#52236 from heyihong/SPARK-53490.
Authored-by: Yihong He <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
0 commit comments