Replies: 3 comments
-
The confluent table description provider already looks at the schema version encoded in the message to pick the right schema to use. The problem however is that if the query touches messages which have 3 different schemas then there needs to be some common "super" schema to represent those messages as a table. This is a case of mismatch between relational model and Kafka where a single topic can contain disparate messages. |
Beta Was this translation helpful? Give feedback.
-
This would be useful to solve in general but the solution you propose with users specifying the version of the schema to use on read isn't widely useful since in most cases the version to use will change based on what range of data is being queried. The better fix would be to have some component which can derive a common type for a "table" and cast/convert older messages when reading them to this common type. |
Beta Was this translation helpful? Give feedback.
-
Hm, true. It works in our case because all schema changes are full transitive, so any schema version can read any message. Even if fixing a schema can make reading certain messages impossible it enables us to build ingestion pipelines based on the kafka which are not breaking on upstream changes. |
Beta Was this translation helpful? Give feedback.
-
We are using confluent schema registry and teams are allowed to make fully transitive schema changes without much alignment. The assumption is that consumers are generally using a fixed schema version and can then switch to the new version later on their own terms.
We are using Trino to read from Kafka and then write to Iceberg tables. Even fully transitive schema changes change the type of the data object received and lead to syntax errors.
I am adding this in our fork of the kafka-connector by allowing the value-subject paramter that can be passed when selecting data from kafka to be of the form value-subject=subject:version where the version is optional.
This will allow us to fix version and migrate to higher versions on our own terms after they have been released.
I would like to get some opinions on that. Is it worth contributing back? Is the subject:version notation reasonable or would you have chosen something different?
Beta Was this translation helpful? Give feedback.
All reactions