Skip to content

Conversation

nathaniel-d-ef
Copy link
Contributor

@nathaniel-d-ef nathaniel-d-ef commented Sep 17, 2025

Which issue does this PR close?

Rationale for this change

This introduces writer-side fingerprint prefix support, removing the existing hard-coded Rabin approach with a configurable pattern extending off of the work done on the reader side. In addition to supporting the SHA256 and MD5 (feature flagged), we also cover compatibility with Confluent's wire format IDs.

What changes are included in this PR?

  • Replaced fixed Rabin fingerprinting with support for configurable FingerprintAlgorithm in schema and writer.
  • Removed deprecated methods and unnecessary variable assignments for single-object encoding.
  • Simplified prefix generation logic and encoding workflows.
  • Updated benchmarks and added unit tests to validate updated fingerprinting strategies.

Are these changes tested?

Yes, existing tests are all passing, and tests have been added to validate the prefix outputs. Benchmark results show no appreciable changes.

Are there any user-facing changes?

  • Crate is not yet public
  • Confluent users are expected to provide the schema store ID when registering a WriterBuilder

@github-actions github-actions bot added arrow Changes to the arrow crate arrow-avro arrow-avro crate labels Sep 17, 2025
…iter

- Introduced `FingerprintStrategy` enum to customize fingerprinting methods, including Rabin, ConfluentSchemaId, MD5, and SHA256.
- Updated stream writer to handle per-record prefix generation based on the selected strategy.
- Added related unit tests for configurable fingerprint strategies.
Copy link
Contributor

@jecsand838 jecsand838 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nathaniel-d-ef Thanks for getting this draft up! It looks really good overall, I just had a few suggestions.

@nathaniel-d-ef
Copy link
Contributor Author

Thank you for the thorough review and recommendations @jecsand838 - I've refined the approach and this is now in a much better place.

@nathaniel-d-ef nathaniel-d-ef marked this pull request as ready for review September 18, 2025 19:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate arrow-avro arrow-avro crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants