Skip to content

Conversation

codephage2020
Copy link
Contributor

@codephage2020 codephage2020 commented Aug 27, 2025

Which issue does this PR close?

Rationale for this change

cast_to_variant will panic for values of Date64 / Timestamp that can not be converted to NaiveDate

What changes are included in this PR?

  1. add new api :
    pub fn cast_to_variant_with_options(input: &dyn Array, strict: bool) -> Result<VariantArray, ArrowError>
  • strict = true: Returns errors on conversion failures (default behavior)
  • strict = false: Returns null values for failed conversions
  1. add some tests to test non-strict mode.
  2. refactor: eliminate duplication in timestamp conversion using macro

Are these changes tested?

Yes.

Are there any user-facing changes?

no.

@github-actions github-actions bot added the parquet-variant parquet-variant* crates label Aug 27, 2025
Signed-off-by: codephage2020 <[email protected]>
@codephage2020 codephage2020 marked this pull request as ready for review August 27, 2025 15:50
Copy link
Contributor

@liamzwbao liamzwbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Left few comments

@codephage2020
Copy link
Contributor Author

Thanks for your reviews @liamzwbao .

All feedback implemented - ready for re-review! 🔍

CC @alamb .

@codephage2020 codephage2020 changed the title [Variant]add strict mode to cast_to_variant [Variant] add strict mode to cast_to_variant Aug 28, 2025
Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, couple questions

/// * `strict` - If true, return error on conversion failure. If false, insert null for failed conversions.
pub fn cast_to_variant_with_options(
input: &dyn Array,
strict: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any possibility/risk that we may need additional options in the future? If so, it might be better to pass a struct? Or maybe we just deal with that if/when it comes?

Copy link
Contributor Author

@codephage2020 codephage2020 Aug 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, very good discovery. I have thought about this implementation. We can refactor this part in the next PR.

Done.

Copy link
Contributor

@liamzwbao liamzwbao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm! it’d be good to address @scovich’s comments as well.

run_test_non_strict(
values,
vec![
None, // Invalid timestamp becomes null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just realized that for overflow value, we may need to convert it to Variant::Null instead of None like what we did for Decimal, see this comment for context. However, I don't think it should be in this PR, we could discuss and make them consistent later.

cc @alamb

fn convert_timestamp(
/// Options for controlling the behavior of `cast_to_variant_with_options`.
#[derive(Debug, Clone, PartialEq, Eq)]
pub struct CastOptions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks quite similar to https://docs.rs/arrow/latest/arrow/compute/struct.CastOptions.html

However that seems to be defined in arrow-compute

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parquet-variant parquet-variant* crates
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Variant] cast_to_variant will panic on certain Date64 or Timestamp Values values
4 participants