Skip to content

[Variant] writing a VariantArray to parquet panics #8296

@alamb

Description

@alamb

Describe the bug
As part of testing integration with the parquet crate in #8133 I found that trying to write a VariantArray directly to parquet panics

To Reproduce

 // Use the VariantArrayBuilder to build a VariantArray
 let mut builder = VariantArrayBuilder::new(3);
 // row 1: {"name": "Alice"}
 let mut variant_builder = builder.variant_builder();
 variant_builder.new_object().with_field("name", "Alice").finish()?;
 variant_builder.finish();
 let array = builder.build();

// TODO support writing VariantArray directly
// at the moment it panics when trying to downcast to a struct array
let array: ArrayRef = Arc::new(array);

 // create a RecordBatch with the VariantArray
 let batch = RecordBatch::try_from_iter(vec![("data", array)])?;

 // write the RecordBatch to a Parquet file
 let file = std::fs::File::create("variant.parquet")?;
 let mut writer = ArrowWriter::try_new(file, batch.schema(), None)?;
 writer.write(&batch)?;
 writer.close()?;

This results in this panic

struct array
thread 'main' panicked at arrow-array/src/cast.rs:904:30:
struct array
stack backtrace:
   0: __rustc::rust_begin_unwind
             at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/std/src/panicking.rs:697:5
   1: core::panicking::panic_fmt
             at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/panicking.rs:75:14
   2: core::panicking::panic_display
             at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/panicking.rs:268:5
   3: core::option::expect_failed
             at /rustc/29483883eed69d5fb4db01964cdf2af4d86e9cb2/library/core/src/option.rs:2081:5
   4: core::option::Option<T>::expect
             at /Users/andrewlamb/.rustup/toolchains/1.89-aarch64-apple-darwin/lib/rustlib/src/rust/library/core/src/option.rs:960:21
   5: arrow_array::cast::AsArray::as_struct
             at /Users/andrewlamb/Software/arrow-rs/arrow-array/src/cast.rs:904:30
   6: parquet::arrow::arrow_writer::levels::LevelInfoBuilder::try_new
             at ./src/arrow/arrow_writer/levels.rs:162:35
   7: parquet::arrow::arrow_writer::levels::calculate_array_levels
             at ./src/arrow/arrow_writer/levels.rs:55:23
   8: parquet::arrow::arrow_writer::compute_leaves
             at ./src/arrow/arrow_writer/mod.rs:625:18
   9: parquet::arrow::arrow_writer::ArrowRowGroupWriter::write
             at ./src/arrow/arrow_writer/mod.rs:839:25
  10: parquet::arrow::arrow_writer::ArrowWriter<W>::write

Expected behavior
We should be able to write a VariantArray directly without such an error

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions