Skip to content

Conversation

@947132885
Copy link

Description

ray fails to merge blocks when there are both unaligned scalar and struct fields.

Minimal code to reproduce:

`
import pyarrow as pa
from ray.data._internal.arrow_ops.transform_pyarrow import concat

table1 = pa.table({
"a": [1, 2, 3],
"s": [
{"x": 7,},
{"x": 8,},
{"x": 9,},
],
})

table2 = pa.table({
"b": [4, 5, 6],
})
concat([table1, table2])
`

Fix:

create null array in _align_struct_fields instead of assertion

@947132885 947132885 requested a review from a team as a code owner November 3, 2025 02:51
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a bug in _align_struct_fields that caused a failure when processing blocks with missing scalar fields alongside struct fields. The fix correctly replaces an overly strict assertion with logic to pad missing columns with nulls. This change is correct, well-targeted, and resolves the issue described. The implementation is clean and follows standard practices for handling missing data in pyarrow tables.

@ray-gardener ray-gardener bot added data Ray Data-related issues community-contribution Contributed by the community labels Nov 3, 2025
@goutamvenkat-anyscale
Copy link
Contributor

Thanks @947132885 for your contribution. Can you please add some tests for this change?

@947132885
Copy link
Author

ok, I added a simple test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community data Ray Data-related issues

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants