Skip to content

mavlink2-bridge: nullable StructArray on a *_cmd input panics the bridge node (bypasses the "reject, don't panic" decode contract) #2298

@phil-opp

Description

@phil-opp

Summary

A node publishing an Arrow StructArray that carries top-level nulls (a validity/null buffer with null_count > 0) to any of the bridge's command inputs (heartbeat_cmd, command_long_cmd, set_mode_cmd, rc_channels_override_cmd, set_position_target_global_int_cmd, set_position_target_local_ned_cmd) panics and crashes the bridge node, instead of being rejected with a BridgeError.

This directly contradicts the documented design contract for these inputs. From libraries/extensions/mavlink2-bridge/tests/arrow_malformed.rs:1-6:

command_long_cmd and the other writer inputs are public dora inputs — any node in the dataflow can publish to them. The decode path must reject malformed batches with BridgeError, not panic, otherwise the whole bridge node aborts on a single bad message.

and from arrow_convert.rs:58-74, which lists the malformed cases the decoder is supposed to defend against (wrong dtype, zero-row, multi-row, null-at-row-0).

Root cause

The real input path runs RecordBatch::from(&StructArray) before the careful per-column validation in from_record_batch ever executes.

binaries/mavlink2-bridge-node/src/main.rs:322-331:

fn decode_input<T: MavlinkArrow>(data: &ArrayRef, input: &str) -> Result<T> {
    let struct_array = data.as_struct_opt().ok_or_else(|| {
        eyre!(
            "expected StructArray for input '{input}', got {:?}",
            data.data_type()
        )
    })?;
    let batch = RecordBatch::from(struct_array);   // <-- panics on a nullable struct
    T::from_record_batch(&batch).with_context(|| format!("decoding {input} from incoming Arrow"))
}

In arrow-array 58.3.0 (the version pinned in Cargo.lock), impl From<&StructArray> for RecordBatch delegates to the owned impl, which asserts the struct has no top-level nulls:

assert_eq!(
    nulls.map(|n| n.null_count()).unwrap_or_default(),
    0,
    "Cannot convert nullable StructArray to RecordBatch, see StructArray documentation"
);

So a StructArray with a top-level validity bitmap and a null row triggers this assert_eq! and unwinds. The conversion happens before read_primitive's is_null(0) guard (arrow_convert.rs:97-101), so the existing null-handling is never reached for this shape of input.

Impact / reachability

decode_input is called from handle_input (main.rs:336), which is invoked directly in the main event loop at main.rs:521:

if let Err(e) = handle_input(conn.as_ref(), &header, &id, &array_ref) {
    tracing::error!("writer error on input '{id}': {e:#}");
}

Only Err is handled — a panic is not an Err. It unwinds out of handle_input → out of the main-thread event loop → the node process terminates. Any node in the dataflow that publishes a nullable struct (even accidentally — e.g. an array built with a validity buffer, or a sliced/filtered/concatenated struct that retained nulls) takes the bridge down. The whole point of the malformed-input hardening was to prevent exactly this.

The existing negative tests in arrow_malformed.rs (zero-row, multi-row, null-row, wrong-type) all construct a RecordBatch directly and call from_record_batch, so they bypass the RecordBatch::from(struct_array) step and don't catch this gap.

Suggested fix

Reject a nullable top-level struct in decode_input before the conversion, so it surfaces as a normal decode error:

let struct_array = data.as_struct_opt().ok_or_else(|| { /* ... */ })?;
if struct_array.null_count() > 0 {
    bail!(
        "input '{input}' is a nullable StructArray with {} null row(s); \
         expected exactly one non-null MAVLink message",
        struct_array.null_count()
    );
}
let batch = RecordBatch::from(struct_array);

(Equivalently, guard on struct_array.nulls().is_some().) A regression test should publish a StructArray built with a validity buffer / null entry through decode_input and assert it returns Err, not panic — exercising the same path the daemon-spawned node uses, rather than calling from_record_batch directly.


This issue was filed automatically by a scheduled, automated Claude code-review routine (no human has reviewed it yet). The analysis was verified against the pinned arrow-array 0.58.3 source; please sanity-check the reachability assumptions before acting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions