Add ingestion_agent Spark source for the ingestion agent's read API by yyoli-db · Pull Request #184 · databrickslabs/lakeflow-community-connectors

yyoli-db · 2026-05-16T07:22:54Z

Introduces a new ingestion_agent Python Data Source format that dispatches on an operation option, returning DataFrames for list_objects, preview_table, get_object_metadata, validate_connection, and list_operations — the read-side surface specified in experimental/yong-li/docs/ingestion-agent-api-design.md.

Defaults derive from the existing LakeflowConnect surface, so every existing connector gains the agent API automatically. Connectors that need richer responses (hierarchical listing, source-native sampling, extra metadata, custom prefixed operations) implement the optional SupportsIngestionAgent mixin. The lakeflow_connect format is unchanged; ingestion_agent is registered alongside it.

Co-authored-by: Isaac

Introduces a new `ingestion_agent` Python Data Source format that dispatches on an `operation` option, returning DataFrames for list_objects, preview_table, get_object_metadata, validate_connection, and list_operations — the read-side surface specified in `experimental/yong-li/docs/ingestion-agent-api-design.md`. Defaults derive from the existing LakeflowConnect surface, so every existing connector gains the agent API automatically. Connectors that need richer responses (hierarchical listing, source-native sampling, extra metadata, custom prefixed operations) implement the optional `SupportsIngestionAgent` mixin. The `lakeflow_connect` format is unchanged; `ingestion_agent` is registered alongside it. Co-authored-by: Isaac

Refactor source-specific operations into AgentOperation classes, so adding a new op is a class + one entry on a source's agent_operations() map — no framework edits required. Mirrors the Scala AgentOperation design in databricks-eng/universe#1935135. - Add AgentOperation ABC (name, description, kind, schema / resolve_schema, pull) to the interface module. - Reframe the five built-ins as AgentOperation subclasses that consult the existing SupportsIngestionAgent per-method hooks before falling back to LakeflowConnect-derived defaults. - Source-defined operations override built-ins with the same name; list_operations reports both built-ins and source plug-ins. - Framework auto-appends _meta and converts pull() exceptions to an error row for kind=metadata; data-kind ops pass through with the source's natural schema. - Relax data-column nullability for metadata-kind ops so the framework's _meta-only error row validates against the schema. Co-authored-by: Isaac

Lower the cost of adding a source-specific ingestion-agent operation to a single decorated method. Compared to the AgentOperation subclass + agent_operations() map entry, the decorator path: - drops the subclass wrapper — name/description/schema/kind go on the decorator, the method body *is* the pull; - removes the agent_operations() override — the default implementation walks the class for @agent_operation-decorated attributes and wraps each one in an AgentOperation; - still composes with the class-based path: override agent_operations() and call super() to mix in stateful subclass operations alongside decorated ones. Class-based AgentOperation is kept for ops with state or dynamic schemas; the framework dispatches both paths identically. Co-authored-by: Isaac

yyoli-db added 3 commits May 16, 2026 07:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ingestion_agent Spark source for the ingestion agent's read API#184

Add ingestion_agent Spark source for the ingestion agent's read API#184
yyoli-db wants to merge 3 commits into
masterfrom
yong-li_data/ingestion-agent-api

yyoli-db commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yyoli-db commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant