Skip to content

Add ingestion_agent Spark source for the ingestion agent's read API#184

Open
yyoli-db wants to merge 3 commits into
masterfrom
yong-li_data/ingestion-agent-api
Open

Add ingestion_agent Spark source for the ingestion agent's read API#184
yyoli-db wants to merge 3 commits into
masterfrom
yong-li_data/ingestion-agent-api

Conversation

@yyoli-db
Copy link
Copy Markdown
Collaborator

Introduces a new ingestion_agent Python Data Source format that dispatches on an operation option, returning DataFrames for list_objects, preview_table, get_object_metadata, validate_connection, and list_operations — the read-side surface specified in experimental/yong-li/docs/ingestion-agent-api-design.md.

Defaults derive from the existing LakeflowConnect surface, so every existing connector gains the agent API automatically. Connectors that need richer responses (hierarchical listing, source-native sampling, extra metadata, custom prefixed operations) implement the optional SupportsIngestionAgent mixin. The lakeflow_connect format is unchanged; ingestion_agent is registered alongside it.

Co-authored-by: Isaac

yyoli-db added 3 commits May 16, 2026 07:21
Introduces a new `ingestion_agent` Python Data Source format that
dispatches on an `operation` option, returning DataFrames for
list_objects, preview_table, get_object_metadata, validate_connection,
and list_operations — the read-side surface specified in
`experimental/yong-li/docs/ingestion-agent-api-design.md`.

Defaults derive from the existing LakeflowConnect surface, so every
existing connector gains the agent API automatically. Connectors that
need richer responses (hierarchical listing, source-native sampling,
extra metadata, custom prefixed operations) implement the optional
`SupportsIngestionAgent` mixin. The `lakeflow_connect` format is
unchanged; `ingestion_agent` is registered alongside it.

Co-authored-by: Isaac
Refactor source-specific operations into AgentOperation classes, so
adding a new op is a class + one entry on a source's
agent_operations() map — no framework edits required. Mirrors the
Scala AgentOperation design in databricks-eng/universe#1935135.

- Add AgentOperation ABC (name, description, kind, schema /
  resolve_schema, pull) to the interface module.
- Reframe the five built-ins as AgentOperation subclasses that
  consult the existing SupportsIngestionAgent per-method hooks
  before falling back to LakeflowConnect-derived defaults.
- Source-defined operations override built-ins with the same name;
  list_operations reports both built-ins and source plug-ins.
- Framework auto-appends _meta and converts pull() exceptions to
  an error row for kind=metadata; data-kind ops pass through with
  the source's natural schema.
- Relax data-column nullability for metadata-kind ops so the
  framework's _meta-only error row validates against the schema.

Co-authored-by: Isaac
Lower the cost of adding a source-specific ingestion-agent operation
to a single decorated method. Compared to the AgentOperation subclass
+ agent_operations() map entry, the decorator path:

- drops the subclass wrapper — name/description/schema/kind go on the
  decorator, the method body *is* the pull;
- removes the agent_operations() override — the default implementation
  walks the class for @agent_operation-decorated attributes and wraps
  each one in an AgentOperation;
- still composes with the class-based path: override
  agent_operations() and call super() to mix in stateful subclass
  operations alongside decorated ones.

Class-based AgentOperation is kept for ops with state or dynamic
schemas; the framework dispatches both paths identically.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant