Lakeflow Community Connectors

Lakeflow community connectors are built on top of the Spark Python Data Source API and Spark Declarative Pipeline (SDP). These connectors enable users to ingest data from various source systems.

Note: Lakeflow community connectors provide access to additional data sources beyond Databricks managed connectors. They are maintained by community contributors and are not subject to official Databricks SLAs, certifications, or guaranteed compatibility.

Interface and Testing

Two implementation approaches (see interface README):

LakeflowConnect (recommended) — Define methods for listing tables, schemas, and reading records. Shared libraries handle Spark PDS, streaming, and offset management automatically.

Note: All built-in AI-assisted development workflows (commands, skills, agents) apply exclusively to this approach.
Direct Python Data Source API — Full control over partitioning, schemas, and read logic; requires manual Spark API contract implementation.

Tests run against live source environments (no mocks):

Generic test suite — End-to-end validation using real credentials
Unit tests — For complex connector-specific logic
Write-back testing (recommended) — Write data, read it back, verify incremental reads and deletes

Develop a New Connector

Build connectors with AI-assisted workflows using Claude Code or Cursor. All commands and skills are defined under .claude/ and auto-discovered by both tools.

git clone https://github.com/databrickslabs/lakeflow-community-connectors.git

One-Command Agent

A single command orchestrates the entire workflow — API research through deployment:

/create-connector <source_name> [tables=t1,t2,...] [doc=<url_or_path>]

The agent pauses once to collect credentials for source authentication.

Step-by-Step Skills

For more control, run each step individually. Replace {source} with your connector name.

Step	Command
1. Research source READ APIs	`/research-source-api for {source}`
2. Collect credentials	`/authenticate-source for {source}`
3. Implement connector	`/implement-connector for {source}`
4. Run tests and fix failures	`/test-and-fix-connector for {source}`
5a. Generate documentation	`/create-connector-document for {source}`
5b. Finalize connector spec	`/generate-connector-spec for {source}`
6. Build and deploy	`/deploy-connector for {source}`

Optional: Write-Back Testing — Run between steps 4 and 5. Skip for read-only sources or when writes are expensive/risky.

Step	Command
Research write APIs	`/research-write-api-of-source for {source}`
Implement write-back tests	`/write-back-testing for {source}`

Deploy and Run

Each connector runs as a configurable SDP pipeline. Define a pipeline spec to configure tables and destinations.

Databricks UI — Click "+New" > "Add or upload data" > select the source under "Community connectors". For custom connectors from your own repo, select "+ Add Community Connector".
CLI tool — Run /deploy-connector in Cursor or Claude Code for guided deployment, or use the CLI directly. See tools/community_connector.

Project Structure

lakeflow-community-connectors/
|
|___ src/databricks/labs/community_connector/   # Core modules
|       |___ interface/          # The interface each source connector needs to implement 
|       |___ sources/            # Source connectors
|       |       |___ github/     
|       |       |___ zendesk/
|       |       |___ stripe/
|       |       |___ ...         # Each connector: python code, docs, spec and etc. 
|       |___ sparkpds/           # PySpark Data Source implementation and registry
|       |___ libs/               # Shared utilities (spec parsing, data types, module loading)
|       |___ pipeline/           # SDP ingestion orchestration
|
|___ tests/                      # Test suites
|       |___ unit/
|               |___ sources/    # Per-connector tests + generic test harness
|               |___ libs/       # Shared library tests
|               |___ pipeline/   # Pipeline tests
|
|___ tools/                      # Build and deployment tooling
|       |___ community_connector/  # CLI tool for workspace setup and deployment
|       |___ scripts/              # Build scripts (e.g., merge_python_source.py)
|
|___ templates/                  # Templates and guide for AI-assisted development
|
|___ .claude/                    # AI-assisted development (auto-discovered by Claude Code and Cursor)
        |___ skills/             # Skill files for each workflow step
        |___ agents/             # Subagents for different development phases
        |___ commands/           # Slash commands (e.g., /create-connector)

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
.claude		.claude
.cursor		.cursor
.github/workflows		.github/workflows
pipeline-spec		pipeline-spec
src/databricks		src/databricks
templates		templates
tests		tests
tools		tools
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lakeflow Community Connectors

Interface and Testing

Develop a New Connector

One-Command Agent

Step-by-Step Skills

Deploy and Run

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Lakeflow Community Connectors

Interface and Testing

Develop a New Connector

One-Command Agent

Step-by-Step Skills

Deploy and Run

Project Structure

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages