This guide covers deploying OntoBricks both locally for development and to Databricks Apps for production use, including the optional MCP server for Databricks Playground integration and instructions for deploying to a new workspace.
Deployment uses Databricks Asset Bundles (DAB) — a declarative, repeatable approach that deploys both apps in a single command from databricks.yml at the project root.
Architecture at a glance:
┌──────────────────────┐ ┌──────────────────────┐
│ ontobricks-XXX │ │ mcp-ontobricks │
│ (Databricks App) │◄─────│ (Databricks App) │
│ │ REST │ │
│ Web UI + REST API │ API │ MCP Server for │
│ Knowledge Graph │ │ Databricks │
│ │ │ Playground │
└──────────┬───────────┘ └──────────────────────┘
│
├──────────────────┐
▼ ▼
┌───────────────────────────────────┐
│ SQL Warehouse (Delta backend) │
│ + LadybugDB (embedded graph) │
└───────────────────────────────────┘
│
▼
┌───────────────┐
│ MLflow │
│ Tracking │
│ (Experiments) │
└───────────────┘
Key files:
| File | Purpose |
|---|---|
databricks.yml |
DAB bundle definition — apps, permissions, targets |
app.yaml |
Main app runtime config — command, env vars, resource declarations |
src/mcp-server/app.yaml |
MCP server runtime config |
.databricksignore |
Excludes non-runtime files from the bundle sync |
scripts/deploy.sh |
Convenience wrapper around DAB commands |
- Python 3.10 or higher
uvpackage manager (recommended) orpip- Git
- Access to a Databricks workspace
- Databricks CLI installed and authenticated
# Clone the repository
git clone <repository-url>
cd OntoBricks
# Install uv (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Create virtual environment and install dependencies
uv venv
source .venv/bin/activate
uv syncCreate a .env file from the example:
cp .env.example .envEdit .env with your values:
# Required
DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-personal-access-token
DATABRICKS_SQL_WAREHOUSE_ID=your-warehouse-id
# Optional
DATABRICKS_CATALOG=main
DATABRICKS_SCHEMA=default
# MLflow — persist agent traces to your workspace (recommended)
MLFLOW_TRACKING_URI=databricksscripts/start.sh
# Or directly: uv run python run.py
# Open http://localhost:8000uv run pytest # all tests
uv run pytest --cov=back --cov=front --cov=shared --cov=api --cov=agents --cov-report=html # with coverage
uv run pytest tests/e2e/ -v # end-to-end testsDeployment uses Databricks Asset Bundles to deploy both the main app and the MCP server declaratively from a single databricks.yml at the project root.
Naming in this guide: ontobricks-XXX stands for whatever main app name you set under resources.apps.ontobricks_dev_app in databricks.yml (replace it in every command and UI path). The MCP app name in this bundle is typically mcp-ontobricks.
| Requirement | Details |
|---|---|
| Databricks CLI | >= 0.250.0 — check with databricks -v |
| Authenticated CLI | databricks auth login --host https://<workspace> |
| SQL Warehouse | A running SQL Warehouse in the workspace |
| Apps feature | Databricks Apps must be enabled on the workspace |
| Unity Catalog | A catalog, schema, and volume for the project registry |
| UC grants for the app SP | The app runs as a service principal. See §3 Unity Catalog Permissions for the Service Principal for the exact grants required on the registry catalog/schema, the registry volume, and your source tables. |
databricks auth login --host https://<workspace>.cloud.databricks.com
# Verify
databricks current-user meEdit the workspace-specific values in app.yaml (main app) and src/mcp-server/app.yaml (MCP server):
| Variable | File | Description | How to find it |
|---|---|---|---|
DATABRICKS_SQL_WAREHOUSE_ID_DEFAULT |
app.yaml |
Fallback SQL Warehouse ID | SQL Warehouses > select warehouse > Connection details |
DATABRICKS_TRIPLESTORE_TABLE |
app.yaml |
Default triple store table | Choose or create a catalog.schema.table for triple storage |
ONTOBRICKS_URL |
src/mcp-server/app.yaml |
Main app URL | Set after first deploy — databricks apps get <main-app-name> (must match resources.apps.ontobricks_dev_app.name in your databricks.yml; this guide uses the placeholder ontobricks-XXX) |
Note — the deployed app name no longer needs an explicit
ONTOBRICKS_APP_NAMEenv var. The runtime auto-detects it from the Databricks-Apps-injectedDATABRICKS_APP_NAME(your bundle’s main appname, written asontobricks-XXXin this guide, or e.g.ontobricksin another deployment). SetONTOBRICKS_APP_NAMEonly as an explicit override (e.g. in.envfor local development).
The REGISTRY_CATALOG, REGISTRY_SCHEMA, and REGISTRY_VOLUME static variables are only needed for local development. In a deployed app the volume resource binding injects the registry path automatically.
Update the default variable values to match your workspace:
variables:
warehouse_id:
default: "<your-warehouse-id>"
registry_catalog:
default: "<your-catalog>"
registry_schema:
default: "<your-schema>"
registry_volume:
default: "registry" # volume name segment; full UC path is catalog.schema.volumeLakebase (only if you use target dev-lakebase) — the bundle also defines:
| Variable | Purpose |
|---|---|
lakebase_project |
Autoscaling project id (segment after projects/). |
lakebase_branch |
Branch id (e.g. production). |
lakebase_database_resource_segment |
Must be the db-… suffix from the Postgres API name field — not the Postgres datname / status.postgres_database string. Resolve it with databricks postgres list-databases "projects/<project>/branches/<branch>" -o json (see comments in databricks.yml for jq / python3 one-liners). |
lakebase_registry_schema |
OntoBricks registry schema inside Postgres; mirror the same value as LAKEBASE_SCHEMA in app.yaml and src/mcp-server/app.yaml. |
Do not set lakebase_database_resource_segment to the schema name ontobricks_registry unless you intentionally bind a dedicated Postgres database whose resource id is that string (rare). This repository’s Lakebase binding targets a dedicated Postgres database whose datname is ontobricks_registry, with registry tables in the schema ontobricks_registry. Older single-DB layouts used databricks_postgres as datname with the same schema name inside it — use list-databases to see which db-… row matches your bind.
Update the permissions section to grant CAN_MANAGE to the deploying user:
permissions:
- level: CAN_MANAGE
user_name: <your-email>
- level: CAN_USE
group_name: users# Volume-only target (no Lakebase postgres resource in the bundle)
databricks bundle validate -t dev
# Default sandbox with Lakebase Autoscaling postgres binding
databricks bundle validate -t dev-lakebaseThis checks the bundle configuration without deploying. Fix any errors before proceeding.
This repository’s scripts/deploy.sh defaults to dev-lakebase (Volume + Lakebase postgres binding). Use -t dev for Volume-only (no postgres resource).
# Deploy both apps (main + MCP) with Lakebase binding — recommended
databricks bundle deploy -t dev-lakebase
# Or use the convenience script (validate + deploy + start main app)
scripts/deploy.sh
# Volume-only registry backend (no Lakebase bind)
scripts/deploy.sh -t devIf the apps already exist in the workspace (e.g., from a previous manual deploy), bind bundle resources to them once (resource keys and app names must match databricks.yml):
databricks bundle deployment bind ontobricks_dev_app ontobricks-XXX -t dev-lakebase --auto-approve
databricks bundle deployment bind mcp_ontobricks_app mcp-ontobricks -t dev-lakebase --auto-approve
databricks bundle deploy -t dev-lakebaseOr run scripts/deploy.sh --bind -t dev-lakebase to bind the main app only (see script header for MCP bind if you extend the script).
Databricks does not auto-grant the app service principal USAGE on your registry schema. After the app can reach Lakebase, run scripts/bootstrap-lakebase-perms.sh as a human user that owns the schema (or has GRANT OPTION).
Script defaults use Postgres database ontobricks_registry (dedicated datname matching the bundle bind) and schema ontobricks_registry:
scripts/bootstrap-lakebase-perms.sh \
-i "<lakebase_project>" \
-d ontobricks_registry \
-s ontobricks_registry \
-a ontobricks-XXX \
-a mcp-ontobricksIf your Lakebase instance still uses the shared default database databricks_postgres with the registry schema ontobricks_registry inside it, pass -d databricks_postgres.
Defaults inside the script are aligned with databricks.yml (ontobricks-app, ontobricks_registry database, both apps); override -i / -d / -s / -a when your workspace differs. The script is idempotent.
# Start the main UI app (resource key ontobricks_dev_app)
databricks bundle run ontobricks_dev_app -t dev-lakebase
# Start the MCP server (resource key mcp_ontobricks_app)
databricks bundle run mcp_ontobricks_app -t dev-lakebasescripts/deploy.sh (without --no-run) starts only the main app (ontobricks-XXX). Start the MCP app with databricks bundle run mcp_ontobricks_app -t dev-lakebase, or use make deploy then run the MCP command separately.
After the first deployment, bind the app resources in the Databricks workspace UI:
- Go to Compute > Apps and open your main app (placeholder
ontobricks-XXX— same string asresources.apps.ontobricks_dev_app.nameindatabricks.yml). - Click Resources
- Bind
sql-warehouseto a running SQL Warehouse - Bind
volumeto the Unity Catalog Volume for the project registry (e.g.,your_catalog.your_schema.registry— matchregistry_*variables indatabricks.yml) - Repeat for
mcp-ontobricks(same warehouse and volume) - For
dev-lakebase, confirm the postgres resource is bound to the Lakebase database you configured (no extra UI step if deploy succeeded). - Verify both apps show status Running
Note: Resource bindings persist across redeployments — you only need to do this once per workspace. Once the
sql-warehouseandvolumeresources are bound, the corresponding controls in the Settings page are locked. To change them, update the resource bindings in the Apps UI and restart the app.
If the volume is empty (first deployment):
- Open the app URL
- Go to Settings > Registry
- Click Initialize to bootstrap the registry
# Check app status (names must match databricks.yml resources.apps.*.name)
databricks apps get ontobricks-XXX
databricks apps get mcp-ontobricks
# Or via bundle
databricks bundle summary -t dev-lakebaseThe convenience script wraps databricks bundle validate, databricks bundle deploy, optional bundle deployment bind, bundle run (main app only), bootstrap-app-permissions.sh, and bootstrap-lakebase-perms.sh (when target is dev-lakebase):
scripts/deploy.sh # default: -t dev-lakebase — validate + deploy + run ontobricks-XXX
scripts/deploy.sh -t dev # Volume-only target (no Lakebase postgres bind)
scripts/deploy.sh -t dev-lakebase # explicit Lakebase target (same as default)
scripts/deploy.sh --no-run # deploy artifacts without starting the app
scripts/deploy.sh --bind # bind main app resource key → existing app name, then deployThe app.yaml file controls the Databricks App runtime. Here is every variable explained:
# Command to start the app — uv resolves dependencies from pyproject.toml
command:
- "uv"
- "run"
- "python"
- "run.py"
env:
# ── SQL Warehouse ──────────────────────────────────────────────
# Injected from the sql-warehouse resource binding (configured in the Apps UI)
- name: DATABRICKS_SQL_WAREHOUSE_ID
valueFrom: sql-warehouse
# Static fallback warehouse ID for MCP / session-less API calls
# (when no resource binding is available)
- name: DATABRICKS_SQL_WAREHOUSE_ID_DEFAULT
value: "<your-warehouse-id>"
# ── Unity Catalog defaults ─────────────────────────────────────
- name: DATABRICKS_CATALOG
value: "main"
- name: DATABRICKS_SCHEMA
value: "default"
# ── Triple store fallback ──────────────────────────────────────
# Fully-qualified Delta triple store table used when no project session
# is active (e.g. MCP API calls). Format: catalog.schema.table
- name: DATABRICKS_TRIPLESTORE_TABLE
value: "<catalog>.<schema>.<table>"
# ── Project Registry ───────────────────────────────────────────
# Injected from the volume resource binding.
# The Databricks Apps runtime sets this to /Volumes/<catalog>/<schema>/<volume>.
# When present, it overrides the three static REGISTRY_* variables below.
- name: REGISTRY_VOLUME_PATH
valueFrom: volume
# Static fallbacks — used for local development and MCP when no
# volume resource is bound.
- name: REGISTRY_CATALOG
value: "<your-catalog>"
- name: REGISTRY_SCHEMA
value: "<your-schema>"
- name: REGISTRY_VOLUME
value: "OntoBricksRegistry"
# ── Permission Management ──────────────────────────────────────
# The deployed app name is auto-detected from the Databricks-Apps-
# injected DATABRICKS_APP_NAME env var (matches the resource name in
# databricks.yml automatically). Set ONTOBRICKS_APP_NAME only as an
# explicit override (e.g. in .env for local development).
# ── MLflow ─────────────────────────────────────────────────────
# Persist agent traces to the workspace tracking server
- name: MLFLOW_TRACKING_URI
value: "databricks"
# ── Resources ──────────────────────────────────────────────────
# Configure these resources in the Databricks Apps UI after deployment.
# Once bound, the corresponding Settings UI controls are locked (read-only).
resources:
- name: sql-warehouse
description: "SQL Warehouse for executing queries and metadata operations"
sql_warehouse:
permission: CAN_USE
- name: volume
description: "Unity Catalog Volume for the OntoBricks project registry"
volume:
permission: CAN_READ_WRITEWhen OntoBricks detects that it is running as a Databricks App with resource bindings, the Settings page automatically locks the affected controls:
| Resource | What is locked | How to change |
|---|---|---|
sql-warehouse |
SQL Warehouse dropdown + refresh button | Rebind the sql-warehouse resource in Compute > Apps > Resources |
volume |
Registry Change button | Rebind the volume resource in Compute > Apps > Resources |
The Initialize button remains available when the volume resource is bound but the registry has not been initialized yet (no .registry marker). This allows first-time setup without unlocking the UI.
In local development mode (no Databricks App resources), all Settings controls remain fully editable.
Databricks Apps run as a service principal (SP) created automatically when the app is deployed. The app's workspace bindings (sql-warehouse, volume) only cover the compute and the registry volume — they do not grant Unity Catalog privileges on the catalogs, schemas, or tables OntoBricks reads from and writes to. These grants are your responsibility and must be done once per workspace.
You can look up the app's SP ID with:
databricks apps get ontobricks-XXX -o json \
| python3 -c "import sys,json; d=json.load(sys.stdin); print(d.get('service_principal_client_id'))"Throughout this guide, replace ontobricks-XXX with your workspace’s actual main app name from databricks.yml.
Throughout this section that identifier is written as <app-sp> (it is a UUID / client-id, not an email).
Privilege mental model. In Unity Catalog, to reference any object you need
USE CATALOGon its catalog andUSE SCHEMAon its schema. To read table/view data you needSELECT. To create new tables or views in a schema you need schema-levelCREATE TABLE/CREATE VIEW. To modify or replace an object that already exists (includingCREATE OR REPLACEover an existing object) you must own it or holdMANAGEon it. Volumes are separate: files under/Volumes/...needREAD VOLUMEandWRITE VOLUME.ALL PRIVILEGESis a convenient shorthand during onboarding but you should narrow it down before production.
The main app's service principal performs the following operations at runtime. Every one of them needs a UC grant.
| # | Operation | Target | UC privilege required |
|---|---|---|---|
| 1 | SHOW CATALOGS, SHOW SCHEMAS, SHOW TABLES, DESCRIBE, SHOW VOLUMES, information_schema.tables lookups (Data Source picker) |
Source catalogs + registry catalog | USE CATALOG + USE SCHEMA + SELECT on browsed tables |
| 2 | SELECT on source tables referenced by R2RML sql_query entries (VIEW creation + build) |
Each source table/view | SELECT |
| 3 | CREATE OR REPLACE VIEW <registry_catalog>.<registry_schema>.triplestore_<domain>_V<n> (Digital Twin Sync) |
Registry schema | Schema CREATE VIEW. If an object with the same name already exists from a previous build, additionally MANAGE on it or SP ownership. |
| 4 | SELECT subject, predicate, object FROM <triplestore VIEW> (SPARQL + Ladybug population) |
The triplestore VIEW | SELECT (inherited by the SP as owner once it created the VIEW in step 3). |
| 5 | CREATE OR REPLACE TABLE <registry_catalog>.<registry_schema>._ob_snapshot_<domain>_v<n> AS SELECT ... (incremental mode) |
Registry schema | Schema CREATE TABLE (+ MANAGE / ownership if the snapshot already exists from a prior run). |
| 6 | SELECT / DROP TABLE IF EXISTS on the snapshot table (incremental diff and cleanup) |
Snapshot table | Inherited via ownership once the SP created it. |
| 7 | CREATE TABLE IF NOT EXISTS <table>(subject STRING, predicate STRING, object STRING) USING DELTA, DELETE FROM, INSERT INTO on the optional DATABRICKS_TRIPLESTORE_TABLE fallback (reasoning materialisation, MCP session-less calls) |
Fallback triple-store table | Schema CREATE TABLE. If the table pre-exists, MODIFY to DELETE/INSERT + SELECT. |
| 8 | File I/O under /Volumes/<registry_catalog>/<registry_schema>/<registry_volume>/ (projects, domains, history log, LadybugDB archives) |
Registry volume | READ VOLUME + WRITE VOLUME. |
| 9 | POST /api/2.1/unity-catalog/volumes — only triggered from Settings → Registry → Initialize when the volume does not yet exist |
Registry schema | Schema CREATE VOLUME (skip if you create the volume manually up front). |
All the above run through the SQL Warehouse bound to the app (sql-warehouse resource) on behalf of the app SP. The CAN_USE grant on the warehouse covers compute access; data access is controlled by UC.
Run these SQL statements once as a workspace admin (or anyone with MANAGE on the catalog/schema), substituting your bundle values from databricks.yml:
-- The catalog the registry lives in
GRANT USE CATALOG ON CATALOG `<registry_catalog>` TO `<app-sp>`;
-- The schema that holds the triplestore VIEW, snapshot tables, and the registry Volume
GRANT USE SCHEMA ON SCHEMA `<registry_catalog>`.`<registry_schema>` TO `<app-sp>`;
GRANT CREATE TABLE ON SCHEMA `<registry_catalog>`.`<registry_schema>` TO `<app-sp>`;
GRANT CREATE VIEW ON SCHEMA `<registry_catalog>`.`<registry_schema>` TO `<app-sp>`;
-- Registry Volume (files: projects, domains, history, Ladybug archives).
-- The `volume` resource binding only grants compute reach-through; UC ACLs still apply.
GRANT READ VOLUME ON VOLUME `<registry_catalog>`.`<registry_schema>`.`<registry_volume>` TO `<app-sp>`;
GRANT WRITE VOLUME ON VOLUME `<registry_catalog>`.`<registry_schema>`.`<registry_volume>` TO `<app-sp>`;
-- Optional — only if you want the Settings UI to be able to create the
-- registry volume itself on first run (otherwise create it once as admin).
GRANT CREATE VOLUME ON SCHEMA `<registry_catalog>`.`<registry_schema>` TO `<app-sp>`;Alternatively — shorter, broader, and fine for a dev workspace:
GRANT USE CATALOG ON CATALOG `<registry_catalog>` TO `<app-sp>`;
GRANT ALL PRIVILEGES ON SCHEMA `<registry_catalog>`.`<registry_schema>` TO `<app-sp>`;Prefer the explicit list above for a production workspace because ALL PRIVILEGES also implies MANAGE, which is broader than what the app needs day to day.
For every table or view referenced in an R2RML mapping (i.e. anything that appears in the Data Sources tab of a domain), the app SP needs to read the data:
GRANT USE CATALOG ON CATALOG `<source_catalog>` TO `<app-sp>`;
GRANT USE SCHEMA ON SCHEMA `<source_catalog>`.`<source_schema>` TO `<app-sp>`;
-- Per-object is tightest:
GRANT SELECT ON TABLE `<source_catalog>`.`<source_schema>`.`<table>` TO `<app-sp>`;
-- Or grant schema-wide if you trust the SP with the whole schema:
-- GRANT SELECT ON SCHEMA `<source_catalog>`.`<source_schema>` TO `<app-sp>`;If any mapping sql_query joins tables from multiple schemas or catalogs, repeat the USE CATALOG / USE SCHEMA / SELECT chain for each. The VIEW creation in step 3 will fail with TABLE_OR_VIEW_NOT_FOUND or PERMISSION_DENIED if one is missing.
MODIFYon source tables — the app only reads them.MANAGEon the registry schema —CREATE TABLE+CREATE VIEWare sufficient because the SP automatically owns the objects it creates.MANAGEon the registry catalog — not required unless you want the SP toDROP CATALOG/ALTER CATALOG, which OntoBricks never does.
If a build previously materialised the triplestore as a TABLE (legacy code path or a manual CTAS in SQL Editor), the next Sync will try to run CREATE OR REPLACE VIEW over the same name. Databricks refuses this because:
CREATE OR REPLACE VIEWcannot convert an existing TABLE to a VIEW, even for the owner.- Even if it could, the SP would need
MANAGEon that pre-existing object.
The user-facing error surfaces as:
Sync failed: Failed to create VIEW: PERMISSION_DENIED: User does not have MANAGE on Table '<cat>.<schema>.triplestore_<domain>_v<n>'.
Fix (once): drop the stale object yourself, then re-run Sync:
DROP TABLE IF EXISTS `<registry_catalog>`.`<registry_schema>`.`triplestore_<domain>_v<n>`;Once dropped, the app SP recreates the object as a VIEW and owns it from then on.
If databricks apps get reports the SP client-id as None (app not fully provisioned), bind it by email-alias instead. Apps expose a dedicated SP user with the form <app-name>@<account-id>.iam.databricks.com, visible under Admin → Service principals after the first deploy. You can grant against that principal with the same SQL, substituting its identifier in the TO clause.
Once grants are in place, you can verify from any SQL editor or notebook authenticated as the app SP (or by running a build in the app and watching the logs):
-- Should succeed (list registry schema objects)
SHOW TABLES IN `<registry_catalog>`.`<registry_schema>`;
SHOW VOLUMES IN `<registry_catalog>`.`<registry_schema>`;
-- Should return the current SP's privileges on the schema
SHOW GRANTS `<app-sp>` ON SCHEMA `<registry_catalog>`.`<registry_schema>`;If a user reports a sync failure, the app now surfaces the underlying Databricks error in the task status (PERMISSION_DENIED, TABLE_OR_VIEW_NOT_FOUND, unresolved column, etc.). The full traceback is always in the app logs.
OntoBricks includes a built-in permission system that controls who can access the app and what they can do. Permissions are managed in Settings > Permissions and are only active when running as a Databricks App (local development has no restrictions).
| Role | Access |
|---|---|
| Admin | Full access + can manage the permission list. Determined by CAN_MANAGE on the Databricks App. |
| Editor | Full access to all features (read + write). |
| Viewer | Read-only access (cannot create, edit, or delete). |
| No role | Blocked entirely (redirect to Access Denied page). |
When no permissions are configured yet, only users with CAN_MANAGE on the Databricks App have access. Everyone else is blocked until an admin adds them via the Permissions tab.
At runtime, the app checks whether the logged-in user has CAN_MANAGE on the Databricks App by calling the Permissions API. The check uses the user's own OAuth token (forwarded by the Databricks Apps proxy via x-forwarded-access-token). This means:
- Admin detection uses the user's OAuth token when the Apps runtime forwards one.
- If the user's forwarded token is not available or lacks scope for
/api/2.0/permissions/apps/*, the app falls back to the SDK (service principal) and then a REST call with the SP token. - Those fallbacks require the app's service principal to have at least
CAN_VIEW_PERMISSIONSon its own app, which is not granted automatically when the app is created.
Databricks Apps do not grant the freshly-created service principal any permission on the app it runs. Until that is fixed, list_app_principals returns 403, the middleware cannot resolve any user (including CAN_MANAGE deployers), and the access-denied page appears on the very first request.
make deploy (via scripts/deploy.sh) runs the fix automatically. For manual deploys:
make bootstrap-perms
# equivalent to: scripts/bootstrap-app-permissions.sh ontobricks-XXX mcp-ontobricksThe script is idempotent. It discovers each app's service principal via databricks apps get and grants it CAN_MANAGE on itself. If the first post-deploy page load shows a "First-deploy bootstrap required" banner, that is exactly the situation — run the command and reload.
- Ensure you have CAN_MANAGE on the sandbox app
ontobricks-XXXin the Databricks UI (Compute > Apps > ontobricks-XXX > Permissions), or on your production app name if different - Open the app and go to Settings > Permissions
- Click Add to grant access to workspace users or groups
- Assign each principal a Viewer or Editor role
- Users not in the list are blocked from accessing the app
If a user reports "Access Denied", hit the diagnostic endpoint (accessible even when blocked):
https://<app-url>/settings/permissions/diag
This returns:
- The user's email and forwarded token status
- SDK (SP token) check result
- User-token check result
- Which principals have
CAN_MANAGE - The cached admin decision and its age
When moving OntoBricks to a different Databricks workspace, follow these steps.
# Option A: Set as default
databricks auth login --host https://<new-workspace>.cloud.databricks.com
# Option B: Use a named profile
databricks auth login --host https://<new-workspace>.cloud.databricks.com --profile new-wsVerify:
databricks current-user me
# Or with a profile: databricks current-user me --profile new-wsThe new workspace needs a catalog/schema where OntoBricks can store projects and triple stores:
# Create a schema for OntoBricks (adjust catalog name)
databricks sql query "CREATE SCHEMA IF NOT EXISTS main.ontobricks"
# Create the registry volume
databricks sql query "CREATE VOLUME IF NOT EXISTS main.ontobricks.OntoBricksRegistry"databricks.yml — update the variable defaults:
variables:
warehouse_id:
default: "<new-warehouse-id>"
registry_catalog:
default: "<new-catalog>"
registry_schema:
default: "<new-schema>"
registry_volume:
default: "registry"
# If you use dev-lakebase — set Lakebase project/branch and the db-… segment from list-databases
lakebase_project:
default: "<lakebase-project-id>"
lakebase_branch:
default: "production"
lakebase_database_resource_segment:
default: "<db-… from databricks postgres list-databases … -o json>"
lakebase_registry_schema:
default: "ontobricks_registry"Update the permissions section with the deploying user's email.
app.yaml — update workspace-specific values:
- name: DATABRICKS_SQL_WAREHOUSE_ID_DEFAULT
value: "<new-warehouse-id>"
- name: DATABRICKS_TRIPLESTORE_TABLE
value: "<catalog>.<schema>.<triplestore_table>"src/mcp-server/app.yaml — update the main app URL (after first deploy):
- name: ONTOBRICKS_URL
value: "https://<new-ontobricks-app-url>"# Validate first (use the same target you will deploy)
databricks bundle validate -t dev-lakebase
# Deploy and start main app (Lakebase target); start MCP separately if needed
scripts/deploy.sh -t dev-lakebase
databricks bundle run mcp_ontobricks_app -t dev-lakebase- Go to Compute > Apps > ontobricks-XXX > Resources (or your main app name)
- Bind
sql-warehouseto a running SQL Warehouse - Bind
volumeto the registry UC Volume - Repeat for
mcp-ontobricks
- Open the app URL
- Go to Settings > Registry > Initialize (if the volume is empty)
- Verify both apps are Running:
databricks apps get ontobricks-XXXanddatabricks apps get mcp-ontobricks
After the main app is deployed and running:
# Get the main app URL
databricks apps get ontobricks-XXX -o json | python3 -c "import sys,json; print(json.load(sys.stdin)['url'])"Update ONTOBRICKS_URL in src/mcp-server/app.yaml with this URL, then redeploy the bundle and restart the MCP app:
databricks bundle deploy -t dev-lakebase
databricks bundle run mcp_ontobricks_app -t dev-lakebase[ ] 1. databricks auth login --host https://<new-workspace>
[ ] 2. Verify: databricks current-user me
[ ] 3. Create Unity Catalog resources (schema, volume)
[ ] 4. Update databricks.yml variables (warehouse_id, registry_*, and if using dev-lakebase: lakebase_*)
[ ] 5. Update databricks.yml permissions (your email)
[ ] 6. Update app.yaml (DATABRICKS_SQL_WAREHOUSE_ID_DEFAULT, DATABRICKS_TRIPLESTORE_TABLE, LAKEBASE_SCHEMA)
[ ] 7. databricks bundle validate -t dev-lakebase
[ ] 8. scripts/deploy.sh -t dev-lakebase # or: databricks bundle deploy -t dev-lakebase
[ ] 9. Bind sql-warehouse and volume resources in the Apps UI (both apps: ontobricks-XXX, mcp-ontobricks)
[ ] 10. If dev-lakebase: scripts/bootstrap-lakebase-perms.sh (see §2 Step 5b)
[ ] 11. Grant UC privileges to each app's service principal (see §3):
registry USE CATALOG/USE SCHEMA/CREATE TABLE/CREATE VIEW +
volume READ/WRITE + source-table SELECT
[ ] 12. Initialize registry (Settings > Registry > Initialize)
[ ] 13. Verify both apps are RUNNING
[ ] 14. Update ONTOBRICKS_URL in src/mcp-server/app.yaml with the main app URL
[ ] 15. databricks bundle deploy -t dev-lakebase && databricks bundle run mcp_ontobricks_app -t dev-lakebase
[ ] 16. Verify MCP appears in Databricks Playground
OntoBricks supports two triple store backends. Choose one in your project settings.
Delta uses a Databricks SQL Warehouse to store triples in a Delta table. On Databricks Apps, the app's service principal authenticates via OAuth automatically — the only requirement is the SQL Warehouse resource declared in app.yaml (already configured).
LadybugDB is an embedded graph database that stores data locally at /tmp. When the project has an ontology loaded, LadybugDB uses a true graph model (OWL classes become node tables, object properties become relationship tables). Graph data is automatically archived to the registry UC Volume when saving a project and restored on load.
The MCP server (mcp-ontobricks) is a separate Databricks App that exposes OntoBricks knowledge-graph tools to the Databricks Playground. It must have a name starting with mcp- to be discoverable.
| Requirement | Details |
|---|---|
| Main app deployed | Your main app (ontobricks-XXX in examples — the name under ontobricks_dev_app in databricks.yml) must be deployed and running first |
| Databricks CLI | Authenticated to the same workspace |
| Playground access | Databricks Playground must be enabled |
The MCP server is deployed alongside the main app by the same databricks.yml bundle:
# Deploy both app definitions (single bundle deploy)
databricks bundle deploy -t dev-lakebase
# Or the convenience script (starts main app only — start MCP separately)
scripts/deploy.sh -t dev-lakebase
# After the main app is up, start the MCP app if needed
databricks bundle run mcp_ontobricks_app -t dev-lakebasecommand:
- "uv"
- "run"
- "mcp-ontobricks"
env:
- name: ONTOBRICKS_URL
value: "https://<your-ontobricks-app-url>"
- name: DATABRICKS_SQL_WAREHOUSE_ID
valueFrom: sql-warehouse
- name: REGISTRY_VOLUME_PATH
valueFrom: volume
resources:
- name: sql-warehouse
sql_warehouse:
permission: CAN_USE
- name: volume
volume:
permission: CAN_READ_WRITEImportant: Update
ONTOBRICKS_URLto match your main app's URL before deploying. Find it with:databricks apps get ontobricks-XXX -o json | python3 -c "import sys,json; print(json.load(sys.stdin)['url'])"
After the first deployment, bind the MCP server's resources:
- Go to Compute > Apps > mcp-ontobricks > Resources
- Bind
sql-warehouseto the same SQL Warehouse used by the main app - Bind
volumeto the same registry UC Volume
The MCP server calls the main app's REST API using its service principal's OAuth token. The users group should already have CAN_USE on the main app (set in databricks.yml). If it doesn't:
databricks apps update-permissions ontobricks-XXX --json '{
"access_control_list": [
{ "group_name": "users", "permission_level": "CAN_USE" }
]
}'Use your real main app name everywhere ontobricks-XXX appears (from databricks.yml).
- Go to your Databricks workspace
- Navigate to Playground
mcp-ontobricksappears in the MCP Servers list (apps starting withmcp-are shown automatically)- Select it to use OntoBricks knowledge-graph tools in conversations
| Tool | Description |
|---|---|
list_projects |
List all projects (knowledge graphs) in the registry with names and descriptions |
select_project |
Activate a project by name — subsequent queries operate on its triple store |
list_entity_types |
Human-readable overview of the selected project's knowledge graph (entity types, counts, predicates) |
describe_entity |
Search by name/type and get a full-text description with attributes, relationships, and BFS traversal |
get_status |
Compact diagnostic: project, backend, table, data availability, triple count |
For LLM clients like Cursor or Claude Desktop (stdio transport):
cd mcp-server
uv run python -c \
"from server.app import create_mcp_server; create_mcp_server('standalone').run(transport='stdio')"Or via the convenience wrapper at the project root:
python src/mcp-server/mcp_server.py # stdio (default)
python src/mcp-server/mcp_server.py --http # streamable-http on port 9100Override the target URL:
ONTOBRICKS_URL=http://your-host:8000 python src/mcp-server/mcp_server.pyCursor (.cursor/mcp.json):
{
"mcpServers": {
"ontobricks": {
"command": "uv",
"args": ["run", "python", "-c",
"from server.app import create_mcp_server; create_mcp_server('standalone').run(transport='stdio')"
],
"cwd": "/path/to/OntoBricks/mcp-server"
}
}
}Claude Desktop (claude_desktop_config.json):
{
"mcpServers": {
"ontobricks": {
"command": "uv",
"args": ["run", "python", "-c",
"from server.app import create_mcp_server; create_mcp_server('standalone').run(transport='stdio')"
],
"cwd": "/path/to/OntoBricks/mcp-server",
"env": { "ONTOBRICKS_URL": "http://localhost:8000" }
}
}
}OntoBricks agents are instrumented with MLflow tracing. When deployed to Databricks, traces are persisted to the workspace tracking server.
MLFLOW_TRACKING_URI=databricksis set inapp.yaml- Application startup in
src/shared/fastapi/main.pycallssetup_tracing(), which creates the/Shared/ontobricks-agentsexperiment - Every agent call (OWL Generator, Auto-Mapping, Auto Icon Assign, Ontology Assistant) produces a span tree:
AGENT (run_agent)
├── LLM (_call_llm) — endpoint, tokens, latency
├── TOOL (tool:get_metadata) — arguments, result
├── LLM (_call_llm) — next iteration
├── TOOL (tool:execute_sql) — SQL query, result
└── ...
- In your Databricks workspace, go to Machine Learning > Experiments
- Open
/Shared/ontobricks-agents - Click any run, then the Traces tab
| Variable | Default | Description |
|---|---|---|
MLFLOW_TRACKING_URI |
(none) | Set to databricks for persistent traces |
ONTOBRICKS_MLFLOW_EXPERIMENT |
ontobricks-agents |
Experiment name (auto-prefixed with /Shared/ on Databricks) |
Tracing degrades gracefully: if MLflow is not configured, agents run normally without traces.
OntoBricks/
├── databricks.yml # Bundle definition (apps, permissions, targets)
├── .databricksignore # Excludes non-runtime files from sync
├── app.yaml # Main app runtime config
├── src/mcp-server/
│ └── app.yaml # MCP server runtime config
└── docs/dab-reference.md
└── README.md # DAB-specific documentation
| Target | Mode | Description |
|---|---|---|
dev |
development | Volume-only registry backend — no Apps postgres resource. |
dev-lakebase |
development | Same apps plus Lakebase Autoscaling postgres binding (default for make deploy / scripts/deploy.sh). |
databricks bundle deploy -t dev-lakebase
databricks bundle run ontobricks_dev_app -t dev-lakebase
databricks bundle run mcp_ontobricks_app -t dev-lakebaseOverride defaults with --var flags or in a target-specific variables: block. Authoritative defaults live in databricks.yml; the table below summarizes intent:
| Variable | Description |
|---|---|
warehouse_id |
SQL Warehouse ID for the sql-warehouse resource. |
registry_catalog / registry_schema / registry_volume |
UC triple for the volume resource (catalog.schema.volume name). |
lakebase_project / lakebase_branch / lakebase_database_resource_segment |
Lakebase Autoscaling binding (final segment must be db-… from list-databases). |
lakebase_registry_schema |
Postgres schema for the registry (mirror LAKEBASE_SCHEMA in app.yaml). |
databricks bundle deploy -t dev-lakebase --var warehouse_id=abc123def456make deploy # scripts/deploy.sh -t dev-lakebase
make deploy-volume # Volume-only: scripts/deploy.sh -t dev
make deploy-no-run # Deploy without bundle run
make bundle-validate # databricks bundle validate -t dev-lakebase
make bundle-summary # databricks bundle summary -t dev-lakebase
make bootstrap-perms # app SP CAN_MANAGE on self (see scripts/bootstrap-app-permissions.sh)
make bootstrap-lakebase # Lakebase schema grants (see scripts/bootstrap-lakebase-perms.sh)The .databricksignore at the project root excludes non-runtime files (tests, docs, data, IDE config, etc.) from the main app sync. The MCP server has its own source_code_path pointing directly to src/mcp-server/.
If the apps already exist in the workspace from a previous deployment:
databricks bundle deployment bind ontobricks_dev_app ontobricks-XXX -t dev-lakebase --auto-approve
databricks bundle deployment bind mcp_ontobricks_app mcp-ontobricks -t dev-lakebase --auto-approve
databricks bundle deploy -t dev-lakebaseUse this checklist when deploying OntoBricks from scratch on any workspace:
[ ] 1. Databricks CLI installed (>= 0.250.0) and authenticated
databricks auth login --host https://<workspace>
databricks current-user me
[ ] 2. SQL Warehouse created and running
[ ] 3. Unity Catalog resources available:
[ ] A catalog you can use (e.g., main or your personal catalog)
[ ] A schema within that catalog (e.g., ontobricks)
[ ] A Volume for the project registry (e.g., OntoBricksRegistry)
[ ] 4. Update databricks.yml:
[ ] Variable defaults (warehouse_id, registry_*, and if dev-lakebase: lakebase_*)
[ ] Permissions (your email with CAN_MANAGE)
[ ] 5. Update app.yaml:
[ ] DATABRICKS_SQL_WAREHOUSE_ID_DEFAULT
[ ] DATABRICKS_TRIPLESTORE_TABLE
[ ] LAKEBASE_SCHEMA (must match lakebase_registry_schema when using Lakebase)
[ ] (local dev only) REGISTRY_CATALOG / REGISTRY_SCHEMA / REGISTRY_VOLUME
[ ] 6. Validate: databricks bundle validate -t dev-lakebase
[ ] 7. Deploy: scripts/deploy.sh -t dev-lakebase (or databricks bundle deploy -t dev-lakebase)
[ ] 8. Bind sql-warehouse resource in the Apps UI (main + MCP apps)
[ ] 9. Bind volume resource to the registry UC Volume (both apps)
[ ] 10. If dev-lakebase: run scripts/bootstrap-lakebase-perms.sh (see §2 Step 5b)
[ ] 11. Grant Unity Catalog privileges to the app service principal (see §3):
[ ] Registry catalog: USE CATALOG
[ ] Registry schema : USE SCHEMA + CREATE TABLE + CREATE VIEW
[ ] Registry volume : READ VOLUME + WRITE VOLUME
[ ] Each source catalog/schema referenced in R2RML mappings:
USE CATALOG + USE SCHEMA + SELECT (per table or schema-wide)
[ ] 12. make bootstrap-perms (or scripts/bootstrap-app-permissions.sh ontobricks-XXX mcp-ontobricks)
[ ] 13. Verify main app is RUNNING:
databricks apps get ontobricks-XXX
[ ] 14. Initialize registry if the volume is empty:
Open app → Settings → Registry → Initialize
[ ] 15. (If using MCP) Update ONTOBRICKS_URL in src/mcp-server/app.yaml:
databricks apps get ontobricks-XXX -o json | python3 -c "import sys,json; print(json.load(sys.stdin)['url'])"
[ ] 16. (If using MCP) Redeploy and start MCP:
databricks bundle deploy -t dev-lakebase && databricks bundle run mcp_ontobricks_app -t dev-lakebase
[ ] 17. (If using MCP) Bind MCP resources (same warehouse + volume)
[ ] 18. (If using MCP) Verify in Databricks Playground
# Check app status (use your bundle’s app names)
databricks apps get ontobricks-XXX
databricks apps get mcp-ontobricks
# Check recent deployment status
databricks apps list-deployments ontobricks-XXX- Hit the diagnostic endpoint:
https://<app-url>/settings/permissions/diag - Check
user_token_present— should betruein Databricks App mode - Check
user_token_can_manage— should list your email - If
email_is_managerisfalse, verify your email matches the CAN_MANAGE grant exactly - Check
admin_cache— if stale, restart the app to clear it
The agents need OAuth credentials to call the Foundation Model API. In a Databricks App, these are resolved automatically via the service principal. If you see this error:
- Verify the app is running as a Databricks App (not locally)
- Check that the LLM endpoint is configured in the project settings
- Review app logs for OAuth token resolution errors
- Verify the SQL Warehouse is running and the resource binding is correct
- Check the SP has correct permissions on catalogs/schemas — see §3 Unity Catalog Permissions for the Service Principal
- Review app logs in the Databricks Apps UI
If Sync fails with a message like
Sync failed: Failed to create VIEW: PERMISSION_DENIED:
User does not have MANAGE on Table '<cat>.<schema>.triplestore_<domain>_v<n>'.
then an object with the same name already exists in UC as a TABLE — typically left behind by a legacy materialised build — and CREATE OR REPLACE VIEW cannot overwrite it. Drop the stale object once as an admin and retry:
DROP TABLE IF EXISTS `<cat>`.`<schema>`.`triplestore_<domain>_v<n>`;If the failure is a plain PERMISSION_DENIED without a pre-existing object, the app SP is missing a UC grant — see §3.2 for the registry schema and §3.3 for source tables.
The app SP lacks USE CATALOG / USE SCHEMA / SELECT on one of the source tables referenced in an R2RML mapping. Identify which source from the error message and grant per §3.3.
All internal navigation links must include a trailing slash (e.g., /dtwin/ not /dtwin). FastAPI's redirect_slashes generates localhost:8000 redirects behind the Databricks proxy.
Stale __pycache__ or old directories in the workspace can cause import conflicts:
# Remove stale workspace files if needed
databricks workspace delete /Users/<you>/ontobricks/<stale-dir> --recursiveThe MCP app's SP needs CAN_USE permission on the main app. The users group should already have this via databricks.yml.
- Verify
MLFLOW_TRACKING_URI=databricksis set inapp.yaml - Check the app logs for
MLflow tracing enabled — experiment='/Shared/ontobricks-agents' - Run an agent call (e.g., generate an ontology) to create the first trace
- The experiment is created on first use, not at startup
- Never commit
.envfiles or secrets to Git - Use Databricks Secrets for sensitive data (passwords, tokens)
- Use service principals for production (automatically handled by Databricks Apps)
- Use an appropriately sized SQL Warehouse (enable auto-stop to save costs)
- App logs: Available in the Databricks Apps console
- Log level: Configure via
LOG_LEVELenvironment variable (DEBUG,INFO,WARNING,ERROR) - Structured JSON logs: Set
LOG_FORMAT=jsonto emit one JSON object per log line — ideal for log aggregation and search - Request timing: Every non-static request is logged with method, path, status code, and duration in milliseconds
- Thread pool: Tune concurrent blocking work via
ONTOBRICKS_THREAD_POOL_SIZE(default20) - Health checks:
GET /healthruns a comprehensive readiness probe (/tmp+ session/log dirs, Databricks auth, SQL warehouse, registry UC volume read+write, registry catalog/schema view DDL, Lakebase USAGE) and returns{status, summary:{ok,warnings,errors}, checks:[…]}. The endpoint always returns HTTP 200 — external probes should look at the top-levelstatusandsummary.errorsfields rather than the HTTP code, so a single flickering dependency does not pull the app out of rotation. - Agent traces: View under Machine Learning > Experiments >
/Shared/ontobricks-agents— each agent call shows a span tree with inputs, outputs, latency, and token usage
# Pull latest code and redeploy
git pull origin main
scripts/deploy.sh -t dev-lakebase
databricks bundle run mcp_ontobricks_app -t dev-lakebase