Problem Statement
Deploying Ontos as a Databricks App via databricks bundle deploy does not work reliably. The bundle configuration (src/databricks.yaml), runtime configuration (src/app.yaml), and documentation (README, CONFIGURING.md, pyproject.toml) are inconsistent with each other in multiple ways:
-
Missing resources: The bundle only provisions sql-warehouse, but the app requires database (Lakebase), volume, and optionally serving-endpoint (LLM). Deployment succeeds but the app crashes at startup because valueFrom references in app.yaml resolve to nothing.
-
Entry point mismatch: databricks.yaml uses uvicorn src.app:app (which fails without --app-dir backend), while app.yaml uses python backend/src/app.py. The bundle's app_config variable overrides app.yaml, creating a conflict.
-
Config duality: The bundle defines a full app_config variable (command + env vars) that overrides app.yaml at deploy time, but it's incomplete (missing PYTHONPATH, PGSCHEMA, APP_ADMIN_DEFAULT_GROUPS, etc.). Neither file is authoritative.
-
Wrong variable names in docs/scripts: README says --var="catalog=app_data" but the actual variable is catalog_name. Same issue in pyproject.toml deploy script. The README also shows a separate databricks apps deploy command that's redundant with bundle deploy.
-
Dead configuration: FRONTEND_STATIC_DIR is set in app.yaml but never read by the backend (which hardcodes the static path).
-
Local dev friction: README tells users to manually mkdir backend/static with no explanation of why.
-
Lakebase chicken-and-egg for DAB deploys: The DAB database resource requires an opaque auto-generated ID path (e.g., projects/.../databases/db-8uv1-...), not a human-readable instance name. This ID is only available after the Lakebase instance is created, creating a chicken-and-egg problem for DAB deploys. Marketplace installs don't have this issue (users select the instance via UI). The backend has a LAKEBASE_INSTANCE_NAME config field but get_lakebase_instance_name() in database.py does not use it as a fallback when the app resource lookup returns None.
Solution
Establish a clear config responsibility split and fix all inconsistencies:
databricks.yaml = infrastructure authority (resources, targets, permissions, variable definitions)
app.yaml = runtime authority (command, env vars, resource references via valueFrom)
manifest.yaml = resource contract with platform (already correct, no changes)
Remove the bundle's app_config override entirely. Add missing resources to the bundle (volume, serving-endpoint). For the database resource, use the manifest.yaml spec for Marketplace installs but support a LAKEBASE_INSTANCE_NAME env var fallback for DAB deploys where the opaque ID isn't available. Use per-target config overrides in databricks.yaml only for environment-varying env vars. Fix all documentation to match.
Add a config consistency validation script to catch drift between these files.
Database Strategy: Two Deployment Paths
| Deployment Mode |
Database Config |
How It Works |
| Marketplace install |
User selects Lakebase instance in UI |
Platform creates database resource with opaque ID → get_lakebase_instance_name() reads it via ws_client.apps.get() → works automatically |
| DAB deploy |
Cannot declare database resource (opaque ID unknown) |
Set LAKEBASE_INSTANCE_NAME env var in app.yaml/bundle targets → get_lakebase_instance_name() falls back to this → SDK resolves host + credentials |
Backend change needed: get_lakebase_instance_name() must check settings.LAKEBASE_INSTANCE_NAME as a fallback when the app resource lookup returns None.
User Stories
- As a developer, I want
databricks bundle deploy -t dev to provision all required resources (SQL Warehouse, Volume) so that the app starts without missing resource errors.
- As a developer, I want
databricks bundle deploy -t prod to provision all resources including the LLM serving endpoint so that AI features work in production.
- As a developer, I want to deploy without an LLM endpoint by setting
LLM_ENABLED=False so that I can run the app in environments without a serving endpoint.
- As a developer, I want the README deployment instructions to use correct variable names (
catalog_name, schema_name) so that copy-pasting commands actually works.
- As a developer, I want a single deploy command (
databricks bundle deploy -t <target>) without needing a separate databricks apps deploy so that the deployment process is simple and predictable.
- As a developer, I want the bundle to automatically build the frontend during deployment (via
npm run build auto-detection) so that I don't need manual build steps.
- As a developer, I want
app.yaml to be the single source of truth for runtime configuration so that I only need to look in one place for command and env vars.
- As a developer, I want environment-specific values (
DATABRICKS_CATALOG, PGSCHEMA) to be set per-target in the bundle so that dev and prod deployments use different catalogs/schemas automatically.
- As a developer, I want to run
hatch -e dev run deploy-and-run with correct variable names so that the convenience script actually works.
- As a developer, I want the local development Quick Start in the README to explain the static directory requirement and ideally auto-create it so that onboarding is smooth.
- As a developer, I want a validation script that checks config consistency between
databricks.yaml, app.yaml, and manifest.yaml so that drift is caught before deployment.
- As an operator, I want the CONFIGURING.md Lakebase deploy section to reference
databricks bundle deploy (not databricks apps deploy) so that documentation matches the actual workflow.
- As a developer, I want dead configuration (
FRONTEND_STATIC_DIR) removed from app.yaml so that the config is clean and doesn't mislead.
- As a developer deploying via DAB, I want to set
LAKEBASE_INSTANCE_NAME as an env var so that I don't need the opaque database ID that's only available after Lakebase instance creation.
- As a Marketplace user installing Ontos, I want to select my existing Lakebase instance from the UI and have the app connect to it automatically without any manual configuration.
- As a developer, I want
get_lakebase_instance_name() to fall back to the LAKEBASE_INSTANCE_NAME env var when no database resource is attached to the app, so that both DAB and Marketplace deployment paths work.
Implementation Decisions
Config Architecture
- Remove the
app_config variable and config: ${var.app_config} from databricks.yaml. The bundle should not define command or env vars — that is app.yaml's job.
- Keep
app.yaml entry point as python backend/src/app.py (uses the if __name__ block which calls uvicorn.run() internally). This is the current working pattern.
- Add per-target
config.env overrides in databricks.yaml targets for environment-varying values only: DATABRICKS_CATALOG, DATABRICKS_SCHEMA, PGSCHEMA.
- Verify DAB merge behavior: does target-level
config.env merge with app.yaml env by name, or replace entirely? This determines whether overrides work or need the full env list.
Resources in databricks.yaml
- Required resources (in base config):
sql-warehouse, volume
- Optional resource (in base config with default):
serving-endpoint with default name databricks-meta-llama-3-3-70b-instruct. Targets without LLM can override LLM_ENABLED=False in their config.env.
- Database resource NOT declared in bundle — DAB cannot reference Lakebase by name, only by opaque ID. Instead, pass
LAKEBASE_INSTANCE_NAME as an env var in bundle targets.
- The
database resource spec remains in manifest.yaml for Marketplace installs where users select the instance via UI.
- New variables:
serving_endpoint_name (default: databricks-meta-llama-3-3-70b-instruct), lakebase_instance_name (no default — must be set per target).
- Resource names must match
manifest.yaml names exactly: sql-warehouse, serving-endpoint, volume.
Database Connectivity: Dual-Path Support
The backend must support two database connection paths:
-
Marketplace path (resource injection): get_lakebase_instance_name() reads the database resource's instance_name via ws_client.apps.get(app_name) → resolves host + credentials via SDK. This already works.
-
DAB path (env var fallback): When no database resource exists on the app, get_lakebase_instance_name() falls back to settings.LAKEBASE_INSTANCE_NAME. The env var is set per-target in databricks.yaml.
Backend code change in src/backend/src/common/database.py:
- Modify
get_lakebase_instance_name() (line 67) to check settings.LAKEBASE_INSTANCE_NAME as fallback when app resource lookup returns None.
- This also requires
get_lakebase_instance_name() to accept settings as a parameter (or access it globally).
- Additionally,
PGHOST must be resolvable without the database resource. Currently get_db_url() requires settings.PGHOST — for the DAB path, the host must be derived from the instance name via SDK (e.g., ws_client.database.endpoints.get() or similar).
app.yaml Cleanup
- Remove
FRONTEND_STATIC_DIR (dead env var — backend hardcodes Path(__file__).parent.parent / "static").
- Add
DATABRICKS_CATALOG and DATABRICKS_SCHEMA with sensible defaults (will be overridden by bundle targets).
- Add
LAKEBASE_INSTANCE_NAME with empty default (set per-target in bundle, or auto-resolved via database resource for Marketplace).
- Keep all other env vars as-is (PYTHONPATH, PGSCHEMA, APP_ADMIN_DEFAULT_GROUPS, etc.).
Documentation Fixes
- README: Fix variable names in deploy command, remove
databricks apps deploy, add target flags, add note about auto frontend build, improve Quick Start local dev instructions.
- CONFIGURING.md: Replace
databricks apps deploy <app-name> with databricks bundle deploy -t prod, align app.yaml example, document the two database connectivity paths (Marketplace vs DAB).
- pyproject.toml: Fix deploy script variable names (
catalog → catalog_name, schema → schema_name), fix app name (app_ontos → ontos), add target flag.
Config Consistency Validation
- Add a script (e.g.,
src/scripts/validate_config.py) that:
- Parses
databricks.yaml, app.yaml, and manifest.yaml
- Checks that every
valueFrom reference in app.yaml has a matching resource name in both databricks.yaml and manifest.yaml
- Checks that
databricks.yaml resource names match manifest.yaml resource spec names
- Optionally runs
databricks bundle validate if CLI is available
- Can be run in CI or manually before deploy.
Testing Decisions
Good tests for this work verify external behavior (does the config parse correctly, are resources consistent, does the fallback logic work) not implementation details (specific YAML formatting).
What to test
-
get_lakebase_instance_name() fallback logic: Unit test that when app resource lookup returns None, the function falls back to settings.LAKEBASE_INSTANCE_NAME. Test both paths: resource found (Marketplace) and resource not found + env var set (DAB).
-
Config consistency check (src/scripts/validate_config.py): A Python script that parses all three YAML files and asserts:
- Every
valueFrom reference in app.yaml has a matching resource in databricks.yaml or manifest.yaml
- Every resource in
databricks.yaml has a matching spec in manifest.yaml
- Resource names are consistent across all files
-
Bundle validation: Run databricks bundle validate -t dev and databricks bundle validate -t prod to check YAML syntax and variable resolution.
Prior art
- The project has existing Python test infrastructure via
pytest in src/backend/src/tests/
- The validation script follows the pattern of
src/scripts/build_static.sh — a standalone utility script in the scripts directory
Out of Scope
- Lakebase provisioning automation: The PRD does not cover auto-creating the Lakebase database instance. Users still need to create it manually per CONFIGURING.md.
- CI/CD pipeline setup: No GitHub Actions or CI pipeline changes. The validation script is meant to be run manually or integrated into CI separately.
- Frontend build pipeline changes: The
build_static.sh and vite.config.ts are correct and unchanged.
- app.yaml variable interpolation:
app.yaml does not support DAB variables. Environment-specific values must come from bundle target overrides.
- Lobbying Databricks for name-based database references in DAB: The root cause of the chicken-and-egg problem is a platform limitation. This PRD works around it.
Further Notes
- The DAB
config.env merge behavior (merge-by-name vs full-replace) is a critical unknown. If it replaces the entire env list, the per-target override approach won't work and we'll need to duplicate all env vars in the bundle config. This must be verified against Databricks documentation or by testing before implementation.
- The
manifest.yaml and databricks.yaml serve complementary roles: manifest defines what resource types the app can consume (platform contract), while the bundle provisions specific instances of those resources. Both are needed.
- The serving-endpoint default name (
databricks-meta-llama-3-3-70b-instruct) should be validated against what's actually available in the target workspace. Consider making it a required variable with no default to force explicit configuration.
- The Lakebase chicken-and-egg problem was reported by a DAB user. Their workaround was to skip the database resource entirely and use SDK-based auth. Our approach is similar but cleaner: keep the
database spec in manifest.yaml for Marketplace, use LAKEBASE_INSTANCE_NAME env var as fallback for DAB.
PGHOST resolution for the DAB path needs investigation: can the SDK resolve the Lakebase host from the instance name alone, or do we need additional config? The workflows already use generate_database_credential() with instance names, so this pattern is proven.
Problem Statement
Deploying Ontos as a Databricks App via
databricks bundle deploydoes not work reliably. The bundle configuration (src/databricks.yaml), runtime configuration (src/app.yaml), and documentation (README, CONFIGURING.md, pyproject.toml) are inconsistent with each other in multiple ways:Missing resources: The bundle only provisions
sql-warehouse, but the app requiresdatabase(Lakebase),volume, and optionallyserving-endpoint(LLM). Deployment succeeds but the app crashes at startup becausevalueFromreferences inapp.yamlresolve to nothing.Entry point mismatch:
databricks.yamlusesuvicorn src.app:app(which fails without--app-dir backend), whileapp.yamlusespython backend/src/app.py. The bundle'sapp_configvariable overridesapp.yaml, creating a conflict.Config duality: The bundle defines a full
app_configvariable (command + env vars) that overridesapp.yamlat deploy time, but it's incomplete (missing PYTHONPATH, PGSCHEMA, APP_ADMIN_DEFAULT_GROUPS, etc.). Neither file is authoritative.Wrong variable names in docs/scripts: README says
--var="catalog=app_data"but the actual variable iscatalog_name. Same issue inpyproject.tomldeploy script. The README also shows a separatedatabricks apps deploycommand that's redundant with bundle deploy.Dead configuration:
FRONTEND_STATIC_DIRis set inapp.yamlbut never read by the backend (which hardcodes the static path).Local dev friction: README tells users to manually
mkdir backend/staticwith no explanation of why.Lakebase chicken-and-egg for DAB deploys: The DAB
databaseresource requires an opaque auto-generated ID path (e.g.,projects/.../databases/db-8uv1-...), not a human-readable instance name. This ID is only available after the Lakebase instance is created, creating a chicken-and-egg problem for DAB deploys. Marketplace installs don't have this issue (users select the instance via UI). The backend has aLAKEBASE_INSTANCE_NAMEconfig field butget_lakebase_instance_name()indatabase.pydoes not use it as a fallback when the app resource lookup returnsNone.Solution
Establish a clear config responsibility split and fix all inconsistencies:
databricks.yaml= infrastructure authority (resources, targets, permissions, variable definitions)app.yaml= runtime authority (command, env vars, resource references viavalueFrom)manifest.yaml= resource contract with platform (already correct, no changes)Remove the bundle's
app_configoverride entirely. Add missing resources to the bundle (volume, serving-endpoint). For the database resource, use themanifest.yamlspec for Marketplace installs but support aLAKEBASE_INSTANCE_NAMEenv var fallback for DAB deploys where the opaque ID isn't available. Use per-target config overrides indatabricks.yamlonly for environment-varying env vars. Fix all documentation to match.Add a config consistency validation script to catch drift between these files.
Database Strategy: Two Deployment Paths
databaseresource with opaque ID →get_lakebase_instance_name()reads it viaws_client.apps.get()→ works automaticallydatabaseresource (opaque ID unknown)LAKEBASE_INSTANCE_NAMEenv var inapp.yaml/bundle targets →get_lakebase_instance_name()falls back to this → SDK resolves host + credentialsBackend change needed:
get_lakebase_instance_name()must checksettings.LAKEBASE_INSTANCE_NAMEas a fallback when the app resource lookup returnsNone.User Stories
databricks bundle deploy -t devto provision all required resources (SQL Warehouse, Volume) so that the app starts without missing resource errors.databricks bundle deploy -t prodto provision all resources including the LLM serving endpoint so that AI features work in production.LLM_ENABLED=Falseso that I can run the app in environments without a serving endpoint.catalog_name,schema_name) so that copy-pasting commands actually works.databricks bundle deploy -t <target>) without needing a separatedatabricks apps deployso that the deployment process is simple and predictable.npm run buildauto-detection) so that I don't need manual build steps.app.yamlto be the single source of truth for runtime configuration so that I only need to look in one place for command and env vars.DATABRICKS_CATALOG,PGSCHEMA) to be set per-target in the bundle so that dev and prod deployments use different catalogs/schemas automatically.hatch -e dev run deploy-and-runwith correct variable names so that the convenience script actually works.databricks.yaml,app.yaml, andmanifest.yamlso that drift is caught before deployment.databricks bundle deploy(notdatabricks apps deploy) so that documentation matches the actual workflow.FRONTEND_STATIC_DIR) removed fromapp.yamlso that the config is clean and doesn't mislead.LAKEBASE_INSTANCE_NAMEas an env var so that I don't need the opaque database ID that's only available after Lakebase instance creation.get_lakebase_instance_name()to fall back to theLAKEBASE_INSTANCE_NAMEenv var when no database resource is attached to the app, so that both DAB and Marketplace deployment paths work.Implementation Decisions
Config Architecture
app_configvariable andconfig: ${var.app_config}fromdatabricks.yaml. The bundle should not define command or env vars — that isapp.yaml's job.app.yamlentry point aspython backend/src/app.py(uses theif __name__block which callsuvicorn.run()internally). This is the current working pattern.config.envoverrides indatabricks.yamltargets for environment-varying values only:DATABRICKS_CATALOG,DATABRICKS_SCHEMA,PGSCHEMA.config.envmerge withapp.yamlenv by name, or replace entirely? This determines whether overrides work or need the full env list.Resources in
databricks.yamlsql-warehouse,volumeserving-endpointwith default namedatabricks-meta-llama-3-3-70b-instruct. Targets without LLM can overrideLLM_ENABLED=Falsein their config.env.LAKEBASE_INSTANCE_NAMEas an env var in bundle targets.databaseresource spec remains inmanifest.yamlfor Marketplace installs where users select the instance via UI.serving_endpoint_name(default:databricks-meta-llama-3-3-70b-instruct),lakebase_instance_name(no default — must be set per target).manifest.yamlnames exactly:sql-warehouse,serving-endpoint,volume.Database Connectivity: Dual-Path Support
The backend must support two database connection paths:
Marketplace path (resource injection):
get_lakebase_instance_name()reads thedatabaseresource'sinstance_nameviaws_client.apps.get(app_name)→ resolves host + credentials via SDK. This already works.DAB path (env var fallback): When no
databaseresource exists on the app,get_lakebase_instance_name()falls back tosettings.LAKEBASE_INSTANCE_NAME. The env var is set per-target indatabricks.yaml.Backend code change in
src/backend/src/common/database.py:get_lakebase_instance_name()(line 67) to checksettings.LAKEBASE_INSTANCE_NAMEas fallback when app resource lookup returnsNone.get_lakebase_instance_name()to acceptsettingsas a parameter (or access it globally).PGHOSTmust be resolvable without the database resource. Currentlyget_db_url()requiressettings.PGHOST— for the DAB path, the host must be derived from the instance name via SDK (e.g.,ws_client.database.endpoints.get()or similar).app.yamlCleanupFRONTEND_STATIC_DIR(dead env var — backend hardcodesPath(__file__).parent.parent / "static").DATABRICKS_CATALOGandDATABRICKS_SCHEMAwith sensible defaults (will be overridden by bundle targets).LAKEBASE_INSTANCE_NAMEwith empty default (set per-target in bundle, or auto-resolved via database resource for Marketplace).Documentation Fixes
databricks apps deploy, add target flags, add note about auto frontend build, improve Quick Start local dev instructions.databricks apps deploy <app-name>withdatabricks bundle deploy -t prod, align app.yaml example, document the two database connectivity paths (Marketplace vs DAB).catalog→catalog_name,schema→schema_name), fix app name (app_ontos→ontos), add target flag.Config Consistency Validation
src/scripts/validate_config.py) that:databricks.yaml,app.yaml, andmanifest.yamlvalueFromreference inapp.yamlhas a matching resource name in bothdatabricks.yamlandmanifest.yamldatabricks.yamlresource names matchmanifest.yamlresource spec namesdatabricks bundle validateif CLI is availableTesting Decisions
Good tests for this work verify external behavior (does the config parse correctly, are resources consistent, does the fallback logic work) not implementation details (specific YAML formatting).
What to test
get_lakebase_instance_name()fallback logic: Unit test that when app resource lookup returnsNone, the function falls back tosettings.LAKEBASE_INSTANCE_NAME. Test both paths: resource found (Marketplace) and resource not found + env var set (DAB).Config consistency check (
src/scripts/validate_config.py): A Python script that parses all three YAML files and asserts:valueFromreference inapp.yamlhas a matching resource indatabricks.yamlormanifest.yamldatabricks.yamlhas a matching spec inmanifest.yamlBundle validation: Run
databricks bundle validate -t devanddatabricks bundle validate -t prodto check YAML syntax and variable resolution.Prior art
pytestinsrc/backend/src/tests/src/scripts/build_static.sh— a standalone utility script in the scripts directoryOut of Scope
build_static.shandvite.config.tsare correct and unchanged.app.yamldoes not support DAB variables. Environment-specific values must come from bundle target overrides.Further Notes
config.envmerge behavior (merge-by-name vs full-replace) is a critical unknown. If it replaces the entire env list, the per-target override approach won't work and we'll need to duplicate all env vars in the bundle config. This must be verified against Databricks documentation or by testing before implementation.manifest.yamlanddatabricks.yamlserve complementary roles: manifest defines what resource types the app can consume (platform contract), while the bundle provisions specific instances of those resources. Both are needed.databricks-meta-llama-3-3-70b-instruct) should be validated against what's actually available in the target workspace. Consider making it a required variable with no default to force explicit configuration.databasespec inmanifest.yamlfor Marketplace, useLAKEBASE_INSTANCE_NAMEenv var as fallback for DAB.PGHOSTresolution for the DAB path needs investigation: can the SDK resolve the Lakebase host from the instance name alone, or do we need additional config? The workflows already usegenerate_database_credential()with instance names, so this pattern is proven.