fix: align Databricks Bundle deployment config, resources, and documentation

## Problem Statement

Deploying Ontos as a Databricks App via `databricks bundle deploy` does not work reliably. The bundle configuration (`src/databricks.yaml`), runtime configuration (`src/app.yaml`), and documentation (README, CONFIGURING.md, pyproject.toml) are inconsistent with each other in multiple ways:

1. **Missing resources**: The bundle only provisions `sql-warehouse`, but the app requires `database` (Lakebase), `volume`, and optionally `serving-endpoint` (LLM). Deployment succeeds but the app crashes at startup because `valueFrom` references in `app.yaml` resolve to nothing.

2. **Entry point mismatch**: `databricks.yaml` uses `uvicorn src.app:app` (which fails without `--app-dir backend`), while `app.yaml` uses `python backend/src/app.py`. The bundle's `app_config` variable overrides `app.yaml`, creating a conflict.

3. **Config duality**: The bundle defines a full `app_config` variable (command + env vars) that overrides `app.yaml` at deploy time, but it's incomplete (missing PYTHONPATH, PGSCHEMA, APP_ADMIN_DEFAULT_GROUPS, etc.). Neither file is authoritative.

4. **Wrong variable names in docs/scripts**: README says `--var="catalog=app_data"` but the actual variable is `catalog_name`. Same issue in `pyproject.toml` deploy script. The README also shows a separate `databricks apps deploy` command that's redundant with bundle deploy.

5. **Dead configuration**: `FRONTEND_STATIC_DIR` is set in `app.yaml` but never read by the backend (which hardcodes the static path).

6. **Local dev friction**: README tells users to manually `mkdir backend/static` with no explanation of why.

7. **Lakebase chicken-and-egg for DAB deploys**: The DAB `database` resource requires an opaque auto-generated ID path (e.g., `projects/.../databases/db-8uv1-...`), not a human-readable instance name. This ID is only available after the Lakebase instance is created, creating a chicken-and-egg problem for DAB deploys. Marketplace installs don't have this issue (users select the instance via UI). The backend has a `LAKEBASE_INSTANCE_NAME` config field but `get_lakebase_instance_name()` in `database.py` does not use it as a fallback when the app resource lookup returns `None`.

## Solution

Establish a clear config responsibility split and fix all inconsistencies:

- **`databricks.yaml`** = infrastructure authority (resources, targets, permissions, variable definitions)
- **`app.yaml`** = runtime authority (command, env vars, resource references via `valueFrom`)
- **`manifest.yaml`** = resource contract with platform (already correct, no changes)

Remove the bundle's `app_config` override entirely. Add missing resources to the bundle (volume, serving-endpoint). For the database resource, use the `manifest.yaml` spec for Marketplace installs but support a `LAKEBASE_INSTANCE_NAME` env var fallback for DAB deploys where the opaque ID isn't available. Use per-target config overrides in `databricks.yaml` only for environment-varying env vars. Fix all documentation to match.

Add a config consistency validation script to catch drift between these files.

### Database Strategy: Two Deployment Paths

| Deployment Mode | Database Config | How It Works |
|---|---|---|
| **Marketplace install** | User selects Lakebase instance in UI | Platform creates `database` resource with opaque ID → `get_lakebase_instance_name()` reads it via `ws_client.apps.get()` → works automatically |
| **DAB deploy** | Cannot declare `database` resource (opaque ID unknown) | Set `LAKEBASE_INSTANCE_NAME` env var in `app.yaml`/bundle targets → `get_lakebase_instance_name()` falls back to this → SDK resolves host + credentials |

Backend change needed: `get_lakebase_instance_name()` must check `settings.LAKEBASE_INSTANCE_NAME` as a fallback when the app resource lookup returns `None`.

## User Stories

1. As a developer, I want `databricks bundle deploy -t dev` to provision all required resources (SQL Warehouse, Volume) so that the app starts without missing resource errors.
2. As a developer, I want `databricks bundle deploy -t prod` to provision all resources including the LLM serving endpoint so that AI features work in production.
3. As a developer, I want to deploy without an LLM endpoint by setting `LLM_ENABLED=False` so that I can run the app in environments without a serving endpoint.
4. As a developer, I want the README deployment instructions to use correct variable names (`catalog_name`, `schema_name`) so that copy-pasting commands actually works.
5. As a developer, I want a single deploy command (`databricks bundle deploy -t <target>`) without needing a separate `databricks apps deploy` so that the deployment process is simple and predictable.
6. As a developer, I want the bundle to automatically build the frontend during deployment (via `npm run build` auto-detection) so that I don't need manual build steps.
7. As a developer, I want `app.yaml` to be the single source of truth for runtime configuration so that I only need to look in one place for command and env vars.
8. As a developer, I want environment-specific values (`DATABRICKS_CATALOG`, `PGSCHEMA`) to be set per-target in the bundle so that dev and prod deployments use different catalogs/schemas automatically.
9. As a developer, I want to run `hatch -e dev run deploy-and-run` with correct variable names so that the convenience script actually works.
10. As a developer, I want the local development Quick Start in the README to explain the static directory requirement and ideally auto-create it so that onboarding is smooth.
11. As a developer, I want a validation script that checks config consistency between `databricks.yaml`, `app.yaml`, and `manifest.yaml` so that drift is caught before deployment.
12. As an operator, I want the CONFIGURING.md Lakebase deploy section to reference `databricks bundle deploy` (not `databricks apps deploy`) so that documentation matches the actual workflow.
13. As a developer, I want dead configuration (`FRONTEND_STATIC_DIR`) removed from `app.yaml` so that the config is clean and doesn't mislead.
14. As a developer deploying via DAB, I want to set `LAKEBASE_INSTANCE_NAME` as an env var so that I don't need the opaque database ID that's only available after Lakebase instance creation.
15. As a Marketplace user installing Ontos, I want to select my existing Lakebase instance from the UI and have the app connect to it automatically without any manual configuration.
16. As a developer, I want `get_lakebase_instance_name()` to fall back to the `LAKEBASE_INSTANCE_NAME` env var when no database resource is attached to the app, so that both DAB and Marketplace deployment paths work.

## Implementation Decisions

### Config Architecture

- **Remove** the `app_config` variable and `config: ${var.app_config}` from `databricks.yaml`. The bundle should not define command or env vars — that is `app.yaml`'s job.
- **Keep** `app.yaml` entry point as `python backend/src/app.py` (uses the `if __name__` block which calls `uvicorn.run()` internally). This is the current working pattern.
- **Add** per-target `config.env` overrides in `databricks.yaml` targets for environment-varying values only: `DATABRICKS_CATALOG`, `DATABRICKS_SCHEMA`, `PGSCHEMA`.
- **Verify** DAB merge behavior: does target-level `config.env` merge with `app.yaml` env by name, or replace entirely? This determines whether overrides work or need the full env list.

### Resources in `databricks.yaml`

- **Required resources** (in base config): `sql-warehouse`, `volume`
- **Optional resource** (in base config with default): `serving-endpoint` with default name `databricks-meta-llama-3-3-70b-instruct`. Targets without LLM can override `LLM_ENABLED=False` in their config.env.
- **Database resource NOT declared in bundle** — DAB cannot reference Lakebase by name, only by opaque ID. Instead, pass `LAKEBASE_INSTANCE_NAME` as an env var in bundle targets.
- The `database` resource spec remains in `manifest.yaml` for Marketplace installs where users select the instance via UI.
- **New variables**: `serving_endpoint_name` (default: `databricks-meta-llama-3-3-70b-instruct`), `lakebase_instance_name` (no default — must be set per target).
- Resource names must match `manifest.yaml` names exactly: `sql-warehouse`, `serving-endpoint`, `volume`.

### Database Connectivity: Dual-Path Support

The backend must support two database connection paths:

1. **Marketplace path** (resource injection): `get_lakebase_instance_name()` reads the `database` resource's `instance_name` via `ws_client.apps.get(app_name)` → resolves host + credentials via SDK. This already works.

2. **DAB path** (env var fallback): When no `database` resource exists on the app, `get_lakebase_instance_name()` falls back to `settings.LAKEBASE_INSTANCE_NAME`. The env var is set per-target in `databricks.yaml`.

**Backend code change** in `src/backend/src/common/database.py`:
- Modify `get_lakebase_instance_name()` (line 67) to check `settings.LAKEBASE_INSTANCE_NAME` as fallback when app resource lookup returns `None`.
- This also requires `get_lakebase_instance_name()` to accept `settings` as a parameter (or access it globally).
- Additionally, `PGHOST` must be resolvable without the database resource. Currently `get_db_url()` requires `settings.PGHOST` — for the DAB path, the host must be derived from the instance name via SDK (e.g., `ws_client.database.endpoints.get()` or similar).

### `app.yaml` Cleanup

- **Remove** `FRONTEND_STATIC_DIR` (dead env var — backend hardcodes `Path(__file__).parent.parent / "static"`).
- **Add** `DATABRICKS_CATALOG` and `DATABRICKS_SCHEMA` with sensible defaults (will be overridden by bundle targets).
- **Add** `LAKEBASE_INSTANCE_NAME` with empty default (set per-target in bundle, or auto-resolved via database resource for Marketplace).
- **Keep** all other env vars as-is (PYTHONPATH, PGSCHEMA, APP_ADMIN_DEFAULT_GROUPS, etc.).

### Documentation Fixes

- **README**: Fix variable names in deploy command, remove `databricks apps deploy`, add target flags, add note about auto frontend build, improve Quick Start local dev instructions.
- **CONFIGURING.md**: Replace `databricks apps deploy <app-name>` with `databricks bundle deploy -t prod`, align app.yaml example, document the two database connectivity paths (Marketplace vs DAB).
- **pyproject.toml**: Fix deploy script variable names (`catalog` → `catalog_name`, `schema` → `schema_name`), fix app name (`app_ontos` → `ontos`), add target flag.

### Config Consistency Validation

- Add a script (e.g., `src/scripts/validate_config.py`) that:
  1. Parses `databricks.yaml`, `app.yaml`, and `manifest.yaml`
  2. Checks that every `valueFrom` reference in `app.yaml` has a matching resource name in both `databricks.yaml` and `manifest.yaml`
  3. Checks that `databricks.yaml` resource names match `manifest.yaml` resource spec names
  4. Optionally runs `databricks bundle validate` if CLI is available
- Can be run in CI or manually before deploy.

## Testing Decisions

Good tests for this work verify **external behavior** (does the config parse correctly, are resources consistent, does the fallback logic work) not implementation details (specific YAML formatting).

### What to test

1. **`get_lakebase_instance_name()` fallback logic**: Unit test that when app resource lookup returns `None`, the function falls back to `settings.LAKEBASE_INSTANCE_NAME`. Test both paths: resource found (Marketplace) and resource not found + env var set (DAB).

2. **Config consistency check** (`src/scripts/validate_config.py`): A Python script that parses all three YAML files and asserts:
   - Every `valueFrom` reference in `app.yaml` has a matching resource in `databricks.yaml` or `manifest.yaml`
   - Every resource in `databricks.yaml` has a matching spec in `manifest.yaml`
   - Resource names are consistent across all files

3. **Bundle validation**: Run `databricks bundle validate -t dev` and `databricks bundle validate -t prod` to check YAML syntax and variable resolution.

### Prior art

- The project has existing Python test infrastructure via `pytest` in `src/backend/src/tests/`
- The validation script follows the pattern of `src/scripts/build_static.sh` — a standalone utility script in the scripts directory

## Out of Scope

- **Lakebase provisioning automation**: The PRD does not cover auto-creating the Lakebase database instance. Users still need to create it manually per CONFIGURING.md.
- **CI/CD pipeline setup**: No GitHub Actions or CI pipeline changes. The validation script is meant to be run manually or integrated into CI separately.
- **Frontend build pipeline changes**: The `build_static.sh` and `vite.config.ts` are correct and unchanged.
- **app.yaml variable interpolation**: `app.yaml` does not support DAB variables. Environment-specific values must come from bundle target overrides.
- **Lobbying Databricks for name-based database references in DAB**: The root cause of the chicken-and-egg problem is a platform limitation. This PRD works around it.

## Further Notes

- The DAB `config.env` merge behavior (merge-by-name vs full-replace) is a critical unknown. If it replaces the entire env list, the per-target override approach won't work and we'll need to duplicate all env vars in the bundle config. This must be verified against Databricks documentation or by testing before implementation.
- The `manifest.yaml` and `databricks.yaml` serve complementary roles: manifest defines what resource *types* the app can consume (platform contract), while the bundle provisions *specific instances* of those resources. Both are needed.
- The serving-endpoint default name (`databricks-meta-llama-3-3-70b-instruct`) should be validated against what's actually available in the target workspace. Consider making it a required variable with no default to force explicit configuration.
- The Lakebase chicken-and-egg problem was reported by a DAB user. Their workaround was to skip the database resource entirely and use SDK-based auth. Our approach is similar but cleaner: keep the `database` spec in `manifest.yaml` for Marketplace, use `LAKEBASE_INSTANCE_NAME` env var as fallback for DAB.
- `PGHOST` resolution for the DAB path needs investigation: can the SDK resolve the Lakebase host from the instance name alone, or do we need additional config? The workflows already use `generate_database_credential()` with instance names, so this pattern is proven.

Deployment Mode	Database Config	How It Works
Marketplace install	User selects Lakebase instance in UI	Platform creates `database` resource with opaque ID → `get_lakebase_instance_name()` reads it via `ws_client.apps.get()` → works automatically
DAB deploy	Cannot declare `database` resource (opaque ID unknown)	Set `LAKEBASE_INSTANCE_NAME` env var in `app.yaml`/bundle targets → `get_lakebase_instance_name()` falls back to this → SDK resolves host + credentials

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: align Databricks Bundle deployment config, resources, and documentation #188

Problem Statement

Solution

Database Strategy: Two Deployment Paths

User Stories

Implementation Decisions

Config Architecture

Resources in `databricks.yaml`

Database Connectivity: Dual-Path Support

`app.yaml` Cleanup

Documentation Fixes

Config Consistency Validation

Testing Decisions

What to test

Prior art

Out of Scope

Further Notes

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fix: align Databricks Bundle deployment config, resources, and documentation #188

Description

Problem Statement

Solution

Database Strategy: Two Deployment Paths

User Stories

Implementation Decisions

Config Architecture

Resources in databricks.yaml

Database Connectivity: Dual-Path Support

app.yaml Cleanup

Documentation Fixes

Config Consistency Validation

Testing Decisions

What to test

Prior art

Out of Scope

Further Notes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Resources in `databricks.yaml`

`app.yaml` Cleanup