feat: Enable fine-grained permissions in Lakekeeper catalog #224

martyngigg · 2026-02-09T16:55:56Z

Summary

Introduce the OpenFGA as an authorization backend for Lakekeeper. This enables fine-grained permissions with the Lakekeeper warehouses. On bootstrapping Lakekeeper the following happens:

a Lakekeeper admin user, supplied as an argument to the ansible playbook, is defined
the trino service account is given read access to the warehouses.

The Admin user must use the Lakekeeper UI or REST API to define other permissions. Further work will define more granular permissions in #135 and related issues.

Refs #211

Summary by CodeRabbit

Release Notes

New Features
- Added authorization service integration with warehouse-level permission management for service accounts.
- Enhanced warehouse bootstrap workflow with improved credential handling.
Documentation
- Updated deployment guide to include initial administrator configuration step.
Chores
- Updated service versions (Lakekeeper to v0.11.1, OpenFGA to v1.11.2).
- Restructured deployment configuration for improved infrastructure management.

Used to enable human admin user access.

It's easier to maintain and show what steps are required.

martyngigg · 2026-02-10T08:28:46Z

@coderabbitai full review

coderabbitai · 2026-02-10T08:28:55Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-02-10T08:29:12Z

📝 Walkthrough

Walkthrough

The pull request integrates OpenFGA as an authorization service, refactors Lakekeeper's bootstrap process to use OAuth2 sessions, updates Keycloak configuration paths, reorganises Ansible playbook tasks into modular files, and adds comprehensive database configurations across development and QA environments. The Lakekeeper bootstrap script is significantly refactored to replace token-based authentication with session-based OAuth2 workflows.

Changes

Cohort / File(s)	Summary
Documentation & Deployment Instructions `docs-devel/deployment/index.md`	Added CLI variable for admin email in Ansible deployment; clarified first-time deployment requirement for the new variable.
Keycloak Configuration `infra/ansible/group_vars/keycloak.yml`, `infra/ansible/roles/keycloak/tasks/setup-realm.yml`	Swapped public Keycloak client from `lakekeeper-ui` to `lakekeeper-api`; added task to grant "view-users" role to machine service account.
OpenFGA Integration `infra/ansible/roles/openfga/defaults/main.yml`, `infra/ansible/roles/openfga/tasks/main.yml`, `infra/ansible/inventories/dev/group_vars/datastore.yml`	Introduced new OpenFGA role with Docker container management, migrations, and caching configuration; added `openfga` database to PostgreSQL provisioning list.
Keycloak Base Path & Catalog URI Updates `infra/ansible/group_vars/all/all.yml`, `infra/ansible/roles/elt/templates/secrets/envvars.j2`, `infra/ansible/roles/trino/tasks/main.yml`	Changed Keycloak base path from `/authn` to `/auth`; renamed `lakekeeper_catalog.uri` to `lakekeeper_catalog.catalog_uri` with new `management_uri` field.
Lakekeeper Configuration Variables `infra/ansible/inventories/dev/group_vars/lakekeeper.yml`, `infra/ansible/inventories/qa/group_vars/lakekeeper.yml`	Added OpenFGA database connection variables and Lakekeeper metadatabase encryption/bootstrap credentials sourced from Vault; expanded configuration for both environments.
Lakekeeper Bootstrap Script Refactoring `infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py`	Replaced direct token-based Keycloak integration with OAuth2Session for authentication; refactored CLI arguments and added helper functions for logging, OAuth2 configuration, and OIDC user ID derivation.
Lakekeeper Playbook Refactoring `infra/ansible/roles/lakekeeper/tasks/main.yml`, `infra/ansible/roles/lakekeeper/tasks/migrate.yml`, `infra/ansible/roles/lakekeeper/tasks/start.yml`, `infra/ansible/roles/lakekeeper/tasks/bootstrap.yml`	Reorganised Lakekeeper deployment into modular task files; introduced shared environment variable structure; automated warehouse bootstrap via Docker container with template-driven JSON configuration.
Lakekeeper Configuration & Defaults `infra/ansible/roles/lakekeeper/defaults/main.yml`, `infra/ansible/roles/lakekeeper/templates/bootstrap-warehouse.json.j2`	Updated Lakekeeper image to v0.11.1; added log level configuration; added permissions field to bootstrap warehouse template.
Database & Infrastructure `infra/ansible/roles/postgres/tasks/main.yml`	Tightened PostgreSQL data directory permissions; added ownership specifications for secrets directory.
Ansible Playbook Structure `infra/ansible/site.yml`	Converted role declarations from nested mappings to scalar entries; added explicit tags to multiple roles for selective execution.
Local Development `infra/local/docker-compose.yml`, `infra/local/keycloak/bootstrap.sh`, `infra/local/warehouses/lakekeeper/bootstrap-dev_isis_*.json`	Updated OpenFGA image to v1.11.2; refactored bootstrap script CLI arguments; simplified Keycloak user creation; added permissions for `service-account-trino` in warehouse bootstrap configurations.

Poem

🐰 Hopping through configs with OAuth's grace,
OpenFGA now guards the Lakekeeper space,
Modular tasks make deployments neat,
From tokens to sessions—the refactor's complete!
Permissions and realms in harmony dance,
Rabbit bids new architecture a chance. ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.29% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarises the main objective of the pull request: enabling fine-grained permissions in the Lakekeeper catalog through OpenFGA integration.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 12

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py (1)
71-89: ⚠️ Potential issue | 🟠 Major

Missing error handling on bootstrap POST.

The bootstrap method (line 80) calls self._auth_session.request("POST", ...) but never checks the response status. If bootstrapping fails server-side, the script will log "Server bootstrapped successfully." regardless.
Proposed fix
         LOGGER.info("Bootstrapping server.")
-        self._auth_session.request(
+        response = self._auth_session.request(
             "POST",
             self.management_url + "/bootstrap",
             json={
                 "accept-terms-of-use": True,
                 "is-operator": True,
             },
         )
+        response.raise_for_status()
 
         LOGGER.info("Server bootstrapped successfully.")

🤖 Fix all issues with AI agents

In `@infra/ansible/group_vars/all/all.yml`:
- Around line 36-37: The new management_uri variable is defined but unused;
either remove it or wire it into the bootstrap/playbook and template that expect
the management endpoint. Fix option 1: remove management_uri from
group_vars/all/all.yml if not needed. Fix option 2: replace hardcoded local
management address in the bootstrap playbook and any tasks/templates that
reference the management endpoint with the management_uri variable (match how
catalog_uri is used), e.g. update infra/ansible/roles/trino/tasks/main.yml,
infra/ansible/roles/elt/templates/secrets/envvars.j2 and the bootstrap playbook
to consume management_uri instead of a hardcoded URL so the variable is actually
referenced. Ensure references use the exact variable name management_uri and
follow the same templating pattern as catalog_uri.
- Line 23: Documentation references to legacy Keycloak paths are stale: update
any mentions of "/authn" and "/iceberg" in the deployment docs to match the
deployed configuration using keycloak_base_path (/auth). Search the docs for the
strings "/authn" and "/iceberg" and replace them with "/auth" (or otherwise
align the text to reference keycloak_base_path) and ensure any examples or
generated keycloak_url values reflect the current keycloak_base_path setting.

In `@infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py`:
- Around line 91-93: The is_bootstrapped method currently assumes a successful
/info response; update it to handle non-2xx responses and missing keys by first
validating the HTTP response (e.g., check response.ok or call
response.raise_for_status()) and then safely reading the JSON with
response.json().get("bootstrapped", False); if parsing fails or the key is
absent, return False or raise a clear exception. Use the existing _auth_session,
management_url and the "/info" endpoint in this logic so callers of
is_bootstrapped receive a deterministic boolean or a clear error.
- Around line 57-69: In assign_grants, the HTTP GET and POST responses are not
checked for errors—call raise_for_status() on the GET response returned by
self._auth_session.get(...) before accessing response.json()["assignments"], and
call raise_for_status() on the POST response returned by
self._auth_session.post(...) (or otherwise check its status) so failures surface
consistently with other methods like rename_default_project/create_warehouse;
update the code around the response variable and the
self._auth_session.post(...) call to perform these checks and log or re-raise
errors as appropriate.
- Around line 123-131: The get_warehouse_id method should validate the GET
response before using response.json()["warehouses"]; update get_warehouse_id to
check response.ok (or status_code) and handle non-2xx by logging or raising,
wrap response.json() parsing in a try/except to catch JSONDecodeError, and
verify the parsed payload contains a "warehouses" list before iterating; use
self._auth_session, self.management_url and the "/warehouse" path to locate the
call, and return None (or propagate a clear error) when the response is invalid
or missing the expected structure.
- Around line 95-109: In rename_default_project, avoid calling response.json()
before verifying the response status: first call response.raise_for_status() on
the Response returned by self._auth_session.get(self.management_url +
"/project"), then (and only then) call response.json() for logging/inspection;
update uses of response.json() in the LOGGER.debug and the subsequent
project-name check to occur after the raise_for_status() call, and ensure the
POST to self.management_url + "/project/rename" still calls
response.raise_for_status() afterwards as currently implemented.
- Around line 310-322: Guard against a missing filename and missing
"permissions" key: first validate warehouse_json_file (ensure it's not
None/empty and exists) before calling open(warehouse_json_file) and fail with a
clear error via click or LOGGER if it's not provided; then replace
warehouse_json.pop("permissions") with warehouse_json.pop("permissions", {}) so
absence yields an empty dict, and keep the subsequent
server.create_warehouse(warehouse_json) and the grant loop (using oidc_user_id
and server.assign_grants) unchanged so no KeyError occurs when permissions are
absent.
- Around line 110-121: The provision_user method is defined but never invoked;
either remove it or call it from the bootstrap flow—locate the provision_user
function and the main() entrypoint and, if provisioning is intended, invoke
self.provision_user() (or provision_user() as appropriate) after the
authentication/session is established and before exiting, ensuring you handle
exceptions similarly to the existing try/except around requests; if provisioning
is not needed, delete the provision_user method to remove dead code.

In `@infra/ansible/roles/lakekeeper/tasks/bootstrap.yml`:
- Line 51: The restart_policy value is written as unquoted no which YAML parses
as boolean false; update the restart_policy assignment (restart_policy:) to pass
a string by quoting the value (e.g., restart_policy: "no" or 'no') so the
Ansible module receives the literal string instead of a boolean.
- Around line 53-56: The env block under the community.docker.docker_container
task is using shell-style "KEY=val" strings (and a bare "#" line) which is
invalid for the module; change the env parameter to a proper YAML mapping of
key: value pairs (e.g., UV_LINK_MODE: "copy", UV_PROJECT_ENVIRONMENT:
"/opt/uv-venv"), remove the stray "#" line, and ensure the mapping is indented
under the env key in the same task so the module receives a dict not a list of
strings.

In `@infra/ansible/roles/openfga/defaults/main.yml`:
- Line 10: The default variable openfga_database_options currently contains the
insecure value "?sslmode=disable"; remove that insecure default from the role
defaults (set it to empty/null) and instead add explicit overrides in the QA/dev
inventory group_vars to set a secure option (e.g., "?sslmode=require"), and
ensure any production inventory also provides an explicit secure override;
update usages of openfga_database_options (where it's referenced) to tolerate an
empty/default value if not set.

In `@infra/ansible/roles/openfga/tasks/main.yml`:
- Around line 39-40: The OPENFGA_CHECK_QUERY_CACHE_ENABLED env var is
incorrectly wired to openfga_check_iterator_cache_enabled; update the assignment
so OPENFGA_CHECK_QUERY_CACHE_ENABLED uses the openfga_check_query_cache_enabled
variable instead (keeping the existing templating/string conversion style),
leaving OPENFGA_CHECK_ITERATOR_CACHE_ENABLED unchanged; verify you reference
openfga_check_query_cache_enabled exactly and preserve surrounding formatting.

🧹 Nitpick comments (11)

infra/ansible/roles/openfga/tasks/main.yml (2)
45-46: Healthcheck port is hardcoded but a variable exists.

The healthcheck uses -addr=localhost:8081 while openfga_grpc_port is defined and used for OPENFGA_GRPC_ADDR on line 35. If the port default changes, the healthcheck will break silently.
Proposed fix
-      test: ["CMD", "/usr/local/bin/grpc_health_probe", "-addr=localhost:8081"]
+      test: ["CMD", "/usr/local/bin/grpc_health_probe", "-addr=localhost:{{ openfga_grpc_port }}"]
15-15: Unquoted no is parsed as boolean false in YAML.

restart_policy: no should be restart_policy: "no" to ensure it's treated as the string "no" rather than boolean false. Ansible may tolerate this, but it's a known YAML gotcha.
Proposed fix
-    restart_policy: no
+    restart_policy: "no"
infra/ansible/inventories/dev/group_vars/lakekeeper.yml (1)

2-6: Minor inconsistency: openfga_database_port is hardcoded in dev but vault-sourced in QA.

In the dev inventory this is 5432 (line 3), while the QA inventory uses {{ vault_db_port }}. This is likely intentional for local dev simplicity but worth noting — if the dev Postgres port ever changes, this will need a manual update.

infra/ansible/roles/lakekeeper/defaults/main.yml (1)

3-5: Default log level of ERROR may be too restrictive.

Setting the default to ERROR suppresses all WARN-level messages (e.g. deprecation notices, slow operations, configuration drift). Consider WARN as a safer default that still keeps logs quiet but surfaces actionable information. The bootstrap log level at INFO is fine.
infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py (4)
133-157: id shadows the Python built-in.

Line 139 uses id as a variable name, which shadows the built-in id() function. While unlikely to cause a bug here, it's a code-smell that linters will flag.
Proposed fix — rename to `warehouse_id`
     def create_warehouse(self, warehouse_config: Dict[str, Any]) -> str:
         """Create a warehouse in the server with the config.
 
         If the bucket does not exist it is created.
         """
         name = warehouse_config["warehouse-name"]
-        id = self.get_warehouse_id(name)
-        if id is not None:
+        warehouse_id = self.get_warehouse_id(name)
+        if warehouse_id is not None:
             LOGGER.info(
                 f"Warehouse '{name}' already exists. Skipping warehouse creation."
             )
-            return id
+            return warehouse_id
1-9: Dependency version range for authlib is very broad.

authlib>=1.6,<=2 includes a major version boundary (1.x → 2.x). If authlib 2.0 introduces breaking changes, this script could fail. Consider tightening the upper bound to <2.
Proposed fix
-#     "authlib>=1.6,<=2",
+#     "authlib>=1.6,<2",
220-235: Add explicit grant_type="client_credentials" for clarity.

Whilst Authlib's OAuth2Session.fetch_access_token() automatically infers client_credentials as the grant type when neither an authorization code nor user credentials are provided, the Authlib documentation recommends explicitly setting this parameter to avoid ambiguity.
Recommended explicit call
     # Cache an access token
-    client.fetch_access_token()
+    client.fetch_access_token(grant_type="client_credentials")
     return client
293-298: Consider using auto_refresh_token parameter instead of manually calling refresh_token().

Whilst connection is a documented public property of KeycloakAdmin, manually calling kcadm.connection.refresh_token() is not the recommended pattern. The python-keycloak library provides auto_refresh_token parameter for this purpose—pass it when creating the KeycloakAdmin instance:
Suggested approach
kcadm = KeycloakAdmin(
    server_url=keycloak_url.rstrip("/") + "/",
    username=keycloak_admin_credentials.left,
    password=keycloak_admin_credentials.right,
    auto_refresh_token=["get", "post", "put", "delete"],
)
kcadm.change_current_realm(keycloak_user_realm)
This enables automatic token refresh on HTTP calls, eliminating the need for manual refresh before switching realms.
infra/ansible/roles/lakekeeper/tasks/bootstrap.yml (2)

39-42: Credentials are exposed on the container command line.

--keycloak-admin-credentials and --bootstrap-credentials are passed as CLI arguments, making them visible in docker inspect and process listings. Since the container is ephemeral (cleanup: true, detach: false), the window is small, but consider passing these via environment variables instead for defence in depth.

16-29: Hardcoded permissions for Trino service account.

warehouse_permissions is defined inline with only "service-account-trino": ["select"]. This is consistent with the PR objectives, but consider extracting this to a default variable (e.g. in defaults/main.yml) so it can be overridden per-environment without modifying the task file.

infra/ansible/site.yml (1)

5-8: base and traefik roles in the traefik play are untagged, unlike other plays.

Other plays tag their primary roles (e.g. tags: [postgres], tags: [lakekeeper]). The traefik play omits tags for both base and traefik. This is a minor inconsistency — if the intent is to allow --tags traefik for selective execution, you'd need a tag here too.

infra/ansible/group_vars/all/all.yml

infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py

infra/ansible/roles/lakekeeper/tasks/bootstrap.yml

infra/ansible/roles/openfga/defaults/main.yml

infra/ansible/roles/openfga/tasks/main.yml

martyngigg added 11 commits February 5, 2026 17:04

Fix postgres file permissions

80e5174

Deploy OpenFGA and enable it in Lakekeeper

fdc11fc

Split up lakekeeper tasks for clarify

e8c2bd0

Clarify Lakekeeper Keycloak client_id

e48f9bf

Set Lakekeeper server admin if supplied.

aad3b2a

Used to enable human admin user access.

Put Keycloak on /auth

96a4c78

Fix realm token caching

4b7ece4

Add tags to playbook

09ec221

Lakekeeper ERROR log level by default

64930ca

Create Lakekeeper warehouses.

630a6b4

Switch back the Lakekeeper bootrap Python script

3cfa432

It's easier to maintain and show what steps are required.

martyngigg changed the title ~~feat: Lakekeeper openfga~~ feat: Enable fine-grained permissions in Lakekeeper catalog Feb 9, 2026

coderabbitai bot reviewed Feb 10, 2026

View reviewed changes

martyngigg added 6 commits February 10, 2026 08:40

Fix docs for Keycloak path

2f65e86

Remove unused ansible variable

bd93c96

Add error handling to Lakekeeper grant assignment

36e0c66

Further review fixes

f6c1bb9

Further review fixes

836c191

Upgrade Lakekeeper

0d4fd54

martyngigg merged commit 396b2ea into main Feb 10, 2026
2 checks passed

martyngigg deleted the lakekeeper-openfga branch February 10, 2026 09:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Enable fine-grained permissions in Lakekeeper catalog #224

feat: Enable fine-grained permissions in Lakekeeper catalog #224

Uh oh!

martyngigg commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

martyngigg commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026 •

edited

Loading

Walkthrough

Changes

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

feat: Enable fine-grained permissions in Lakekeeper catalog #224

feat: Enable fine-grained permissions in Lakekeeper catalog #224

Uh oh!

Conversation

martyngigg commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Summary by CodeRabbit

Release Notes

Uh oh!

martyngigg commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026

Uh oh!

coderabbitai bot commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

martyngigg commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 10, 2026 •

edited

Loading