-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Enable fine-grained permissions in Lakekeeper catalog #224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Used to enable human admin user access.
It's easier to maintain and show what steps are required.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughThe pull request integrates OpenFGA as an authorization service, refactors Lakekeeper's bootstrap process to use OAuth2 sessions, updates Keycloak configuration paths, reorganises Ansible playbook tasks into modular files, and adds comprehensive database configurations across development and QA environments. The Lakekeeper bootstrap script is significantly refactored to replace token-based authentication with session-based OAuth2 workflows. Changes
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 12
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py (1)
71-89:⚠️ Potential issue | 🟠 MajorMissing error handling on bootstrap POST.
The
bootstrapmethod (line 80) callsself._auth_session.request("POST", ...)but never checks the response status. If bootstrapping fails server-side, the script will log "Server bootstrapped successfully." regardless.Proposed fix
LOGGER.info("Bootstrapping server.") - self._auth_session.request( + response = self._auth_session.request( "POST", self.management_url + "/bootstrap", json={ "accept-terms-of-use": True, "is-operator": True, }, ) + response.raise_for_status() LOGGER.info("Server bootstrapped successfully.")
🤖 Fix all issues with AI agents
In `@infra/ansible/group_vars/all/all.yml`:
- Around line 36-37: The new management_uri variable is defined but unused;
either remove it or wire it into the bootstrap/playbook and template that expect
the management endpoint. Fix option 1: remove management_uri from
group_vars/all/all.yml if not needed. Fix option 2: replace hardcoded local
management address in the bootstrap playbook and any tasks/templates that
reference the management endpoint with the management_uri variable (match how
catalog_uri is used), e.g. update infra/ansible/roles/trino/tasks/main.yml,
infra/ansible/roles/elt/templates/secrets/envvars.j2 and the bootstrap playbook
to consume management_uri instead of a hardcoded URL so the variable is actually
referenced. Ensure references use the exact variable name management_uri and
follow the same templating pattern as catalog_uri.
- Line 23: Documentation references to legacy Keycloak paths are stale: update
any mentions of "/authn" and "/iceberg" in the deployment docs to match the
deployed configuration using keycloak_base_path (/auth). Search the docs for the
strings "/authn" and "/iceberg" and replace them with "/auth" (or otherwise
align the text to reference keycloak_base_path) and ensure any examples or
generated keycloak_url values reflect the current keycloak_base_path setting.
In `@infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py`:
- Around line 91-93: The is_bootstrapped method currently assumes a successful
/info response; update it to handle non-2xx responses and missing keys by first
validating the HTTP response (e.g., check response.ok or call
response.raise_for_status()) and then safely reading the JSON with
response.json().get("bootstrapped", False); if parsing fails or the key is
absent, return False or raise a clear exception. Use the existing _auth_session,
management_url and the "/info" endpoint in this logic so callers of
is_bootstrapped receive a deterministic boolean or a clear error.
- Around line 57-69: In assign_grants, the HTTP GET and POST responses are not
checked for errors—call raise_for_status() on the GET response returned by
self._auth_session.get(...) before accessing response.json()["assignments"], and
call raise_for_status() on the POST response returned by
self._auth_session.post(...) (or otherwise check its status) so failures surface
consistently with other methods like rename_default_project/create_warehouse;
update the code around the response variable and the
self._auth_session.post(...) call to perform these checks and log or re-raise
errors as appropriate.
- Around line 123-131: The get_warehouse_id method should validate the GET
response before using response.json()["warehouses"]; update get_warehouse_id to
check response.ok (or status_code) and handle non-2xx by logging or raising,
wrap response.json() parsing in a try/except to catch JSONDecodeError, and
verify the parsed payload contains a "warehouses" list before iterating; use
self._auth_session, self.management_url and the "/warehouse" path to locate the
call, and return None (or propagate a clear error) when the response is invalid
or missing the expected structure.
- Around line 95-109: In rename_default_project, avoid calling response.json()
before verifying the response status: first call response.raise_for_status() on
the Response returned by self._auth_session.get(self.management_url +
"/project"), then (and only then) call response.json() for logging/inspection;
update uses of response.json() in the LOGGER.debug and the subsequent
project-name check to occur after the raise_for_status() call, and ensure the
POST to self.management_url + "/project/rename" still calls
response.raise_for_status() afterwards as currently implemented.
- Around line 310-322: Guard against a missing filename and missing
"permissions" key: first validate warehouse_json_file (ensure it's not
None/empty and exists) before calling open(warehouse_json_file) and fail with a
clear error via click or LOGGER if it's not provided; then replace
warehouse_json.pop("permissions") with warehouse_json.pop("permissions", {}) so
absence yields an empty dict, and keep the subsequent
server.create_warehouse(warehouse_json) and the grant loop (using oidc_user_id
and server.assign_grants) unchanged so no KeyError occurs when permissions are
absent.
- Around line 110-121: The provision_user method is defined but never invoked;
either remove it or call it from the bootstrap flow—locate the provision_user
function and the main() entrypoint and, if provisioning is intended, invoke
self.provision_user() (or provision_user() as appropriate) after the
authentication/session is established and before exiting, ensuring you handle
exceptions similarly to the existing try/except around requests; if provisioning
is not needed, delete the provision_user method to remove dead code.
In `@infra/ansible/roles/lakekeeper/tasks/bootstrap.yml`:
- Line 51: The restart_policy value is written as unquoted no which YAML parses
as boolean false; update the restart_policy assignment (restart_policy:) to pass
a string by quoting the value (e.g., restart_policy: "no" or 'no') so the
Ansible module receives the literal string instead of a boolean.
- Around line 53-56: The env block under the community.docker.docker_container
task is using shell-style "KEY=val" strings (and a bare "#" line) which is
invalid for the module; change the env parameter to a proper YAML mapping of
key: value pairs (e.g., UV_LINK_MODE: "copy", UV_PROJECT_ENVIRONMENT:
"/opt/uv-venv"), remove the stray "#" line, and ensure the mapping is indented
under the env key in the same task so the module receives a dict not a list of
strings.
In `@infra/ansible/roles/openfga/defaults/main.yml`:
- Line 10: The default variable openfga_database_options currently contains the
insecure value "?sslmode=disable"; remove that insecure default from the role
defaults (set it to empty/null) and instead add explicit overrides in the QA/dev
inventory group_vars to set a secure option (e.g., "?sslmode=require"), and
ensure any production inventory also provides an explicit secure override;
update usages of openfga_database_options (where it's referenced) to tolerate an
empty/default value if not set.
In `@infra/ansible/roles/openfga/tasks/main.yml`:
- Around line 39-40: The OPENFGA_CHECK_QUERY_CACHE_ENABLED env var is
incorrectly wired to openfga_check_iterator_cache_enabled; update the assignment
so OPENFGA_CHECK_QUERY_CACHE_ENABLED uses the openfga_check_query_cache_enabled
variable instead (keeping the existing templating/string conversion style),
leaving OPENFGA_CHECK_ITERATOR_CACHE_ENABLED unchanged; verify you reference
openfga_check_query_cache_enabled exactly and preserve surrounding formatting.
🧹 Nitpick comments (11)
infra/ansible/roles/openfga/tasks/main.yml (2)
45-46: Healthcheck port is hardcoded but a variable exists.The healthcheck uses
-addr=localhost:8081whileopenfga_grpc_portis defined and used forOPENFGA_GRPC_ADDRon line 35. If the port default changes, the healthcheck will break silently.Proposed fix
- test: ["CMD", "/usr/local/bin/grpc_health_probe", "-addr=localhost:8081"] + test: ["CMD", "/usr/local/bin/grpc_health_probe", "-addr=localhost:{{ openfga_grpc_port }}"]
15-15: Unquotednois parsed as booleanfalsein YAML.
restart_policy: noshould berestart_policy: "no"to ensure it's treated as the string"no"rather than booleanfalse. Ansible may tolerate this, but it's a known YAML gotcha.Proposed fix
- restart_policy: no + restart_policy: "no"infra/ansible/inventories/dev/group_vars/lakekeeper.yml (1)
2-6: Minor inconsistency:openfga_database_portis hardcoded in dev but vault-sourced in QA.In the dev inventory this is
5432(line 3), while the QA inventory uses{{ vault_db_port }}. This is likely intentional for local dev simplicity but worth noting — if the dev Postgres port ever changes, this will need a manual update.infra/ansible/roles/lakekeeper/defaults/main.yml (1)
3-5: Default log level of ERROR may be too restrictive.Setting the default to
ERRORsuppresses allWARN-level messages (e.g. deprecation notices, slow operations, configuration drift). ConsiderWARNas a safer default that still keeps logs quiet but surfaces actionable information. The bootstrap log level atINFOis fine.infra/ansible/roles/lakekeeper/files/bootstrap-warehouse.py (4)
133-157:idshadows the Python built-in.Line 139 uses
idas a variable name, which shadows the built-inid()function. While unlikely to cause a bug here, it's a code-smell that linters will flag.Proposed fix — rename to `warehouse_id`
def create_warehouse(self, warehouse_config: Dict[str, Any]) -> str: """Create a warehouse in the server with the config. If the bucket does not exist it is created. """ name = warehouse_config["warehouse-name"] - id = self.get_warehouse_id(name) - if id is not None: + warehouse_id = self.get_warehouse_id(name) + if warehouse_id is not None: LOGGER.info( f"Warehouse '{name}' already exists. Skipping warehouse creation." ) - return id + return warehouse_id
1-9: Dependency version range forauthlibis very broad.
authlib>=1.6,<=2includes a major version boundary (1.x → 2.x). If authlib 2.0 introduces breaking changes, this script could fail. Consider tightening the upper bound to<2.Proposed fix
-# "authlib>=1.6,<=2", +# "authlib>=1.6,<2",
220-235: Add explicitgrant_type="client_credentials"for clarity.Whilst Authlib's
OAuth2Session.fetch_access_token()automatically infersclient_credentialsas the grant type when neither an authorization code nor user credentials are provided, the Authlib documentation recommends explicitly setting this parameter to avoid ambiguity.Recommended explicit call
# Cache an access token - client.fetch_access_token() + client.fetch_access_token(grant_type="client_credentials") return client
293-298: Consider usingauto_refresh_tokenparameter instead of manually callingrefresh_token().Whilst
connectionis a documented public property ofKeycloakAdmin, manually callingkcadm.connection.refresh_token()is not the recommended pattern. The python-keycloak library providesauto_refresh_tokenparameter for this purpose—pass it when creating theKeycloakAdmininstance:Suggested approach
kcadm = KeycloakAdmin( server_url=keycloak_url.rstrip("/") + "/", username=keycloak_admin_credentials.left, password=keycloak_admin_credentials.right, auto_refresh_token=["get", "post", "put", "delete"], ) kcadm.change_current_realm(keycloak_user_realm)This enables automatic token refresh on HTTP calls, eliminating the need for manual refresh before switching realms.
infra/ansible/roles/lakekeeper/tasks/bootstrap.yml (2)
39-42: Credentials are exposed on the container command line.
--keycloak-admin-credentialsand--bootstrap-credentialsare passed as CLI arguments, making them visible indocker inspectand process listings. Since the container is ephemeral (cleanup: true,detach: false), the window is small, but consider passing these via environment variables instead for defence in depth.
16-29: Hardcoded permissions for Trino service account.
warehouse_permissionsis defined inline with only"service-account-trino": ["select"]. This is consistent with the PR objectives, but consider extracting this to a default variable (e.g. indefaults/main.yml) so it can be overridden per-environment without modifying the task file.infra/ansible/site.yml (1)
5-8:baseandtraefikroles in the traefik play are untagged, unlike other plays.Other plays tag their primary roles (e.g.
tags: [postgres],tags: [lakekeeper]). The traefik play omits tags for bothbaseandtraefik. This is a minor inconsistency — if the intent is to allow--tags traefikfor selective execution, you'd need a tag here too.
Summary
Introduce the OpenFGA as an authorization backend for Lakekeeper. This enables fine-grained permissions with the Lakekeeper warehouses. On bootstrapping Lakekeeper the following happens:
The Admin user must use the Lakekeeper UI or REST API to define other permissions. Further work will define more granular permissions in #135 and related issues.
Refs #211
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores