Skip to content

Fix quickstart doc with docker compose #1610

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions getting-started/assets/getting-started.env
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
CLIENT_ID=root
CLIENT_SECRET=s3cr3t
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ services:
volumes:
# Bind local conf file to a convenient location in the container
- type: bind
source: ../assets/postgres/postgresql.conf
source: ${ASSETS_PATH}/postgres/postgresql.conf
target: /etc/postgresql/postgresql.conf
command:
- "postgres"
Expand Down
6 changes: 5 additions & 1 deletion getting-started/eclipselink/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,11 @@ This example requires `jq` to be installed on your machine.
2. Start the docker compose group by running the following command from the root of the repository:

```shell
docker compose -f getting-started/eclipselink/docker-compose-bootstrap-db.yml -f getting-started/assets/postgres/docker-compose-postgres.yml -f getting-started/eclipselink/docker-compose.yml up
export ASSETS_PATH=$(pwd)/getting-started/assets/
docker compose --env-file getting-started/assets/getting-started.env \
-f getting-started/assets/postgres/docker-compose-postgres.yml \
-f getting-started/eclipselink/docker-compose-bootstrap-db.yml \
-f getting-started/eclipselink/docker-compose.yml up
```

3. Using spark-sql: attach to the running spark-sql container:
Expand Down
4 changes: 2 additions & 2 deletions getting-started/eclipselink/docker-compose-bootstrap-db.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,11 @@ services:
polaris.persistence.type: eclipse-link
polaris.persistence.eclipselink.configuration-file: /deployments/config/eclipselink/persistence.xml
volumes:
- ../assets/eclipselink/:/deployments/config/eclipselink
- ${ASSETS_PATH}/eclipselink/:/deployments/config/eclipselink
command:
- "bootstrap"
- "--realm=POLARIS"
- "--credential=POLARIS,root,s3cr3t"
- "--credential=POLARIS,${CLIENT_ID},${CLIENT_SECRET}"
polaris:
depends_on:
polaris-bootstrap:
Expand Down
10 changes: 5 additions & 5 deletions getting-started/eclipselink/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ services:
quarkus.otel.sdk.disabled: "true"
POLARIS_BOOTSTRAP_CREDENTIALS: POLARIS,${CLIENT_ID},${CLIENT_SECRET}
volumes:
- ../assets/eclipselink/:/deployments/config/eclipselink
- ${ASSETS_PATH}/eclipselink/:/deployments/config/eclipselink
healthcheck:
test: ["CMD", "curl", "http://localhost:8182/q/health"]
interval: 2s
Expand All @@ -58,7 +58,7 @@ services:
- CLIENT_ID=${CLIENT_ID}
- CLIENT_SECRET=${CLIENT_SECRET}
volumes:
- ../assets/polaris/:/polaris
- ${ASSETS_PATH}/polaris/:/polaris
entrypoint: '/bin/sh -c "chmod +x /polaris/create-catalog.sh && /polaris/create-catalog.sh"'

spark-sql:
Expand Down Expand Up @@ -95,11 +95,11 @@ services:
polaris-setup:
condition: service_completed_successfully
environment:
- CLIENT_ID=${CLIENT_ID}
- CLIENT_SECRET=${CLIENT_SECRET}
- CLIENT_ID=${USER_CLIENT_ID}
- CLIENT_SECRET=${USER_CLIENT_SECRET}
stdin_open: true
tty: true
ports:
- "8080:8080"
volumes:
- ../assets/trino-config/catalog:/etc/trino/catalog
- ${ASSETS_PATH}/trino-config/catalog:/etc/trino/catalog
57 changes: 37 additions & 20 deletions site/content/in-dev/unreleased/getting-started/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,10 +24,10 @@ weight: 200

Polaris can be deployed via a docker image or as a standalone process. Before starting, be sure that you've satisfied the relevant prerequisites detailed in the previous page.

## Docker Image

To start using Polaris in Docker, build and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino.
## Common Setup
Before running Polaris, ensure you have completed the following setup steps:

1. **Build Polaris**
```shell
cd ~/polaris
./gradlew \
Expand All @@ -36,9 +36,38 @@ cd ~/polaris
:polaris-quarkus-admin:assemble --rerun \
-Dquarkus.container-image.tag=postgres-latest \
-Dquarkus.container-image.build=true
docker compose -f getting-started/eclipselink/docker-compose-postgres.yml -f getting-started/eclipselink/docker-compose-bootstrap-db.yml -f getting-started/eclipselink/docker-compose.yml up
```
- **For standalone**: Omit the `-Dquarkus.container-image.tag` and `-Dquarkus.container-image.build` options if you do not need to build a Docker image.

2. **Set the Assets Path**
```shell
export ASSETS_PATH=$(pwd)/getting-started/assets/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this line to the top of the file with the exporting of the CLIENT_ID/SECRET, so that it only needs to be run once?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I followed, this is the beginning of the quickstart page (https://polaris.apache.org/in-dev/unreleased/getting-started/quickstart/#docker-image). This is only needed for the context of docker compose.

The export of CLIENT_ID/CLIENT_SECRET is invalid I think as the current state (without the env) file, this won't even be able to start.

If I understand correctly, we should consider move export of CLIENT_ID/CLIENT_SECRET to this section (as the current docker compose file has no credential, so it will try to set empty string for root credential (as well as username, which is in-valid).

The export is only needed if user doesn't want to use env file (as env file will load the credential in the updated command). Let me know what you think.

Copy link
Collaborator

@adnanhemani adnanhemani May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this comment is in conjunction with the suggestion from the overall review's comment. We should do the following:

  1. Move the export of CLIENT_ID/CLIENT_SECRET to the top of the file - the Docker Compose files will be able to intake the environment variables set in bash. (i.e. keep one reference at the top of the Quickstart page and one at the top of each of the cloud deployment pages)
  2. Remove all references to setting the CLIENT_ID/CLIENT_SECRET elsewhere.
  3. Add the export ASSETS_PATH to these references to setting CLIENT_ID/CLIENT_SECRET

Does this make sense?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got what you mean. I had made some changes to this doc for refactor. Please review.

```

3. **Set Authentication Credentials**

Polaris supports multiple authentication methods, including the use of `CLIENT_ID` and `CLIENT_SECRET` environment variables. If you choose to use these credentials, you can set them as follows:

```shell
export CLIENT_ID=root
export CLIENT_SECRET=s3cr3t
```
- **For Docker**: These variables are configured in the `getting-started.env` file. To use custom values, export them as shown above and remove the `--env-file` option from the `docker compose` command.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is introducing additional mental complexity - why don't we just require everyone to export these variables and then remove the .env file itself?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, we need to do the same thing to the Cloud Providers pages - should be a simple copy-paste change to all 3 pages.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think env file is helping. Currently we have auth hard-coded in couple place such as auth info for Postgres. Later on if we want to integrate with additional services (e.g. minio/keycloak), we will need to do the same with current route. Introducing a env file is pretty helpful for having a centralized location for config/auth info imo and keep the docker compose file clean. It is not a must for this PR, but I do think it will avoid many of those issues in the long run.

- **For Standalone**: These variables are used for interacting with the Polaris CLI or other tools.

## Running Polaris with Docker

To start using Polaris in Docker and launch Polaris, which is packaged with a Postgres instance, Apache Spark, and Trino.

```shell
docker compose -p polaris --env-file getting-started/assets/getting-started.env \
-f getting-started/assets/postgres/docker-compose-postgres.yml \
-f getting-started/eclipselink/docker-compose-bootstrap-db.yml \
-f getting-started/eclipselink/docker-compose.yml up
```

By default, this command uses the `getting-started.env` file to configure environment variables, including `CLIENT_ID` and `CLIENT_SECRET`. If you want to use custom authentication credentials, refer to the [Common Setup](#common-setup) section.

You should see output for some time as Polaris, Spark, and Trino build and start up. Eventually, you won’t see any more logs and see some logs relating to Spark, resembling the following:

```
Expand All @@ -48,24 +77,17 @@ spark-sql-1 | 25/04/04 05:39:38 WARN SparkSQLCLIDriver: WARNING: Direct
spark-sql-1 | 25/04/04 05:39:39 WARN RESTSessionCatalog: Iceberg REST client is missing the OAuth2 server URI configuration and defaults to http://polaris:8181/api/catalogv1/oauth/tokens. This automatic fallback will be removed in a future Iceberg release.It is recommended to configure the OAuth2 endpoint using the 'oauth2-server-uri' property to be prepared. This warning will disappear if the OAuth2 endpoint is explicitly configured. See https://github.com/apache/iceberg/issues/10537
```

Finally, set the following static credentials for interacting with the Polaris server in the following exercises:

```shell
export CLIENT_ID=root
export CLIENT_SECRET=s3cr3t
```

The Docker image pre-configures a sample catalog called `quickstart_catalog` that uses a local file system.

## Running Polaris as a Standalone Process

You can also start Polaris through Gradle (packaged within the Polaris repository):

1. **Start the Server**

Run the following command to start Polaris:

```shell
cd ~/polaris
# Build the server
./gradlew clean :polaris-quarkus-server:assemble :polaris-quarkus-server:quarkusAppPartsBuild --rerun
# Start the server
./gradlew run
```

Expand All @@ -83,11 +105,6 @@ When using a Gradle-launched Polaris instance in this tutorial, we'll launch an
For more information on how to configure Polaris for production usage, see the [docs]({{% relref "../configuring-polaris-for-production" %}}).

When Polaris is run using the `./gradlew run` command, the root principal credentials are `root` and `secret` for the `CLIENT_ID` and `CLIENT_SECRET`, respectively.
You can also set these credentials as environment variables for use with the Polaris CLI:
```shell
export CLIENT_ID=root
export CLIENT_SECRET=secret
```

### Installing Apache Spark and Trino Locally for Testing

Expand Down
16 changes: 11 additions & 5 deletions site/content/in-dev/unreleased/getting-started/using-polaris.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,16 @@ Title: Using Polaris
type: docs
weight: 400
---

## Setup

Define your `CLIENT_ID` & `CLIENT_SECRET` and export them for future use.

```shell
export CLIENT_ID=YOUR_CLIENT_ID
export CLIENT_SECRET=YOUR_CLIENT_SECRET
```

## Defining a Catalog

In Polaris, the [catalog]({{% relref "../entities#catalog" %}}) is the top-level entity that objects like [tables]({{% relref "../entities#table" %}}) and [views]({{% relref "../entities#view" %}}) are organized under. With a Polaris service running, you can create a catalog like so:
Expand Down Expand Up @@ -167,7 +171,6 @@ bin/spark-sql \
--conf spark.sql.catalog.quickstart_catalog.client.region=us-west-2
```


Similar to the CLI commands above, this configures Spark to use the Polaris running at `localhost:8181`. If your Polaris server is running elsewhere, but sure to update the configuration appropriately.

Finally, note that we include the `iceberg-aws-bundle` package here. If your table is using a different filesystem, be sure to include the appropriate dependency.
Expand All @@ -176,7 +179,9 @@ Finally, note that we include the `iceberg-aws-bundle` package here. If your tab

Refresh the Docker container with the user's credentials:
```shell
docker compose -f getting-started/eclipselink/docker-compose.yml up -d
docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop spark-sql
docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f spark-sql
docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps spark-sql
```

Attach to the running spark-sql container:
Expand Down Expand Up @@ -237,14 +242,15 @@ org.apache.iceberg.exceptions.ForbiddenException: Forbidden: Principal 'quicksta
Refresh the Docker container with the user's credentials:

```shell
docker compose -f getting-started/eclipselink/docker-compose.yml down trino
docker compose -f getting-started/eclipselink/docker-compose.yml up -d
docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml stop trino
docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml rm -f trino
docker compose -p polaris -f getting-started/eclipselink/docker-compose.yml up -d --no-deps trino
```

Attach to the running Trino container:

```shell
docker exec -it eclipselink-trino-1 trino
docker exec -it $(docker ps -q --filter name=trino) trino
```

You may not see Trino's prompt immediately, type ENTER to see it. A few commands that you can try:
Expand Down