Fix quickstart doc with docker compose #1610

MonkeyCanCode · 2025-05-19T05:38:31Z

This PR fixes couple issues found while trying to go through quickstart guide:

Doc site/content/in-dev/unreleased/getting-started/quickstart.md has an invalid ref to getting-started/eclipselink/docker-compose-postgres.yml. This is updated to getting-started/assets/postgres/docker-compose-postgres.yml (this is correct in getting-started/eclipselink/README.md but not in the public page)
Due to now we have 3 docker compose and 1 in a diff path, running this command can cause failure as they 2 of them have binds from local to container but from diff paths (it will create empty dirs on the wrong path). To fix this, I introduced ASSETS_PATH to point to the absolute path of assets directory which is being used for binding.
Doc getting-started/eclipselink/docker-compose-bootstrap-db.yml has a hard-coded value of default username/password which is no longer valid after recent change in Use env var in spark container #1522
quickstart page (https://polaris.apache.org/in-dev/unreleased/getting-started/quickstart/#docker-image) ask user to build the image then run docker compose and asks user to do expose CLIENT_ID and CLIENT_SECRET. Due to change in item 3, that is no longer valid as default value are not longer available. This will cause failure as well. To fix this, I add .env file to contain the original default value. This will continue to allow user to set them to diff by remove env loading for docker compose.

pingtimeout

+1

This PR raises a question about how we structure the docker-compose commands. I understand the desire to reduce the amount of duplicated yaml across docker-compose files. However, for the sake of being beginner-friendly, I would rather have everything contained in a single docker-compose file, even if it slightly increases the maintenance effort (think upgrading the Postgres version). Wdyt?

getting-started/assets/.env

dimas-b · 2025-05-19T19:33:18Z

I would rather have everything contained in a single docker-compose file

I support that. I even suggested that in #1470 (comment) :)

adnanhemani · 2025-05-19T21:24:35Z

While there are some good call-outs in this PR of things we should change, there are also some regressions. Please do not merge this PR - give me until EOD to get comments in.

adnanhemani

@MonkeyCanCode - lots of great catches here. #1522 introduced a major new change in Quickstart where we are no longer relying on default credential values and must define them up front - and we did not change the website to fully reflect that. For allowing this change and not recognizing the regression to the documentation, I apologize.

We had discussed an ".env" file in this thread but ultimately decided against it: #1522 (comment). I still agree with the comments posted there. I think that if we move this section into the top of the Quickstart page and then remove the later instructions to export the same environment variables, it would be best. We should do the same for the Cloud Provider Deployment instructions and the READMEs in the Getting Started folder.

100% agreed on the ASSETS_PATH change and the correcting of the YAML file locations.

adnanhemani · 2025-05-20T01:35:25Z

getting-started/eclipselink/docker-compose.yml

@@ -82,7 +82,7 @@ services:
      --conf, "spark.sql.catalog.quickstart_catalog.type=rest",
      --conf, "spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog",
      --conf, "spark.sql.catalog.quickstart_catalog.uri=http://polaris:8181/api/catalog",
-      --conf, "spark.sql.catalog.quickstart_catalog.credential=${USER_CLIENT_ID}:${USER_CLIENT_SECRET}",
+      --conf, "spark.sql.catalog.quickstart_catalog.credential=${CLIENT_ID}:${CLIENT_SECRET}",


This is a regression, it should remain USER_CLIENT_ID/SECRET - these are user credentials rather than the admin credentials in CLIENT_ID/SECRET

I did noticed this one last night and assuming there is intensional. However, going through the quickstart guide, this is supposed to be the credential created at the end of the polaris CLI. But USER_CLIENT_ID and USER_CLIENT_SECRET is never defined in this context I believed. As an end-user tries to bring up setup via docker compose, there is no relation to the polaris CLI.

Now assuming we want to keep it this way, an end-user can't really define this one until polaris CLI returns the value back (which needed to bring up docker compose first, run polaris CLI for entities creation, then restart spark-sql to have this be effective). This restart workflow is really odd imo and we should replace it with other more automated workflow.

If this is causing regression, I can change it back. But I don't think we ask user to follow the restart workflow as it is really tedious imo.

Let me know what you think.

Take a look at this: https://polaris.apache.org/in-dev/unreleased/getting-started/using-polaris/#connecting-with-spark

We do ask users to refresh their Docker containers once they export the required variables.

You've correctly pointed out the chicken-and-the-egg problem we have here, but unfortunately this is how we have solved it so far - asking the user for one line ran is not completely unreasonable and we have no better known solution without breaking the Docker Compose file down even further :(

But there are couple problem with both existed route as well as the above, let me share my observations:

On terminal 1, we will run the following:

export ASSETS_PATH=$(pwd)/getting-started/assets/ docker compose --env-file getting-started/assets/getting-started.env \ -f getting-started/assets/postgres/docker-compose-postgres.yml \ -f getting-started/eclipselink/docker-compose-bootstrap-db.yml \ -f getting-started/eclipselink/docker-compose.yml up

Due to USER_CLIENT_ID and USER_CLIENT_SECRET are not yet defined at this stage, the spark-sql will fail due to fail auth (as both will resolve to empty string). So the container is in stop state.

Now on terminal 2, we do the needed work to bootstrap polaris entities and have a different principal created and run the following to set values for environment variables:

export USER_CLIENT_ID=xxxxx export USER_CLIENT_SECRET=xxxx

and try to start the failed containers here with current command in the doc:

docker compose -f getting-started/eclipselink/docker-compose.yml up -d

This will actually crashed the first setup as two docker compose command on diff terminal/session will create two diff projects and they are trying to bind to the same ports.

Now to workaround the problem, I added -p xxx to specify project so two commands from diff sessions will be dealing with same resources. However, this doesn't solve the problem of created container doesn't reload (which back to my original point where this is not really a regression as it never works before). To have containers reload the env, we will need to stop/remove the container then recreate it. By doing so, it will then try to load envs from the client's terminal.

adnanhemani · 2025-05-20T01:36:09Z

site/content/in-dev/unreleased/getting-started/quickstart.md

@@ -36,7 +36,8 @@ cd ~/polaris
  :polaris-quarkus-admin:assemble --rerun \
  -Dquarkus.container-image.tag=postgres-latest \
  -Dquarkus.container-image.build=true
-docker compose -f getting-started/eclipselink/docker-compose-postgres.yml -f getting-started/eclipselink/docker-compose-bootstrap-db.yml -f getting-started/eclipselink/docker-compose.yml up
+export ASSETS_PATH=$(pwd)/getting-started/assets/


Can we move this line to the top of the file with the exporting of the CLIENT_ID/SECRET, so that it only needs to be run once?

Not sure if I followed, this is the beginning of the quickstart page (https://polaris.apache.org/in-dev/unreleased/getting-started/quickstart/#docker-image). This is only needed for the context of docker compose.

The export of CLIENT_ID/CLIENT_SECRET is invalid I think as the current state (without the env) file, this won't even be able to start.

If I understand correctly, we should consider move export of CLIENT_ID/CLIENT_SECRET to this section (as the current docker compose file has no credential, so it will try to set empty string for root credential (as well as username, which is in-valid).

The export is only needed if user doesn't want to use env file (as env file will load the credential in the updated command). Let me know what you think.

Sorry, this comment is in conjunction with the suggestion from the overall review's comment. We should do the following:

Move the export of CLIENT_ID/CLIENT_SECRET to the top of the file - the Docker Compose files will be able to intake the environment variables set in bash. (i.e. keep one reference at the top of the Quickstart page and one at the top of each of the cloud deployment pages)

Remove all references to setting the CLIENT_ID/CLIENT_SECRET elsewhere.

Add the export ASSETS_PATH to these references to setting CLIENT_ID/CLIENT_SECRET

Does this make sense?

I think I got what you mean. I had made some changes to this doc for refactor. Please review.

adnanhemani · 2025-05-20T01:38:53Z

This PR raises a question about how we structure the docker-compose commands. I understand the desire to reduce the amount of duplicated yaml across docker-compose files. However, for the sake of being beginner-friendly, I would rather have everything contained in a single docker-compose file, even if it slightly increases the maintenance effort (think upgrading the Postgres version). Wdyt?

I disagree on this - if you read my reply to the comment that @dimas-b linked (#1470 (comment)), there's the reason why the Dockerfiles need to stay piecewise (tldr - it doesn't have to do with duplicated YAML but with the ability to reload services when required). If you have a better way to solve that issue, I'm open to it!

MonkeyCanCode · 2025-05-21T06:04:09Z

I would rather have everything contained in a single docker-compose file

I support that. I even suggested that in #1470 (comment) :)

If we all want to do it in a single docker compose file, I can take care this in the next PR. Both route works and there are ways around to make them work. More toward to a preference things (e.g. a single giant compose file may be hard to read and maybe we will have other service such as minio which we don't want to start out of box...but an end-user can decide if minio should be spin up by adding additional compose file to the command).

Fix quickstart doc with docker compose

c125b2b

MonkeyCanCode requested review from adutra, ashvina, dennishuo, dimas-b, eric-maynard, jackye1995, jbonofre, vvcephei, collado-mike and snazy as code owners May 19, 2025 05:38

github-project-automation bot added this to Basic Kanban Board May 19, 2025

MonkeyCanCode requested review from RussellSpitzer, takidau, flyrain, ebyhr, HonahX, singhpk234 and pingtimeout as code owners May 19, 2025 05:38

github-project-automation bot moved this to PRs In Progress in Basic Kanban Board May 19, 2025

pingtimeout previously approved these changes May 19, 2025

View reviewed changes

github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board May 19, 2025

dimas-b reviewed May 19, 2025

View reviewed changes

getting-started/assets/.env Outdated Show resolved Hide resolved

This was referenced May 19, 2025

fix(quickstart): fixed quickstart jdbc to include client id and secrets (#1613) #1614

Open

Quickstart getting-started/jdbc is missing Client Id and Client Secret #1613

Open

adnanhemani suggested changes May 20, 2025

View reviewed changes

github-project-automation bot moved this from Ready to merge to PRs In Progress in Basic Kanban Board May 20, 2025

Fix quickstart doc with docker compose

6a4be53

MonkeyCanCode dismissed pingtimeout’s stale review via 6a4be53 May 20, 2025 03:34

MonkeyCanCode added 4 commits May 19, 2025 22:35

Fix quickstart doc with docker compose

f776509

Merge branch 'main' into fix_quickstart_doc

aa8dc9f

Fix quickstart doc with docker compose

1c4a0b8

Fix quickstart doc with docker compose

37b35eb

MonkeyCanCode requested a review from ajantha-bhat as a code owner May 21, 2025 05:59

MonkeyCanCode requested review from adnanhemani, dimas-b and pingtimeout May 21, 2025 06:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix quickstart doc with docker compose #1610

Fix quickstart doc with docker compose #1610

MonkeyCanCode commented May 19, 2025 •

edited

Loading

pingtimeout left a comment

dimas-b commented May 19, 2025

adnanhemani commented May 19, 2025

adnanhemani left a comment

adnanhemani May 20, 2025

MonkeyCanCode May 20, 2025 •

edited

Loading

adnanhemani May 20, 2025

MonkeyCanCode May 21, 2025

adnanhemani May 20, 2025

MonkeyCanCode May 20, 2025

adnanhemani May 20, 2025 •

edited

Loading

MonkeyCanCode May 21, 2025

adnanhemani commented May 20, 2025

MonkeyCanCode commented May 21, 2025

Fix quickstart doc with docker compose #1610

Are you sure you want to change the base?

Fix quickstart doc with docker compose #1610

Conversation

MonkeyCanCode commented May 19, 2025 • edited Loading

pingtimeout left a comment

Choose a reason for hiding this comment

dimas-b commented May 19, 2025

adnanhemani commented May 19, 2025

adnanhemani left a comment

Choose a reason for hiding this comment

adnanhemani May 20, 2025

Choose a reason for hiding this comment

MonkeyCanCode May 20, 2025 • edited Loading

Choose a reason for hiding this comment

adnanhemani May 20, 2025

Choose a reason for hiding this comment

MonkeyCanCode May 21, 2025

Choose a reason for hiding this comment

adnanhemani May 20, 2025

Choose a reason for hiding this comment

MonkeyCanCode May 20, 2025

Choose a reason for hiding this comment

adnanhemani May 20, 2025 • edited Loading

Choose a reason for hiding this comment

MonkeyCanCode May 21, 2025

Choose a reason for hiding this comment

adnanhemani commented May 20, 2025

MonkeyCanCode commented May 21, 2025

MonkeyCanCode commented May 19, 2025 •

edited

Loading

MonkeyCanCode May 20, 2025 •

edited

Loading

adnanhemani May 20, 2025 •

edited

Loading