-
Notifications
You must be signed in to change notification settings - Fork 239
Fix quickstart doc with docker compose #1610
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This PR raises a question about how we structure the docker-compose commands. I understand the desire to reduce the amount of duplicated yaml across docker-compose files. However, for the sake of being beginner-friendly, I would rather have everything contained in a single docker-compose file, even if it slightly increases the maintenance effort (think upgrading the Postgres version). Wdyt?
I support that. I even suggested that in #1470 (comment) :) |
While there are some good call-outs in this PR of things we should change, there are also some regressions. Please do not merge this PR - give me until EOD to get comments in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MonkeyCanCode - lots of great catches here. #1522 introduced a major new change in Quickstart where we are no longer relying on default credential values and must define them up front - and we did not change the website to fully reflect that. For allowing this change and not recognizing the regression to the documentation, I apologize.
We had discussed an ".env" file in this thread but ultimately decided against it: #1522 (comment). I still agree with the comments posted there. I think that if we move this section into the top of the Quickstart page and then remove the later instructions to export the same environment variables, it would be best. We should do the same for the Cloud Provider Deployment instructions and the READMEs in the Getting Started folder.
100% agreed on the ASSETS_PATH change and the correcting of the YAML file locations.
@@ -82,7 +82,7 @@ services: | |||
--conf, "spark.sql.catalog.quickstart_catalog.type=rest", | |||
--conf, "spark.sql.catalog.quickstart_catalog.warehouse=quickstart_catalog", | |||
--conf, "spark.sql.catalog.quickstart_catalog.uri=http://polaris:8181/api/catalog", | |||
--conf, "spark.sql.catalog.quickstart_catalog.credential=${USER_CLIENT_ID}:${USER_CLIENT_SECRET}", | |||
--conf, "spark.sql.catalog.quickstart_catalog.credential=${CLIENT_ID}:${CLIENT_SECRET}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a regression, it should remain USER_CLIENT_ID/SECRET - these are user credentials rather than the admin credentials in CLIENT_ID/SECRET
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did noticed this one last night and assuming there is intensional. However, going through the quickstart guide, this is supposed to be the credential created at the end of the polaris CLI. But USER_CLIENT_ID
and USER_CLIENT_SECRET
is never defined in this context I believed. As an end-user tries to bring up setup via docker compose, there is no relation to the polaris CLI.
Now assuming we want to keep it this way, an end-user can't really define this one until polaris CLI returns the value back (which needed to bring up docker compose first, run polaris CLI for entities creation, then restart spark-sql to have this be effective). This restart workflow is really odd imo and we should replace it with other more automated workflow.
If this is causing regression, I can change it back. But I don't think we ask user to follow the restart workflow as it is really tedious imo.
Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at this: https://polaris.apache.org/in-dev/unreleased/getting-started/using-polaris/#connecting-with-spark
We do ask users to refresh their Docker containers once they export the required variables.
You've correctly pointed out the chicken-and-the-egg problem we have here, but unfortunately this is how we have solved it so far - asking the user for one line ran is not completely unreasonable and we have no better known solution without breaking the Docker Compose file down even further :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But there are couple problem with both existed route as well as the above, let me share my observations:
On terminal 1, we will run the following:
export ASSETS_PATH=$(pwd)/getting-started/assets/
docker compose --env-file getting-started/assets/getting-started.env \
-f getting-started/assets/postgres/docker-compose-postgres.yml \
-f getting-started/eclipselink/docker-compose-bootstrap-db.yml \
-f getting-started/eclipselink/docker-compose.yml up
Due to USER_CLIENT_ID
and USER_CLIENT_SECRET
are not yet defined at this stage, the spark-sql will fail due to fail auth (as both will resolve to empty string). So the container is in stop state.
Now on terminal 2, we do the needed work to bootstrap polaris entities and have a different principal created and run the following to set values for environment variables:
export USER_CLIENT_ID=xxxxx
export USER_CLIENT_SECRET=xxxx
and try to start the failed containers here with current command in the doc:
docker compose -f getting-started/eclipselink/docker-compose.yml up -d
This will actually crashed the first setup as two docker compose command on diff terminal/session will create two diff projects and they are trying to bind to the same ports.
Now to workaround the problem, I added -p xxx
to specify project so two commands from diff sessions will be dealing with same resources. However, this doesn't solve the problem of created container doesn't reload (which back to my original point where this is not really a regression as it never works before). To have containers reload the env, we will need to stop/remove the container then recreate it. By doing so, it will then try to load envs from the client's terminal.
@@ -36,7 +36,8 @@ cd ~/polaris | |||
:polaris-quarkus-admin:assemble --rerun \ | |||
-Dquarkus.container-image.tag=postgres-latest \ | |||
-Dquarkus.container-image.build=true | |||
docker compose -f getting-started/eclipselink/docker-compose-postgres.yml -f getting-started/eclipselink/docker-compose-bootstrap-db.yml -f getting-started/eclipselink/docker-compose.yml up | |||
export ASSETS_PATH=$(pwd)/getting-started/assets/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we move this line to the top of the file with the exporting of the CLIENT_ID/SECRET, so that it only needs to be run once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I followed, this is the beginning of the quickstart page (https://polaris.apache.org/in-dev/unreleased/getting-started/quickstart/#docker-image). This is only needed for the context of docker compose.
The export of CLIENT_ID/CLIENT_SECRET is invalid I think as the current state (without the env) file, this won't even be able to start.
If I understand correctly, we should consider move export of CLIENT_ID/CLIENT_SECRET to this section (as the current docker compose file has no credential, so it will try to set empty string for root credential (as well as username, which is in-valid).
The export is only needed if user doesn't want to use env file (as env file will load the credential in the updated command). Let me know what you think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this comment is in conjunction with the suggestion from the overall review's comment. We should do the following:
- Move the export of CLIENT_ID/CLIENT_SECRET to the top of the file - the Docker Compose files will be able to intake the environment variables set in bash. (i.e. keep one reference at the top of the Quickstart page and one at the top of each of the cloud deployment pages)
- Remove all references to setting the CLIENT_ID/CLIENT_SECRET elsewhere.
- Add the export ASSETS_PATH to these references to setting CLIENT_ID/CLIENT_SECRET
Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I got what you mean. I had made some changes to this doc for refactor. Please review.
I disagree on this - if you read my reply to the comment that @dimas-b linked (#1470 (comment)), there's the reason why the Dockerfiles need to stay piecewise (tldr - it doesn't have to do with duplicated YAML but with the ability to reload services when required). If you have a better way to solve that issue, I'm open to it! |
If we all want to do it in a single docker compose file, I can take care this in the next PR. Both route works and there are ways around to make them work. More toward to a preference things (e.g. a single giant compose file may be hard to read and maybe we will have other service such as minio which we don't want to start out of box...but an end-user can decide if minio should be spin up by adding additional compose file to the command). |
This PR fixes couple issues found while trying to go through quickstart guide:
site/content/in-dev/unreleased/getting-started/quickstart.md
has an invalid ref togetting-started/eclipselink/docker-compose-postgres.yml
. This is updated togetting-started/assets/postgres/docker-compose-postgres.yml
(this is correct ingetting-started/eclipselink/README.md
but not in the public page)ASSETS_PATH
to point to the absolute path of assets directory which is being used for binding.getting-started/eclipselink/docker-compose-bootstrap-db.yml
has a hard-coded value of default username/password which is no longer valid after recent change in Use env var in spark container #1522CLIENT_ID
andCLIENT_SECRET
. Due to change in item 3, that is no longer valid as default value are not longer available. This will cause failure as well. To fix this, I add.env
file to contain the original default value. This will continue to allow user to set them to diff by remove env loading for docker compose.