-
Notifications
You must be signed in to change notification settings - Fork 252
chronograf can crash when using Docker bind mounts #781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
Paraphraser
wants to merge
1
commit into
influxdata:master
Choose a base branch
from
Paraphraser:20250121-entrypoint-master
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
chronograf can crash when using Docker bind mounts #781
Paraphraser
wants to merge
1
commit into
influxdata:master
from
Paraphraser:20250121-entrypoint-master
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Assume a Docker "bind mount" is used to map Chronograf's persistent store. Examples: * a `docker run` command: ``` $ docker run -v ./chronograf:/var/lib/chronograf chronograf ``` * these lines in a `docker compose` service definition: ``` volumes: - ./chronograf:/var/lib/chronograf ``` Prior to starting the container, Docker tries to ensure that the *external* path to the persistent store exists via the equivalent of: ``` $ sudo mkdir -p ./chronograf ``` The practical result is that any path component that didn't exist beforehand is created and owned by root. Make two assumptions (typical "first launch" conditions): 1. That `./chronograf` did not exist so Docker has just created the `chronograf` folder with root ownership; and 2. That Docker launches the container as root (the default). In the absence of passing `CHRONOGRAF_AS_ROOT`, the first-time user is then in the situation where: 1. the persistent store is owned by root; 2. the container launches as root but downgrades its privileges to user `chronograf` (userID 999); 3. the executable is then unable to write into its persistent store. It crashes with the error message: ``` time="«timestamp»" level=error msg="Unable to create bolt clientUnable to open boltdb; is there a chronograf already running? open /var/lib/chronograf/chronograf-v1.db: permission denied" ``` 4. Depending on how the container was launched, it then either halts or goes into a restart loop (eg if `restart: unless-stopped`). Currently, there are two solutions to this: 1. The user passes the `CHRONOGRAF_AS_ROOT` environment variable with the value `true`; or 2. The user manually adjusts ownership on the persistent store: ``` $ sudo chown -R 999:999 ./chronograf ``` Option 1 defeats the purpose of running with reduced privileges. Option 2 isn't documented so it is an example of "hidden knowledge". The user has to: * recognise that the service is not running (which is not always immediately obvious to inexperienced users); * know to consult `docker logs -f chronograf` (the `-f` being particularly important if the container is in a restart loop); * be able to interpret the error message correctly (ie that "permission denied" is the critical element); * realise that changing ownership on the persistent store is the correct response; and * know to use userID 999 in the `chown`. It would be preferable if the container handled these situations correctly for itself, which is the main goal of this Pull Request. This problem does not occur if a *named volume mount* is used rather than a *bind mount*. That is because of the "copy" step whereby Docker recursively copies the internal path to the external path before the Unix-bind-mount association is formed. The last path component of the volume mount (ie the `_data` folder) is then owned by userID 999. Even if `CHRONOGRAF_AS_ROOT` is `true`, root can still write into that folder. If the container is launched *without* an explicit volume mapping, a new *anonymous volume mount* is created each time the container is recreated, but otherwise behaves the same as a *named volume mount*. This is a side-effect of the Dockerfile declaration: ``` VOLUME /var/lib/chronograf ``` > Removing the `VOLUME` statement would avoid this side-effect. In that case, `/var/lib/chronograf` would only exist inside the container while it was running and would not persist. Neither would there be a steady accumulation of unused anonymous volume mounts. Although the default for Docker is to launch the container as root, it is also possible to use either the `-u` option (`docker run`) or `user:` clause (`docker compose`) to have Docker launch the container as some other user. In this situation, with the exception of userID 999, the container will lack the privileges to write to `/var/lib/chronograf` so it will abort with the permission error mentioned above, and the user will also have to know which userID to employ to set up the persistent store. This Pull Request tries to deal with that possibility by writing a hint into the log. For example, if the container is launched as userID 1000 but doesn't have write permission for `/var/lib/chronograf`, the user would see: ``` You need to change ownership on chronograf's persistent store. Run: sudo chown -R 1000:1000 /path/to/persistent/store ``` Signed-off-by: Phill Kelley <[email protected]>
Paraphraser
added a commit
to Paraphraser/IOTstack
that referenced
this pull request
Mar 5, 2025
[PR 781](influxdata/influxdata-docker#781) was submitted on 2025-01-21 but is has now been over 40 days without any response. It isn't clear whether it is simply taking the time it needs to take, or if this is a signal that it will never be processed. The basic problem occurs with Docker "bind mounts" which are the convention for IOTstack containers. If Chronograf launches from a clean slate, Docker will create `./volumes/chronograf` with root ownership. Although the container *launches* as root, it does not take the opportunity to enforce its ownership conventions prior to downgrading its privileges to that of (internal) user `chronograf` (ID=999). The result is the container can't write to its persistent store, crashes and goes into a restart loop. This PR provides an augmented entry point script which sets ownership correctly prior to launching the `chronograf` process. This PR applies the patch for IOTstack users via a local Dockerfile. It can be unwound if/when PR781 is processed. Signed-off-by: Phill Kelley <[email protected]>
Paraphraser
added a commit
to Paraphraser/IOTstack
that referenced
this pull request
Mar 5, 2025
[PR 781](influxdata/influxdata-docker#781) was submitted on 2025-01-21 but is has now been over 40 days without any response. It isn't clear whether it is simply taking the time it needs to take, or if this is a signal that it will never be processed. The basic problem occurs with Docker "bind mounts" which are the convention for IOTstack containers. If Chronograf launches from a clean slate, Docker will create `./volumes/chronograf` with root ownership. Although the container *launches* as root, it does not take the opportunity to enforce its ownership conventions prior to downgrading its privileges to that of (internal) user `chronograf` (ID=999). The result is the container can't write to its persistent store, crashes and goes into a restart loop. This PR provides an augmented entry point script which sets ownership correctly prior to launching the `chronograf` process. This PR applies the patch for IOTstack users via a local Dockerfile. It can be unwound if/when PR781 is processed. Signed-off-by: Phill Kelley <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Assume a Docker "bind mount" is used to map Chronograf's persistent store. Examples:
a
docker run
command:these lines in a
docker compose
service definition:Prior to starting the container, Docker tries to ensure that the external path to the persistent store exists via the equivalent of:
The practical result is that any path component that didn't exist beforehand is created and owned by root.
Make two assumptions (typical "first launch" conditions):
./chronograf
did not exist so Docker has just created thechronograf
folder with root ownership; andIn the absence of passing
CHRONOGRAF_AS_ROOT
, the first-time user is then in the situation where:the persistent store is owned by root;
the container launches as root but downgrades its privileges to user
chronograf
(userID 999);the executable is then unable to write into its persistent store. It crashes with the error message:
Depending on how the container was launched, it then either halts or goes into a restart loop (eg if
restart: unless-stopped
).Currently, there are two solutions to this:
The user passes the
CHRONOGRAF_AS_ROOT
environment variable with the valuetrue
; orThe user manually adjusts ownership on the persistent store:
Option 1 defeats the purpose of running with reduced privileges. Option 2 isn't documented so it is an example of "hidden knowledge". The user has to:
docker logs -f chronograf
(the-f
being particularly important if the container is in a restart loop);chown
(or100:101
for the alpine container).It would be preferable if the container handled these situations correctly for itself, which is the main goal of this Pull Request.
This problem does not occur if a named volume mount is used rather than a bind mount. That is because of the "copy" step whereby Docker recursively copies the internal path to the external path before the Unix-bind-mount association is formed. The last path component of the volume mount (ie the
_data
folder) is then owned by userID 999. Even ifCHRONOGRAF_AS_ROOT
istrue
, root can still write into that folder.If the container is launched without an explicit volume mapping, a new anonymous volume mount is created each time the container is recreated, but otherwise behaves the same as a named volume mount. This is a side-effect of the Dockerfile declaration:
Although the default for Docker is to launch the container as root, it is also possible to use either the
-u
option (docker run
) oruser:
clause (docker compose
) to have Docker launch the container as some other user. In this situation, with the exception of userID 999, the container will lack the privileges to write to/var/lib/chronograf
so it will abort with the permission error mentioned above, and the user will also have to know which userID to employ to set up the persistent store.This Pull Request tries to deal with that possibility by writing a hint into the log. For example, if the container is launched as userID 1000 but doesn't have write permission for
/var/lib/chronograf
, the user would see: