Skip to content

Commit 7542c87

Browse files
authored
Merge pull request #3405 from apostasie/documentation
Draft design document for store
2 parents 678393e + ea86ae9 commit 7542c87

File tree

6 files changed

+169
-6
lines changed

6 files changed

+169
-6
lines changed

.github/workflows/test-canary.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -112,5 +112,5 @@ jobs:
112112
ctrdVersion: ${{ env.CONTAINERD_VERSION }}
113113
run: powershell hack/configure-windows-ci.ps1
114114
- name: "Run integration tests"
115-
# See https://github.com/containerd/nerdctl/blob/main/docs/testing.md#about-parallelization
115+
# See https://github.com/containerd/nerdctl/blob/main/docs/dev/testing.md#about-parallelization
116116
run: go test -p 1 -v ./cmd/nerdctl/...

.github/workflows/test-kube.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ jobs:
2222
with:
2323
fetch-depth: 1
2424
- name: "Run Kubernetes integration tests"
25-
# See https://github.com/containerd/nerdctl/blob/main/docs/testing.md#about-parallelization
25+
# See https://github.com/containerd/nerdctl/blob/main/docs/dev/testing.md#about-parallelization
2626
run: |
2727
./hack/build-integration-kubernetes.sh
2828
sudo ./_output/nerdctl exec nerdctl-test-control-plane bash -c -- 'export TMPDIR="$HOME"/tmp; mkdir -p "$TMPDIR"; cd /nerdctl-source; /usr/local/go/bin/go test -p 1 ./cmd/nerdctl/... -test.only-kubernetes'

.github/workflows/test.yml

+3-3
Original file line numberDiff line numberDiff line change
@@ -290,15 +290,15 @@ jobs:
290290
timeout_minutes: 30
291291
max_attempts: 2
292292
retry_on: error
293-
# See https://github.com/containerd/nerdctl/blob/main/docs/testing.md#about-parallelization
293+
# See https://github.com/containerd/nerdctl/blob/main/docs/dev/testing.md#about-parallelization
294294
command: go test -p 1 -timeout 20m -v -exec sudo ./cmd/nerdctl/... -args -test.target=docker -test.allow-kill-daemon
295295
- name: "Ensure that the IPv6 integration test suite is compatible with Docker"
296296
uses: nick-fields/retry@v3
297297
with:
298298
timeout_minutes: 30
299299
max_attempts: 2
300300
retry_on: error
301-
# See https://github.com/containerd/nerdctl/blob/main/docs/testing.md#about-parallelization
301+
# See https://github.com/containerd/nerdctl/blob/main/docs/dev/testing.md#about-parallelization
302302
command: go test -p 1 -timeout 20m -v -exec sudo ./cmd/nerdctl/... -args -test.target=docker -test.allow-kill-daemon -test.only-ipv6
303303

304304
test-integration-windows:
@@ -332,7 +332,7 @@ jobs:
332332
run: powershell hack/configure-windows-ci.ps1
333333
# TODO: Run unit tests
334334
- name: "Run integration tests"
335-
# See https://github.com/containerd/nerdctl/blob/main/docs/testing.md#about-parallelization
335+
# See https://github.com/containerd/nerdctl/blob/main/docs/dev/testing.md#about-parallelization
336336
run: go test -p 1 -v ./cmd/nerdctl/...
337337

338338
test-integration-freebsd:

README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -253,7 +253,7 @@ Using `go install github.com/containerd/nerdctl/v2/cmd/nerdctl` is possible, but
253253

254254
### Testing
255255

256-
See [testing nerdctl](docs/testing.md).
256+
See [testing nerdctl](docs/dev/testing.md).
257257

258258
### Contributing to nerdctl
259259

docs/dev/store.md

+163
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
# About `pkg/store`
2+
3+
## TL;DR
4+
5+
You _may_ want to read this if you are developing something in nerdctl that would involve storing persistent information.
6+
7+
If there is a "store" already in the codebase (eg: volumestore, namestore, etc) that does provide the methods that you need,
8+
you are fine and should just stick to that.
9+
10+
On the other hand, if you are curious, or if what you want to write is "new", then _you should_ have a look at this document:
11+
it does provide extended information about how we manage persistent data storage, especially with regard to concurrency
12+
and atomicity.
13+
14+
## Motivation
15+
16+
The core of nerdctl aims at keeping its dependencies as lightweight as possible.
17+
For that reason, nerdctl does not use a database to store persistent information, but instead uses the filesystem,
18+
under a variety of directories.
19+
20+
That "information" is typically volumes metadata, containers lifecycle info, the "name store" (which does ensure no two
21+
containers can be named the same), etc.
22+
23+
However, storing data on the filesystem in a reliable way comes with challenges:
24+
- incomplete writes may happen (because of a system restart, or an application crash), leaving important structured files
25+
in a broken state
26+
- concurrent writes, or reading while writing would obviously be a problem as well, be it accross goroutines, or between
27+
concurrent executions of the nerdctl binary, or embedded in a third-party application that does concurrently access resources
28+
29+
The `pkg/store` package does provide a "storage" abstraction that takes care of these issues, generally providing
30+
guarantees that concurrent operations can be performed safely, and that writes are "atomic", ensuring we never brick
31+
user installs.
32+
33+
For details about how, and what is done, read-on.
34+
35+
## The problem with writing a file
36+
37+
A write may very well be interrupted.
38+
39+
While reading the resulting mangled file will typically break `json.Unmarshall` for example, and while we should still
40+
handle such cases gracefully and provide meaningful information to the user about which file is damaged (which could be due
41+
to the user manually modifying them), using "atomic" writes will (almost always (*)) prevent this from happening
42+
on our part.
43+
44+
An "atomic" write is usually performed by first writing data to a temporary file, and then, only if the write operation
45+
succeeded, move that temporary file to its final destination.
46+
47+
The `rename` syscall (see https://man7.org/linux/man-pages/man2/rename.2.html) is indeed "atomic"
48+
(eg: it fully succeeds, or fails), providing said guarantees that you end-up with a complete file that has the entirety
49+
of what was meant to be written.
50+
51+
This is an "almost always", as an _operating system crash_ MAY break that promise (this is highly dependent on specifics
52+
that are out of scope here, and that nerdctl has no control over).
53+
Though, crashing operating systems is (hopefully) a sufficiently rare event that we can consider we "always" have atomic writes.
54+
55+
There is one caveat with "rename-based atomic writes" though: if you mount the file itself inside a container,
56+
an atomic write will not work as you expect, as the inode will (obviously) change when you modify the file,
57+
and these changes will not be propagated inside the container.
58+
59+
This caveat is the reason why `hostsstore` does NOT use an atomic write to update the `hosts` file, but a traditional write.
60+
61+
## Concurrency between go routines
62+
63+
This is a (simple) well-known problem. Just use a mutex to prevent concurrent modifications of the same object.
64+
65+
Note that this is not much of a problem right now in nerdctl itself - but it might be in third-party applications using
66+
our codebase.
67+
68+
This is just generally good hygiene when building concurrency-safe packages.
69+
70+
## Concurrency between distinct binary invocations
71+
72+
This is much more of a problem.
73+
74+
There are many good reasons and real-life scenarios where concurrent binary execution may happen.
75+
A third-party deployment tool (similar to terraform for eg), that will batch a bunch of operations to be performed
76+
to achieve a desired infrastructure state, and call many `nerdctl` invocations in parallel to achieve that.
77+
This is also common-place in testing (subpackages).
78+
And of course, a third-party tool that would be long-running and allow parallel execution, leveraging nerdctl codebase
79+
as a library, may certainly produce these circumstances.
80+
81+
The known answer to that problem is to use a filesystem lock (or flock).
82+
83+
Concretely, the first process will "lock" a directory. All other processes trying to do the same will then be put
84+
in a queue and wait for the prior lock to be released before they can "lock" themselves, in turn.
85+
86+
Filesystem locking comes with its own set of challenges:
87+
- implementation is somewhat low-level (the golang core library keeps their implementation internal, and you have to
88+
reimplement your own with platform-dependent APIs and syscalls)
89+
- it is tricky to get right - there are reasons why golang core does not make it public
90+
- locking "scope" should be done carefully: having ONE global lock for everything will definitely hurt performance badly,
91+
as you will basically make everything "sequential", effectively destroying some of the benefits of parallelizing code
92+
in the first place...
93+
94+
## Lock design...
95+
96+
While it is tempting to just provide self-locking, individual methods as an API (`Get`, `Set`), this is not the right
97+
answer.
98+
99+
Imagine a context where consuming code would first like to check if something exists, then later on create it if it does not:
100+
```golang
101+
if !store.Exists("something") {
102+
// do things
103+
// ...
104+
// Now, create
105+
store.Set([]byte("data"), "something")
106+
}
107+
```
108+
109+
You do have two methods (`Get` and `Set`) that _may individually_ guarantee they are the sole user of that resource,
110+
but a concurrent change _in between_ these two calls may very well (and _will_) happen and change the state of the world.
111+
112+
Effectively, in that case, `Set` might overwrite changes made by another go routine or concurrent execution, possibly
113+
wrecking havoc in another process.
114+
115+
_When_ to lock, and _for how long_, is a decision that only the embedding code can make.
116+
117+
A good example is container creation.
118+
It may require the creation of several different volumes.
119+
In that case, you want to lock at the start of the container creation process, and only release the lock when you are fully
120+
done creating the container - not just when done creating a volume (nor even when done creating all volumes).
121+
122+
## ... while safeguarding the developer
123+
124+
nerdctl still provides some safeguards for the developer.
125+
126+
Any store method that DOES require locking will fail loudly if it does not detect a lock.
127+
128+
This is obviously not bullet-proof.
129+
For example, the lock may belong to another goroutine instead of the one we are in (and we cannot detect that).
130+
But this is still better than nothing, and will help developers making sure they **do** lock.
131+
132+
## Using the `store` api to implement your own storage
133+
134+
While - as mentioned above - the store does _not_ lock on its own, specific "stores implementations" may, and should,
135+
provide higher-level methods that best fit their data-model usage, and that **do** lock on their own.
136+
137+
For example, the namestore (which is the simplest store), does provide three simple methods:
138+
- Acquire
139+
- Release
140+
- Rename
141+
142+
Users of the `namestore` do not have to bother with locking. These methods are safe to use concurrently.
143+
144+
This is a good example of how to leverage core store primitives to implement a developer friendly, safe storage for
145+
"something" (in that case "names").
146+
147+
Finaly note an important point - mentioned above: locking should be done to the smallest possible "segment" of sub-directories.
148+
Specifically, any store should lock only - at most - resources under the _namespace_ being manipulated.
149+
150+
For example, a container lifecycle storage should not lock out any other container, but only its own private directory.
151+
152+
## Scope, ambitions and future
153+
154+
`pkg/store` has no ambition whatsoever to be a generic solution, usable outside nerdctl.
155+
156+
It is solely designed to fit nerdctl needs, and if it was to be made usable standalone, would probably have to be modified
157+
extensively, which is clearly out of scope here.
158+
159+
Furthermore, there are already much more advanced generic solutions out there that you should use instead for outside-of-nerdctl projects.
160+
161+
As for future, one nice thing we should consider is to implement read-only locks in addition to the exclusive, write-locks
162+
we currently use.
163+
The net benefit would be a performance boost in certain contexts (massively parallel, mostly read environments).
File renamed without changes.

0 commit comments

Comments
 (0)