|
| 1 | +# Auditing dockerfile |
| 2 | + |
| 3 | +Because of the nature of GitHub cache, and the time it takes to build the dockerfile for testing, it is desirable |
| 4 | +to be able to audit what is going on there. |
| 5 | + |
| 6 | +This document provides a few pointers on how to do that, and some results as of 2025-02-26 (run inside lima, nerdctl main, |
| 7 | +on a macbook pro M1). |
| 8 | + |
| 9 | +## Intercept network traffic |
| 10 | + |
| 11 | +### On macOS |
| 12 | + |
| 13 | +Use Charles: |
| 14 | +- start SSL proxying |
| 15 | +- enable SOCKS proxy |
| 16 | +- export the root certificate |
| 17 | + |
| 18 | +### On linux |
| 19 | + |
| 20 | +Left as an exercise to the reader. |
| 21 | + |
| 22 | +### If using lima |
| 23 | + |
| 24 | +- restart your lima instance with `HTTP_PROXY=http://X.Y.Z.W:8888 HTTPS_PROXY=socks5://X.Y.Z.W:8888 limactl start instance` - where XYZW |
| 25 | +is the local ip of the Charles proxy (non-localhost) |
| 26 | + |
| 27 | +### On the host where you are running containerd |
| 28 | + |
| 29 | +- copy the root certificate from above into `/usr/local/share/ca-certificates/charles-ssl-proxying-certificate.crt` |
| 30 | +- update your host: `sudo update-ca-certificates` |
| 31 | +- now copy the root certificate again to your current nerdctl clone |
| 32 | + |
| 33 | +### Hack the dockerfile to insert our certificate |
| 34 | + |
| 35 | +Add the following stages in the dockerfile: |
| 36 | +```dockerfile |
| 37 | +FROM --platform=$BUILDPLATFORM golang:${GO_VERSION}-bookworm AS hack-build-base-debian |
| 38 | +RUN apt-get update -qq; apt-get -qq install ca-certificates |
| 39 | +COPY charles-ssl-proxying-certificate.crt /usr/local/share/ca-certificates/ |
| 40 | +RUN update-ca-certificates |
| 41 | + |
| 42 | +FROM --platform=$BUILDPLATFORM golang:${GO_VERSION}-alpine AS hack-build-base |
| 43 | +RUN apk add --no-cache ca-certificates |
| 44 | +COPY charles-ssl-proxying-certificate.crt /usr/local/share/ca-certificates/ |
| 45 | +RUN update-ca-certificates |
| 46 | + |
| 47 | +FROM ubuntu:${UBUNTU_VERSION} AS hack-base |
| 48 | +RUN apt-get update -qq; apt-get -qq install ca-certificates |
| 49 | +COPY charles-ssl-proxying-certificate.crt /usr/local/share/ca-certificates/ |
| 50 | +RUN update-ca-certificates |
| 51 | +``` |
| 52 | + |
| 53 | +Then replace any later "FROM" with our modified bases: |
| 54 | +``` |
| 55 | +golang:${GO_VERSION}-bookworm => hack-build-base-debian |
| 56 | +golang:${GO_VERSION}-alpine => hack-build-base |
| 57 | +ubuntu:${UBUNTU_VERSION} => hack-base |
| 58 | +``` |
| 59 | + |
| 60 | +## Mimicking what the CI is doing |
| 61 | + |
| 62 | +A quick helper: |
| 63 | + |
| 64 | +```bash |
| 65 | +run(){ |
| 66 | + local no_cache="${1:-}" |
| 67 | + local platform="${2:-arm64}" |
| 68 | + local dockerfile="${3:-Dockerfile}" |
| 69 | + local target="${4:-test-integration}" |
| 70 | + |
| 71 | + local cache_shard="$CONTAINERD_VERSION"-"$platform" |
| 72 | + local shard="$cache_shard"-"$target"-"$UBUNTU_VERSION"-"$no_cache"-"$dockerfile" |
| 73 | + |
| 74 | + local cache_location=$HOME/bk-cache-"$cache_shard" |
| 75 | + local destination=$HOME/bk-output-"$shard" |
| 76 | + local logs="$HOME"/bk-debug-"$shard" |
| 77 | + |
| 78 | + if [ "$no_cache" != "" ]; then |
| 79 | + nerdctl system prune -af |
| 80 | + nerdctl builder prune -af |
| 81 | + rm -Rf "$cache_location" |
| 82 | + fi |
| 83 | + |
| 84 | + nerdctl build \ |
| 85 | + --build-arg UBUNTU_VERSION="$UBUNTU_VERSION" \ |
| 86 | + --build-arg CONTAINERD_VERSION="$CONTAINERD_VERSION" \ |
| 87 | + --platform="$platform" \ |
| 88 | + --output type=tar,dest="$destination" \ |
| 89 | + --progress plain \ |
| 90 | + --build-arg HTTP_PROXY=$HTTP_PROXY \ |
| 91 | + --build-arg HTTPS_PROXY=$HTTPS_PROXY \ |
| 92 | + --cache-to type=local,dest="$cache_location",mode=max \ |
| 93 | + --cache-from type=local,src="$cache_location" \ |
| 94 | + --target "$target" \ |
| 95 | + -f "$dockerfile" . 2>&1 | tee "$logs" |
| 96 | +} |
| 97 | +``` |
| 98 | + |
| 99 | +And here is what the CI is doing: |
| 100 | + |
| 101 | +```bash |
| 102 | +ci_run(){ |
| 103 | + local no_cache="${1:-}" |
| 104 | + export UBUNTU_VERSION=24.04 |
| 105 | + |
| 106 | + CONTAINERD_VERSION=v1.6.36 run "$no_cache" arm64 Dockerfile.origin build-dependencies |
| 107 | + UBUNTU_VERSION=20.04 CONTAINERD_VERSION=v1.6.36 run "" arm64 Dockerfile.origin test-integration |
| 108 | + |
| 109 | + CONTAINERD_VERSION=v1.7.25 run "$no_cache" arm64 Dockerfile.origin build-dependencies |
| 110 | + UBUNTU_VERSION=22.04 CONTAINERD_VERSION=v1.7.25 run "" arm64 Dockerfile.origin test-integration |
| 111 | + |
| 112 | + CONTAINERD_VERSION=v2.0.3 run "$no_cache" arm64 Dockerfile.origin build-dependencies |
| 113 | + UBUNTU_VERSION=24.04 CONTAINERD_VERSION=v2.0.3 run "" arm64 Dockerfile.origin test-integration |
| 114 | + |
| 115 | + CONTAINERD_VERSION=v2.0.3 run "$no_cache" amd64 Dockerfile.origin build-dependencies |
| 116 | + UBUNTU_VERSION=24.04 CONTAINERD_VERSION=v2.0.3 run "" amd64 Dockerfile.origin test-integration |
| 117 | +} |
| 118 | + |
| 119 | +# To simulate what happens when there is no cache, go with: |
| 120 | +ci_run no_cache |
| 121 | + |
| 122 | +# Once you have a cached run, you can simulate what happens with cache |
| 123 | +# First modify something in the nerdctl tree |
| 124 | +# Then run it |
| 125 | +touch mimick_nerdctl_change |
| 126 | +ci_run |
| 127 | +``` |
| 128 | + |
| 129 | +## Analyzing results |
| 130 | + |
| 131 | +### Network |
| 132 | + |
| 133 | +#### Full CI run, cold cache (the first three pipelines, and part of the fourth) |
| 134 | + |
| 135 | +The following numbers are based on the above script, with cold cache. |
| 136 | + |
| 137 | +Unfortunately golang did segfault on me during the last (cross-run targetting amd), so, these numbers should be taken |
| 138 | +as (slightly) underestimated. |
| 139 | + |
| 140 | +Total number of requests: 7190 |
| 141 | + |
| 142 | +Total network duration: 13 minutes 11 seconds |
| 143 | + |
| 144 | +Outbound: 1.31MB |
| 145 | + |
| 146 | +Inbound: 5202MB |
| 147 | + |
| 148 | +Breakdown per domain |
| 149 | + |
| 150 | +| Destination | # requests | through | duration | |
| 151 | +|----------------------------------------------|-------------------|------------|-----------------| |
| 152 | +| https://registry-1.docker.io | 123 (2 failed) | 1.22MB | 26s | |
| 153 | +| https://production.cloudflare.docker.com | 60 | 1242.41MB | 2m6s | |
| 154 | +| http://deb.debian.org | 207 | 107.14MB | 13s | |
| 155 | +| https://github.com | 105 | 977.88MB | 1m25s | |
| 156 | +| https://proxy.golang.org | 5343 (57 failed) | 753.69MB | 4m8s | |
| 157 | +| https://objects.githubusercontent.com | 42 | 900.22MB | 50s | |
| 158 | +| https://raw.githubusercontent.com | 8 | 92KB | 2s | |
| 159 | +| https://storage.googleapis.com | 19 (3 failed) | 537.21MB | 35s | |
| 160 | +| https://ghcr.io | 65 | 588.68KB | 13s | |
| 161 | +| https://auth.docker.io | 10 | 259KB | 5s | |
| 162 | +| https://pkg-containers.githubusercontent.com | 48 | 183.63MB | 20s | |
| 163 | +| http://ports.ubuntu.com | 300 | 165.36MB | 1m55s | |
| 164 | +| https://golang.org | 4 | 228.93KB | <1s | |
| 165 | +| https://go.dev | 4 | 95.51KB | <1s | |
| 166 | +| https://dl.google.com | 4 | 271.42MB | 11s | |
| 167 | +| https://sum.golang.org | 746 | 3.89MB | 17s | |
| 168 | +| http://security.ubuntu.com | 7 | 2.70MB | 3s | |
| 169 | +| http://archive.ubuntu.com | 95 | 55.95MB | 19s | |
| 170 | +| | - | - | - | |
| 171 | +| Total | 7190 | 5203MB | 13 mins 11 secs | |
| 172 | + |
| 173 | + |
| 174 | +#### Full CI run, warm cache (only the first three pipelines) |
| 175 | + |
| 176 | +| Destination | # requests | through | duration | |
| 177 | +|------------------------------------------|------------------|---------|----------------| |
| 178 | +| https://registry-1.docker.io | 25 | 537KB | 14s | |
| 179 | +| https://production.cloudflare.docker.com | 2 | 25MB | 1s | |
| 180 | +| https://github.com | 7 (1 failed) | 105KB | 2s | |
| 181 | +| https://proxy.golang.org | 930 (11 failed) | 150MB | 37s | |
| 182 | +| https://objects.githubusercontent.com | 4 | 86MB | 4s | |
| 183 | +| https://storage.googleapis.com | 3 | 112MB | 6s | |
| 184 | +| https://auth.docker.io | 1 | 26KB | <1s | |
| 185 | +| http://ports.ubuntu.com | 133 | 67MB | 50s | |
| 186 | +| https://golang.org | 2 | 114KB | <1s | |
| 187 | +| https://go.dev | 2 | 45KB | <1s | |
| 188 | +| https://dl.google.com | 2 | 134MB | 5s | |
| 189 | +| https://sum.golang.org | 484 | 3MB | 11s | |
| 190 | +| | - | - | - | |
| 191 | +| Total | 1595 (12 failed) | 579MB | 2 mins 10 secs | |
| 192 | + |
| 193 | + |
| 194 | +#### Analysis |
| 195 | + |
| 196 | +##### Docker Hub |
| 197 | + |
| 198 | +Images from Docker Hub are clearly a source of concern (made even worse by the fact they apply strict limitations on the |
| 199 | +number of requests permitted). |
| 200 | + |
| 201 | +When the cache is cold, this is about 1GB per run, for 200 requests and 3 minutes. |
| 202 | + |
| 203 | +Actions: |
| 204 | +- [ ] reduce the number of images |
| 205 | + - we currently use 2 golang images, which does not make sense |
| 206 | +- [ ] reduce the round trips |
| 207 | + - there is no reason why any of the images should be queried more than once per build |
| 208 | +- [ ] move away from Hub golang image, and instead use a raw distro + golang download |
| 209 | + - Hub golang is a source of pain and issues (diverging version scheme forces ugly shell contorsions, delay in availability creates |
| 210 | +broken situations) |
| 211 | + - we are already downloading the go release tarball anyhow, so, this is just wasted bandwidth with no added value |
| 212 | + |
| 213 | +Success criteria: |
| 214 | +- on a cold cache, reduce the total number of requests against Docker properties by 50% or more |
| 215 | +- on a cold cache, cut the data transfer and time in half |
| 216 | + |
| 217 | +##### Distro packages |
| 218 | + |
| 219 | +On a WARM cache, close to 1 minute is spent fetching Ubuntu packages. |
| 220 | +This should not happen, and distro downloads should always be cached. |
| 221 | + |
| 222 | +On a cold cache, distro packages download near 3 minutes. |
| 223 | +Very likely there is stage duplication that could be reduced and some of that could be cut of. |
| 224 | + |
| 225 | +Actions: |
| 226 | +- [ ] ensure distro package downloading is staged in a way we can cache it |
| 227 | +- [ ] review stages to reduce package installation duplication |
| 228 | + |
| 229 | +Success criteria: |
| 230 | +- [ ] 0 package installation on a warm cache |
| 231 | +- [ ] cut cold cache package install time by 50% (XXX not realistic?) |
| 232 | + |
| 233 | + |
| 234 | +##### GitHub repositories |
| 235 | + |
| 236 | +Clones from GitHub do clock in at 1GB on a cold cache. |
| 237 | +Containerd alone counts for more than half of it (at 160MB+ x4). |
| 238 | + |
| 239 | +Hopefully, on a warm cache it is practically non-existent. |
| 240 | + |
| 241 | +But then, this is ridiculous. |
| 242 | + |
| 243 | +Actions: |
| 244 | +- [ ] shallow clone |
| 245 | + |
| 246 | +Success criteria: |
| 247 | +- [ ] reduce network traffic from cloning by 80% |
| 248 | + |
| 249 | +##### Go modules |
| 250 | + |
| 251 | +At 750+MB and over 4 minutes, this is the number one speed bottleneck on a cold cache. |
| 252 | + |
| 253 | +On a warm cache, it is still over 150MB and 30+ seconds. |
| 254 | + |
| 255 | +In and of itself, this is hard to reduce, as we need these... |
| 256 | + |
| 257 | +Actions: |
| 258 | +- [ ] we could cache the module download location to reduce round-trips on modules that are shared accross |
| 259 | +different projects |
| 260 | +- [ ] we are likely installing nerdctl modules six times - (once per architecture during the build phase, then once per |
| 261 | +ubuntu version and architecture during the tests runs (this is not even accounted for in the audit above)) - it should |
| 262 | +only happen twice (once per architecture) |
| 263 | + |
| 264 | +Success criteria: |
| 265 | +- [ ] achieve 20% reduction of total time spent downloading go modules |
| 266 | + |
| 267 | +##### Other downloads |
| 268 | + |
| 269 | +1. At 500MB+ and 30 seconds, storage.googleapis.com is serving a SINGLE go module that gets special treatment: klauspost/compress. |
| 270 | +This module is very small, but does serve along with it a very large `testdata` folder. |
| 271 | +The fact that nerdctl downloads its module multiple times is further compounding the effect. |
| 272 | + |
| 273 | +2. the golang archive is downloaded multiple times - it should be downloaded only once per run, and only on a cold cache |
| 274 | + |
| 275 | +3. some of the binary releases we are retrieving are also being retrieved with a warm cache, and they are generally quite large. |
| 276 | +We could consider building certain things from source instead, and in all cases ensure that we are only downloading with a cold cache. |
| 277 | + |
| 278 | +Success criteria: |
| 279 | +- [ ] 0 static downloads on a warm cache |
| 280 | +- [ ] cut extra downloads by 20% |
| 281 | + |
| 282 | +#### Duration |
| 283 | + |
| 284 | +Unscientific numbers, per pipeline |
| 285 | + |
| 286 | +dependencies, no cache: |
| 287 | +- 224 seconds total |
| 288 | +- 53 seconds exporting cache |
| 289 | + |
| 290 | +dependencies, with cache: |
| 291 | +- 12 seconds |
| 292 | + |
| 293 | +test-integration, no cache: |
| 294 | +- 282 seconds |
| 295 | + |
| 296 | +#### Caching |
| 297 | + |
| 298 | +Number of layers in cache: |
| 299 | +``` |
| 300 | +after dependencies stage: 78 |
| 301 | +intermediate size: 1.5G |
| 302 | +after test-integration stage: 118 |
| 303 | +total size: 2.8G |
| 304 | +``` |
| 305 | + |
| 306 | +## Generic considerations |
| 307 | + |
| 308 | +### Caching compression |
| 309 | + |
| 310 | +This is obviously heavily dependent on the runner properties. |
| 311 | + |
| 312 | +With local cache, on high-performance IO (laptop SSD), zstd is definitely considerably better (about twice as fast). |
| 313 | + |
| 314 | +With GHA, the impact is minimal, since network IO is heavily dominant, but zstd still has the upper |
| 315 | +hand with regard to cache size. |
| 316 | + |
| 317 | +### Output |
| 318 | + |
| 319 | +Loading image into the Docker store comes at a somewhat significant cost. |
| 320 | +It is quite possible that a significant performance boost could be achieved by using |
| 321 | +buildkit containerd worker and nerdctl instead. |
0 commit comments