-
Notifications
You must be signed in to change notification settings - Fork 18
Description
Problem
Integration tests in CI are taking significantly longer than expected. The longest job (Domain & Network Tests) takes 37 minutes, and the total aggregate CI time across all 4 integration test jobs is ~106 minutes.
Data from run 23313635067 on main:
| Job | Test Count | Wall Clock | Avg per Test |
|---|---|---|---|
| Domain & Network Tests | 77 | 35m 47s | ~28s |
| Container & Ops Tests | 66 | 29m 8s | ~26s |
| Protocol & Security Tests | 58 | 27m 27s | ~28s |
| API Proxy Tests | 36 | 13m 55s | ~23s |
| Total | 237 | 106m 17s | ~27s |
Chroot tests add another 4 jobs (~14min critical path, ~24min aggregate):
| Job | Test Count | Wall Clock |
|---|---|---|
| Chroot Edge Cases | 29 | ~7m |
| Chroot Package Managers | 26 | ~10m (blocked on Languages) |
| Chroot Languages | 20 | ~4m |
| Chroot /proc | 7 | ~3m |
Root Causes
1. docker compose down uses default 10s grace period (~40 min wasted)
In src/docker-manager.ts:1831, stopContainers() calls:
await execa('docker', ['compose', 'down', '-v'], { cwd: workDir });This uses Docker Compose's default 10s stop timeout per container. Each AWF invocation has 2-3 containers (squid, agent, sometimes api-proxy). Since containers exit cleanly when the command finishes, Docker sends SIGTERM, waits 10s, then sends SIGKILL — but the 10s wait is unnecessary because the process is already gone.
Impact: ~10s per test * 237 tests = ~39 minutes of pure shutdown waiting across all jobs. Adding -t 1 or -t 2 could save ~30+ minutes of aggregate CI time.
2. Every test spins up a full AWF instance (no container reuse)
Each test() call in the integration suite invokes runner.runWithSudo(), which runs the full AWF lifecycle:
- Generate configs (squid.conf, docker-compose.yml)
docker compose up(create network, start squid, healthcheck, start agent)- Run command
docker compose down -v(stop all containers, remove volumes/network)
This means 237 full Docker Compose cycles. The overhead per test (~25-28s) is dominated by:
- Container creation and startup: ~5-8s
- Squid healthcheck wait: ~2-3s
- Container shutdown (10s grace): ~10s
- Network create/destroy: ~2-3s
3. Docker container builds repeated in every job (~5 min wasted)
Each of the 4 integration test jobs and 4 chroot test jobs independently runs:
docker build -t ghcr.io/github/gh-aw-firewall/squid:latest containers/squid/
docker build -t ghcr.io/github/gh-aw-firewall/agent:latest containers/agent/This takes ~70s per job. With 8 jobs, that's ~9 minutes of redundant builds. Using a shared build job with Docker layer caching (actions/cache for Docker) could reduce this to ~70s total.
4. Chroot Package Managers job unnecessarily blocked on Languages
In test-chroot.yml:133:
needs: test-chroot-languages # Run after language tests passThis adds ~4 minutes to the chroot critical path. Package Manager tests install their own language runtimes independently (Ruby, Rust via setup actions), so this dependency appears unnecessary.
5. Tests run serially within each job (maxWorkers: 1)
In tests/setup/jest.integration.config.js:23:
maxWorkers: 1, // Run tests serially to avoid Docker conflictsThis is necessary because AWF uses fixed container names (awf-squid, awf-agent) and a fixed network (awf-net at 172.30.0.0/24), so parallel tests would conflict. However, this means all tests within a job run sequentially.
6. Unbalanced job sizes
Domain & Network Tests has 77 tests (37 min) while API Proxy Tests has 36 tests (14 min). The wall-clock CI time is gated by the slowest job.
7. 27 tests not run in CI at all
These test files are not matched by any CI job's --testPathPatterns:
gh-host-injection.test.ts(8 tests)ghes-auto-populate.test.ts(9 tests)skip-pull.test.ts(3 tests)workdir-tmpfs-hiding.test.ts(7 tests)
Recommendations (prioritized by impact)
P0: Add -t 1 to docker compose down (~30 min saved aggregate)
Change src/docker-manager.ts:1831 from:
await execa('docker', ['compose', 'down', '-v'], ...);to:
await execa('docker', ['compose', 'down', '-v', '-t', '1'], ...);Since containers' main processes have already exited by the time down runs, a 1s grace period is more than enough. This alone could reduce each test by ~8-9s.
Estimated saving: ~8s * 237 tests = ~32 min aggregate, ~10 min off the critical path (Domain & Network).
P1: Rebalance test jobs to reduce wall-clock time
Split Domain & Network Tests (77 tests, 37 min) into two jobs of ~38 tests each. This would reduce the critical path from 37 min to ~20 min.
Suggested split:
- Domain Tests: blocked-domains, empty-domains, wildcard-patterns, dns-servers (43 tests)
- Network Tests: ipv6, localhost-access, network-security (34 tests)
Estimated saving: ~17 min off critical path.
P2: Cache Docker builds across jobs
Use a shared build step or actions/cache for Docker layers to avoid rebuilding containers 8 times.
Estimated saving: ~8 min aggregate (7 redundant builds * 70s each).
P3: Batch tests that share configuration
Many tests use identical AWF configurations (e.g., allowDomains: ['github.com']). Running multiple assertions inside a single AWF invocation via bash -c "test1 && test2 && ..." could dramatically reduce container cycles. This requires refactoring tests but would have the largest impact.
Estimated saving: Could reduce 237 container cycles to ~50-80, saving 60-70% of aggregate time.
P4: Remove needs: test-chroot-languages from Package Managers job
The Package Managers job installs its own language runtimes via setup actions and doesn't depend on Languages job output.
Estimated saving: ~4 min off chroot critical path.
P5: Add missing tests to CI jobs
Add gh-host-injection, ghes-auto-populate, skip-pull, and workdir-tmpfs-hiding to appropriate CI jobs (e.g., Container & Ops Tests).
Impact: Not a performance improvement but ensures all 27 tests are actually run.
Summary
| Recommendation | Effort | Aggregate Saving | Critical Path Saving |
|---|---|---|---|
P0: -t 1 on docker compose down |
Trivial (1-line) | ~32 min | ~10 min |
| P1: Rebalance test jobs | Low | ~0 (parallel) | ~17 min |
| P2: Cache Docker builds | Medium | ~8 min | ~0 (parallel) |
| P3: Batch tests with shared config | High | ~60-70 min | ~20+ min |
| P4: Remove chroot job dependency | Trivial | ~4 min | ~4 min |
| P5: Add missing tests | Low | N/A | N/A |
Applying P0 + P1 + P4 (all low effort) would reduce the critical path from 37 min to ~10-15 min.