Skip to content

perf: integration tests taking 37+ minutes in CI (Domain & Network job) #1376

@Mossaka

Description

@Mossaka

Problem

Integration tests in CI are taking significantly longer than expected. The longest job (Domain & Network Tests) takes 37 minutes, and the total aggregate CI time across all 4 integration test jobs is ~106 minutes.

Data from run 23313635067 on main:

Job Test Count Wall Clock Avg per Test
Domain & Network Tests 77 35m 47s ~28s
Container & Ops Tests 66 29m 8s ~26s
Protocol & Security Tests 58 27m 27s ~28s
API Proxy Tests 36 13m 55s ~23s
Total 237 106m 17s ~27s

Chroot tests add another 4 jobs (~14min critical path, ~24min aggregate):

Job Test Count Wall Clock
Chroot Edge Cases 29 ~7m
Chroot Package Managers 26 ~10m (blocked on Languages)
Chroot Languages 20 ~4m
Chroot /proc 7 ~3m

Root Causes

1. docker compose down uses default 10s grace period (~40 min wasted)

In src/docker-manager.ts:1831, stopContainers() calls:

await execa('docker', ['compose', 'down', '-v'], { cwd: workDir });

This uses Docker Compose's default 10s stop timeout per container. Each AWF invocation has 2-3 containers (squid, agent, sometimes api-proxy). Since containers exit cleanly when the command finishes, Docker sends SIGTERM, waits 10s, then sends SIGKILL — but the 10s wait is unnecessary because the process is already gone.

Impact: ~10s per test * 237 tests = ~39 minutes of pure shutdown waiting across all jobs. Adding -t 1 or -t 2 could save ~30+ minutes of aggregate CI time.

2. Every test spins up a full AWF instance (no container reuse)

Each test() call in the integration suite invokes runner.runWithSudo(), which runs the full AWF lifecycle:

  1. Generate configs (squid.conf, docker-compose.yml)
  2. docker compose up (create network, start squid, healthcheck, start agent)
  3. Run command
  4. docker compose down -v (stop all containers, remove volumes/network)

This means 237 full Docker Compose cycles. The overhead per test (~25-28s) is dominated by:

  • Container creation and startup: ~5-8s
  • Squid healthcheck wait: ~2-3s
  • Container shutdown (10s grace): ~10s
  • Network create/destroy: ~2-3s

3. Docker container builds repeated in every job (~5 min wasted)

Each of the 4 integration test jobs and 4 chroot test jobs independently runs:

docker build -t ghcr.io/github/gh-aw-firewall/squid:latest containers/squid/
docker build -t ghcr.io/github/gh-aw-firewall/agent:latest containers/agent/

This takes ~70s per job. With 8 jobs, that's ~9 minutes of redundant builds. Using a shared build job with Docker layer caching (actions/cache for Docker) could reduce this to ~70s total.

4. Chroot Package Managers job unnecessarily blocked on Languages

In test-chroot.yml:133:

needs: test-chroot-languages  # Run after language tests pass

This adds ~4 minutes to the chroot critical path. Package Manager tests install their own language runtimes independently (Ruby, Rust via setup actions), so this dependency appears unnecessary.

5. Tests run serially within each job (maxWorkers: 1)

In tests/setup/jest.integration.config.js:23:

maxWorkers: 1, // Run tests serially to avoid Docker conflicts

This is necessary because AWF uses fixed container names (awf-squid, awf-agent) and a fixed network (awf-net at 172.30.0.0/24), so parallel tests would conflict. However, this means all tests within a job run sequentially.

6. Unbalanced job sizes

Domain & Network Tests has 77 tests (37 min) while API Proxy Tests has 36 tests (14 min). The wall-clock CI time is gated by the slowest job.

7. 27 tests not run in CI at all

These test files are not matched by any CI job's --testPathPatterns:

  • gh-host-injection.test.ts (8 tests)
  • ghes-auto-populate.test.ts (9 tests)
  • skip-pull.test.ts (3 tests)
  • workdir-tmpfs-hiding.test.ts (7 tests)

Recommendations (prioritized by impact)

P0: Add -t 1 to docker compose down (~30 min saved aggregate)

Change src/docker-manager.ts:1831 from:

await execa('docker', ['compose', 'down', '-v'], ...);

to:

await execa('docker', ['compose', 'down', '-v', '-t', '1'], ...);

Since containers' main processes have already exited by the time down runs, a 1s grace period is more than enough. This alone could reduce each test by ~8-9s.

Estimated saving: ~8s * 237 tests = ~32 min aggregate, ~10 min off the critical path (Domain & Network).

P1: Rebalance test jobs to reduce wall-clock time

Split Domain & Network Tests (77 tests, 37 min) into two jobs of ~38 tests each. This would reduce the critical path from 37 min to ~20 min.

Suggested split:

  • Domain Tests: blocked-domains, empty-domains, wildcard-patterns, dns-servers (43 tests)
  • Network Tests: ipv6, localhost-access, network-security (34 tests)

Estimated saving: ~17 min off critical path.

P2: Cache Docker builds across jobs

Use a shared build step or actions/cache for Docker layers to avoid rebuilding containers 8 times.

Estimated saving: ~8 min aggregate (7 redundant builds * 70s each).

P3: Batch tests that share configuration

Many tests use identical AWF configurations (e.g., allowDomains: ['github.com']). Running multiple assertions inside a single AWF invocation via bash -c "test1 && test2 && ..." could dramatically reduce container cycles. This requires refactoring tests but would have the largest impact.

Estimated saving: Could reduce 237 container cycles to ~50-80, saving 60-70% of aggregate time.

P4: Remove needs: test-chroot-languages from Package Managers job

The Package Managers job installs its own language runtimes via setup actions and doesn't depend on Languages job output.

Estimated saving: ~4 min off chroot critical path.

P5: Add missing tests to CI jobs

Add gh-host-injection, ghes-auto-populate, skip-pull, and workdir-tmpfs-hiding to appropriate CI jobs (e.g., Container & Ops Tests).

Impact: Not a performance improvement but ensures all 27 tests are actually run.

Summary

Recommendation Effort Aggregate Saving Critical Path Saving
P0: -t 1 on docker compose down Trivial (1-line) ~32 min ~10 min
P1: Rebalance test jobs Low ~0 (parallel) ~17 min
P2: Cache Docker builds Medium ~8 min ~0 (parallel)
P3: Batch tests with shared config High ~60-70 min ~20+ min
P4: Remove chroot job dependency Trivial ~4 min ~4 min
P5: Add missing tests Low N/A N/A

Applying P0 + P1 + P4 (all low effort) would reduce the critical path from 37 min to ~10-15 min.

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions