Conversation
- Adds is_finalized to instances table - Implements transaction-safe reservePort - Tracks IsFinalized and launchSemaphore in Manager - Marks instances finalized via batch query
- Allow up to 2 simultaneous launches, as testing shows no improvement beyond 2, probably due to docker global locks on network provisioning allowing improvement only on 1 pipeline stage - Removes heavy global portLock bottleneck - Implements NCPU-sized semaphore restricting docker load - Pre-claims open ports via reservePort before starting containers - Adds aggressive 5-min GC pruning for crashed unfinalized instances
- Adds 50-routines parallel checkout reservePort suite - Refactors profile and db integration tests removing getFreePort - Updates mock DB queries preventing SQLite foreign key errors
There was a problem hiding this comment.
Pull request overview
This PR increases instance launch throughput by allowing limited parallel container startups and moving host-port allocation to an atomic, DB-backed reservation scheme to avoid global serialization.
Changes:
- Replace the port-assignment mutex approach with (a) atomic port reservation in SQLite and (b) a launch semaphore to allow up to 2 concurrent container launches.
- Extend the SQLite schema with
instances.is_finalizedplus several indexes, and add GC for stale unfinalized instances. - Add new tests/benchmarks around port reservation/bitset logic and build-instance lookup batching.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| README.md | Documents CMGR_CONCURRENT_LAUNCHES configuration. |
| cmgr/types.go | Adds launchSemaphore, switches prune gate to atomic.Int64, adds IsFinalized to instance metadata. |
| cmgr/prune_test.go | Updates tests for the new atomic.Int64 API. |
| cmgr/profile_test_timestamps.go | Adds timestamp helpers (currently not test-only due to filename). |
| cmgr/profile_common_test.go | Adds profiling test/bench harness for launch preparation vs DB size. |
| cmgr/docker.go | Adds docker ping retries, initializes launch semaphore, and uses pre-reserved host ports during container create. |
| cmgr/database.go | Adds is_finalized, multiple indexes, _busy_timeout/_synchronous, and a usedPortBitset() helper. |
| cmgr/database_operations_test.go | Adds tests for usedPortBitset, reservePort, and lookupBuildInstances. |
| cmgr/database_instances.go | Implements reservePort, updates finalizeInstance, and adds lookupBuildInstances. |
| cmgr/concurrency_test.go | Adds concurrent port reservation test. |
| cmgr/api.go | Reserves ports earlier in newInstance, batches build instance lookup, updates prune gate usage, and GC’s unfinalized instances. |
| cmd/cmgrd/main.go | Documents CMGR_CONCURRENT_LAUNCHES and CMGR_PRUNE_AGE. |
| cmd/cmgr/main.go | Documents CMGR_CONCURRENT_LAUNCHES. |
Comments suppressed due to low confidence (1)
cmgr/api.go:283
- On error paths after
openInstance(iMeta), the function returns without cleaning up the newly-created instance row (and, with the new logic, any reservedportAssignments). For example, ifGetChallengeMetadataorstartNetworkfails, this leaks an unfinalized instance until the later GC runs, temporarily consuming ports and DB rows. Consider callingm.stopInstance(iMeta)(or at leastremoveInstanceMetadata) before returning on these errors to release reservations immediately.
m.checkPrune()
cMeta, err := m.GetChallengeMetadata(build.Challenge)
if err != nil {
return 0, err
}
if m.portLow != 0 {
revPortMap, err := m.getReversePortMap(build.Challenge)
if err != nil {
return 0, err
}
for _, image := range build.Images {
if image.Host == "builder" {
continue
}
for _, portStr := range image.Ports {
portName := revPortMap[portStr]
hostPort, err := m.reservePort(iMeta.Id, portName)
if err != nil {
m.stopInstance(iMeta)
return 0, err
}
iMeta.Ports[portName] = hostPort
}
}
}
err = m.startNetwork(iMeta, cMeta.ChallengeOptions.NetworkOptions)
if err != nil {
return 0, err
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Set port range to test bitset optimization (10000-63000) | ||
| // This ensures getFreePort uses the bitset logic instead of ephemeral ports | ||
| mgr.portLow = 10000 | ||
| mgr.portHigh = 63000 | ||
|
|
There was a problem hiding this comment.
The setup sets mgr.portLow/portHigh to enable port-range logic, but the instance population loop relies on finalizeInstance() persisting InstanceMetadata.Ports. With the new logic, finalizeInstance() only inserts portAssignments when portLow==0, so these preloaded instances won’t actually create portAssignments rows, skewing the DB-load/profile results. Consider either reserving ports via reservePort() during setup or temporarily disabling port reservation for the population phase (as other tests do) so portAssignments reflect reality.
| // Set port range to test bitset optimization (10000-63000) | |
| // This ensures getFreePort uses the bitset logic instead of ephemeral ports | |
| mgr.portLow = 10000 | |
| mgr.portHigh = 63000 |
| // Output to CSV for easy comparison | ||
| csvFile := fmt.Sprintf("profile_results_%d.csv", instanceCount) | ||
| f, err := os.Create(csvFile) | ||
| if err != nil { | ||
| t.Logf("Warning: could not create CSV file: %s", err) | ||
| } else { | ||
| defer f.Close() | ||
| fmt.Fprintf(f, "metric,duration_ms\n") | ||
| fmt.Fprintf(f, "lookupBuildMetadata,%f\n", step1Duration.Seconds()*1000) | ||
| fmt.Fprintf(f, "openInstance,%f\n", step2Duration.Seconds()*1000) | ||
| fmt.Fprintf(f, "GetChallengeMetadata,%f\n", step3Duration.Seconds()*1000) | ||
| fmt.Fprintf(f, "getReversePortMap,%f\n", step4Duration.Seconds()*1000) | ||
| fmt.Fprintf(f, "reservePort_total,%f\n", step5Duration.Seconds()*1000) | ||
| fmt.Fprintf(f, "total,%f\n", totalDuration.Seconds()*1000) | ||
| t.Logf("\nResults written to %s", csvFile) | ||
| } |
There was a problem hiding this comment.
This test writes CSV files (e.g. profile_results_*.csv) into the current working directory. That can pollute the repo checkout and/or fail in read-only CI environments. Prefer writing to t.TempDir() (or skipping file output entirely) so artifacts are isolated and automatically cleaned up.
| numPorts := m.portHigh - m.portLow + 1 | ||
|
|
||
| for attempt := 0; attempt < 50; attempt++ { | ||
| // Get currently used ports to find candidates | ||
| bitset, err := m.usedPortBitset() | ||
| if err != nil { | ||
| return 0, err | ||
| } | ||
|
|
||
| port := rand.Intn(numPorts) + m.portLow | ||
| var candidate int |
There was a problem hiding this comment.
reservePort uses the package-global math/rand (rand.Intn) which is never seeded anywhere in this repo (no rand.Seed found). That makes port selection deterministic across process starts, increasing collision probability under concurrent launches and making behavior harder to reason about. Prefer using a properly-seeded RNG here (e.g., seed the global source once at startup, or use a per-Manager RNG with appropriate locking).
| // Need a dummy instance ID from setup | ||
| var instId InstanceId | ||
| mgr.db.QueryRow("SELECT id FROM instances LIMIT 1").Scan(&instId) | ||
|
|
||
| port, err := mgr.reservePort(instId, "test") | ||
| if err != nil { | ||
| t.Fatalf("reservePort failed: %s", err) | ||
| } |
There was a problem hiding this comment.
These tests use QueryRow(...).Scan(&instId) without checking the returned error. If the query returns no rows (or the scan fails), instId will remain zero and the test may exercise reservePort with an unintended instance id. Please capture and assert the Scan error.
|
@copilot apply changes based on the comments in this thread |
…seeding Agent-Logs-Url: https://github.com/picoCTF/cmgr/sessions/72ed8087-14fc-417e-a0a0-db0ef7df5918 Co-authored-by: hi-liang <3301067+hi-liang@users.noreply.github.com>
All changes from the review have been applied in commit 7e3dfad:
|
Replace the port assignment mutex with a semaphore so we can launch 2 instances at a time. this gives us some minor pipelining gains with docker daemon (see #41)