Feat launch concurrency by hi-liang · Pull Request #44 · picoCTF/cmgr

hi-liang · 2026-03-13T19:00:19Z

Replace the port assignment mutex with a semaphore so we can launch 2 instances at a time. this gives us some minor pipelining gains with docker daemon (see #41)

- Adds is_finalized to instances table - Implements transaction-safe reservePort - Tracks IsFinalized and launchSemaphore in Manager - Marks instances finalized via batch query

- Allow up to 2 simultaneous launches, as testing shows no improvement beyond 2, probably due to docker global locks on network provisioning allowing improvement only on 1 pipeline stage - Removes heavy global portLock bottleneck - Implements NCPU-sized semaphore restricting docker load - Pre-claims open ports via reservePort before starting containers - Adds aggressive 5-min GC pruning for crashed unfinalized instances

- Adds 50-routines parallel checkout reservePort suite - Refactors profile and db integration tests removing getFreePort - Updates mock DB queries preventing SQLite foreign key errors

Copilot

Pull request overview

This PR increases instance launch throughput by allowing limited parallel container startups and moving host-port allocation to an atomic, DB-backed reservation scheme to avoid global serialization.

Changes:

Replace the port-assignment mutex approach with (a) atomic port reservation in SQLite and (b) a launch semaphore to allow up to 2 concurrent container launches.
Extend the SQLite schema with instances.is_finalized plus several indexes, and add GC for stale unfinalized instances.
Add new tests/benchmarks around port reservation/bitset logic and build-instance lookup batching.

Reviewed changes

Copilot reviewed 13 out of 13 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
README.md	Documents `CMGR_CONCURRENT_LAUNCHES` configuration.
cmgr/types.go	Adds `launchSemaphore`, switches prune gate to `atomic.Int64`, adds `IsFinalized` to instance metadata.
cmgr/prune_test.go	Updates tests for the new `atomic.Int64` API.
cmgr/profile_test_timestamps.go	Adds timestamp helpers (currently not test-only due to filename).
cmgr/profile_common_test.go	Adds profiling test/bench harness for launch preparation vs DB size.
cmgr/docker.go	Adds docker ping retries, initializes launch semaphore, and uses pre-reserved host ports during container create.
cmgr/database.go	Adds `is_finalized`, multiple indexes, `_busy_timeout`/`_synchronous`, and a `usedPortBitset()` helper.
cmgr/database_operations_test.go	Adds tests for `usedPortBitset`, `reservePort`, and `lookupBuildInstances`.
cmgr/database_instances.go	Implements `reservePort`, updates `finalizeInstance`, and adds `lookupBuildInstances`.
cmgr/concurrency_test.go	Adds concurrent port reservation test.
cmgr/api.go	Reserves ports earlier in `newInstance`, batches build instance lookup, updates prune gate usage, and GC’s unfinalized instances.
cmd/cmgrd/main.go	Documents `CMGR_CONCURRENT_LAUNCHES` and `CMGR_PRUNE_AGE`.
cmd/cmgr/main.go	Documents `CMGR_CONCURRENT_LAUNCHES`.

Comments suppressed due to low confidence (1)

cmgr/api.go:283

On error paths after openInstance(iMeta), the function returns without cleaning up the newly-created instance row (and, with the new logic, any reserved portAssignments). For example, if GetChallengeMetadata or startNetwork fails, this leaks an unfinalized instance until the later GC runs, temporarily consuming ports and DB rows. Consider calling m.stopInstance(iMeta) (or at least removeInstanceMetadata) before returning on these errors to release reservations immediately.

	m.checkPrune()

	cMeta, err := m.GetChallengeMetadata(build.Challenge)
	if err != nil {
		return 0, err
	}

	if m.portLow != 0 {
		revPortMap, err := m.getReversePortMap(build.Challenge)
		if err != nil {
			return 0, err
		}

		for _, image := range build.Images {
			if image.Host == "builder" {
				continue
			}
			for _, portStr := range image.Ports {
				portName := revPortMap[portStr]
				hostPort, err := m.reservePort(iMeta.Id, portName)
				if err != nil {
					m.stopInstance(iMeta)
					return 0, err
				}
				iMeta.Ports[portName] = hostPort
			}
		}
	}

	err = m.startNetwork(iMeta, cMeta.ChallengeOptions.NetworkOptions)
	if err != nil {
		return 0, err
	}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-01T15:48:48Z

+	// Set port range to test bitset optimization (10000-63000)
+	// This ensures getFreePort uses the bitset logic instead of ephemeral ports
+	mgr.portLow = 10000
+	mgr.portHigh = 63000
+


The setup sets mgr.portLow/portHigh to enable port-range logic, but the instance population loop relies on finalizeInstance() persisting InstanceMetadata.Ports. With the new logic, finalizeInstance() only inserts portAssignments when portLow==0, so these preloaded instances won’t actually create portAssignments rows, skewing the DB-load/profile results. Consider either reserving ports via reservePort() during setup or temporarily disabling port reservation for the population phase (as other tests do) so portAssignments reflect reality.

Suggested change

// Set port range to test bitset optimization (10000-63000)

// This ensures getFreePort uses the bitset logic instead of ephemeral ports

mgr.portLow = 10000

mgr.portHigh = 63000

Copilot · 2026-04-01T15:48:48Z

+	// Output to CSV for easy comparison
+	csvFile := fmt.Sprintf("profile_results_%d.csv", instanceCount)
+	f, err := os.Create(csvFile)
+	if err != nil {
+		t.Logf("Warning: could not create CSV file: %s", err)
+	} else {
+		defer f.Close()
+		fmt.Fprintf(f, "metric,duration_ms\n")
+		fmt.Fprintf(f, "lookupBuildMetadata,%f\n", step1Duration.Seconds()*1000)
+		fmt.Fprintf(f, "openInstance,%f\n", step2Duration.Seconds()*1000)
+		fmt.Fprintf(f, "GetChallengeMetadata,%f\n", step3Duration.Seconds()*1000)
+		fmt.Fprintf(f, "getReversePortMap,%f\n", step4Duration.Seconds()*1000)
+		fmt.Fprintf(f, "reservePort_total,%f\n", step5Duration.Seconds()*1000)
+		fmt.Fprintf(f, "total,%f\n", totalDuration.Seconds()*1000)
+		t.Logf("\nResults written to %s", csvFile)
+	}


This test writes CSV files (e.g. profile_results_*.csv) into the current working directory. That can pollute the repo checkout and/or fail in read-only CI environments. Prefer writing to t.TempDir() (or skipping file output entirely) so artifacts are isolated and automatically cleaned up.

Copilot · 2026-04-01T15:48:49Z

+	numPorts := m.portHigh - m.portLow + 1
+
+	for attempt := 0; attempt < 50; attempt++ {
+		// Get currently used ports to find candidates
+		bitset, err := m.usedPortBitset()
+		if err != nil {
+			return 0, err
+		}
+
+		port := rand.Intn(numPorts) + m.portLow
+		var candidate int


reservePort uses the package-global math/rand (rand.Intn) which is never seeded anywhere in this repo (no rand.Seed found). That makes port selection deterministic across process starts, increasing collision probability under concurrent launches and making behavior harder to reason about. Prefer using a properly-seeded RNG here (e.g., seed the global source once at startup, or use a per-Manager RNG with appropriate locking).

Copilot · 2026-04-01T15:48:50Z

+	// Need a dummy instance ID from setup
+	var instId InstanceId
+	mgr.db.QueryRow("SELECT id FROM instances LIMIT 1").Scan(&instId)
+
+	port, err := mgr.reservePort(instId, "test")
+	if err != nil {
+		t.Fatalf("reservePort failed: %s", err)
+	}


These tests use QueryRow(...).Scan(&instId) without checking the returned error. If the query returns no rows (or the scan fails), instId will remain zero and the test may exercise reservePort with an unintended instance id. Please capture and assert the Scan error.

hi-liang · 2026-04-01T15:56:46Z

@copilot apply changes based on the comments in this thread

…seeding Agent-Logs-Url: https://github.com/picoCTF/cmgr/sessions/72ed8087-14fc-417e-a0a0-db0ef7df5918 Co-authored-by: hi-liang <3301067+hi-liang@users.noreply.github.com>

Copilot · 2026-04-01T16:04:35Z

@copilot apply changes based on the comments in this thread

All changes from the review have been applied in commit 7e3dfad:

profile_test_timestamps.go → renamed to profile_test_timestamps_test.go so it's excluded from non-test builds
TestProfileInstanceLaunchWithDBLoad now skips unless CMGR_TEST_PROFILE=1 is set (opt-in instead of just testing.Short())
CSV output now writes to t.TempDir() instead of the current working directory
reservePort rand seeding: moved the per-Manager RNG init into initDatabase (so it's always seeded even in tests that bypass NewManager), added a randMu sync.Mutex to Manager for safe concurrent access, and changed reservePort to use m.rand.Intn under that mutex instead of the package-global rand.Intn
Error checks added for addChallenges, openBuild, and openInstance in concurrency_test.go and database_operations_test.go
QueryRow(...).Scan errors are now checked and fatal in both places in database_operations_test.go

hi-liang added 4 commits March 12, 2026 00:48

feat(db): atomic port reservations via SQLite

2432258

- Adds is_finalized to instances table - Implements transaction-safe reservePort - Tracks IsFinalized and launchSemaphore in Manager - Marks instances finalized via batch query

test(concurrency): validate atomic port reservations

c5a3f49

- Adds 50-routines parallel checkout reservePort suite - Refactors profile and db integration tests removing getFreePort - Updates mock DB queries preventing SQLite foreign key errors

add missing test util

3a9c895

hi-liang requested a review from Copilot April 1, 2026 15:39

Copilot started reviewing on behalf of hi-liang April 1, 2026 15:39 View session

Copilot AI reviewed Apr 1, 2026

View reviewed changes

Copilot started work on behalf of hi-liang April 1, 2026 15:56 View session

fix: address PR review feedback - test isolation, error checks, rand …

7e3dfad

…seeding Agent-Logs-Url: https://github.com/picoCTF/cmgr/sessions/72ed8087-14fc-417e-a0a0-db0ef7df5918 Co-authored-by: hi-liang <3301067+hi-liang@users.noreply.github.com>

Copilot finished work on behalf of hi-liang April 1, 2026 16:05

hi-liang and others added 8 commits April 1, 2026 12:14

Merge branch 'master' into feat-launch-concurrency

3401dac

fix: merge conflicts, unused imports

de2fdde

test: add more concurrency checks

ea7e154

style: go fmt

3f3af60

Cleanup on newInstance failure

3d233b7

reuse port map

84e00c4

query used ports once before search

d75a674

test: consolidate timestamp wrappers

7681ff2

hi-liang merged commit 8d250ff into master Apr 7, 2026
1 check passed

hi-liang deleted the feat-launch-concurrency branch April 7, 2026 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat launch concurrency#44

Feat launch concurrency#44
hi-liang merged 13 commits intomasterfrom
feat-launch-concurrency

hi-liang commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Uh oh!

hi-liang commented Apr 1, 2026

Uh oh!

Copilot AI commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hi-liang commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

hi-liang commented Apr 1, 2026

Uh oh!

Copilot AI commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hi-liang commented Mar 13, 2026 •

edited

Loading