feat: introduce SNYK_REQUEST_CONCURRENCY for dependency request parallelism#6756
Conversation
…lelism Add a tunable concurrency knob for in-flight dependency-test/dependency-monitor HTTP requests, default 10 (raised from the prior hard-coded 5), clamped to [1, 50]. Override via the SNYK_REQUEST_CONCURRENCY environment variable. Apply the new helper at the existing pMap call site in sendAndParseResults (run-test.ts). The default bump is effectively a no-op for non-container test workloads (single-project tests produce one payload; --all-projects rarely produces more than the prior 5-payload ceiling), but materially improves wall-clock for container tests that produce one ScanResult per directory containing dependencies. A follow-up PR will adopt the same helper in the container monitor path, which is currently fully sequential.
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
|
Benchmark —
|
| Configuration | Mean wall-clock | vs baseline |
|---|---|---|
baseline (c=5) |
68.04 s ± 1.72 s | 1.00× |
PR A default (c=10) |
40.96 s ± 0.15 s | 1.66× faster (−40%) |
PR A override (c=20) |
25.13 s ± 0.48 s | 2.71× faster (−63%) |
Each config: 3 runs after 1 warmup; image pre-pulled. Min/max within ±2% of mean across all configs — variance is tight.
Standard deviation drops with higher concurrency (1.72s → 0.15s → 0.48s) because the wait-time component shrinks and the scan becomes more CPU-bound on the (much steadier) JAR-extraction work in snyk-docker-plugin.
This comment has been minimized.
This comment has been minimized.
| export const RETRY_ATTEMPTS = 3; | ||
| export const RETRY_DELAY = 500; | ||
|
|
||
| const DEFAULT_REQUEST_CONCURRENCY = 10; |
There was a problem hiding this comment.
Thinking: Generally changing a default without visibility in the behaviour, we currently don't have any metrics on CPU consumption, makes me wonder if we should be more careful. The impact is quite huge as it affects all SCA and Container scans.
There was a problem hiding this comment.
I don't think there'll be significant CPU impact. Node executes JS on a single thread, and the code path this PR touches spends nearly all its time waiting on HTTP responses, not in JS execution.
On the "affects all SCA and Container scans" concern, I agree. A few things that make me comfortable with the bump:
- headroom to throttling is seemingly large. I put more discussion here. The api-gateway rate limit on /v1/test* and /v1/monitor* is keyed per principal_id (per-token), default high bucket: 200 req/s burst, 2000/min, 60000/hour. Even at the override ceiling (50), a single CLI scan stays at ≤25% of the per-second cap.
- 5 is a conservatively low bound and 10 is still quite modest as a default.
- OS flows team reviewed and approved the PR 😄
Other options:
- we could land a 5 to 8 bump first and revisit after we have telemetry?
- add a feature-flag-style override/escape hatch to bump it back to 5
- any other ideas?
What do you think?
There was a problem hiding this comment.
Makes sense for a single CLI run. A lot of our user actually run the CLI highly concurrent and already experience rate limiting. In the combination in the best case, the increase of request parallelism and rate limiting might just equal out, in the worst case the rate limiting will eventually exceed the retry limit and fail the CLI.
Address review feedback on the test/monitor request-concurrency knob: - Restore the default to 5 (the prior hard-coded value), so the env-var introduction is purely a configurability change. The default-bump question can be revisited separately once we have telemetry, per Peter's review. - Make the GAF configuration the single source of truth for the user-facing SNYK_REQUEST_CONCURRENCY value: register a new cliv2.ConfigKeyRequestConcurrency key, with snyk_request_concurrency as an alternative key (so the env var feeds the config). The Go side forwards the resolved value to the legacy CLI process via the internal SNYK_INTERNAL_REQUEST_CONCURRENCY env var. The TS helper now reads that internal env var instead of the user-facing one, leaving the public configuration surface owned by Go (and reachable in the future from config files / flags without further TS changes). - Add the new internal env var to the legacy-CLI env blacklist so a user can't bypass the Go config by setting it directly. - A new env var (not the existing MAX_THREADS) keeps HTTP request concurrency separate from the CPU-bound thread pool, per F2F.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Add unit coverage for fillEnvironmentFromConfig's handling of the new ConfigKeyRequestConcurrency: forwards a user-set value to the legacy CLI as SNYK_INTERNAL_REQUEST_CONCURRENCY, omits the internal env when unset, and strips a user-provided internal env so Go remains the source of truth.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Under main.go's WithSupportedEnvVarPrefixes setup (the production config), GAF's IsSet does not pre-bind env vars for alternative keys — only get() does. As a result, config.IsSet(ConfigKeyRequestConcurrency) returned false even when SNYK_REQUEST_CONCURRENCY was set, so the internal env var was never forwarded to the legacy CLI process and the TS code always saw the default concurrency. Switch to GetString and check non-empty: GetString goes through GAF's get(), which binds the alt key before reading. The original unit test passed only because it used WithAutomaticEnv, which bypasses bindEnv entirely and so masked the production behavior. Update the test to construct the config the way main.go does (with WithSupportedEnvVarPrefixes), so the regression is caught next time.
PR Reviewer Guide 🔍
|
What does this PR do?
Adds a tunable concurrency knob for in-flight dependency-test / dependency-monitor HTTP requests, configurable via the user-facing
SNYK_REQUEST_CONCURRENCYenv var (default 5, clamped to[1, 50]).The default is unchanged from the prior hard-coded
MAX_CONCURRENCY = 5— this PR is purely a configurability change. Bumping the default can be revisited separately once we have telemetry on CPU and rate-limit impact (per the review thread).The user-facing env var is owned by the Go CLI's GAF configuration, with the resolved value forwarded to the legacy CLI via the internal
SNYK_INTERNAL_REQUEST_CONCURRENCYenv var. This keeps Go as the single source of truth for application configuration (env vars, future config files / flags, etc.), per Peter's review feedback.Wiring
cliv2.ConfigKeyRequestConcurrency(internal_request_concurrency) — new GAF config key.main.goregisterssnyk_request_concurrencyas an alternative key, so the env var feeds the config.fillEnvironmentFromConfigwrites the resolved value toSNYK_INTERNAL_REQUEST_CONCURRENCYfor the legacy CLI process.getRequestConcurrency()insrc/lib/snyk-test/common.tsreadsSNYK_INTERNAL_REQUEST_CONCURRENCY(the internal contract from Go), defaults to 5, clamps to[1, 50].MAX_CONCURRENCY = 5constant at the existingpMapcall site insendAndParseResults(src/lib/snyk-test/run-test.ts).A follow-up PR (#6757) adopts the same helper in
src/lib/ecosystems/monitor.ts:monitorDependencies, which is currently fully sequential.Why?
snyk container testproduces oneScanResultper directory of dependencies in the image (lib/analyzer/applications/java.ts:groupJarFingerprintsByPathinsnyk-docker-plugin). For Java-heavy images this can be hundreds of ScanResults, each becoming a separatePOST /test-dependenciesrequest. With the prior hard-coded concurrency cap of 5, the request fan-out is the dominant wall-clock cost — and there was no escape hatch.This PR adds the escape hatch without changing default behavior:
--all-projectsruns are unaffected.--all-projectsruns default to the same throughput as today, but can opt into higher concurrency via the env var.Benchmark —
quay.io/wildfly/wildfly:34.0.1.Final-jdk21(512 ScanResults)hyperfine --warmup 1 --runs 3against a locally-built PR binary, varying onlySNYK_REQUEST_CONCURRENCY. Thec=5row is the default and reproduces the prior hard-coded behavior.c=5)c=10)c=20)Min/max within ±2% of mean across all configs — variance is tight. Standard deviation drops with higher concurrency (1.72s → 0.15s → 0.48s) because the wait-time component shrinks and the scan becomes more CPU-bound on the (much steadier) JAR-extraction work in
snyk-docker-plugin.(I re-validated end-to-end on the latest revision with 2 hyperfine runs per config; min(c=20)=53s < min(c=5)=64s confirms the env var continues to flow through Go → TS as expected.)
Where should the reviewer start?
cliv2/internal/cliv2/cliv2.go— new exportedConfigKeyRequestConcurrency;fillEnvironmentFromConfigforwarding; blacklist entry.cliv2/internal/constants/constants.go—SNYK_INTERNAL_REQUEST_CONCURRENCY_ENVconstant.cliv2/cmd/cliv2/main.go—AddAlternativeKeysregistration for the user-facing env var name.src/lib/snyk-test/common.ts—getRequestConcurrency()helper.src/lib/snyk-test/run-test.ts— single call-site swap.cliv2/internal/cliv2/cliv2_test.go—Test_PrepareV1EnvironmentVariables_RequestConcurrency(3 sub-cases: forwarded-when-set, omitted-when-unset, user-set-internal-var-stripped-and-reapplied).test/jest/unit/lib/snyk-test/common.spec.ts— 9 unit tests for the helper covering default, override, clamping, and invalid-input cases.How should this be manually tested?
node bin/snykdirectly — the Go wrapper is what readsSNYK_REQUEST_CONCURRENCYand forwards it asSNYK_INTERNAL_REQUEST_CONCURRENCY.)Risk assessment
Low. No default behavior change. The override is bounded to
[1, 50]. The Go-side wiring is additive: the legacy CLI's env handling stays the same except for the new internal var (covered by the new Go test).Background
Supersedes #6747 (which targeted the wrong code path —
src/lib/ecosystems/test.ts:testDependenciesis unreachable fromsnyk container test;getEcosystemForTestreturns null for docker, so container test goes throughrunTest/pMapinstead).