Interprocedural taint analysis for Spring Boot applications, built on Tai-e. Detects multi-layer data-flow vulnerabilities that conventional tools such as SonarQube cannot reach.
Detects 12 vulnerability classes across 7 frameworks, including cross-layer, reactive, cross-service, and cross-request stored injection — 40 of 41 vulnerable benchmark cases with 0 false positives (the near-miss layer catches the last one and flags attempted-but-incorrect sanitization). Ships as a CLI, a self-contained jar, a Docker image, and a GitHub Action with SARIF 2.1 output.
Consider this seemingly harmless Spring Boot code:
// Controller
@GetMapping("/users")
public List<User> search(@RequestParam String name) {
return userService.search(name);
}
// Service
public List<User> search(String name) {
String filtered = nameFilter(name); // looks like sanitization, but isn't
return userRepo.findByName(filtered);
}
// Repository
public List<User> findByName(String name) {
return jdbc.query(
"SELECT * FROM users WHERE name = '" + name + "'", // 🚨 SQL Injection
mapper
);
}The value comes from @RequestParam, crosses the service and repository layers, and reaches a SQL query without sanitization. A trivial payload like name = ' OR '1'='1 exposes the whole table.
SonarQube does not detect this path. It only flags the case where the sink is in the same method as the source. The real vulnerability lives in flows that cross multiple layers — and that is exactly where this project operates.
Taint analysis tracks the flow of untrusted data through a system using three concepts:
[SOURCE] ──► data flow ──► [SANITIZER?] ──► [SINK]
│
if absent → alert
- Source — where external data enters:
@RequestParam,@RequestBody,@KafkaListener - Sanitizer — what cleans the data:
HtmlUtils.htmlEscape(), parameterized queries,@Valid - Sink — where dangerous data is consumed:
JdbcTemplate.execute(),Runtime.exec(),response.write()
If data flows from a source to a sink without passing through a sanitizer → potential vulnerability.
The analysis is interprocedural: it tracks data across methods, classes, and abstraction layers — not just within a single function.
This project does not replace SonarQube. They serve different purposes:
| Tool | Purpose | Interprocedural taint |
|---|---|---|
| SonarQube | General quality + bugs + simple vulnerabilities | ❌ |
| Semgrep OSS | Static code patterns | ❌ |
| Semgrep Pro | Interprocedural taint | ✅ — but paid |
| Checkmarx / Veracode | Full enterprise SAST | ✅ — but expensive |
| Spring Taint Analyzer | Interprocedural taint for Spring Boot | ✅ — free |
Where it differs in practice — the Spring-specific capabilities that depend on real interprocedural taint:
| Capability | Spring Taint | SonarQube (free) | Semgrep OSS |
|---|---|---|---|
| Interprocedural taint (across methods/layers) | ✅ | ❌ | ❌ |
@KafkaListener / @FeignClient as sources |
✅ | ❌ | ❌ |
MultipartFile / @MatrixVariable as sources |
✅ | ❌ | ❌ |
| Conditional sanitizers | ✅ | ❌ | ❌ |
| Cross-request stored injection | ✅ | ❌ | ❌ |
WebFlux / Reactor (Mono / Flux) |
✅ | ❌ | ❌ |
| JPQL / template / JNDI / XXE injection | ✅ | ❌ | ❌ |
Spring Security & application.yml misconfig |
✅ | ❌ | ❌ |
| Near-miss sanitizer detection (wrong/insufficient sanitization) | ✅ | ❌ | ❌ |
| Autofix — applies the fix (parameterized query / output escaping) | ✅ | ❌ | ❌ |
| Per-finding confidence score | ✅ | ❌ | ❌ |
| Diff mode + baseline for pull requests | ✅ | ❌ | partial |
| SARIF 2.1 output | ✅ | ✅ | ✅ |
| Free / open source | ✅ | ✅ | ✅ |
Semgrep Pro has interprocedural taint but is a paid product (~$35/dev/month). Spring Taint Analyzer delivers the Spring Boot–focused equivalent, free.
Expected use in a CI pipeline:
- sonarqube scan # general quality, code smells, coverage
- spring-taint scan # deep data-flow vulnerabilitiesValue proposition in one sentence: you already use SonarQube — this project detects what it cannot see.
Built on Tai-e (Nanjing University, ISSTA 2023), a modern static-analysis framework for Java. Tai-e solves the hard parts — call-graph construction, context-sensitive pointer analysis, and interprocedural IFDS taint propagation. Our work is the Spring layer on top of it.
Spring Boot project
│ compile (Maven / Gradle)
▼
Bytecode (.class / JAR) ← analysis runs here, not on source
│
▼
Tai-e: Call Graph + Pointer Analysis
│
▼
IFDS Taint Propagation
│
▼
Spring Source/Sink Config ← our differentiator
(@RequestParam, @KafkaListener,
JdbcTemplate, Runtime.exec…)
│
▼
SARIF 2.1 report
(terminal / GitHub / GitLab / VS Code)
Operating on bytecode (not source) gives precise inheritance/generics resolution, analysis of third-party dependencies without source, and independence from any IDE or build system.
spring-taint/
├── config/
│ └── spring-taint.yml # default Spring sources/sinks/sanitizers (Tai-e format)
├── docs/
│ └── design/ # technical scope & design notes
├── spring-taint-engine/ # analyzer: CLI, config loader, Tai-e adapter, SARIF reporter
└── spring-taint-benchmark/ # intentionally vulnerable Spring Boot cases + ground truth
Like FlowDroid's DroidBench, this repo ships a benchmark of intentionally vulnerable (and intentionally safe) Spring Boot cases. Every advertised detection is validated against it before release.
The benchmark has 44 cases (41 vulnerable, 3 safe) across SQL and JPQL injection
(direct, through-service, four-layer, via Kafka and RabbitMQ, reactive R2DBC), reflected,
conditional-sanitizer and cross-request stored XSS, SSRF, SpEL, JNDI, XXE,
template injection (SSTI), log injection, path traversal, command injection, and
open redirect — with sources from Spring (@RequestParam, @PathVariable,
@RequestBody, @RequestHeader, @MatrixVariable, MultipartFile),
@KafkaListener, @RabbitListener, JAX-RS (@QueryParam), @Repository reads,
@FeignClient results, @Scheduled jobs and @Transactional write-then-read,
plus taint flowing through Optional / CompletableFuture wrappers. Ground truth is
in expected.yml.
Current engine result: 40 of 41 vulnerable cases detected by the taint engine alone,
0 false positives on the 3 safe cases; the near-miss layer (--src) catches the
remaining wrong-context flow (41) and explains the rest. Full table: benchmark README.
Per-rule reference: docs/rules.md.
Positive cases measure recall; safe cases measure precision.
Beyond the synthetic benchmark, the analyzer is run against real OSS apps, across both
Spring Boot 3 (jakarta) and Spring Boot 2 (javax):
- spring-petclinic (clean, Boot 4.0) — engaged correctly (9 entry points, 12 sources) and reported 0 false positives, plus one legitimate config finding the project itself flags as production-unsafe.
- spring-petclinic-rest (clean, larger — ~126 classes, real
JdbcTemplateusage) — 0 false positives at scale (31 entry points, 46 sources; analysis ~0.2s), with the DTO/entity modelling active (1964 bean-accessor + 4 bean-copy transfers). - mall-tiny (clean, Boot 2.7 / MyBatis) — a real app that calls
BeanUtils.copyPropertiesand Lombok@Builder; the bean-copy transfer fires and still reports 0 false positives (82 classes, 33 entry points, 54 sources). - sql-injection-web (vulnerable) — found the cross-layer SQL injection (controller → repository, two files) at 99% confidence and generated the fix.
- Contrast vulnerable-spring-boot-application (vulnerable, Boot 2 /
javax, value via a@RequestParam Map) — found the cross-layer SQL injection at 99%.
See docs/validation.md.
The benchmark proves recall on synthetic cases; these are public CVEs of the same bug classes in the wild. The analyzer reasons over application bytecode and reports the interprocedural source-to-sink form of each flow — so it catches these patterns when the vulnerable call lives in the analyzed code, not only inside a third-party library.
| Class | Detector | Representative public CVE | The data flow |
|---|---|---|---|
| SQL injection (CWE-89) | sql-injection |
CVE-2020-5427 / CVE-2020-5428 — Spring Cloud Data Flow / Task | a request-controlled sort column is concatenated into the task-execution query |
| SQL injection (CWE-89) | sql-injection |
CVE-2016-6652 — Spring Data JPA | a Sort value from the request reaches the generated SQL (blind SQLi) |
| SQL injection (CWE-89) | sql-injection |
CVE-2024-54762 — RuoYi (Spring Boot admin) | an authenticated request parameter reaches a query without sanitization |
| SpEL injection (CWE-917) | spel-injection |
CVE-2018-1273 — Spring Data Commons | a crafted request payload property path is evaluated as a SpEL expression |
Each is a request value flowing across methods into a sink with no sanitizer in between — exactly the shape this tool tracks. (For a CVE whose vulnerable call sits in a framework rather than the app, the analyzer reports it only when that call is reached from the analyzed code.)
- Phase 1 — Spring MVC (MVP): SQL injection, XSS, path traversal, command injection, SSRF, SpEL injection, open redirect. Sources:
@RequestParam,@PathVariable,@RequestBody,@RequestHeader,@CookieValue,@ModelAttribute, servlet API. Exit criterion: detect every benchmark case with zero false negatives and precision > 80%. - Phase 2 — gaps left open by existing OSS tools (done):
@KafkaListenerand@RabbitListeneras sources, conditional sanitizers, custom method sanitizers, cross-request stored injection, WebFlux / async (Mono/Fluxas transparent taint wrappers). - Phase 3 — multi-framework & robustness (done): JAX-RS / Quarkus and Micronaut sources; JNDI / XXE / template / JPQL / log-injection sinks;
@FeignClient,@Scheduledand@Transactionalsources; configuration and misconfiguration audits. - Phase 4 — roadmap: gRPC sources, an IntelliJ plugin, and publishing the image to GHCR.
The full technical scope lives in docs/design/spring-taint-scope.md.
The commands (the runnable java -jar …/spring-taint-all.jar form is in
Building; scan needs the target's dependency classpath via --libs):
# Basic scan
spring-taint scan target/classes --libs "<dependency classpath>"
# Custom configuration (merged onto the built-in rules)
spring-taint scan target/classes --libs "…" --config spring-taint.yml
# SARIF output (GitHub Advanced Security, GitLab SAST, VS Code)
spring-taint scan target/classes --libs "…" --output results.sarif
# Filter by severity, show the full trace
spring-taint scan target/classes --libs "…" --severity critical,high --verbose
# Only findings touching files changed vs a base ref (fast PR scans)
spring-taint scan target/classes --libs "…" --diff origin/main
# Near-miss notes + suggested parameterized-query fixes (needs the sources)
spring-taint scan target/classes --libs "…" --src src/main/java --suggest-fixes
# Apply the high-confidence fixes to the source
spring-taint scan target/classes --libs "…" --src src/main/java --fix
# Adopt on a legacy codebase: record today's findings, then fail only on NEW ones
spring-taint scan target/classes --libs "…" --baseline spring-taint-baseline.txtExample output (every taint finding carries a confidence score):
[CRITICAL] sql-injection (confidence: 95%)
Source: UserController.java:28 - search() - tainted parameter
Flow: UserController.search() -> UserService.search() -> UserRepository.query()
Sink: UserRepository.java:27 - sink: query()
Sanitizer: none detected
Teams can add their own rules in Tai-e's YAML format:
sources:
- { kind: call, method: "<com.myapp.LegacyInput: java.lang.String readUserData()>", index: result }
- { kind: param, method: "<com.myapp.EventHandler: void onEvent(java.lang.String)>", index: 0 }
sinks:
- { method: "<com.myapp.LegacyDao: void rawExecute(java.lang.String)>", index: 0 }
sanitizers:
- { method: "<com.myapp.Validator: java.lang.String sanitize(java.lang.String)>", index: 0 }Requires JDK 17+ and Maven.
mvn -q clean package # build engine + benchmark
mvn -q -pl spring-taint-benchmark package # compile the benchmark cases onlyBuild the self-contained jar and scan compiled classes. Pass the target's
dependency classpath with --libs so framework types like JdbcTemplate resolve
(the taint config is bundled, so --config is optional):
java -jar spring-taint-engine/target/spring-taint-all.jar \
scan target/classes --libs "$(... your dependency classpath ...)" \
--output results.sarifRun the analyzer process on a JDK 17 runtime. The application under analysis can be compiled with a newer JDK — recompiling the benchmark to Java 21 bytecode (major 65) and scanning it yields identical results. The JDK 17 requirement is on the analyzer's own process (Tai-e's invokedynamic handling trips on the JDK 21 runtime library), not on the bytecode version of the code being scanned.
A custom --config is merged onto the built-in rules (use --no-default-config
to replace them instead).
Three pattern-based scans (any JDK, no taint engine) complement the taint analysis:
# Hardcoded secrets in bytecode — secret-named constants, known key formats
# (AWS, GitHub, …), and @Value defaults
java -jar …/spring-taint-all.jar secrets target/classes
# Insecure settings in application*.yml / .properties — hardcoded secrets,
# disabled TLS, Security auto-config excluded, Actuator "*", H2 console
java -jar …/spring-taint-all.jar config src/main/resources
# Insecure Spring code in bytecode — csrf()/frameOptions().disable(),
# @CrossOrigin("*"), insecure cookies, sensitive data logged
java -jar …/spring-taint-all.jar misconfig target/classesRun the analyzer in CI and upload findings to GitHub code scanning. The action runs as a Docker container on JDK 17; give it the compiled classes and the dependency classpath:
- uses: actions/checkout@v4
- uses: actions/setup-java@v4
with: { distribution: temurin, java-version: '17' }
- run: mvn -B -ntp package -DskipTests
- id: cp
run: echo "value=$(mvn -q dependency:build-classpath -Dmdep.outputFilterFile=/dev/stdout)" >> "$GITHUB_OUTPUT"
- name: Spring Taint Analysis
uses: GabrielBBaldez/spring-taint@main
with:
path: target/classes
libs: ${{ steps.cp.outputs.value }}
output: results.sarif
severity: critical,high
- uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: results.sarifSee action.yml for all inputs. This repo also scans its own
benchmark on every push — see .github/workflows/ci.yml.
examples/pr-security.yml is a copy-paste workflow for
your own project: on every PR it scans only the changed code (--diff), uploads
SARIF so findings appear inline in the PR (GitHub code scanning), and posts the
suggested fixes (parameterized queries / output escaping) as a PR comment. Pair
it with --baseline to gate only on newly introduced issues.
A web console (React + Vite + TypeScript) visualizes the SARIF output: severity
breakdown, findings by rule, and the full source → sink taint flow for each
finding. Drop a .sarif file to load your own report. See
dashboard/.
cd dashboard && npm install && npm run dev # → http://localhost:4321- Scope and positioning
- Engine choice (Tai-e)
- Gap mapping vs. competitors
- Project scaffold (Maven multi-module, CLI skeleton, config loader, SARIF model)
- Initial benchmark: SQL injection (direct / through-service / via-Kafka / safe), reflected XSS, path traversal, command injection
- Engine: Tai-e IFDS wired end-to-end on the benchmark
- Spring source layer: annotation → Tai-e param-source generation
- Functional CLI with SARIF output
- precision/recall on the current benchmark — 30/30 vulnerable cases detected, 0 false positives across SQL / JPQL injection (direct / through-service / four-layer / via-Kafka / reactive R2DBC), reflected / conditional / cross-request stored XSS, SSRF, SpEL, JNDI, XXE, template injection (SSTI), log injection, path traversal, command injection, open redirect; multi-framework sources (Spring MVC/WebFlux, Kafka, JAX-RS/Quarkus, Micronaut,
@Repositoryreads,@FeignClient,@Scheduled) - Phase 2 differentiators:
@KafkaListenersource, conditional sanitizers, stored / second-order injection - GitHub Action (Docker) + self-contained jar + CI workflow
- Web dashboard (React + Vite) for SARIF reports
- Hardcoded-secrets scanner (
secretscommand); mergeable--config - Robustness pass: JNDI / XXE / log / template / JPQL sinks, file-upload and
@MatrixVariablesources,Optional/CompletableFuturetaint transfers, framework-internal sink filtering - Configuration & misconfiguration audits:
config(insecureapplication.yml/.properties) andmisconfig(CSRF/clickjacking disabled, CORS*, insecure cookies, sensitive data logged) - Adoption: per-finding confidence score (console + SARIF),
scan --diff <ref>for fast pull-request scans, inline// spring-taint: suppresscomments (--src/suppressions), andvalidate-configto catch typo'd custom rules - Advanced sources:
@FeignClientresults (cross-service),@Scheduledjobs as entry points, and@Transactionalwrite-then-read stored injection - Near-miss sanitizers (
--src): flags insufficient (quote-stripping), blacklist, discarded-result, and wrong-context sanitization — the "I'm sure this is safe" class of bug - Autofix (
--suggest-fixes/--fix): rewrites a concatenated SQL query into a parameterized one; verified end-to-end (applying the fixes drops the benchmark's SQL findings 15 → 1 and the patched code compiles) - Unit-test suite for the scanners (18 tests, with regression coverage); the bytecode scanners (
secrets/misconfig) read any-JDK class files (ASM 9.7) - Autofix covers XSS (wrap in
HtmlUtils.htmlEscape) as well as SQL; baseline mode (--baseline) to adopt on a legacy codebase and gate CI on new findings only
Static analysis has inherent limits. For this project:
- Java reflection (
Class.forName(),Method.invoke()) can break the flow - Spring dynamic proxies (AOP / CGLib) introduce indirection that may break the call graph
- Entity / DTO field tracking — sources are
String-only (a@Repository/@FeignClientreturning a DTO whose getter is later read is not followed), a deliberate precision-over-recall choice - Complex lambdas / method references — partial coverage via Tai-e
- The taint analysis process runs on JDK 17 — it analyzes applications built with newer JDKs fine (verified on Java 21 bytecode); the JDK 17 requirement is the tool's own runtime, not a limit on the scanned code
Each release documents its limitations explicitly, alongside the test cases that exercise them.
Contributions are welcome — see CONTRIBUTING.md for the dev setup, how to add a benchmark case, and the PR checklist. Please also read the Code of Conduct. To report a security issue, see SECURITY.md.
Built on Tai-e (Nanjing University), which provides the call-graph construction, pointer analysis, and IFDS taint propagation. Tai-e is licensed under LGPL-3.0; this project depends on it as a library.
MIT © Gabriel Baldez.
