Date: 2026-03-23 Scope: Security, performance, code quality, dashboard UX Codebase: Phase 10 (Hardened Cgroup Agents)
- Security Vulnerabilities
- Performance Issues
- Code Quality
- Dashboard UX & Usability
- Missing Features
- Recommendations
Files: guardian-ebpf/src/main.rs:413-421, 533-536
The tracepoint->PENDING_DENY->LSM pattern has a race on multi-core systems:
- CPU 1: Tracepoint inserts
pid_tgidinto PENDING_DENY - CPU 2: Same process, different
openatcall, also inserts (overwrites) - CPU 1: LSM hook fires, consumes (deletes) the entry
- CPU 2: LSM hook fires, entry is gone — access ALLOWED despite policy deny
BPF HashMap insert is not atomic with respect to the tracepoint→LSM sequence. High-frequency file access on multi-core systems can slip through.
Impact: Enforcement bypass under load. Single-threaded workloads unaffected.
Files: guardian-ebpf/src/main.rs:64, 138, 155, 180-190
All PENDING maps (PENDING_DENY, PENDING_EXEC_DENY, PENDING_NET_DENY, etc.) are
sized at max_entries=4096. Insertion failure is silently dropped:
let _ = PENDING_DENY.insert(&pid_tgid, &1u8, 0); // error ignoredAn attacker spawning 4096+ concurrent processes can fill the map. Process 4097's tracepoint inserts nothing, LSM finds nothing, access is allowed.
Impact: A local attacker with fork access can bypass enforcement.
File: guardian/src/ipc.rs:1147-1162
Grant accumulation limits are checked AFTER the grant is already sent to the agent via oneshot channel. The code warns but doesn't block:
if accumulated > max_total {
warn!("Grant accumulation exceeded...");
// Note: We still allow it but log the warning. The grant was already
// communicated to the agent via oneshot. Future enhancement: block before
// sending the decision.
}Impact: Agents can stack unlimited grants by requesting different resources.
Files: guardian-ebpf/src/main.rs:918, guardian/src/main.rs:1498
IPv4 addresses are read with from_ne_bytes() (native endian) but sockaddr_in
stores them in network byte order (big-endian). On x86_64 (little-endian),
addresses are reversed: 192.168.1.1 becomes 1.1.168.192 in logs and policy
evaluation.
Impact: Network policies may fail to match addresses. Logs show wrong IPs.
File: guardian-ebpf/src/main.rs:413-421
eBPF evaluates policy at syscall ENTRY (tracepoint) but the file operation completes at syscall EXIT. Between tracepoint and kernel file open, the path can be swapped (symlink race):
openat("/tmp/safe")— tracepoint allows- Attacker replaces
/tmp/safewith symlink to/etc/shadow - Kernel follows symlink, opens
/etc/shadow - LSM finds no PENDING_DENY entry — access allowed
Mitigation: Landlock resolves this for cgroup agents (inode-level). Comm-based agents remain vulnerable.
File: guardian-launch/src/main.rs:520
Landlock grants read access to all of /etc. If eBPF fails to load or the LSM
hook doesn't fire, /etc/shadow is readable via Landlock alone. The eBPF deny
rules are the only protection for sensitive /etc/* files.
Impact: Defense-in-depth weakened. Landlock should be the primary enforcement
for cgroup agents, but it's permissive for /etc.
File: guardian-ebpf/src/main.rs:593-619
When a known dynamic linker is detected, eBPF reads argv[1] to find the real
binary. Between checking DYNAMIC_LINKERS and reading argv[1], a concurrent
thread can modify argv[1] in memory.
File: guardian-launch/src/main.rs:155-158
If drop_privileges() fails, the agent continues running as root:
Err(e) => {
warn!("Failed to drop privileges: {} (continuing as root)", e);
false
}Impact: Agent runs as root with Landlock, defeating principle of least privilege.
File: guardian-launch/src/main.rs:472-479
If agent config has file_access.default = "allow", Landlock returns Ok(())
silently with only a WARN log. Agent has no inode-level enforcement.
File: guardian/src/ipc.rs:164-182
Socket is removed, then rebound, then chmod'd — three non-atomic steps. An
attacker with write access to /run/ could race the socket creation. Partially
mitigated by peer credential check (requires UID 0).
File: guardian-ebpf/src/main.rs:320-345
Network enforcement checks only destination port, not IP address. Cannot block specific destinations while allowing others on the same port.
CLAUDE.md Known Limitation #18
sendto() without prior connect() bypasses both eBPF and Landlock network
enforcement. DNS exfiltration via UDP is unmonitored.
File: guardian/src/main.rs:1206-1209
Lost events are only logged and counted. In enforce mode, an attacker can flood events to fill perf buffers, causing legitimate denied accesses to go unlogged.
File: guardian/src/ipc.rs:1400-1405
Failed BPF map removals for expired grants are logged at DEBUG level. Leaked entries accumulate, potentially exhausting map capacity.
All template files
No CSRF tokens on any state-changing forms (POST/PUT/DELETE). The Bearer auth token provides weak CSRF protection but proper tokens should be used.
File: guardian-common/src/lib.rs:388-406
recv_message() calls read_exact() with no timeout. A malicious client can
send the length prefix but never the body, blocking the handler forever.
File: guardian/src/main.rs:1330-1340
find_agent_for_event() doesn't indicate which identification method (cgroup,
TGID, comm) actually matched. Audit trail lacks this information.
File: guardian/src/main.rs:1368
find_agent_for_event() iterates all agents linearly for every event. With 50
agents at 1000 events/sec = 50K comparisons/sec. Should use a HashMap cache.
File: guardian/src/ipc.rs:491-545
handle_register() holds the IPC state mutex while performing BPF map inserts
(syscalls). Each grant takes 2-5ms. With 10 concurrent permission requests,
IPC latency adds up to 20-50ms.
File: guardian/src/dashboard/db.rs:135-156
rusqlite is synchronous. Each event insert acquires a std::sync::Mutex and
performs blocking I/O inside a tokio task. At >1000 events/sec, this causes
tokio executor starvation.
File: guardian/src/dashboard/db.rs:135-156
Each event = 1 INSERT statement. Batching 100 events per transaction would be 10x faster (fewer fsync calls).
File: guardian/src/main.rs:487
Alert mpsc channel capacity is 4096. If webhook/email dispatch takes >4 seconds under sustained load, events drop silently.
File: guardian/src/permissions.rs:25-37
AgentRateLimit.recently_denied_resources is a HashMap that grows unbounded.
Cleanup happens only on queries, not on a TTL basis. Long-running daemons
accumulate memory.
File: guardian/src/permissions.rs:196-203
GrantAccumulator.grants stores grant history per (agent, resource) with no
expiry mechanism. Memory grows indefinitely.
File: guardian/src/main.rs:430
Broadcast channel capacity 1024. SSE clients and DB writer lag under >10K events/sec. Lagged subscribers skip events.
File: guardian/src/main.rs:659-681
SIGHUP reload acquires IPC mutex during TOML parsing. Fast (<10ms) but blocks concurrent IPC requests.
File: guardian-common/src/lib.rs:7
Paths >256 bytes are truncated and denied. Node.js and Java paths often exceed this limit, causing false denials.
- Excellent inline comments explaining eBPF concepts, Rust patterns, and security rationale throughout the codebase
- Comprehensive input validation in IPC handler (null bytes, path traversal,
size limits) —
ipc.rs:277-349 - Correct privilege dropping with verification (paranoia check) —
guardian-launch/src/main.rs:372-381 - Graceful LSM fallback — optional features don't crash the daemon
- Defense-in-depth design — 6 independent security layers
- Good error handling with
anyhow::Contextfor actionable error messages - Compile-time template checking via askama (catches typos at build time)
- Single binary deployment with rust-embed for static files
File: guardian/src/dashboard/routes/api.rs
Contains agent CRUD, policy updates, alert config, permission approval, config serialization, and metrics. Should be split into sub-modules.
Multiple locations where errors are dropped with let _:
guardian-ebpf/src/main.rs:421:let _ = PENDING_DENY.insert(...)— criticalguardian/src/ipc.rs:1400:debug!("Failed to remove...")— should be warnguardian/src/main.rs:831:let _ = deny_exact.insert(...)— policy rule lost
- 25+ unit tests for config, permissions, main
- Zero integration tests for IPC server, dashboard API, eBPF enforcement
- No load/stress tests
- No eBPF bytecode validation tests
app.jsis empty (only comments)log_levelfield in GlobalConfig triggers dead_code warning (acknowledged)- Some tokio features in Cargo.toml are unused
File: guardian/src/main.rs:710-716
Tasks are aborted on Ctrl+C. No drain period for in-flight IPC requests or pending permission decisions.
| Feature | Grade | Notes |
|---|---|---|
| Navigation | A- | Intuitive sidebar with icons+labels, active page highlight |
| Permission banners | A | Visible on all pages, risk colors, wait timers, type-to-confirm |
| Events page | B+ | Live/history tabs, filtering by severity/action/agent/path |
| Agent management | B+ | Cgroup vs comm badges, Tier 1/Tier 2 indicators, stop/grant |
| Policy editor | B | Per-agent editing, collapsible help panel |
| Mobile responsive | B | Hamburger menu, grid collapses, tables scroll |
| Color scheme | A- | Dark theme, WCAG AA contrast, semantic badge colors |
| SSE live updates | B+ | Single shared connection, connection status indicator |
Users must discover help hints (? icons) scattered across pages. No centralized
reference for glob patterns, risk levels, security tiers, or command syntax.
Cannot export events, audit trail, or policies as CSV/JSON. Required for SIEM integration and compliance audits.
Users can enter invalid patterns in policy editor. Invalid patterns fail silently at enforcement time with no UI feedback.
Cannot validate webhook URLs, Slack tokens, or SMTP credentials before saving. Users discover failures only when a real alert fires.
CLAUDE.md notes alerting config changes require daemon restart, but the UI doesn't warn users after saving alert configuration.
SQLite stores full permission audit history, but no UI exists to view it. The
/api/permissions/audit API endpoint exists but has no frontend.
No CPU/memory/PID usage graphs for cgroup agents. Admins can't see which agent is consuming the most resources.
No ? for help, j/k for navigation, or g <page> for quick jumps. Power
users expect these.
- Max 500 live events (older silently dropped)
- No time-range picker for history
- No CSV/JSON export
- Long paths truncated with ellipsis (hover shows full, but not copy-friendly)
Policy editor textareas (16 rows) dominate small screens. Grant popover uses absolute positioning that breaks on mobile.
Risk-colored banners (green/amber/red) lack pattern or icon differentiation. ~8% of males have color vision deficiency.
| Feature | Priority | Notes |
|---|---|---|
| IP/domain-based network policy | HIGH | Currently port-only; can't block specific destinations |
| DNS monitoring | HIGH | DNS resolution unmonitored; domain-based policy impossible |
| UDP enforcement | HIGH | sendto() without connect() bypasses all enforcement |
| Content hashing | MEDIUM | No integrity verification of accessed files |
| io_uring file operations | MEDIUM | Seccomp blocks io_uring setup, but if bypassed, all ops invisible |
| mmap_file LSM hook | LOW | Memory-mapped file access not monitored |
| Anomaly ML detection | LOW | Current rule-based anomaly detection is basic |
| Feature | Priority | Notes |
|---|---|---|
| Event export (CSV/JSON) | HIGH | Required for SIEM/compliance |
| Permission audit UI | HIGH | Backend exists, no frontend |
| Alert test button | MEDIUM | Validate webhook/Slack/email before saving |
| Policy diff/rollback | MEDIUM | No change history for policy edits |
| Agent resource graphs | MEDIUM | CPU/memory/PID usage per cgroup |
| Bulk operations | LOW | Apply same policy to multiple agents |
| Keyboard shortcuts | LOW | Power user productivity |
-
Fix IPv4 byte order: Change
from_ne_bytes()tofrom_be_bytes()inguardian-ebpf/src/main.rs:918. One-line fix, critical impact. -
Don't ignore PENDING map insert errors: In eBPF, if insert fails, set a flag that the LSM hook checks to deny by default (fail-closed for map full).
-
Enforce grant accumulation before approval: Move the limit check before the oneshot send in
ipc.rs.
-
Cache agent lookups: HashMap<comm, AgentConfig> rebuilt on config reload. Eliminates O(N) per-event lookup.
-
Batch SQLite inserts: Buffer 100 events, wrap in BEGIN/COMMIT transaction.
-
Add CSRF tokens: Synchronizer token pattern for all dashboard forms.
-
Make privilege drop mandatory: Fail if cannot drop to non-root, unless explicit
--allow-rootflag. -
Refine Landlock /etc paths: Move from blanket
/etcto specific files, or accept the tradeoff and document it.
- Add permission audit UI: Surface the SQLite audit trail in dashboard.
- Add event export: CSV/JSON download buttons on events page.
- Add alert test buttons: Validate webhook/email before saving.
- Implement IPC socket timeouts: 5-second read timeout in recv_message().
- Add integration tests: IPC server, dashboard API, enforcement flow.
- Split api.rs: Break into agents.rs, policies.rs, alerts.rs, permissions.rs.
- UDP enforcement: Hook
sendto()andsendmsg()syscalls. - Domain-based network policy: DNS interception or transparent proxy.
- Async SQLite: Migrate from rusqlite to sqlx for non-blocking writes.
- Dynamic BPF map resizing: Scale policy maps based on agent count.
- Per-agent resource dashboards: cgroup memory/CPU/PID graphs.
| Dimension | Current Limit | Bottleneck |
|---|---|---|
| Concurrent agents | 30-50 | Agent lookup O(N) per event |
| Policy rules per map | 1024 | BPF map max_entries |
| PENDING map entries | 4096 | Enforcement bypass when full |
| Events/sec (enforce) | 500 | Alert dispatch throughput |
| Events/sec (monitor) | 10K+ | Perf buffer capacity |
| Filename length | 256 bytes | BPF stack limit |
| Dashboard SSE clients | ~200 | Tokio task overhead |
| SQLite write throughput | ~100/sec | Sync mutex + no batching |
| Layer | Bypass Difficulty | Known Gaps |
|---|---|---|
| Landlock | Very Hard (kernel inode) | /etc overly permissive; no deny rules |
| Seccomp | Very Hard (irremovable) | x86_64 only; new syscalls need updates |
| eBPF LSM | Hard (kernel hooks) | TOCTOU, map overflow, multi-core race |
| Cgroup | Very Hard (kernel) | Requires root to escape (blocked by NNP) |
| PR_SET_NO_NEW_PRIVS | Very Hard (kernel) | None known |
| Privilege dropping | N/A (preventive) | Non-fatal failure path |
| Rate limiting | Easy (userspace) | Only affects permission requests |
| Risk classification | N/A (advisory) | Hardcoded patterns |
Guardian Shell has a strong security architecture with 6 independent enforcement layers. The eBPF+Landlock+seccomp+cgroup combination provides kernel-level enforcement that is fundamentally more secure than application-layer sandboxes (as demonstrated by the Snowflake Cortex incident).
The most critical issues are:
- PENDING map overflow — allows enforcement bypass under load
- Multi-core race condition — allows enforcement bypass on SMP systems
- IPv4 byte order bug — network policies don't work correctly
- Grant accumulation not enforced — acknowledged in code as "future work"
Performance is adequate for typical deployments (10-20 agents, <500 events/sec in enforce mode). The main bottlenecks are agent lookup O(N) and synchronous SQLite in async context.
The dashboard is usable and well-designed for its target audience (security admins). Key gaps are missing data export, permission audit UI, and CSRF protection. The permission request flow is particularly well-executed with risk-based friction and mandatory wait timers.