Guardian Shell — Comprehensive Code Analysis

Date: 2026-03-23 Scope: Security, performance, code quality, dashboard UX Codebase: Phase 10 (Hardened Cgroup Agents)

Security Vulnerabilities
Performance Issues
Code Quality
Dashboard UX & Usability
Missing Features
Recommendations

1. Security Vulnerabilities

CRITICAL

1.1 PENDING_DENY Map Race Condition (Multi-Core Bypass)

Files: guardian-ebpf/src/main.rs:413-421, 533-536

The tracepoint->PENDING_DENY->LSM pattern has a race on multi-core systems:

CPU 1: Tracepoint inserts pid_tgid into PENDING_DENY
CPU 2: Same process, different openat call, also inserts (overwrites)
CPU 1: LSM hook fires, consumes (deletes) the entry
CPU 2: LSM hook fires, entry is gone — access ALLOWED despite policy deny

BPF HashMap insert is not atomic with respect to the tracepoint→LSM sequence. High-frequency file access on multi-core systems can slip through.

Impact: Enforcement bypass under load. Single-threaded workloads unaffected.

1.2 PENDING Map Overflow = Silent Enforcement Bypass

Files: guardian-ebpf/src/main.rs:64, 138, 155, 180-190

All PENDING maps (PENDING_DENY, PENDING_EXEC_DENY, PENDING_NET_DENY, etc.) are sized at max_entries=4096. Insertion failure is silently dropped:

let _ = PENDING_DENY.insert(&pid_tgid, &1u8, 0);  // error ignored

An attacker spawning 4096+ concurrent processes can fill the map. Process 4097's tracepoint inserts nothing, LSM finds nothing, access is allowed.

Impact: A local attacker with fork access can bypass enforcement.

1.3 Grant Accumulation Limit Not Enforced

File: guardian/src/ipc.rs:1147-1162

Grant accumulation limits are checked AFTER the grant is already sent to the agent via oneshot channel. The code warns but doesn't block:

if accumulated > max_total {
    warn!("Grant accumulation exceeded...");
    // Note: We still allow it but log the warning. The grant was already
    // communicated to the agent via oneshot. Future enhancement: block before
    // sending the decision.
}

Impact: Agents can stack unlimited grants by requesting different resources.

1.4 IPv4 Address Byte Order Bug

Files: guardian-ebpf/src/main.rs:918, guardian/src/main.rs:1498

IPv4 addresses are read with from_ne_bytes() (native endian) but sockaddr_in stores them in network byte order (big-endian). On x86_64 (little-endian), addresses are reversed: 192.168.1.1 becomes 1.1.168.192 in logs and policy evaluation.

Impact: Network policies may fail to match addresses. Logs show wrong IPs.

HIGH

1.5 TOCTOU in Path-Based Policy Evaluation

File: guardian-ebpf/src/main.rs:413-421

eBPF evaluates policy at syscall ENTRY (tracepoint) but the file operation completes at syscall EXIT. Between tracepoint and kernel file open, the path can be swapped (symlink race):

openat("/tmp/safe") — tracepoint allows
Attacker replaces /tmp/safe with symlink to /etc/shadow
Kernel follows symlink, opens /etc/shadow
LSM finds no PENDING_DENY entry — access allowed

Mitigation: Landlock resolves this for cgroup agents (inode-level). Comm-based agents remain vulnerable.

1.6 Landlock /etc Overly Permissive

File: guardian-launch/src/main.rs:520

Landlock grants read access to all of /etc. If eBPF fails to load or the LSM hook doesn't fire, /etc/shadow is readable via Landlock alone. The eBPF deny rules are the only protection for sensitive /etc/* files.

Impact: Defense-in-depth weakened. Landlock should be the primary enforcement for cgroup agents, but it's permissive for /etc.

1.7 Dynamic Linker Detection TOCTOU

File: guardian-ebpf/src/main.rs:593-619

When a known dynamic linker is detected, eBPF reads argv[1] to find the real binary. Between checking DYNAMIC_LINKERS and reading argv[1], a concurrent thread can modify argv[1] in memory.

1.8 Privilege Drop Failure Non-Fatal

File: guardian-launch/src/main.rs:155-158

If drop_privileges() fails, the agent continues running as root:

Err(e) => {
    warn!("Failed to drop privileges: {} (continuing as root)", e);
    false
}

Impact: Agent runs as root with Landlock, defeating principle of least privilege.

1.9 Landlock Default-Allow Silently Skipped

File: guardian-launch/src/main.rs:472-479

If agent config has file_access.default = "allow", Landlock returns Ok(()) silently with only a WARN log. Agent has no inode-level enforcement.

1.10 Socket File Creation TOCTOU

File: guardian/src/ipc.rs:164-182

Socket is removed, then rebound, then chmod'd — three non-atomic steps. An attacker with write access to /run/ could race the socket creation. Partially mitigated by peer credential check (requires UID 0).

MEDIUM

1.11 Network Policy: Port-Only, No Address Filtering

File: guardian-ebpf/src/main.rs:320-345

Network enforcement checks only destination port, not IP address. Cannot block specific destinations while allowing others on the same port.

1.12 UDP Not Enforced

CLAUDE.md Known Limitation #18

sendto() without prior connect() bypasses both eBPF and Landlock network enforcement. DNS exfiltration via UDP is unmonitored.

1.13 Perf Buffer Overflow = Silent Event Loss

File: guardian/src/main.rs:1206-1209

Lost events are only logged and counted. In enforce mode, an attacker can flood events to fill perf buffers, causing legitimate denied accesses to go unlogged.

1.14 BPF Grant Removal Silent Failure

File: guardian/src/ipc.rs:1400-1405

Failed BPF map removals for expired grants are logged at DEBUG level. Leaked entries accumulate, potentially exhausting map capacity.

1.15 No CSRF Protection on Dashboard

All template files

No CSRF tokens on any state-changing forms (POST/PUT/DELETE). The Bearer auth token provides weak CSRF protection but proper tokens should be used.

LOW

1.16 IPC Socket Read Has No Timeout

File: guardian-common/src/lib.rs:388-406

recv_message() calls read_exact() with no timeout. A malicious client can send the length prefix but never the body, blocking the handler forever.

1.17 Event Identity Method Not Tracked

File: guardian/src/main.rs:1330-1340

find_agent_for_event() doesn't indicate which identification method (cgroup, TGID, comm) actually matched. Audit trail lacks this information.

2. Performance Issues

HIGH

2.1 Agent Lookup O(N) Per Event

File: guardian/src/main.rs:1368

find_agent_for_event() iterates all agents linearly for every event. With 50 agents at 1000 events/sec = 50K comparisons/sec. Should use a HashMap cache.

2.2 IPC Mutex Held During BPF Map Operations

File: guardian/src/ipc.rs:491-545

handle_register() holds the IPC state mutex while performing BPF map inserts (syscalls). Each grant takes 2-5ms. With 10 concurrent permission requests, IPC latency adds up to 20-50ms.

2.3 SQLite Blocking in Async Context

File: guardian/src/dashboard/db.rs:135-156

rusqlite is synchronous. Each event insert acquires a std::sync::Mutex and performs blocking I/O inside a tokio task. At >1000 events/sec, this causes tokio executor starvation.

MEDIUM

2.4 No SQLite Transaction Batching

File: guardian/src/dashboard/db.rs:135-156

Each event = 1 INSERT statement. Batching 100 events per transaction would be 10x faster (fewer fsync calls).

2.5 Alert Channel Saturation

File: guardian/src/main.rs:487

Alert mpsc channel capacity is 4096. If webhook/email dispatch takes >4 seconds under sustained load, events drop silently.

2.6 Unbounded Memory Growth in Rate Limiter

File: guardian/src/permissions.rs:25-37

AgentRateLimit.recently_denied_resources is a HashMap that grows unbounded. Cleanup happens only on queries, not on a TTL basis. Long-running daemons accumulate memory.

2.7 Grant Accumulator Has No Cleanup

File: guardian/src/permissions.rs:196-203

GrantAccumulator.grants stores grant history per (agent, resource) with no expiry mechanism. Memory grows indefinitely.

2.8 Broadcast Channel Lag Under Event Storms

File: guardian/src/main.rs:430

Broadcast channel capacity 1024. SSE clients and DB writer lag under >10K events/sec. Lagged subscribers skip events.

LOW

2.9 Config Reload Blocks IPC

File: guardian/src/main.rs:659-681

SIGHUP reload acquires IPC mutex during TOML parsing. Fast (<10ms) but blocks concurrent IPC requests.

2.10 MAX_FILENAME_LEN = 256 Bytes

File: guardian-common/src/lib.rs:7

Paths >256 bytes are truncated and denied. Node.js and Java paths often exceed this limit, causing false denials.

3. Code Quality

Strengths

Excellent inline comments explaining eBPF concepts, Rust patterns, and security rationale throughout the codebase
Comprehensive input validation in IPC handler (null bytes, path traversal, size limits) — ipc.rs:277-349
Correct privilege dropping with verification (paranoia check) — guardian-launch/src/main.rs:372-381
Graceful LSM fallback — optional features don't crash the daemon
Defense-in-depth design — 6 independent security layers
Good error handling with anyhow::Context for actionable error messages
Compile-time template checking via askama (catches typos at build time)
Single binary deployment with rust-embed for static files

Issues

3.1 api.rs Is Too Large (~1100 lines)

File: guardian/src/dashboard/routes/api.rs

Contains agent CRUD, policy updates, alert config, permission approval, config serialization, and metrics. Should be split into sub-modules.

3.2 Silent Error Swallowing

Multiple locations where errors are dropped with let _:

guardian-ebpf/src/main.rs:421: let _ = PENDING_DENY.insert(...) — critical
guardian/src/ipc.rs:1400: debug!("Failed to remove...") — should be warn
guardian/src/main.rs:831: let _ = deny_exact.insert(...) — policy rule lost

3.3 No Integration Tests

25+ unit tests for config, permissions, main
Zero integration tests for IPC server, dashboard API, eBPF enforcement
No load/stress tests
No eBPF bytecode validation tests

3.4 Unused Code

app.js is empty (only comments)
log_level field in GlobalConfig triggers dead_code warning (acknowledged)
Some tokio features in Cargo.toml are unused

3.5 No Graceful Shutdown Coordination

File: guardian/src/main.rs:710-716

Tasks are aborted on Ctrl+C. No drain period for in-flight IPC requests or pending permission decisions.

4. Dashboard UX & Usability

What Works Well

Feature	Grade	Notes
Navigation	A-	Intuitive sidebar with icons+labels, active page highlight
Permission banners	A	Visible on all pages, risk colors, wait timers, type-to-confirm
Events page	B+	Live/history tabs, filtering by severity/action/agent/path
Agent management	B+	Cgroup vs comm badges, Tier 1/Tier 2 indicators, stop/grant
Policy editor	B	Per-agent editing, collapsible help panel
Mobile responsive	B	Hamburger menu, grid collapses, tables scroll
Color scheme	A-	Dark theme, WCAG AA contrast, semantic badge colors
SSE live updates	B+	Single shared connection, connection status indicator

What Needs Improvement

4.1 No Central Help/Documentation Page

Users must discover help hints (? icons) scattered across pages. No centralized reference for glob patterns, risk levels, security tiers, or command syntax.

4.2 No Data Export

Cannot export events, audit trail, or policies as CSV/JSON. Required for SIEM integration and compliance audits.

4.3 No Glob Pattern Validation

Users can enter invalid patterns in policy editor. Invalid patterns fail silently at enforcement time with no UI feedback.

4.4 No Alert Test Button

Cannot validate webhook URLs, Slack tokens, or SMTP credentials before saving. Users discover failures only when a real alert fires.

4.5 No "Restart Required" Warning

CLAUDE.md notes alerting config changes require daemon restart, but the UI doesn't warn users after saving alert configuration.

4.6 Permission Audit Trail Not Surfaced

SQLite stores full permission audit history, but no UI exists to view it. The /api/permissions/audit API endpoint exists but has no frontend.

4.7 No Agent Resource Monitoring

No CPU/memory/PID usage graphs for cgroup agents. Admins can't see which agent is consuming the most resources.

4.8 No Keyboard Shortcuts

No ? for help, j/k for navigation, or g <page> for quick jumps. Power users expect these.

4.9 Events Page Limitations

Max 500 live events (older silently dropped)
No time-range picker for history
No CSV/JSON export
Long paths truncated with ellipsis (hover shows full, but not copy-friendly)

4.10 Mobile Form Sizing

Policy editor textareas (16 rows) dominate small screens. Grant popover uses absolute positioning that breaks on mobile.

4.11 Colorblind Accessibility

Risk-colored banners (green/amber/red) lack pattern or icon differentiation. ~8% of males have color vision deficiency.

5. Missing Features

Security

Feature	Priority	Notes
IP/domain-based network policy	HIGH	Currently port-only; can't block specific destinations
DNS monitoring	HIGH	DNS resolution unmonitored; domain-based policy impossible
UDP enforcement	HIGH	`sendto()` without `connect()` bypasses all enforcement
Content hashing	MEDIUM	No integrity verification of accessed files
io_uring file operations	MEDIUM	Seccomp blocks io_uring setup, but if bypassed, all ops invisible
mmap_file LSM hook	LOW	Memory-mapped file access not monitored
Anomaly ML detection	LOW	Current rule-based anomaly detection is basic

Operations

Feature	Priority	Notes
Event export (CSV/JSON)	HIGH	Required for SIEM/compliance
Permission audit UI	HIGH	Backend exists, no frontend
Alert test button	MEDIUM	Validate webhook/Slack/email before saving
Policy diff/rollback	MEDIUM	No change history for policy edits
Agent resource graphs	MEDIUM	CPU/memory/PID usage per cgroup
Bulk operations	LOW	Apply same policy to multiple agents
Keyboard shortcuts	LOW	Power user productivity

6. Recommendations

Immediate (P0 — Fix Now)

Fix IPv4 byte order: Change from_ne_bytes() to from_be_bytes() in guardian-ebpf/src/main.rs:918. One-line fix, critical impact.
Don't ignore PENDING map insert errors: In eBPF, if insert fails, set a flag that the LSM hook checks to deny by default (fail-closed for map full).
Enforce grant accumulation before approval: Move the limit check before the oneshot send in ipc.rs.

Short-Term (P1 — This Sprint)

Cache agent lookups: HashMap<comm, AgentConfig> rebuilt on config reload. Eliminates O(N) per-event lookup.
Batch SQLite inserts: Buffer 100 events, wrap in BEGIN/COMMIT transaction.
Add CSRF tokens: Synchronizer token pattern for all dashboard forms.
Make privilege drop mandatory: Fail if cannot drop to non-root, unless explicit --allow-root flag.
Refine Landlock /etc paths: Move from blanket /etc to specific files, or accept the tradeoff and document it.

Medium-Term (P2 — Next Release)

Add permission audit UI: Surface the SQLite audit trail in dashboard.
Add event export: CSV/JSON download buttons on events page.
Add alert test buttons: Validate webhook/email before saving.
Implement IPC socket timeouts: 5-second read timeout in recv_message().
Add integration tests: IPC server, dashboard API, enforcement flow.
Split api.rs: Break into agents.rs, policies.rs, alerts.rs, permissions.rs.

Long-Term (P3 — Roadmap)

UDP enforcement: Hook sendto() and sendmsg() syscalls.
Domain-based network policy: DNS interception or transparent proxy.
Async SQLite: Migrate from rusqlite to sqlx for non-blocking writes.
Dynamic BPF map resizing: Scale policy maps based on agent count.
Per-agent resource dashboards: cgroup memory/CPU/PID graphs.

Scalability Limits

Dimension	Current Limit	Bottleneck
Concurrent agents	30-50	Agent lookup O(N) per event
Policy rules per map	1024	BPF map max_entries
PENDING map entries	4096	Enforcement bypass when full
Events/sec (enforce)	500	Alert dispatch throughput
Events/sec (monitor)	10K+	Perf buffer capacity
Filename length	256 bytes	BPF stack limit
Dashboard SSE clients	~200	Tokio task overhead
SQLite write throughput	~100/sec	Sync mutex + no batching

Security Layer Effectiveness

Layer	Bypass Difficulty	Known Gaps
Landlock	Very Hard (kernel inode)	`/etc` overly permissive; no deny rules
Seccomp	Very Hard (irremovable)	x86_64 only; new syscalls need updates
eBPF LSM	Hard (kernel hooks)	TOCTOU, map overflow, multi-core race
Cgroup	Very Hard (kernel)	Requires root to escape (blocked by NNP)
PR_SET_NO_NEW_PRIVS	Very Hard (kernel)	None known
Privilege dropping	N/A (preventive)	Non-fatal failure path
Rate limiting	Easy (userspace)	Only affects permission requests
Risk classification	N/A (advisory)	Hardcoded patterns

Conclusion

Guardian Shell has a strong security architecture with 6 independent enforcement layers. The eBPF+Landlock+seccomp+cgroup combination provides kernel-level enforcement that is fundamentally more secure than application-layer sandboxes (as demonstrated by the Snowflake Cortex incident).

The most critical issues are:

PENDING map overflow — allows enforcement bypass under load
Multi-core race condition — allows enforcement bypass on SMP systems
IPv4 byte order bug — network policies don't work correctly
Grant accumulation not enforced — acknowledged in code as "future work"

Performance is adequate for typical deployments (10-20 agents, <500 events/sec in enforce mode). The main bottlenecks are agent lookup O(N) and synchronous SQLite in async context.

The dashboard is usable and well-designed for its target audience (security admins). Key gaps are missing data export, permission audit UI, and CSRF protection. The permission request flow is particularly well-executed with risk-based friction and mandatory wait timers.

FilesExpand file tree

comprehensive-code-analysis.md

Latest commit

History

comprehensive-code-analysis.md

File metadata and controls

Guardian Shell — Comprehensive Code Analysis

Table of Contents

1. Security Vulnerabilities

CRITICAL

1.1 PENDING_DENY Map Race Condition (Multi-Core Bypass)

1.2 PENDING Map Overflow = Silent Enforcement Bypass

1.3 Grant Accumulation Limit Not Enforced

1.4 IPv4 Address Byte Order Bug

HIGH

1.5 TOCTOU in Path-Based Policy Evaluation

1.6 Landlock /etc Overly Permissive

1.7 Dynamic Linker Detection TOCTOU

1.8 Privilege Drop Failure Non-Fatal

1.9 Landlock Default-Allow Silently Skipped

1.10 Socket File Creation TOCTOU

MEDIUM

1.11 Network Policy: Port-Only, No Address Filtering

1.12 UDP Not Enforced

1.13 Perf Buffer Overflow = Silent Event Loss

1.14 BPF Grant Removal Silent Failure

1.15 No CSRF Protection on Dashboard

LOW

1.16 IPC Socket Read Has No Timeout

1.17 Event Identity Method Not Tracked

2. Performance Issues

HIGH

2.1 Agent Lookup O(N) Per Event

2.2 IPC Mutex Held During BPF Map Operations

2.3 SQLite Blocking in Async Context

MEDIUM

2.4 No SQLite Transaction Batching

2.5 Alert Channel Saturation

2.6 Unbounded Memory Growth in Rate Limiter

2.7 Grant Accumulator Has No Cleanup

2.8 Broadcast Channel Lag Under Event Storms

LOW

2.9 Config Reload Blocks IPC

2.10 MAX_FILENAME_LEN = 256 Bytes

3. Code Quality

Strengths

Issues

3.1 api.rs Is Too Large (~1100 lines)

3.2 Silent Error Swallowing

3.3 No Integration Tests

3.4 Unused Code

3.5 No Graceful Shutdown Coordination

4. Dashboard UX & Usability

What Works Well

What Needs Improvement

4.1 No Central Help/Documentation Page

4.2 No Data Export

4.3 No Glob Pattern Validation

4.4 No Alert Test Button

4.5 No "Restart Required" Warning

4.6 Permission Audit Trail Not Surfaced

4.7 No Agent Resource Monitoring

4.8 No Keyboard Shortcuts

4.9 Events Page Limitations

4.10 Mobile Form Sizing

4.11 Colorblind Accessibility

5. Missing Features

Security

Operations

6. Recommendations

Immediate (P0 — Fix Now)

Short-Term (P1 — This Sprint)

Medium-Term (P2 — Next Release)

Long-Term (P3 — Roadmap)

Scalability Limits

Security Layer Effectiveness

Conclusion