Python-driven iptables/ipset firewalling sandbox by struanb · Pull Request #35 · newsnowlabs/dockside

struanb · 2026-03-20T23:38:40Z

Summary

Replace the shell script firewall daemon (dockside-network-firewall.sh) with a new Python 3.6+ daemon that provides the same functionality with improved atomicity, configurability, and maintainability.

Key Changes

New Python daemon (dockside-network-firewall.py): Single-file implementation that replaces the bash script with:
- Configuration loaded from two JSON files (network-config.json and firewall-config.json) instead of hardcoded shell functions
- Atomic iptables rule application via iptables-restore --noflush with no open-firewall window
- Background ipset DNS refresh loop with stale-IP TTL management
- Optional Unix domain socket for dynamic runtime configuration updates
- systemd Type=notify service integration with SIGUSR1 config reload support
- Zero-disruption restart capability (rules left in place on shutdown)
- Explicit --teardown mode for complete cleanup when needed
Configuration files:
- network-config.json: Docker network definitions (subnet, gateway IP/MAC)
- firewall-config.json: Per-network firewall policies (egress rules, NAT rules, ipsets)
- Both files are admin-editable and support dynamic reload via socket or SIGUSR1
Systemd service unit update: Changed from Type=oneshot to Type=notify to support the new daemon model with proper readiness signaling
Planning document (PLAN-firewall-daemon-python.md): Comprehensive design specification covering architecture, chain structure, atomic apply strategy, socket protocol, and operational modes

Implementation Details

Atomic safety: All iptables state (filter + nat tables) is rebuilt in memory and applied in a single iptables-restore --noflush call, eliminating the open-firewall window present in the bash implementation
Stale IP handling: ipsets maintain a separate --seen set with TTL to gracefully handle DNS resolution changes and transient failures
Thread-safe: Background refresh loop and socket handlers use a single lock to serialize all iptables/ipset mutations
Backward compatible: Supports all existing network and firewall configurations; can run in one-shot mode (no daemon) for compatibility
Extensible: Socket protocol allows external tools to trigger reloads, apply new configs, or query daemon status

The daemon maintains iptables rules across restarts, enabling zero-disruption systemctl restart operations while still providing explicit cleanup via --teardown when needed.

https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

- Type=notify so ExecStartPost (docker compose up) only fires after setup() completes and networks/iptables chains are fully built - ExecStart now runs --daemon (setup + ipset refresh loop) - ExecStartPost runs docker compose up -d once readiness is signalled - ExecReload uses --refresh (lightweight DNS refresh, no chain teardown) - Add Restart=on-failure/RestartSec=5, TimeoutStartSec=300, journal logging - Remove RemainAfterExit (not needed for long-running Type=notify service) - Add systemd-notify --ready call in --daemon case after setup() completes https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

Drafts a plan to replace dockside-network-firewall.sh with a Python 3.6+ daemon that uses iptables-restore for atomic chain updates, a DOCKSIDE-DISPATCH indirection layer to eliminate the open-firewall window on refresh, two-layer FORWARD DROP safety, optional Unix socket for dynamic config, and admin-editable JSON config files as a static alternative to the management socket. https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

- Two separate config files confirmed: iptables rules and Docker networks are independent entities that may exist without each other - Socket path made fully configurable (--socket PATH / env var); no hardcoded default; path choice depends on systemd vs docker compose launch context - Remove config-file watching (inotify and mtime polling): config changes take effect on daemon restart or via socket reload action; immediate file-watching is too risky a pattern - Confirm atomic nat+filter replacement via iptables-restore; document the primary NAT use-case (redirecting container traffic to sandboxed clones) - Change shutdown model: SIGTERM leaves iptables/ipset state in place so systemctl restart is zero-disruption; add explicit --teardown mode for permanent removal; ExecStop no longer calls teardown - ExecReload now sends SIGUSR1 to daemon rather than spawning a child process https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

Implements the design documented in PLAN-firewall-daemon-python.md: - dockside-network-firewall.py: single-file Python 3.6+ daemon (stdlib only) - Atomic iptables apply via iptables-restore --noflush covering both filter and nat tables in one kernel transaction; no open-firewall window - DOCKSIDE-DISPATCH chain in DOCKER-USER; per-network ING/OUT chains for ingress control and egress policy enforcement - Gateway exemption rules prevent safety-net DROPs from blocking the gateway's own new outbound connections - Leaves all iptables/ipset state in place on SIGTERM; next startup applies config atomically over inherited state (zero-disruption systemctl restart) - SIGUSR1 triggers config reload + ipset refresh (maps to ExecReload=) - Optional Unix domain socket (--socket PATH) for reload/apply/refresh/status - --teardown mode for explicit full cleanup - IpsetManager: stale-TTL logic matching the bash implementation; uses per-set --seen companion (hash:ip,timeout) to track last-seen timestamps - install/etc/dockside/network-config.json: Docker network definitions (dockside, ds-prod, ds-clone, ds-priv, ds-claude) matching current setup() - install/etc/dockside/firewall-config.json: per-network egress policies and NAT rules matching current setup(); ds-clone reroute_mysql expressed as a nat + host-resolved egress allow; ds-claude restricted to claude-allowlist ipset - dockside.service: ExecStart updated to .py --daemon; ExecReload changed from spawning a child process to kill -USR1 $MAINPID https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

Add a new `manageUsers` permission (on by default for admin role, deniable via 0) gating all user/role CRUD operations. Perl API (App.pm + new User/Manage.pm): - GET /users, GET /users/<name>: list/get users; ?sensitive=1 includes gh_token and private SSH keypairs - POST /users/create, /users/<name>/update, /users/<name>/remove: full user CRUD; writes users.json and passwd atomically via cacheReadWrite, then calls Data::load() to hot-reload in-memory state - GET /roles, GET /roles/<name>: list/get roles - POST /roles/create, /roles/<name>/update, /roles/<name>/remove: full role CRUD; writes roles.json atomically, reloads roles.json+users.json - removeRole rejects deletion if any user still holds the role - removeUser prevents self-removal - _apply_args_to_record: dot-notation key merging and _unset deletion - _sanitise_user_record: strips gh_token and private keypairs unless ?sensitive=1 Python CLI (dockside_cli.py): - `dockside user list/get/create/edit/remove` and `dockside role list/get/create/edit/remove` subcommand groups - --email, --role, --name, --user-password, --gh-token scalar flags - --permissions, --resources, --ssh top-level JSON flags - --set KEY=VALUE for dot-notation nested paths (repeatable) - --unset KEY to delete nested fields (repeatable) - --from-json FILE|- to load a base record from JSON; flags/--set override - --sensitive on list/get to include private keys and gh_token - _encode_params refactored from **kwargs to dict argument to support dotted keys Example configs: - roles.json: add commented example for denying manageUsers on all admins - users.json: fix permission denial examples to use 0 (not 1), add manageUsers denial comment, add name field, improve developer example https://claude.ai/code/session_01BnV2F3EVVhqLwS6b5BmxtJ

@filename

…, else string Adds @filename syntax to --set, following the curl/httpie convention. Files with a .json extension are JSON-decoded; all other files are read as a plain string, making it straightforward to import public/private keys or other text blobs directly. https://claude.ai/code/session_01BnV2F3EVVhqLwS6b5BmxtJ

…p.md docs/README.md — major rewrite: - Add shields.io badges (GitHub stars, Docker Pulls, licence, AI-ready) - Add "Why Dockside?" comparison table vs Codespaces, Gitpod, Coder - Add devtainer/devcontainer terminology clarification note - Rewrite intro with devcontainer-per-branch + AI-session framing - Add dedicated "AI-assisted development" section (Claude Code, Codex, Copilot; per-session isolation; coming-soon per-network firewall) - Core features: AI-ready devcontainers bullet; fix broken OpenVSCode and Theia markdown links; devcontainer terminology throughout - Benefits: expand "Work from anywhere" (VS Code/JetBrains/SSH/terminal); add AI coding agent benefit; replace stale PM bullet - Advanced features: merge RunCVM into runtimes line with purpose annotations; link to new runtimes.md; expand firewall bullet for AI - Quick Start: restructure to "Launch locally" (steps 1–3) + steps 4–5 guiding first devcontainer launch and config setup; add "Launch on a public domain" as the team path; extract self-signed, self-supplied, GCP, and Terraform options to new advanced-launch-options.md - Add Setup section: config location, auto-reload, profiles/users/SSH walkthrough with anchor links into setup.md and extensions/ssh.md - Add Usage section: 5-bullet workflow guide (launch, IDE, inbound SSH, outbound SSH for git, sharing/access control) - Reorder sections: Setup before Security and Upgrading - Roadmap: replace vague open-ended copy with near-term priorities (per-network firewall management, Terraform) - Case study: rewrite with social proof framing - Annotate 2021 video walkthrough as outdated UI docs/advanced-launch-options.md (new): - Self-signed SSL, self-supplied SSL, GCP Deployment Manager (marked deprecated), Terraform placeholder docs/extensions/runtimes.md (new): - Intro distinguishing two runtime modes: devcontainer runtime vs Dockside-itself runtime; both apply to Sysbox and RunCVM; gVisor is devcontainer-only - Sysbox: both modes + trade-off comparison table - RunCVM: both modes, requirements (/dev/kvm, amd64), KVM/AI use cases; references 92-dockside-runcvm.json example profile - gVisor: sandboxing, AI agent isolation use case, compatibility caveat docs/extensions.md: - Replace separate Sysbox extension entries with single runtimes.md link docs/extensions/ssh.md — full restructure: - Add top-level intro linking inbound server and outbound agent sections - "Integrated SSH server support": rename bullet from "auto-generates" to "manages authorized_keys"; add "Configuring user public keys" subsection with JSON example and tmpfs profile requirement; tighten Notes (first note directs to users.json; second links to agent section) - "Local SSH agent support and automatic keypair provision": new intro sentence; "Configuring keypairs for outbound SSH" subsection with JSON example; remove old volume-mount-based approach; expand to 3 security notes (agent-sharing risk, users.json visibility, manual ssh-add path) - Fix incomplete stub "See " left in "Adding SSH keys to a user's profile" docs/setup.md: - Add note that available choices are always subject to the user record - Profile properties table: networks/runtimes changed from mandatory to optional (default ["*"]); mountIDE description corrected (default true, not false); escape pipe chars in command example - Add "Access Control Model" subsection: profile auth array vs active access mode, per-mode access table, viewer vs developer role capabilities, router visibility in list/get responses, IDE and SSH router restrictions https://claude.ai/code/session_01Cz1yNYuLBuADxd7woZ3vSS

…ier CLI key management Replace the ssh.authorized_keys array with a named hash ssh.publicKeys, allowing individual public keys to be added/removed via the CLI independently: dockside user edit alice --set ssh.publicKeys.laptop="ssh-rsa AAAA..." dockside user edit alice --set ssh.publicKeys.laptop= # remove Changes: - Bump User CURRENT_VERSION to 2 - Add versionUpgrade v1→v2: migrates existing authorized_keys arrays to publicKeys hashes (naming migrated keys key1, key2, ...); cleans up old field - Update authorized_keys() accessor to return [values %{ssh.publicKeys}] - Update example users.json and docs to use publicKeys Reservation.pm and launch.sh are unchanged — the accessor still returns an arrayref. https://claude.ai/code/session_01BnV2F3EVVhqLwS6b5BmxtJ

sensitive fields Allows callers to include SSH private keys and gh_token in the response after editing a user, consistent with user get and user list. Replace sensitive fields (ssh keypair private keys and gh_token) with '<redacted>' in non-sensitive API responses. This makes it clear the field exists and has a value, while still protecting the actual content. Use --sensitive to retrieve real values. https://claude.ai/code/session_01BnV2F3EVVhqLwS6b5BmxtJ https://claude.ai/code/session_01BnV2F3EVVhqLwS6b5BmxtJ

- CLI: add --host-header / DOCKSIDE_HOST_HEADER flag so tests can connect to https://localhost with a correct Host header (needed for local/harness modes) - t/integration/config/users.json: four test users (admin, testdev1, testdev2, testviewer) with committed test-only Ed25519 SSH keypairs embedded in the authorized_keys and keypairs fields - t/integration/config/roles.json: role definitions (admin/developer/user) matching the roles referenced in users.json - t/integration/config/ssh/: committed test-only Ed25519 keypairs for testdev1 and testdev2 (safe to commit; used only against ephemeral test instances) - t/integration/lib/dockside_test.py: DocksideClient (CLI subprocess + cookie jar), TestCase base (can_modify_networks, assertions), TestRunner (TAP output, signal/atexit cleanup) - t/integration/lib/run_tests_main.py: entry point; discovers and runs test modules; passes mode/credentials/network-modify flags to runner - Tests: 01_auth Authentication and permission checks 02_lifecycle_alpine Full create/start/stop/remove (alpine) 03_lifecycle_debian Full create/start/stop/remove (debian) 04_access_and_http MERGED: visibility, router filtering, HTTP proxy codes; testdev1 is owner, testdev2 is added/removed as developer 05_edit Edit metadata; viewer cannot edit; developer can 06_git_profile Git URL / branch / PR creation options 07_ide IDE access: viewer denied, developer allowed, removed developer denied 08_network Network assignment; harness-only (or ALLOW_NETWORK_MODIFY=1): create/attach/detach unique test networks 09_ssh Inbound SSH via wstunnel ProxyCommand; outbound via integrated ssh-agent; uses committed test keypairs - t/integration/harness.sh: pulls image, starts Dockside container, injects passwords, exports env vars, registers cleanup trap - t/integration/run_tests.sh: mode detection, connection parameters per mode, delegates to run_tests_main.py; DOCKSIDE_TEST_ALLOW_NETWORK_MODIFY override - test.sh: add opt-in 'integration' check category - docs/setup.md: add comprehensive Access Control Model section covering profile auth array vs meta.access; viewer/developer distinctions; router filtering; IDE/SSH router restrictions https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

…r new integration tests - shellcheck: add t/integration/run_tests.sh and t/integration/harness.sh - json: add t/integration/config/users.json and t/integration/config/roles.json - python (new check): py_compile syntax-check all .py files under t/integration/ and cli/ https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

_run(): place subcommand before global flags (--server, --output, etc.) so argparse routes to the correct subparser. These flags are only registered on subparsers, not the top-level parser; passing them first caused argparse to see the server URL as the COMMAND positional. _register_cleanup(): after emergency cleanup, restore SIG_DFL and re-raise the signal so the process actually terminates on Ctrl-C rather than silently continuing. https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

Bug 1: run_tests_main.py now exits 0 when called with --cleanup, preventing the bash EXIT/INT trap from re-running the entire test suite after completion or on Ctrl-C. Bug 2: Add _UnavailableClient placeholder class. Non-admin users (dev1, dev2, viewer) are now validated at startup via list_containers(); if auth fails, their client is replaced with _UnavailableClient so that any test accessing the unavailable role raises SkipTest and shows "ok N - ... # SKIP ..." instead of "not ok" with a confusing auth error message. https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

Profile names must be the config filename without .json extension, not the human-readable name property from the JSON: 'Stock Image - Alpine Linux' → '10-alpine' 'Stock Image - Debian' → '11-debian' 'Stock Image - NGINX' → '30-nginx' 'Git Repo (Branch/PR)' → '03-git-repo' Each test file now has a top-level PROFILE_NAME (or PROFILE_ALPINE / PROFILE_NGINX for 04_access_and_http.py) constant so the profile can be updated in one place. Add DOCKSIDE_TEST_DEBUG=1 support to DocksideClient._run: when set, each CLI invocation's command, return code, stdout, and stderr are printed to stderr before parsing. Also wrap json.loads so that a JSONDecodeError now includes the raw stdout in the error message, making mismatched responses immediately visible without needing the debug flag. https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

The $logger closure passed to Expect->log_file() in run_pty() was unconditionally printing every chunk of docker output to STDOUT via `print $chunk`. In the forked child process that runs `docker create`, STDOUT is the inherited HTTP response pipe, so docker's progress output (e.g. the leading `.`) was being written into the response body before the parent serialised the JSON, causing JSON parse errors in the CLI. The docker output is already written to the logfile at `print $fh $_` so no information is lost. https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

All print() calls in wait_for() were going to stdout (the default), causing progress dots and their trailing newlines to be prepended to the JSON output when the CLI was invoked with --output json and capture_output was used. This produced stdout like '.\n{...json...}' which failed JSON parsing. Add file=sys.stderr to every print() call inside wait_for() so progress output is cleanly separated from the machine-readable JSON on stdout. https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

… per-test teardown Two root causes made tests 04-06 fail: 1. TestRunner._run_class creates a fresh TestCase instance per method and calls setUp/tearDown around each one. LifecycleAlpineTests.setUp() registered the container for cleanup on every call, so tearDown would stop+remove the container after every test — including test_01_create. Tests 02/03 passed by chance (async --no-wait removal still visible), but by test_04 the docker container was gone and docker start exited 1. Each failing test then fired its own tearDown stop+remove pair, producing the triple stop/remove pattern visible in the server log. 2. test_01_create produces a running container (status=1). test_04_start called docker start on an already-running container, which exits 1 on this docker/containerd version. Fix: - Add setUpClass/tearDownClass support to TestRunner._run_class. Clients are injected as class attributes before the method loop. setUpClass is called once (failure marks all methods not ok). tearDownClass is called once after all methods and is included in _emergency_cleanup. - LifecycleAlpineTests: drop register_cleanup from setUp (no per-test cleanup); add tearDownClass that stop+removes the container once. test_04_start now stops first (container is running after create) then starts, actually exercising the start path. Remove now-unnecessary _cleanup_names.clear() from test_06_remove. - Remove the unused _name() classmethod. https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

remove() with --no-wait returns immediately; the reservation stays visible in the list for up to ~30s while dockside's async docker rm completes. test_06_remove checked the list immediately after the call and always saw the container still there. Fix: - Add wait/timeout params to DocksideClient.remove() (default wait=True). When wait=True, omits --no-wait and passes --timeout to the CLI, which polls until the reservation is fully gone or status <= -2 with no containerId — consistent with start/stop behaviour. - Update tearDown (base class) and tearDownClass (LifecycleAlpineTests) to pass wait=False so cleanup paths stay non-blocking. - test_06_remove now calls remove() with the default wait=True, so assert_not_in fires only after removal is confirmed. https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

Reverting remove() default to wait=False (--no-wait) so test_06_remove stays fast. Rather than blocking up to 30s for the reservation to vanish, check its state immediately: (a) absent from the list – remove already propagated, or (b) present with status <= -2 and no containerId – the docker container is gone and dockside is holding the reservation in prelaunch state briefly before purging it. This mirrors the same 'gone' condition used by wait_for() in the CLI, without the polling delay. https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

…oved' The docker container is removed asynchronously; immediately after --no-wait the status is still 0 (exited) and containerId is retained even at status -3. The previous 'status <= -2 and not containerId' condition was therefore always false. - STATUS_LABELS: add -3 → 'removed' so `dockside get` shows a human-readable label instead of the raw integer - wait_for('gone'): fix condition to status <= -3 (drop the incorrect containerId check) - test_06_remove: poll up to 10s (0.5s interval) for absent or status <= -3, keeping typical runtime well under 2s https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

Add docstrings and inline comments to the top-level constants, subprocess helper, systemd notify helper, and EgressRule data class. Explains allow_fail semantics, IPSET_STALE_TTL / IPSET_REFRESH_INTERVAL purpose, sd_notify usage, and per-field meaning of EgressRule attributes. No functional changes.

Add docstrings and inline comments to the four data classes (NatRule, NetworkSpec, Config and its two factory classmethods) and the three standalone helpers (_subnet_to_gateway, _resolve_hostname, _resolve_hostname_all). Explains field semantics, property derivations, the managed-network concept, the two-pass config merge, the .1 gateway convention, AF_INET IPv4 restriction, and the seen-set deduplication pattern. No functional changes.

Add docstrings and inline comments to all three manager classes: DockerNetworkManager: explains bridge-name pinning, reset semantics and the allow_fail probe for network existence. IpsetManager: documents the two-set design (live set + seen-set), the delete-then-add TTL-reset pattern, the safety guard against empty resolution, and the ipset list parser. IptablesManager: explains the FORWARD=DROP baseline, DOCKER-USER/-C probe idiom, the --noflush atomic-replace strategy, _build_restore_input structure (chain decls, flushes, rules, COMMIT), DISPATCH ING vs OUT routing, gateway exemptions, safety-net DROP, ING lateral-movement prevention, egress allow/drop rule generation (tcp-reset, targeted vs terminal drop, multiport), NAT PREROUTING jump management, teardown order, and _list_dockside_chains -N line parsing. No functional changes.

Add docstrings and inline comments to the final four components: ManagementSocket: explains the AF_UNIX stream protocol (newline-terminated JSON), 0o660 permissions, daemon thread model, 1 MiB runaway-client guard, accept-loop exit via OSError on socket close, and best-effort error reply. FirewallDaemon: documents thread model (main, ipset-refresh, mgmt-socket, config-reload), _lock serialisation, SIGUSR1 off-load pattern, _apply_full ordering (sysctl → docker → FORWARD DROP → DISPATCH → ipsets → iptables), _refresh_loop interruptible sleep via _stop.wait(timeout=…), _reload error-isolation, _handle_request actions (reload/apply/refresh/status), _get_status live-ipset snapshot, and intentional rule preservation on stop. _socket_query: explains newline framing and the finally-close guarantee. main: documents all four operating modes (--status, --teardown, --daemon, one-shot), RESET=1 semantics, config-tolerant teardown, and the env-var default for --socket. No functional changes.

…oint upgrade - Dockerfile: build /opt/dockside content at /opt/dockside.img (staging area) rather than directly in the named volume path. A final RUN moves the populated /opt/dockside tree to /opt/dockside.img, leaving /opt/dockside empty for the named volume. This allows the entrypoint to bring the volume up to date on every container start, enabling single-step upgrades. - Dockerfile (system stage): convert system/latest/ from a plain directory to a versioned directory named after DOCKSIDE_VERSION (e.g. system/3.2.0/), with system/latest as a symlink — matching the pattern used for IDEs. DOCKSIDE_VERSION is a new build ARG auto-detected from git tags in build.sh. - entrypoint.sh: at container start: - sync new subdirs of /opt/dockside.img/{ide/{theia,openvscode},system} to /opt/dockside (first moving any preexisting non-symlink 'latest' folder to 'latest.orig' for safe-keeping); - installing the new 'latest' symlink; - update /opt/dockside/bin/launch.sh and /opt/dockside/launch.sh. - launch.sh: new version resolution logic: - For system, use IDE_PATH, if provided; - Look for the 'current' or 'latest' symlink, if it resolves to a directory, in that order; - Look for a suitable subdir of the relevant folder; - Resolve symlinks before use, so pinning running devtainers to a specific versioned directory and making them unaffected by a subsequent symlink updates during upgrade. https://claude.ai/code/session_011v8HBRuJBorZTbQLXY7xKA

- Recommend docker-compose.yml - Add named volume mounted at /opt/dockside - Add Dockside systemd unit

Egress rules were being appended to chain prefix (e.g. DOCKSIDE-MYNET) instead of the OUT chain (e.g. DOCKSIDE-MYNET-OUT), which iptables-restore would reject since no such chain is declared. Fix by passing f"{p}-OUT".

Drop rules previously only accepted a raw 'cidr' field. They now support the same destination selectors as allow rules: - to: cidr / raw cidr field → -d <cidr> - to: ip → -d <ip> - to: host → resolved at apply-time, skipped on failure - to: ipset → -m set --match-set <name> dst Both the TCP REJECT and the non-TCP DROP lines use the new dst, and the targeted-vs-terminal logic (all-ctstates vs NEW-only) is driven by whether a destination filter is present, same as before.

… egress Rule 5 (`-i <dev> -m conntrack --ctstate NEW -j DROP`) was unreachable on the first pass for any packet — rules 1–4 exhaustively dispatch all ds-prod traffic to ING, OUT, or RETURN for gateway packets. The only way rule 5 was ever reached was on the *return* from DS-PROD-OUT, at which point it dropped every NEW connection that the OUT chain had just allowed. The stale-bridge rationale was speculative and inapplicable since the ruleset is installed atomically via iptables-restore. https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

- README.md: full documentation of CLI modes, config file schema (network-config.json and firewall-config.json), systemd integration, management socket protocol with request/response examples for all four actions (status, reload, apply, refresh), partial-config-update pattern, ipset refresh mechanism, environment variables, iptables rule architecture, and teardown - config/examples/network-config.json: annotated example showing the unmanaged dockside network plus prod, clone, and priv networks - config/examples/firewall-config.json: annotated example with ipsets, egress rules for prod (full internet) and priv (LAN-blocked, allowlisted), and a clone network with DNAT redirect for a test database https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

…-networks-jev8A

…sistence Addresses all previously identified lifecycle gaps in the firewall daemon: Two-phase apply (reload, apply, and all new partial-update actions) Phase 1 (atomic): ensure_networks → ensure_ipsets → iptables-restore Phase 2 (diff-driven): compare old_config vs new_config; flush and delete orphaned chains for removed networks; deregister and destroy removed ipsets. Ordering invariant: Phase 1 always completes before Phase 2 deletes anything, so DOCKSIDE-DISPATCH never references non-existent chains or ipsets. Container safety _network_has_containers() parses docker network inspect JSON to check the Containers key. If containers are connected, chain deletion is deferred (logged as a warning). Ipsets referenced by deferred networks are also kept. The firewall is immediately correct regardless: DOCKSIDE-DISPATCH is rebuilt without the removed network in Phase 1. New IpsetManager methods deregister(name): removes entry from _sets so the refresh loop stops processing it (previously orphaned entries were refreshed indefinitely). registered_names(): exposes _sets keys for the reconcile action. New IptablesManager method remove_network_chains(spec): flushes and deletes the -ING, -OUT (filter) and -NAT (nat) chains for one removed network, removing the PREROUTING jump before deleting the NAT chain (iptables rejects deleting referenced chains). Serialisation round-trip (EgressRule/NatRule/NetworkSpec/Config) to_dict(), to_net_dict(), to_fw_dict(), to_net_data(), to_fw_data() allow Config objects to be written back to network-config.json and firewall-config.json without losing structure. Atomic disk persistence _save_config(config): writes to a tempfile sibling then os.replace() for atomic rename(2) so on-disk files are never partially written. New socket actions set-network: upsert a network (topology + firewall rules); saves to disk. remove-network: remove a network; cleanup with container safety; saves to disk. set-ipset: upsert an ipset definition; saves to disk. remove-ipset: remove an ipset (rejected if still referenced by egress rules). reconcile: enumerate kernel chains/ipsets/docker-networks vs in-memory config; clean up orphaned objects; report docker orphans. Updated existing actions reload: now calls _two_phase_apply (was purely additive). apply: now calls _two_phase_apply and saves to disk (was purely additive). https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

gateway_ip served two purposes: Docker network gateway (conventionally .1) and dockside container source IP (.2). This conflation forced operators to set gateway_ip to .2, silently overriding Docker's gateway address. gateway_mac was similarly misnamed — it identifies the dockside container's interface, not the gateway. Changes: - Add dockside_ip field: source IP of the dockside container on the network, used for DOCKSIDE-DISPATCH rule matching (! -s / -s). - Rename gateway_mac → dockside_mac throughout: the MAC address of the dockside container's interface (--mac-source matching). - gateway_ip is now exclusively used for docker network creation (docker network create --gateway); defaults to .1 when absent. - NetworkSpec.managed now depends on dockside_ip/dockside_mac (not gateway_ip) since gateway_ip alone does not require firewall chains. - Config files updated: gateway_ip/.2 entries removed (Docker gateway now defaults to .1); gateway_mac → dockside_mac; dockside_ip added. - Either dockside_ip or dockside_mac (or both) may be configured; _dispatch_out_match already handles all three cases correctly. https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

The targeted-CIDR drop rule was emitting a bare `-j DROP` with no conntrack state filter, which killed ESTABLISHED/RELATED packets for flows that earlier RETURN rules had already permitted (e.g. a DNAT-redirected MySQL connection allowed on port 13306 before a `{ "action": "drop", "cidr": "192.168.0.0/16" }` rule). The rest of the OUT chain is designed around the invariant that all rules only act on NEW connections; ESTABLISHED/RELATED packets fall through to Docker's FORWARD ESTABLISHED ACCEPT rule. Align the CIDR drop with that invariant by adding `--ctstate NEW` to both the TCP REJECT and the non-TCP DROP lines, regardless of whether a destination filter is present. https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

1. Remove redundant DISPATCH exemption rules (section 3b) The dockside container's MAC/IP were already excluded from the OUT chain jump by _dispatch_out_match (!--mac-source / !-s filters), so the subsequent MAC/IP-matched RETURN rules in section 3b were unreachable. Remove them. 2. Add iptables comments to DOCKSIDE-DISPATCH rules All three DISPATCH rule types now carry a -m comment --comment "..." annotation so they are self-describing in iptables -L output: - intra-network ING jump: "intra:<dev>" - egress OUT jump: "egress:<dev>" - terminal pass-through: "pass-through" 3. Transfer firewall-config "comment" fields to iptables comments EgressRule now parses the optional "comment" key from each rule's JSON object and emits it as -m comment --comment "..." on every iptables rule that rule generates, making per-rule intent visible in iptables -L alongside the match criteria. https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

- DISPATCH ING jump: "ingress to devcontainers from dockside container" - DISPATCH OUT jump: "egress from devcontainers" - ING RETURN (MAC): "Allow ingress from dockside container" - ING RETURN (IP): "Allow ingress from dockside container" - ING DROP: "Drop all other ingress to devcontainers" https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

- docker-compose.yml: main dockside service with four networks (dockside, ds-priv, ds-clone, ds-prod) using MAC 02:00:00:00:00:02 - docker-compose-with-firewall-sidecar.yml: same plus dockside-iptables sidecar that runs the firewall daemon via nsenter into host netns - network-config.json: updated subnets to /24, removed dockside_ip from the first three networks (MAC-based matching only), updated MACs to 02:00:00:00:00:02; ds-prod retains dockside_ip for IP-based matching https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

…cket - Remove unnecessary /data volume mount - Remove depends_on: dockside — firewall daemon manages iptables independently and doesn't require dockside to be running first - Add /var/run/docker.sock so the daemon can run docker network inspect/create via DockerNetworkManager.ensure_networks() https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

Issue 1 — macOS Docker Desktop FORWARD policy: - Rename ensure_forward_drop() → ensure_forward_policy(forward_policy) which now detects the current FORWARD policy before acting. On Linux, Docker pre-sets FORWARD to DROP so the method is a no-op. On macOS Docker Desktop (FORWARD starts as ACCEPT), it sets DROP and returns True to signal that the conntrack workaround is needed. - Add ensure_conntrack_accept(): inserts iptables -I DOCKER-USER 1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT at position 1 of DOCKER-USER, i.e. before the -j DOCKSIDE-DISPATCH jump, so that docker exec sessions are not interrupted by the FORWARD DROP policy. The rule is idempotent (guarded by -C) and is removed by teardown(). - Add --forward-policy={drop,allow-accept} CLI option. 'drop' (default): always enforce FORWARD DROP, applying the conntrack workaround when changing from ACCEPT. 'allow-accept': leave an existing ACCEPT policy unchanged (FAOD: never actively sets the policy to ACCEPT). - Thread forward_policy through FirewallDaemon.__init__ and _apply_full. Issue 2 — docker socket unavailable when networks pre-exist: - Make docker network create non-fatal (allow_fail=True). If creation fails (e.g. socket not bind-mounted on macOS), a warning is logged and the daemon proceeds to apply iptables rules. Rules reference bridge interface names and take effect automatically once the network is created externally. No security risk: a container cannot join a non-existent network. https://claude.ai/code/session_01BFN7j9ppVBz6cr8zGfxeKi

…auth bypass Issue 1 (High) — strict input validation before iptables-restore: - Add validator functions (_val_identifier, _val_iface, _val_ip, _val_cidr, _val_mac, _val_proto, _val_port, _val_comment, _val_icmp_type, _val_host_entry) using allow-lists and stdlib ipaddress module. - Call validators in EgressRule, NatRule, and NetworkSpec constructors, and in Config.from_dicts() for ipset names and host entries. - Any invalid value raises ValueError before kernel mutation is attempted. Issue 2 (Medium/High) — management socket peer credential verification: - Read SO_PEERCRED on each accepted connection (pid, uid, gid). - Require UID 0 (root) for all mutating actions (apply, reload, set-network, remove-network, set-ipset, remove-ipset, reconcile). - Read-only actions (status, refresh) remain open to any group member. - Log peer pid/uid/gid and action for every request (audit trail). https://claude.ai/code/session_016QwTdAvrMgYdfJuycgobHc

…w-firewall-fixes-YgXxW

- Lines of JSON may have `//` comments appended, but such comments may not include a `"`

…-networks-jev8A

- Lines of JSON may have `//` comments appended, but such comments may not include a `"`

…snowlabs/dockside into claude/compare-docker-networks-jev8A

…-networks-jev8A

claude and others added 30 commits March 20, 2026 01:38

[Misc] Update .gitignore to ignore Python __pycache__ and .pyc files

108a05b

https://claude.ai/code/session_0159qacE1enywepN5NTcqtuf

[Docs] Remove outdated runtimes/sysbox.md

6b2c2a6

[Docs] Include more CLI examples in ssh.md

3a9160f

[Docs] Update the 'Why Dockside?' to fix inaccuracies

10de3ac

[Misc] .gitignore .claude/*.local.*

910acdb

chore: ignore Python __pycache__ directories and .pyc files

9e07143

struanb and others added 26 commits March 22, 2026 15:51

[API] Fix uninitialised value warning

53a6004

[Build] Remove redundant comment, diagnostics

d165b12

[Build] Ensure clean git repo, using alpine/git stage

9ccc6ed

Fix: Profile networks no longer mandatory

fc5401b

[Docs] Update install instructions

f55427b

- Recommend docker-compose.yml - Add named volume mounted at /opt/dockside - Add Dockside systemd unit

fix: pass correct chain name to _egress_to_iptables

6018aba

Egress rules were being appended to chain prefix (e.g. DOCKSIDE-MYNET) instead of the OUT chain (e.g. DOCKSIDE-MYNET-OUT), which iptables-restore would reject since no such chain is declared. Fix by passing f"{p}-OUT".

Merge remote-tracking branch 'origin/main' into claude/compare-docker…

db7741c

…-networks-jev8A

Add security review for dockside network firewall daemon

f01f679

[Launch] Fix .gitconfig updates to use git config --replace-all

826ed61

Merge remote-tracking branch 'origin/main' into claude/security-revie…

38cb0ce

…w-firewall-fixes-YgXxW

[Config] Fix unparseable users.json

bb40df6

- Lines of JSON may have `//` comments appended, but such comments may not include a `"`

Merge remote-tracking branch 'origin/main' into claude/compare-docker…

5dd46d8

…-networks-jev8A

[Fix] Revert two unneeded lines from commit efd705e

fccd149

struanb changed the title ~~Replace bash firewall daemon with Python 3.6+ implementation~~ Python-driven iptables/ipset firewalling sandbox Apr 2, 2026

struanb added 3 commits April 2, 2026 21:10

[Config] Fix unparseable users.json

7abbb05

- Lines of JSON may have `//` comments appended, but such comments may not include a `"`

Merge branch 'claude/compare-docker-networks-jev8A' of github.com:new…

f726ae1

…snowlabs/dockside into claude/compare-docker-networks-jev8A

Merge remote-tracking branch 'origin/main' into claude/compare-docker…

10b31be

…-networks-jev8A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Python-driven iptables/ipset firewalling sandbox#35

Python-driven iptables/ipset firewalling sandbox#35
struanb wants to merge 59 commits into
devel-install-20260319-1from
claude/compare-docker-networks-jev8A

struanb commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

struanb commented Mar 20, 2026

Summary

Key Changes

Implementation Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants