Skip to content

Commit 340fc26

Browse files
docs(decisions): add architectural decision records structure
Create a structured decision records system to document important technical choices across multiple domains (DevOps, Network, Consensus, etc.). This implements a modified MADR template approach for preserving context, trade-offs, and reasoning behind significant architectural decisions.
1 parent 29ed501 commit 340fc26

File tree

5 files changed

+288
-0
lines changed

5 files changed

+288
-0
lines changed

docs/decisions/README.md

+22
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,22 @@
1+
# Decision Log
2+
3+
We capture important decisions with [architectural decision records](https://adr.github.io/).
4+
5+
These records provide context, trade-offs, and reasoning taken at our community & technical cross-roads. Our goal is to preserve the understanding of the project growth, and capture enough insight to effectively revisit previous decisions.
6+
7+
Get started created a new decision record with the template:
8+
9+
```sh
10+
cp template.md NNNN-title-with-dashes.md
11+
```
12+
13+
For more rational behind this approach, see [Michael Nygard's article](http://thinkrelevance.com/blog/2011/11/15/documenting-architecture-decisions).
14+
15+
We've inherited MADR [ADR template](https://adr.github.io/madr/), which is a bit more verbose than Nygard's original template. We may simplify it in the future.
16+
17+
## Evolving Decisions
18+
19+
Many decisions build on each other, a driver of iterative change and messiness
20+
in software. By laying out the "story arc" of a particular system within the
21+
application, we hope future maintainers will be able to identify how to rewind
22+
decisions when refactoring the application becomes necessary.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
status: accepted
3+
date: 2025-02-28
4+
story: Appropriate UID/GID values for container users
5+
---
6+
7+
# Use High UID/GID Values for Container Users
8+
9+
## Context & Problem Statement
10+
11+
Docker containers share the host's user namespace by default. If container UIDs/GIDs overlap with privileged host accounts, this could lead to privilege escalation if a container escape vulnerability is exploited. Low UIDs (especially in the system user range of 100-999) are particularly risky as they often map to privileged system users on the host.
12+
13+
Our previous approach used UID/GID 101 with the `--system` flag for user creation, which falls within the system user range and could potentially overlap with critical system users on the host.
14+
15+
## Priorities & Constraints
16+
17+
* Enhance security by reducing the risk of container user namespace overlaps
18+
* Avoid warnings during container build related to system user ranges
19+
* Maintain compatibility with common Docker practices
20+
* Prevent potential privilege escalation in case of container escape
21+
22+
## Considered Options
23+
24+
* Option 1: Keep using low UID/GID (101) with `--system` flag
25+
* Option 2: Use unprivileged UID/GID (1000+) without `--system` flag
26+
* Option 3: Use high UID/GID (10000+) without `--system` flag
27+
28+
## Decision Outcome
29+
30+
Chosen option: [Option 3: Use high UID/GID (10000+) without `--system` flag]
31+
32+
We decided to:
33+
34+
1. Change the default UID/GID from 101 to 10001
35+
2. Remove the `--system` flag from user/group creation commands
36+
3. Document the security rationale for these changes
37+
38+
This approach significantly reduces the risk of UID/GID collision with host system users while avoiding build-time warnings related to system user ranges. Using a very high UID/GID (10001) provides an additional security boundary in containers where user namespaces are shared with the host.
39+
40+
### Expected Consequences
41+
42+
* Improved security posture by reducing the risk of container escapes leading to privilege escalation
43+
* Elimination of build-time warnings related to system user UID/GID ranges
44+
* Consistency with industry best practices for container security
45+
* No functional impact on container operation, as the internal user permissions remain the same
46+
47+
## More Information
48+
49+
* [NGINX Docker User ID Issue](https://github.com/nginxinc/docker-nginx/issues/490) - Demonstrates the risks of using UID 101 which overlaps with `systemd-network` user on Debian systems
50+
* [.NET Docker Issue on System Users](https://github.com/dotnet/dotnet-docker/issues/4624) - Details the problems with using `--system` flag and the SYS_UID_MAX warnings
51+
* [Docker Security Best Practices](https://docs.docker.com/develop/security-best-practices/) - General security recommendations for Docker containers
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
---
2+
status: accepted
3+
date: 2025-02-28
4+
story: Volumes permissions and privilege management in container entrypoint
5+
---
6+
7+
# Use gosu for Privilege Dropping in Entrypoint
8+
9+
## Context & Problem Statement
10+
11+
Running containerized applications as the root user is a security risk. If an attacker compromises the application, they gain root access within the container, potentially facilitating a container escape. However, some operations during container startup, such as creating directories or modifying file permissions in locations not owned by the application user, require root privileges. We need a way to perform these initial setup tasks as root, but then switch to a non-privileged user *before* executing the main application (`zebrad`). Using `USER` in the Dockerfile is insufficient because it applies to the entire runtime, and we need to change permissions *after* volumes are mounted.
12+
13+
## Priorities & Constraints
14+
15+
* Minimize the security risk by running the main application (`zebrad`) as a non-privileged user.
16+
* Allow initial setup tasks (file/directory creation, permission changes) that require root privileges.
17+
* Maintain a clean and efficient entrypoint script.
18+
* Avoid complex signal handling and TTY issues associated with `su` and `sudo`.
19+
* Ensure 1:1 parity with Docker's `--user` flag behavior.
20+
21+
## Considered Options
22+
23+
* Option 1: Use `USER` directive in Dockerfile.
24+
* Option 2: Use `su` within the entrypoint script.
25+
* Option 3: Use `sudo` within the entrypoint script.
26+
* Option 4: Use `gosu` within the entrypoint script.
27+
* Option 5: Use `chroot --userspec`
28+
* Option 6: Use `setpriv`
29+
30+
## Decision Outcome
31+
32+
Chosen option: [Option 4: Use `gosu` within the entrypoint script]
33+
34+
We chose to use `gosu` because it provides a simple and secure way to drop privileges from root to a non-privileged user *after* performing necessary setup tasks. `gosu` avoids the TTY and signal-handling complexities of `su` and `sudo`. It's designed specifically for this use case (dropping privileges in container entrypoints) and leverages the same underlying mechanisms as Docker itself for user/group handling, ensuring consistent behavior.
35+
36+
### Expected Consequences
37+
38+
* Improved security by running `zebrad` as a non-privileged user.
39+
* Simplified entrypoint script compared to using `su` or `sudo`.
40+
* Avoidance of TTY and signal-handling issues.
41+
* Consistent behavior with Docker's `--user` flag.
42+
* No negative impact on functionality, as initial setup tasks can still be performed.
43+
44+
## More Information
45+
46+
* [gosu GitHub repository](https://github.com/tianon/gosu#why) - Explains the rationale behind `gosu` and its advantages over `su` and `sudo`.
47+
* [gosu usage warning](https://github.com/tianon/gosu#warning) - Highlights the core use case (stepping down from root) and potential vulnerabilities in other scenarios.
48+
* Alternatives considered:
49+
* `chroot --userspec`: While functional, it's less common and less directly suited to this specific task than `gosu`.
50+
* `setpriv`: A viable alternative, but `gosu` is already well-established in our workflow and offers the desired functionality with a smaller footprint than a full `util-linux` installation.
51+
* `su-exec`: Another minimal alternative, but it has known parser bugs that could lead to unexpected root execution.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
---
2+
status: proposed
3+
date: 2025-02-28
4+
story: Standardize filesystem hierarchy for Zebra deployments
5+
---
6+
7+
# Standardize Filesystem Hierarchy: FHS vs. XDG
8+
9+
## Context & Problem Statement
10+
11+
Zebra currently has inconsistencies in its filesystem layout, particularly regarding where configuration, data, cache files, and binaries are stored. We need a standardized approach compatible with:
12+
13+
1. Traditional Linux systems.
14+
2. Containerized deployments (Docker).
15+
3. Cloud environments with stricter filesystem restrictions (e.g., Google's Container-Optimized OS).
16+
17+
We previously considered using the Filesystem Hierarchy Standard (FHS) exclusively ([Issue #3432](https://github.com/ZcashFoundation/zebra/issues/3432)). However, recent changes introduced the XDG Base Directory Specification, which offers a user-centric approach. We need to decide whether to:
18+
19+
* Adhere to FHS.
20+
* Adopt XDG Base Directory Specification.
21+
* Use a hybrid approach, leveraging the strengths of both.
22+
23+
The choice impacts how we structure our Docker images, where configuration files are located, and how users interact with Zebra in different environments.
24+
25+
## Priorities & Constraints
26+
27+
* **Security:** Minimize the risk of privilege escalation by adhering to least-privilege principles.
28+
* **Maintainability:** Ensure a clear and consistent filesystem layout that is easy to understand and maintain.
29+
* **Compatibility:** Work seamlessly across various Linux distributions, Docker, and cloud environments (particularly those with restricted filesystems like Google's Container-Optimized OS).
30+
* **User Experience:** Provide a predictable and user-friendly experience for locating configuration and data files.
31+
* **Flexibility:** Allow users to override default locations via environment variables where appropriate.
32+
* **Avoid Breaking Changes:** Minimize disruption to existing users and deployments, if possible.
33+
34+
## Considered Options
35+
36+
### Option 1: FHS
37+
38+
* Configuration: `/etc/zebrad/`
39+
* Data: `/var/lib/zebrad/`
40+
* Cache: `/var/cache/zebrad/`
41+
* Logs: `/var/log/zebrad/`
42+
* Binary: `/opt/zebra/bin/zebrad` or `/usr/local/bin/zebrad`
43+
44+
### Option 2: XDG Base Directory Specification
45+
46+
* Configuration: `$HOME/.config/zebrad/`
47+
* Data: `$HOME/.local/share/zebrad/`
48+
* Cache: `$HOME/.cache/zebrad/`
49+
* State: `$HOME/.local/state/zebrad/`
50+
* Binary: `$HOME/.local/bin/zebrad` or `/usr/local/bin/zebrad`
51+
52+
### Option 3: Hybrid Approach (FHS for System-Wide, XDG for User-Specific)
53+
54+
* System-wide configuration: `/etc/zebrad/`
55+
* User-specific configuration: `$XDG_CONFIG_HOME/zebrad/`
56+
* System-wide data (read-only, shared): `/usr/share/zebrad/` (e.g., checkpoints)
57+
* User-specific data: `$XDG_DATA_HOME/zebrad/`
58+
* Cache: `$XDG_CACHE_HOME/zebrad/`
59+
* State: `$XDG_STATE_HOME/zebrad/`
60+
* Runtime: `$XDG_RUNTIME_DIR/zebrad/`
61+
* Binary: `/opt/zebra/bin/zebrad` (system-wide) or `$HOME/.local/bin/zebrad` (user-specific)
62+
63+
## Pros and Cons of the Options
64+
65+
### FHS
66+
67+
* **Pros:**
68+
* Traditional and well-understood by system administrators.
69+
* Clear separation of configuration, data, cache, and binaries.
70+
* Suitable for packaged software installations.
71+
72+
* **Cons:**
73+
* Less user-friendly; requires root access to modify configuration.
74+
* Can conflict with stricter cloud environments restricting writes to `/etc` and `/var`.
75+
* Doesn't handle multi-user scenarios as gracefully as XDG.
76+
77+
### XDG Base Directory Specification
78+
79+
* **Pros:**
80+
* User-centric: configuration and data stored in user-writable locations.
81+
* Better suited for containerized and cloud environments.
82+
* Handles multi-user scenarios gracefully.
83+
* Clear separation of configuration, data, cache, and state.
84+
85+
* **Cons:**
86+
* Less traditional; might be unfamiliar to some system administrators.
87+
* Requires environment variables to be set correctly.
88+
* Binary placement less standardized.
89+
90+
### Hybrid Approach (FHS for System-Wide, XDG for User-Specific)
91+
92+
* **Pros:**
93+
* Combines strengths of FHS and XDG.
94+
* Allows system-wide defaults while prioritizing user-specific configurations.
95+
* Flexible and adaptable to different deployment scenarios.
96+
* Clear binary placement in `/opt`.
97+
98+
* **Cons:**
99+
* More complex than either FHS or XDG alone.
100+
* Requires careful consideration of precedence rules.
101+
102+
## Decision Outcome
103+
104+
Pending
105+
106+
## Expected Consequences
107+
108+
Pending
109+
110+
## More Information
111+
112+
* [Filesystem Hierarchy Standard (FHS) v3.0](https://refspecs.linuxfoundation.org/FHS_3.0/fhs-3.0.html)
113+
* [XDG Base Directory Specification](https://specifications.freedesktop.org/basedir-spec/latest/)
114+
* [Zebra Issue #3432: Use the Filesystem Hierarchy Standard (FHS) for deployments and artifacts](https://github.com/ZcashFoundation/zebra/issues/3432)
115+
* [Google Container-Optimized OS: Working with the File System](https://cloud.google.com/container-optimized-os/docs/concepts/disks-and-filesystem#working_with_the_file_system)

docs/decisions/template.md

+49
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,49 @@
1+
---
2+
# status and date are the only required elements. Feel free to remove the rest.
3+
status: {[proposed | rejected | accepted | deprecated | … | superseded by [ADR-NAME](adr-file-name.md)]}
4+
date: {YYYY-MM-DD when the decision was last updated}
5+
builds-on: {[Short Title](2021-05-15-short-title.md)}
6+
story: {description or link to contextual issue}
7+
---
8+
9+
# {short title of solved problem and solution}
10+
11+
## Context and Problem Statement
12+
13+
{2-3 sentences explaining the problem and the forces influencing the decision.}
14+
<!-- The language in this section is value-neutral. It is simply describing facts. -->
15+
16+
## Priorities & Constraints <!-- optional -->
17+
18+
* {List of concerns or constraints}
19+
* {Factors influencing the decision}
20+
21+
## Considered Options
22+
23+
* Option 1: Thing
24+
* Option 2: Another
25+
26+
### Pros and Cons of the Options <!-- optional -->
27+
28+
#### Option 1: {Brief description}
29+
30+
* Good, because {reason}
31+
* Bad, because {reason}
32+
33+
## Decision Outcome
34+
35+
Chosen option [Option 1: Thing]
36+
37+
{Clearly state the chosen option and provide justification. Reference the "Pros and Cons of the Options" section below if applicable.}
38+
39+
### Expected Consequences <!-- optional -->
40+
41+
* List of outcomes resulting from this decision
42+
<!-- Positive, negative, and/or neutral consequences, as long as they affect the team and project in the future. -->
43+
44+
## More Information <!-- optional -->
45+
46+
<!-- * Resources reviewed as part of making this decision -->
47+
<!-- * Links to any supporting documents or resources -->
48+
<!-- * Related PRs -->
49+
<!-- * Related User Journeys -->

0 commit comments

Comments
 (0)