Skip to content

feat: add file-based config and web api workflow#392

Open
Emptytao wants to merge 1 commit intousestrix:mainfrom
Emptytao:codex/api-config-cleanup-zh-docs
Open

feat: add file-based config and web api workflow#392
Emptytao wants to merge 1 commit intousestrix:mainfrom
Emptytao:codex/api-config-cleanup-zh-docs

Conversation

@Emptytao
Copy link

No description provided.

Copy link
Author

@Emptytao Emptytao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 25, 2026

Greptile Summary

This PR introduces two major capabilities: a file-based structured config system (replacing env-var-only configuration with a Pydantic-validated JSON config file at ~/.strix/config.json) and a Web API workflow (a FastAPI server with async SSE streaming, file-backed task persistence, and a subprocess-based worker model for running scans). The config refactor is well-engineered with legacy compatibility, deep-merge logic, and validation. The API layer is architecturally clean and the SSE streaming implementation is correct.

Key concerns found:

  • Arbitrary server-side file read via instruction_file (strix/api/task_manager.py:63-73): the API accepts a server filesystem path, reads it, and returns its content in the task record. Since api_auth_token defaults to null (no auth), this is an unauthenticated file read vulnerability — any client that can reach the API can exfiltrate arbitrary files (e.g. /etc/passwd, private keys). At minimum, instruction_file should be dropped from the API surface or restricted to a designated directory.
  • GET /api/v1/tasks/{id} returns full ScanTaskResult instead of just the task record, making it an expensive and confusing duplicate of /result.
  • sandbox_mode hardcoded to False in worker.py, silently ignoring the operator's config setting.
  • Unsynchronized concurrent-task limit check in create_task is susceptible to a race condition under concurrent requests.
  • finished_at can be None on completed tasks when scan_state lacks an end_time field.

Confidence Score: 2/5

  • Not safe to merge until the instruction_file arbitrary file read is addressed — this is an unauthenticated data exfiltration path when auth token is not configured.
  • The config refactor and overall API structure are solid, but the instruction_file field in ScanTaskRequest creates a server-side arbitrary file read that is exposed with no authentication by default. This is a production security issue that needs to be resolved before the API is deployable.
  • Pay close attention to strix/api/task_manager.py (instruction_file read + race condition) and strix/api/server.py (GET /tasks/{id} response shape).

Important Files Changed

Filename Overview
strix/api/task_manager.py Core task orchestration layer — spawns worker subprocesses and manages task lifecycle. Contains a P0 security issue: the instruction_file field allows reading arbitrary server-side files, and an unsynchronized concurrent-task limit check that can be raced.
strix/api/server.py FastAPI app with SSE streaming and Bearer-token auth. GET /api/v1/tasks/{id} returns a full ScanTaskResult (scan_state + artifacts) rather than just the task record, making it redundant with the /result endpoint and expensive on every status poll.
strix/api/worker.py Subprocess entry-point that runs the scan and writes task state. Hardcodes sandbox_mode=False, silently ignoring the operator's runtime.sandbox_mode config setting.
strix/api/task_store.py File-based task persistence with process-exit polling. refresh() can leave finished_at=None on a COMPLETED task when scan_state["run_metadata"]["end_time"] is absent.
strix/api/models.py Pydantic models for tasks and requests. Well-structured with extra="forbid" on request and normalize_record validator syncing finished_at/completed_at. The instruction_file field is the surface for the file-read issue in task_manager.
strix/config/config.py Refactored config layer moving from env-var-only to a structured JSON file with Pydantic models. Well-designed with legacy env compatibility, deep-merge logic, and file validation.
strix/scan/service.py New scan service module refactored out of interface/main.py, providing shared prepare_scan / execute_prepared_scan logic used by both the CLI and API worker. Clean extraction with no issues found.
tests/api/test_server.py Integration tests for all REST endpoints and SSE stream using a FakeTaskManager. Good coverage of the happy path; FakeTaskManager.get_task returns None instead of raising KeyError like the real implementation, so the KeyError branch in _get_task_or_404 is untested.
Prompt To Fix All With AI
This is a comment left during a code review.
Path: strix/api/task_manager.py
Line: 63-73

Comment:
**`instruction_file` enables arbitrary server-side file read**

When `instruction_file` is present in the request, the API server reads that path directly from its own filesystem and injects the content into the scan as the instruction. Because `api_auth_token` defaults to `null` (no authentication required out of the box), any client that can reach the API can exfiltrate arbitrary files:

```
POST /api/v1/tasks
{"targets": ["https://example.com"], "instruction_file": "/etc/passwd"}
```

The file content is then stored in the task record (and served back via the task result endpoint), making this an unauthenticated arbitrary file read.

Even when an auth token is set, a compromised token grants full read access to the server's filesystem — a significant privilege escalation beyond the intended "run a scan" capability.

Consider removing `instruction_file` from the API surface entirely, or restricting it to a configurable allow-list directory (e.g. only paths inside a configured `instructions_dir`).

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/api/server.py
Line: 92-95

Comment:
**`GET /tasks/{id}` returns full `ScanTaskResult`, not just the task record**

`_get_result_or_404` returns a `ScanTaskResult` (containing `task`, `scan_state`, and `artifacts`), and the response is `result.model_dump(mode="python")`. That means this endpoint returns the same payload as `GET /api/v1/tasks/{task_id}/result`:

```python
# GET /api/v1/tasks/{task_id}  →  {"task": {...}, "scan_state": {...}, "artifacts": [...]}
# GET /api/v1/tasks/{task_id}/result  →  {"task": {...}, "scan_state": {...}, "artifacts": [...]}
```

Callers expecting `GET /tasks/{id}` to return a lightweight status object (like the records returned by `GET /tasks`) will be surprised by the heavier payload. A typical REST convention here is to return just the `ScanTaskRecord`. Returning the full result can also cause unnecessary I/O (globbing all artifact paths) on every status poll.

Consider returning only the task record at this endpoint:
```suggestion
    @app.get("/api/v1/tasks/{task_id}", dependencies=[Depends(require_auth)])
    async def get_task(task_id: str) -> dict[str, object]:
        record = _get_task_or_404(task_id)
        return {"task": record.model_dump(mode="python")}
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/api/task_manager.py
Line: 47-57

Comment:
**Race condition in concurrent-task limit check**

`list_tasks()` + the `len(active_tasks) >= max_concurrent_tasks` check is not atomic. If two requests arrive simultaneously both can read `len(active_tasks) == 0` (with default `max_concurrent_tasks=1`) and both will pass the guard, resulting in two tasks being started when only one should be allowed.

Consider using a threading lock or an asyncio lock around the check-and-create step, or using an in-process counter instead of the disk-based list.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/api/task_store.py
Line: 97-101

Comment:
**`finished_at` may be `None` for completed tasks**

When `refresh()` promotes a task to `COMPLETED` via the scan-state path, `finished_at` is read from `scan_state["run_metadata"]["end_time"]`, which can be absent:

```python
record.finished_at = (scan_state.get("run_metadata") or {}).get("end_time")  # → None if missing
```

A `ScanTaskRecord` with `status=COMPLETED` and `finished_at=None` will produce inconsistent API responses (e.g. the `task.finished` SSE event carries a record without a finish timestamp). The `ScanTaskRecord.normalize_record` validator also syncs `completed_at ↔ finished_at`, so both fields will end up `None`.

Consider falling back to the current time when `end_time` is unavailable:
```suggestion
            record.finished_at = (scan_state.get("run_metadata") or {}).get("end_time") or utc_now_iso()
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: strix/api/worker.py
Line: 57-60

Comment:
**`sandbox_mode` hardcoded to `False`, ignoring config**

`configure_runtime_context` is called with `sandbox_mode=False` unconditionally, which means the `runtime.sandbox_mode` setting from the loaded config file is silently ignored for all API-triggered scans. If an operator sets `sandbox_mode: true` in their config expecting isolated execution, this worker will override that intent without any warning.

Consider reading the value from `Config.get_bool("strix_sandbox_mode")` so the worker respects the operator's configuration, consistent with how other settings are read from the config.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat: add file-based config and web api ..." | Re-trigger Greptile

Comment on lines +63 to +73
instruction = request.instruction
if request.instruction_file:
instruction_path = Path(request.instruction_file).expanduser().resolve()
try:
instruction = instruction_path.read_text(encoding="utf-8").strip()
except OSError as exc:
raise ValueError(
f"Failed to read instruction file '{instruction_path}': {exc}"
) from exc
if not instruction:
raise ValueError(f"Instruction file '{instruction_path}' is empty")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 instruction_file enables arbitrary server-side file read

When instruction_file is present in the request, the API server reads that path directly from its own filesystem and injects the content into the scan as the instruction. Because api_auth_token defaults to null (no authentication required out of the box), any client that can reach the API can exfiltrate arbitrary files:

POST /api/v1/tasks
{"targets": ["https://example.com"], "instruction_file": "/etc/passwd"}

The file content is then stored in the task record (and served back via the task result endpoint), making this an unauthenticated arbitrary file read.

Even when an auth token is set, a compromised token grants full read access to the server's filesystem — a significant privilege escalation beyond the intended "run a scan" capability.

Consider removing instruction_file from the API surface entirely, or restricting it to a configurable allow-list directory (e.g. only paths inside a configured instructions_dir).

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/api/task_manager.py
Line: 63-73

Comment:
**`instruction_file` enables arbitrary server-side file read**

When `instruction_file` is present in the request, the API server reads that path directly from its own filesystem and injects the content into the scan as the instruction. Because `api_auth_token` defaults to `null` (no authentication required out of the box), any client that can reach the API can exfiltrate arbitrary files:

```
POST /api/v1/tasks
{"targets": ["https://example.com"], "instruction_file": "/etc/passwd"}
```

The file content is then stored in the task record (and served back via the task result endpoint), making this an unauthenticated arbitrary file read.

Even when an auth token is set, a compromised token grants full read access to the server's filesystem — a significant privilege escalation beyond the intended "run a scan" capability.

Consider removing `instruction_file` from the API surface entirely, or restricting it to a configurable allow-list directory (e.g. only paths inside a configured `instructions_dir`).

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +92 to +95
@app.get("/api/v1/tasks/{task_id}", dependencies=[Depends(require_auth)])
async def get_task(task_id: str) -> dict[str, object]:
result = _get_result_or_404(task_id)
return result.model_dump(mode="python")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 GET /tasks/{id} returns full ScanTaskResult, not just the task record

_get_result_or_404 returns a ScanTaskResult (containing task, scan_state, and artifacts), and the response is result.model_dump(mode="python"). That means this endpoint returns the same payload as GET /api/v1/tasks/{task_id}/result:

# GET /api/v1/tasks/{task_id}  →  {"task": {...}, "scan_state": {...}, "artifacts": [...]}
# GET /api/v1/tasks/{task_id}/result  →  {"task": {...}, "scan_state": {...}, "artifacts": [...]}

Callers expecting GET /tasks/{id} to return a lightweight status object (like the records returned by GET /tasks) will be surprised by the heavier payload. A typical REST convention here is to return just the ScanTaskRecord. Returning the full result can also cause unnecessary I/O (globbing all artifact paths) on every status poll.

Consider returning only the task record at this endpoint:

Suggested change
@app.get("/api/v1/tasks/{task_id}", dependencies=[Depends(require_auth)])
async def get_task(task_id: str) -> dict[str, object]:
result = _get_result_or_404(task_id)
return result.model_dump(mode="python")
@app.get("/api/v1/tasks/{task_id}", dependencies=[Depends(require_auth)])
async def get_task(task_id: str) -> dict[str, object]:
record = _get_task_or_404(task_id)
return {"task": record.model_dump(mode="python")}
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/api/server.py
Line: 92-95

Comment:
**`GET /tasks/{id}` returns full `ScanTaskResult`, not just the task record**

`_get_result_or_404` returns a `ScanTaskResult` (containing `task`, `scan_state`, and `artifacts`), and the response is `result.model_dump(mode="python")`. That means this endpoint returns the same payload as `GET /api/v1/tasks/{task_id}/result`:

```python
# GET /api/v1/tasks/{task_id}  →  {"task": {...}, "scan_state": {...}, "artifacts": [...]}
# GET /api/v1/tasks/{task_id}/result  →  {"task": {...}, "scan_state": {...}, "artifacts": [...]}
```

Callers expecting `GET /tasks/{id}` to return a lightweight status object (like the records returned by `GET /tasks`) will be surprised by the heavier payload. A typical REST convention here is to return just the `ScanTaskRecord`. Returning the full result can also cause unnecessary I/O (globbing all artifact paths) on every status poll.

Consider returning only the task record at this endpoint:
```suggestion
    @app.get("/api/v1/tasks/{task_id}", dependencies=[Depends(require_auth)])
    async def get_task(task_id: str) -> dict[str, object]:
        record = _get_task_or_404(task_id)
        return {"task": record.model_dump(mode="python")}
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +47 to +57
max_concurrent_tasks = Config.get_int("api_max_concurrent_tasks") or 1
active_tasks = [
task
for task in self.list_tasks()
if task.status in {TaskStatus.QUEUED, TaskStatus.RUNNING, TaskStatus.CANCELLING}
]
if len(active_tasks) >= max_concurrent_tasks:
raise ValueError(
"Maximum concurrent task limit reached. "
f"Current limit: {max_concurrent_tasks}"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Race condition in concurrent-task limit check

list_tasks() + the len(active_tasks) >= max_concurrent_tasks check is not atomic. If two requests arrive simultaneously both can read len(active_tasks) == 0 (with default max_concurrent_tasks=1) and both will pass the guard, resulting in two tasks being started when only one should be allowed.

Consider using a threading lock or an asyncio lock around the check-and-create step, or using an in-process counter instead of the disk-based list.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/api/task_manager.py
Line: 47-57

Comment:
**Race condition in concurrent-task limit check**

`list_tasks()` + the `len(active_tasks) >= max_concurrent_tasks` check is not atomic. If two requests arrive simultaneously both can read `len(active_tasks) == 0` (with default `max_concurrent_tasks=1`) and both will pass the guard, resulting in two tasks being started when only one should be allowed.

Consider using a threading lock or an asyncio lock around the check-and-create step, or using an in-process counter instead of the disk-based list.

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +97 to +101
scan_state = self.load_scan_state(record.task_id)
if scan_state and (scan_state.get("run_metadata") or {}).get("status") == "completed":
record.status = TaskStatus.COMPLETED
record.finished_at = (scan_state.get("run_metadata") or {}).get("end_time")
return self.save(record)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 finished_at may be None for completed tasks

When refresh() promotes a task to COMPLETED via the scan-state path, finished_at is read from scan_state["run_metadata"]["end_time"], which can be absent:

record.finished_at = (scan_state.get("run_metadata") or {}).get("end_time")  # → None if missing

A ScanTaskRecord with status=COMPLETED and finished_at=None will produce inconsistent API responses (e.g. the task.finished SSE event carries a record without a finish timestamp). The ScanTaskRecord.normalize_record validator also syncs completed_at ↔ finished_at, so both fields will end up None.

Consider falling back to the current time when end_time is unavailable:

Suggested change
scan_state = self.load_scan_state(record.task_id)
if scan_state and (scan_state.get("run_metadata") or {}).get("status") == "completed":
record.status = TaskStatus.COMPLETED
record.finished_at = (scan_state.get("run_metadata") or {}).get("end_time")
return self.save(record)
record.finished_at = (scan_state.get("run_metadata") or {}).get("end_time") or utc_now_iso()
Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/api/task_store.py
Line: 97-101

Comment:
**`finished_at` may be `None` for completed tasks**

When `refresh()` promotes a task to `COMPLETED` via the scan-state path, `finished_at` is read from `scan_state["run_metadata"]["end_time"]`, which can be absent:

```python
record.finished_at = (scan_state.get("run_metadata") or {}).get("end_time")  # → None if missing
```

A `ScanTaskRecord` with `status=COMPLETED` and `finished_at=None` will produce inconsistent API responses (e.g. the `task.finished` SSE event carries a record without a finish timestamp). The `ScanTaskRecord.normalize_record` validator also syncs `completed_at ↔ finished_at`, so both fields will end up `None`.

Consider falling back to the current time when `end_time` is unavailable:
```suggestion
            record.finished_at = (scan_state.get("run_metadata") or {}).get("end_time") or utc_now_iso()
```

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +57 to +60
configure_runtime_context(
sandbox_mode=False,
caido_api_token=Config.get_str("caido_api_token"),
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 sandbox_mode hardcoded to False, ignoring config

configure_runtime_context is called with sandbox_mode=False unconditionally, which means the runtime.sandbox_mode setting from the loaded config file is silently ignored for all API-triggered scans. If an operator sets sandbox_mode: true in their config expecting isolated execution, this worker will override that intent without any warning.

Consider reading the value from Config.get_bool("strix_sandbox_mode") so the worker respects the operator's configuration, consistent with how other settings are read from the config.

Prompt To Fix With AI
This is a comment left during a code review.
Path: strix/api/worker.py
Line: 57-60

Comment:
**`sandbox_mode` hardcoded to `False`, ignoring config**

`configure_runtime_context` is called with `sandbox_mode=False` unconditionally, which means the `runtime.sandbox_mode` setting from the loaded config file is silently ignored for all API-triggered scans. If an operator sets `sandbox_mode: true` in their config expecting isolated execution, this worker will override that intent without any warning.

Consider reading the value from `Config.get_bool("strix_sandbox_mode")` so the worker respects the operator's configuration, consistent with how other settings are read from the config.

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant