Skip to content

Commit 0372115

Browse files
authored
Merge pull request #186 from longieirl/fix/sec-cve-2026-33845
fix(sec): suppress CVE-2026-33845, bump to v0.1.6, track CLAUDE.md
2 parents 9a7a949 + ec21ac3 commit 0372115

7 files changed

Lines changed: 217 additions & 9 deletions

File tree

.gitignore

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -206,8 +206,7 @@ site/
206206
# ==============================================================================
207207

208208
.claude/
209-
CLAUDE.md
210-
!resources/**/CLAUDE.md
209+
!CLAUDE.md
211210
.pr_template_content.md
212211
HANDOFF.md
213212
MEMORY.md

.trivyignore

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,13 @@
11
# Trivy vulnerability ignore list
22
# Format: CVE-YYYY-NNNNN [reason]
33

4-
# No ignored vulnerabilities - all critical and high severity issues resolved
5-
# Last review: 2026-03-02
6-
# Next review: 2026-04-01
4+
# CVE-2026-33845: libgnutls30t64 — GnuTLS DoS via DTLS zero-length fragment.
5+
# No fixed version available in Debian 13 as of 2026-05-08; apt-get upgrade cannot
6+
# resolve this. The application processes PDF files locally and never initiates or
7+
# handles DTLS traffic, so this code path is unreachable at runtime.
8+
# Re-evaluate when a Debian patch is released.
9+
CVE-2026-33845
10+
11+
# Last review: 2026-05-08
12+
# Next review: 2026-06-08
713

CHANGELOG.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,13 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
---
1111

12+
## [0.1.6] — 2026-05-08
13+
14+
### Security
15+
- **CVE-2026-33845** (`#184`) — `libgnutls30t64` (GnuTLS DoS via DTLS zero-length fragment). No fixed version is available in Debian 13; `apt-get upgrade -y` cannot resolve this. The application processes PDF files locally and never handles DTLS traffic, so the vulnerable code path is unreachable at runtime. Added to `.trivyignore` with justification and a 2026-06-08 review date. Will be removed once Debian ships a patched package.
16+
17+
---
18+
1219
## [0.1.5] — 2026-05-04
1320

1421
### Security

CLAUDE.md

Lines changed: 196 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,196 @@
1+
# CLAUDE.md — Bank Statements Processor
2+
3+
## Repo Overview
4+
5+
This is the **free-tier** open-source repo. The private `bankstatements-premium` repo holds the paid-tier Docker image published to GHCR. Do not conflate the two.
6+
7+
- **Local Docker image name:** `bankstatementsprocessor` (built from `Dockerfile`)
8+
- **Production image:** `ghcr.io/longieirl/bankstatements-premium:latest` (private repo only)
9+
- Legitimate references to `ghcr.io/longieirl/bankstatements` belong only in `.env.remote`, `Makefile docker-push`, and `.github/workflows/`.
10+
11+
Current version: **0.1.6**
12+
13+
---
14+
15+
## Package Layout
16+
17+
```
18+
packages/
19+
parser-core/ bankstatements-core (PyPI) — PDF extraction, services, templates
20+
parser-free/ bankstatements-free (free-tier CLI) — thin wrapper around parser-core
21+
templates/ shared bank template JSON files
22+
custom_templates/ user-overridable templates
23+
```
24+
25+
**Source of truth for Docker:** `packages/parser-core/` and `packages/parser-free/`.
26+
`src/` at the repo root is a mirror/symlink for local test running only — never edit it.
27+
28+
Real source: `packages/parser-core/src/bankstatements_core/`
29+
30+
### Module structure (`bankstatements_core`)
31+
32+
```
33+
adapters/ pdfplumber adapter
34+
analysis/ bbox utils, column/table analysis, template generator
35+
builders/ ProcessorBuilder
36+
commands/ CLI commands (analyze-pdf, init)
37+
config/ AppConfig, ProcessorConfig, EnvironmentParser
38+
domain/ models, protocols, services, converters, currency
39+
extraction/ PDFExtractor, IBANExtractor, RowBuilder, WordUtils
40+
facades/ ProcessingFacade
41+
patterns/ factories, repositories, strategies
42+
services/ all business logic services
43+
templates/ bank JSON templates + detectors
44+
entitlements.py
45+
processor.py
46+
pdf_table_extractor.py # legacy shim — delegates to extraction/, treat as deprecated
47+
```
48+
49+
---
50+
51+
## Development Setup
52+
53+
```bash
54+
pip install -e packages/parser-core[dev,test]
55+
pip install -e packages/parser-free[test]
56+
```
57+
58+
---
59+
60+
## Running Tests
61+
62+
```bash
63+
# parser-core (run from repo root)
64+
pytest packages/parser-core/tests/ --cov=bankstatements_core --cov-fail-under=91
65+
66+
# parser-free
67+
pytest packages/parser-free/tests/
68+
69+
# integration (run from repo root)
70+
python -m pytest packages/parser-core/tests/integration/ -m integration --no-cov
71+
72+
# re-baseline integration snapshot
73+
pytest packages/parser-core/tests/integration/ -m integration --snapshot-update --no-cov
74+
75+
# parallel (faster)
76+
pytest packages/parser-core/tests/ -n auto
77+
```
78+
79+
Tests default to `not integration` — run integration tests explicitly with `-m integration`.
80+
Coverage minimum: **91%** on `bankstatements-core`.
81+
82+
---
83+
84+
## Linting & Formatting
85+
86+
Run these together before every push (CI checks all four):
87+
88+
```bash
89+
black packages/parser-core/src packages/parser-core/tests
90+
isort packages/parser-core/src packages/parser-core/tests
91+
ruff check packages/parser-core/src packages/parser-core/tests
92+
mypy packages/parser-core/src
93+
```
94+
95+
For `parser-free`, run isort **from within `packages/parser-free/`** — CI sort order differs from root.
96+
97+
**Black gotcha:** Black collapses multi-line `raise`/`return` onto one line if it fits in 88 chars. Always write them as single lines:
98+
- `raise ValueError(f"...")` not a multi-line form
99+
- `raise TypeError(f"...")` not a multi-line form
100+
101+
**Logging:** use `%`-formatting, not f-strings — enforced by ruff rule G004.
102+
103+
---
104+
105+
## Make Targets
106+
107+
```bash
108+
make docker-local # build from source + run
109+
make docker-remote # pull production image + run
110+
make docker-build # build only
111+
make docker-integration # snapshot-based Docker integration test
112+
make docker-scan-trivy # trivy HIGH/CRITICAL scan
113+
make docker-secure-run # network-isolated (GDPR mode)
114+
```
115+
116+
---
117+
118+
## Version Bumping
119+
120+
Three files must always match — CI compares them and fails on mismatch:
121+
122+
1. `packages/parser-core/pyproject.toml``version = "x.y.z"`
123+
2. `packages/parser-core/src/bankstatements_core/__version__.py`
124+
3. `packages/parser-free/pyproject.toml``version = "x.y.z"`
125+
126+
```bash
127+
make version-bump-patch # bump x.x.N
128+
make version-bump-minor # bump x.N.0
129+
make version-bump-major # bump N.0.0
130+
```
131+
132+
---
133+
134+
## Creating Pull Requests
135+
136+
**Never push directly to `main`.** Always create a feature branch, push the branch, and open a PR. Branch protection requires CI to pass before merge.
137+
138+
```bash
139+
git checkout -b <branch-name>
140+
git push -u origin <branch-name>
141+
```
142+
143+
Always use `.github/PULL_REQUEST_TEMPLATE.md`. Pass `--assignee @me` on `gh pr create``gh pr edit` lacks the required token scope.
144+
145+
```bash
146+
gh pr create --assignee @me --title "..." --body "$(cat <<'EOF'
147+
...populated template...
148+
EOF
149+
)"
150+
```
151+
152+
---
153+
154+
## Key Architecture Notes
155+
156+
- `ExtractionResult.card_number: str | None``None` = bank statement, string = credit card (last-4 suffix)
157+
- `BankTemplate.column_aliases` — renames template keys to canonical column names; `RowPostProcessor._apply_column_aliases()` is the sole owner
158+
- `CCGroupingService` in `services/card_grouping.py` — groups CC results by last-4 card suffix
159+
- `processor.run()` splits on `card_number is None`: bank → `group_by_iban`, CC → `group_by_card`
160+
- `PDFProcessingOrchestrator.process_all_pdfs()` returns `tuple[list[ExtractionResult], int, int]``(results, pdf_count, pages_read)`
161+
- `ServiceRegistry.from_config(ProcessorConfig, Entitlements)` is the primary factory
162+
- Credit card support is **paid tier only** via `require_iban=False` in `Entitlements.paid_tier()`
163+
- Service layer uses `list[Transaction]` throughout — no dict round-trips internally; conversion at output boundary via `transactions_to_dicts()`
164+
- Architecture test (`test_architecture.py`) enforces module placement and bans circular imports
165+
166+
---
167+
168+
## CI Workflows
169+
170+
| Workflow | File | Trigger |
171+
|---|---|---|
172+
| Main CI | `ci.yml` | push/PR to main |
173+
| Release (root) | `release.yml` | tag push |
174+
| Release (core) | `release-core.yml` | tag push |
175+
| Security scan | `security-scan.yml` | schedule + push |
176+
| Boundary check | `boundary-check.yml` | push/PR |
177+
| PR labeler | `pr-labeler.yml` | PR open/sync |
178+
179+
CI enforces: ruff, black, mypy, pylint design gates (Xenon), bandit, pip-audit, trivy (0 critical), coverage ≥ 91%.
180+
181+
**Security:** workflows use quoted shell variables and avoid `${{ github.* }}` interpolation directly in `run:` steps to prevent shell injection (hardened in PRs #168#171). Production image runs `apt-get upgrade -y` on every build to pull latest Debian patches.
182+
183+
---
184+
185+
## Open Issues
186+
187+
- **#59** — Docker integration CI job (blocked — needs fake PDFs; local tooling done in PR #70)
188+
189+
---
190+
191+
## Gitignored Files (never commit)
192+
193+
- `HANDOFF.md`, `MEMORY.md`
194+
- `.env.local` (may contain tokens)
195+
- `logs/processing_activity.jsonl`
196+
- `input/`, `output/` contents

packages/parser-core/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "bankstatements-core"
3-
version = "0.1.5"
3+
version = "0.1.6"
44
description = "Core PDF bank statement parsing library"
55
readme = "README.md"
66
requires-python = ">=3.11"

packages/parser-core/src/bankstatements_core/__version__.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,5 +2,5 @@
22

33
from __future__ import annotations
44

5-
__version__ = "0.1.5"
6-
__version_info__ = (0, 1, 5)
5+
__version__ = "0.1.6"
6+
__version_info__ = (0, 1, 6)

packages/parser-free/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "bankstatements-free"
3-
version = "0.1.5"
3+
version = "0.1.6"
44
description = "Free-tier CLI for bankstatements-core PDF bank statement processor"
55
readme = "README.md"
66
requires-python = ">=3.11"

0 commit comments

Comments
 (0)