[Stack 19/27] Fix D8: match Clojure repful classification (rat > rdt)#2451
[Stack 19/27] Fix D8: match Clojure repful classification (rat > rdt)#2451jucor wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR simplifies the repful ("representative for agree or disagree") classification in finalize_cmt_stats to match Clojure's repness.clj:175-177 logic. The old 3-branch conditional (pa > 0.5 AND ra > 1.0 → agree, pd > 0.5 AND rd > 1.0 → disagree, fallback to higher metric) is replaced with the simpler rat > rdt → agree, else disagree.
Changes:
- Replaced both scalar (
finalize_cmt_stats) and vectorized (compute_group_comment_stats_df) repful classification withrat > rdtcomparison - Expanded D8 tests from 2 to 6 formula tests (including edge cases: equal, both negative, both zero), removed
xfailmarkers for now-passing tests - Re-recorded golden snapshots for affected datasets to reflect repful direction changes
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| delphi/polismath/pca_kmeans_rep/repness.py | Simplified repful classification in both scalar and vectorized paths to rat > rdt |
| delphi/tests/test_discrepancy_fixes.py | Added 4 new edge-case tests, removed xfail from D8 formula tests, updated xfail reason on blob test |
| delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md | Added PR 7 / Session 10 journal entry documenting the D8 fix |
| delphi/docs/PLAN_DISCREPANCY_FIXES.md | Marked D8 as DONE |
| delphi/real_data/r6vbnhffkxbd7ifmfbdrd-vw/golden_snapshot.json | Re-recorded snapshot with updated repness values and repful directions |
| delphi/real_data/r4tykwac8thvzv35jrn53-biodiversity/golden_snapshot.json | Re-recorded snapshot with updated repness values |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
| D8: Python uses if pa > 0.5 AND ra > 1.0 → 'agree'; elif pd > 0.5 AND rd > 1.0 → 'disagree' | ||
| Clojure uses simple rat > rdt → 'agree'; else → 'disagree' | ||
| Clojure uses simple rat > rdt → 'agree'; else → 'disagree' (repness.clj:175-177) |
1d18f1b to
45d6c60
Compare
db0fe14 to
2a625de
Compare
45d6c60 to
5b57f8a
Compare
39fc7ef to
c0684d4
Compare
5b57f8a to
7705349
Compare
c0684d4 to
42d9f25
Compare
Delphi Coverage Report
|
42d9f25 to
b68c7e5
Compare
1a0f157 to
a8428d5
Compare
b68c7e5 to
c2e521d
Compare
a8428d5 to
d9ed377
Compare
c2e521d to
b8a8e08
Compare
d9ed377 to
9f20b50
Compare
b8a8e08 to
0154ce7
Compare
9f20b50 to
65f136d
Compare
0154ce7 to
a313f1c
Compare
65f136d to
a0d8710
Compare
a313f1c to
6a2ac55
Compare
0297ca2 to
c0f1f0f
Compare
7c92111 to
f2c2965
Compare
4ebd5ab to
baceacd
Compare
f2c2965 to
e1392d1
Compare
74b31de to
c4f5811
Compare
799a9c4 to
9a1b3b3
Compare
c4f5811 to
2960412
Compare
9a1b3b3 to
cee0f53
Compare
44b04ae to
b24d69b
Compare
3511e00 to
e209f37
Compare
b24d69b to
abfbacb
Compare
e209f37 to
4df6e36
Compare
abfbacb to
3847f76
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "n_participants_in_csv": 69, | ||
| "fixed_timestamp": 1700000000000, | ||
| "recorded_at": "2026-03-27T01:51:24.692321" | ||
| "recorded_at": "2026-03-27T01:51:39.156540" | ||
| }, |
There was a problem hiding this comment.
This snapshot update appears to only change recorded_at (and also math_tick elsewhere), which are timestamp-based and not used for regression comparisons (the comparer ignores math_tick and doesn’t compare metadata). If there are no substantive stage-output diffs, consider reverting to avoid noisy churn or making the recorder write stable values for these fields.
| "n_participants_in_csv": 536, | ||
| "fixed_timestamp": 1700000000000, | ||
| "recorded_at": "2026-03-27T01:51:23.001794" | ||
| "recorded_at": "2026-03-27T01:51:37.771402" | ||
| }, |
There was a problem hiding this comment.
This snapshot change is only updating recorded_at (and math_tick elsewhere), which are timestamp-based and not part of regression comparisons (math_tick is ignored and metadata isn’t compared). If there are no actual stage-output changes for this dataset, please revert these timestamp-only edits to keep diffs meaningful.
| - `test_discrepancy_fixes.py`: Expanded `TestD8FinalizeStats` from 2 to 7 tests (5 formula + | ||
| 1 blob xfail + edge cases for equal/negative/zero rat/rdt) |
There was a problem hiding this comment.
The journal entry says TestD8FinalizeStats expanded “from 2 to 7 tests”, but the PR description states 6 tests. Please reconcile the counts (and ideally list the exact test names) so the journal accurately reflects the change set.
| - `test_discrepancy_fixes.py`: Expanded `TestD8FinalizeStats` from 2 to 7 tests (5 formula + | |
| 1 blob xfail + edge cases for equal/negative/zero rat/rdt) | |
| - `test_discrepancy_fixes.py`: Expanded `TestD8FinalizeStats` to 6 tests covering the formula, | |
| the blob xfail, and edge cases for equal/negative/zero `rat`/`rdt` values |
3847f76 to
f7062f8
Compare
4df6e36 to
9bb9604
Compare
Documents D5-D8 review findings, blob injection tests, CI fixes, k-divergence discovery, stack reordering, and next steps. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f7062f8 to
f7329b4
Compare
|
Superseded by spr-managed PR stack. See the new stack starting at #2508. |
Summary
Simplifies the repful ("representative for agree or disagree?") classification
to match Clojure's
finalize-cmt-stats(repness.clj:175-177).Before (Python): 3-branch conditional:
pa > 0.5 AND ra > 1.0→ agreepd > 0.5 AND rd > 1.0→ disagreeAfter (Clojure):
rat > rdt→ agree, else disagree.The old thresholds were redundant —
ratandrdt(two-proportion z-scores)already encode whether the group's agree/disagree rate is significantly higher
than other groups. The simple comparison is both correct and clearer.
Changes
repness.py:finalize_cmt_stats()— 3-branch logic →rat > rdtrepness.py: Vectorized —np.selectwith conditions →np.where(rat > rdt)test_discrepancy_fixes.py: Expanded from 2 to 6 tests (including edge cases:equal rat/rdt, both negative, both zero)
Test plan
🤖 Generated with Claude Code