Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)#2519
Open
jucor wants to merge 1 commit into
Open
Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)#2519jucor wants to merge 1 commit into
jucor wants to merge 1 commit into
Conversation
This was referenced Mar 30, 2026
cd39374 to
a387b9e
Compare
24de40d to
add1343
Compare
a387b9e to
956e3a8
Compare
ballPointPenguin
approved these changes
Apr 26, 2026
Contributor
There was a problem hiding this comment.
Pull request overview
Aligns Delphi’s Python representativeness pipeline with the Clojure implementation by replacing the one-proportion z-test with Clojure’s Wilson-score-like (+1 pseudocount) prop_test(succ, n) formula, and updates callers/tests accordingly to keep Python↔Clojure parity work unblocked.
Changes:
- Replace
prop_test(p, n, p0)withprop_test(succ, n)using Clojure’s2 * sqrt(n+1) * ((succ+1)/(n+1) - 0.5)formula, and update call sites to pass raw counts. - Update vectorized proportion test to accept success counts and match the same formula (with
n=0handled as “no data” → 0.0). - Expand/adjust discrepancy + unit tests and update parity plan/journal docs to reflect D5 completion.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
delphi/polismath/pca_kmeans_rep/repness.py |
Replaces scalar/vectorized prop_test with the Clojure formula and updates internal call sites to pass raw counts. |
delphi/tests/test_discrepancy_fixes.py |
Removes D5 xfail, adds multiple formula test cases + a blob-injection check against Clojure blob p-test values. |
delphi/tests/test_repness_unit.py |
Updates unit tests to use the new prop_test(succ, n) / vectorized signature and expected formula outputs. |
delphi/tests/test_old_format_repness.py |
Updates old-format wrapper tests for the new prop_test signature/formula. |
delphi/docs/PLAN_DISCREPANCY_FIXES.md |
Marks D5 as DONE and adds a PR cross-reference entry. |
delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md |
Adds/updates the D5 TDD journal entry describing the change and its rationale. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| Returns: | ||
| Z-score | ||
| Z-score (positive means succ/n > 0.5) |
Comment on lines
+48
to
+51
| # 70 successes out of 100: 2*sqrt(101)*((71/101)-0.5) = ~4.19 | ||
| assert np.isclose(prop_test(70, 100), | ||
| 2 * math.sqrt(101) * (71/101 - 0.5), atol=0.01) | ||
| # 10 successes out of 50: 2*sqrt(51)*((11/51)-0.5) = ~-4.29 |
…eudocount) ## Summary Replace Python's standard one-proportion z-test `prop_test(p, n, p0)` with Clojure's Wilson-score-like formula `prop_test(succ, n)` from `stats.clj:10-15`: ``` 2 * sqrt(n+1) * ((succ+1)/(n+1) - 0.5) ``` The Clojure formula has a built-in +1 pseudocount (Laplace smoothing / Beta(1,1) prior) that regularizes extreme values for small Polis groups. This is separate from the `PSEUDO_COUNT=2.0` used for `pa`/`pd` estimation (Beta(2,2) prior): - `pa = (na + 1) / (ns + 2)` — Beta(2,2) prior for probability estimation - `pat = 2 * sqrt(ns+1) * ((na+1)/(ns+1) - 0.5)` — Beta(1,1) prior for significance testing **What changed in the output**: `pat`, `pdt` values (proportion test z-scores), and downstream `agree_metric` / `disagree_metric` values. The z-scores are now slightly different due to `sqrt(n+1)` vs `sqrt(n)` and `(succ+1)/(n+1)` vs `(na+1)/(n+2)` denominators. ## Changes - `repness.py`: `prop_test(p, n, p0)` → `prop_test(succ, n)` with Clojure formula - `repness.py`: `prop_test_vectorized(p, n, p0)` → `prop_test_vectorized(succ, n)` - `repness.py`: Callers updated to pass raw counts `(na, ns)` instead of `(pa, ns, 0.5)` - `test_discrepancy_fixes.py`: Removed xfail from D5 formula test, added 8 test cases + edge case - `test_repness_unit.py`, `test_old_format_repness.py`: Updated for new signature - Golden snapshots re-recorded for all datasets ## Test plan - [x] D5 formula tests pass (8 input pairs + edge cases) - [x] D5 Clojure blob consistency check passes (all datasets) - [x] Full test suite passes (public + private, 19/19 regression tests) - [x] Only pre-existing failure: pakistan-incremental D2 (unrelated) 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Squashed commits - RED: add D5 blob injection test (prop_test vs Clojure p-test values) - Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) - Update plan and journal: mark D5 as done - Plan: add D5 PR number and stack position to cross-reference commit-id:48b77ba3
Delphi Coverage Report
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replace Python's standard one-proportion z-test
prop_test(p, n, p0)withClojure's Wilson-score-like formula
prop_test(succ, n)fromstats.clj:10-15:The Clojure formula has a built-in +1 pseudocount (Laplace smoothing / Beta(1,1)
prior) that regularizes extreme values for small Polis groups. This is separate
from the
PSEUDO_COUNT=2.0used forpa/pdestimation (Beta(2,2) prior):pa = (na + 1) / (ns + 2)— Beta(2,2) prior for probability estimationpat = 2 * sqrt(ns+1) * ((na+1)/(ns+1) - 0.5)— Beta(1,1) prior for significance testingWhat changed in the output:
pat,pdtvalues (proportion test z-scores),and downstream
agree_metric/disagree_metricvalues. The z-scores arenow slightly different due to
sqrt(n+1)vssqrt(n)and(succ+1)/(n+1)vs(na+1)/(n+2)denominators.Changes
repness.py:prop_test(p, n, p0)→prop_test(succ, n)with Clojure formularepness.py:prop_test_vectorized(p, n, p0)→prop_test_vectorized(succ, n)repness.py: Callers updated to pass raw counts(na, ns)instead of(pa, ns, 0.5)test_discrepancy_fixes.py: Removed xfail from D5 formula test, added 8 test cases + edge casetest_repness_unit.py,test_old_format_repness.py: Updated for new signatureTest plan
🤖 Generated with Claude Code
Squashed commits
commit-id:48b77ba3
Stack: