[Stack 16/27] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) by jucor · Pull Request #2448 · compdemocracy/polis

jucor · 2026-03-16T15:30:44Z

Summary

Stacked on #2446 (Fix D9: z-score thresholds from two-tailed to one-tailed). Please review and merge #2446 first.
Next in stack: #2449 (Fix D6: match Clojure two-proportion test formula (+1 pseudocount))

Replace Python's standard one-proportion z-test prop_test(p, n, p0) with
Clojure's Wilson-score-like formula prop_test(succ, n) from stats.clj:10-15:

2 * sqrt(n+1) * ((succ+1)/(n+1) - 0.5)

The Clojure formula has a built-in +1 pseudocount (Laplace smoothing / Beta(1,1)
prior) that regularizes extreme values for small Polis groups. This is separate
from the PSEUDO_COUNT=2.0 used for pa/pd estimation (Beta(2,2) prior):

pa = (na + 1) / (ns + 2) — Beta(2,2) prior for probability estimation
pat = 2 * sqrt(ns+1) * ((na+1)/(ns+1) - 0.5) — Beta(1,1) prior for significance testing

What changed in the output: pat, pdt values (proportion test z-scores),
and downstream agree_metric / disagree_metric values. The z-scores are
now slightly different due to sqrt(n+1) vs sqrt(n) and (succ+1)/(n+1) vs
(na+1)/(n+2) denominators.

Changes

repness.py: prop_test(p, n, p0) → prop_test(succ, n) with Clojure formula
repness.py: prop_test_vectorized(p, n, p0) → prop_test_vectorized(succ, n)
repness.py: Callers updated to pass raw counts (na, ns) instead of (pa, ns, 0.5)
test_discrepancy_fixes.py: Removed xfail from D5 formula test, added 8 test cases + edge case
test_repness_unit.py, test_old_format_repness.py: Updated for new signature
Golden snapshots re-recorded for all datasets

Test plan

D5 formula tests pass (8 input pairs + edge cases)
D5 Clojure blob consistency check passes (all datasets)
Full test suite passes (public + private, 19/19 regression tests)
Only pre-existing failure: pakistan-incremental D2 (unrelated)

🤖 Generated with Claude Code

Copilot

Pull request overview

Updates Delphi’s representativeness (“repness”) proportion-test implementation to match the Clojure reference, aligning downstream pat/pdt (and derived agree/disagree metrics) with the parity plan.

Changes:

Replaced the Python one-proportion z-test with Clojure’s Wilson-score-like prop_test(succ, n) formula (with +1 pseudocount) and updated vectorized equivalent.
Updated repness callers to pass raw counts (na/nd, ns) rather than smoothed proportions.
Refreshed and expanded tests (including removing the D5 xfail) and updated docs/journal plan status.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`delphi/polismath/pca_kmeans_rep/repness.py`	Implements new `prop_test(succ, n)` + vectorized formula and updates callers to use raw counts.
`delphi/tests/test_discrepancy_fixes.py`	Enables D5 parity assertions (removes xfail) and expands coverage across multiple cases/edges.
`delphi/tests/test_repness_unit.py`	Updates unit + vectorized tests for the new prop-test signature/formula.
`delphi/tests/test_old_format_repness.py`	Updates backwards-compatible interface tests for the new prop-test signature.
`delphi/docs/PLAN_DISCREPANCY_FIXES.md`	Marks D5 / PR 4 as DONE in the discrepancy plan table.
`delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md`	Adds PR4 journal entry describing the D5 change, rationale, and validation steps.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+
    Returns:
-        Z-score
+        Z-score (positive means succ/n > 0.5)


+        # 70 successes out of 100: 2*sqrt(101)*((71/101)-0.5) = ~4.19
+        assert np.isclose(prop_test(70, 100),
+                          2 * math.sqrt(101) * (71/101 - 0.5), atol=0.01)
+        # 10 successes out of 50: 2*sqrt(51)*((11/51)-0.5) = ~-4.29


github-actions · 2026-03-30T13:26:28Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1107	320	71%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	297	38	87%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	135	27	80%
regression/recorder.py	36	27	25%
regression/utils.py	138	87	37%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	53	53%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	518	11%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10770	7613	29%

Copilot

Pull request overview

Copilot reviewed 6 out of 8 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T16:29:39Z

+
    Returns:
-        Z-score
+        Z-score (positive means succ/n > 0.5)


The return-value description is slightly inaccurate with the +1 pseudocount: the sign is determined by (succ+1)/(n+1) relative to 0.5 (so e.g. succ==n/2 yields a positive value for n>0). Consider rewording to avoid implying it’s based on the raw succ/n proportion.

Suggested change

Z-score (positive means succ/n > 0.5)

Z-score (sign determined by (succ + 1) / (n + 1) relative to 0.5; positive when (succ + 1) / (n + 1) > 0.5)

Extracts n-success and n-trials from every repness entry in the Clojure math blob and feeds them to Python's prop_test(). Compares output to the blob's p-test value — the ground truth oracle. Fails because Python's current prop_test uses the old formula (standard z-test) which produces different values than Clojure's Wilson-score-like formula with built-in +1 pseudocount. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…eudocount) Replace Python's standard z-test prop_test(p, n, p0) with Clojure's formula prop_test(succ, n) = 2*sqrt(n+1)*((succ+1)/(n+1) - 0.5). The Clojure formula (stats.clj:10-15) has a built-in +1 pseudocount (Laplace smoothing / Beta(1,1) prior) that regularizes extreme values for small Polis groups. This is separate from the PSEUDO_COUNT=2.0 used for pa/pd estimation (Beta(2,2) prior). Changes: - prop_test: signature (p, n, p0) → (succ, n), Clojure formula - prop_test_vectorized: same signature change - comment_stats / compute_group_comment_stats_df: pass raw counts (na, ns) / (nd, ns) instead of (pa, ns, 0.5) - Tests updated for new signature and expected values - Golden snapshots re-recorded (pat/pdt/metric values changed) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jucor · 2026-03-30T22:54:41Z

Superseded by spr-managed PR stack. See the new stack starting at #2508.

jucor requested a review from Copilot March 16, 2026 15:31

Copilot started reviewing on behalf of jucor March 16, 2026 15:31 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

jucor changed the title ~~Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 14/15] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 16, 2026

jucor force-pushed the jc/clj-parity-d9-fix branch from db36889 to 69350d5 Compare March 16, 2026 16:04

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 25a1a19 to 3232cc6 Compare March 16, 2026 16:05

jucor changed the title ~~[Stack 14/15] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 14/16] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 16, 2026

jucor force-pushed the jc/clj-parity-d9-fix branch from 69350d5 to 382de2f Compare March 16, 2026 18:06

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 3232cc6 to 82f1048 Compare March 16, 2026 18:06

jucor changed the title ~~[Stack 14/16] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 14/17] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 16, 2026

jucor marked this pull request as draft March 17, 2026 10:35

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 82f1048 to 1dbd17f Compare March 17, 2026 16:10

jucor changed the title ~~[Stack 14/17] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 14/24] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 17, 2026

jucor changed the title ~~[Stack 14/24] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 14/25] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 17, 2026

This was referenced Mar 17, 2026

[Stack 15/27] Fix D9: z-score thresholds from two-tailed to one-tailed #2446

Closed

[Stack 17/27] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) #2449

Closed

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 6cb475f to fe09dd8 Compare March 18, 2026 19:06

jucor force-pushed the jc/clj-parity-d9-fix branch from f8a7007 to 19cce44 Compare March 19, 2026 10:03

jucor force-pushed the jc/clj-parity-d5-prop-test branch from fe09dd8 to fe2b127 Compare March 19, 2026 10:04

jucor force-pushed the jc/clj-parity-d9-fix branch from 19cce44 to bf2dd99 Compare March 19, 2026 10:43

jucor force-pushed the jc/clj-parity-d5-prop-test branch from fe2b127 to 0a3752c Compare March 19, 2026 10:43

jucor changed the title ~~[Stack 14/25] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 13/24] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 19, 2026

jucor force-pushed the jc/clj-parity-d9-fix branch from bf2dd99 to f8c5793 Compare March 19, 2026 12:31

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 0a3752c to a511b52 Compare March 19, 2026 12:31

jucor force-pushed the jc/clj-parity-d9-fix branch from f8c5793 to 7f733c1 Compare March 19, 2026 14:52

jucor force-pushed the jc/clj-parity-d5-prop-test branch from a511b52 to 4b6c485 Compare March 19, 2026 14:52

jucor changed the title ~~[Stack 13/24] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 14/25] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 19, 2026

jucor force-pushed the jc/clj-parity-d9-fix branch from 7f733c1 to c920b61 Compare March 23, 2026 15:11

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 4b6c485 to 140f87f Compare March 23, 2026 15:13

jucor force-pushed the jc/clj-parity-d9-fix branch from c920b61 to e538293 Compare March 23, 2026 15:41

jucor force-pushed the jc/clj-parity-d9-fix branch 2 times, most recently from 34fc9ce to ee798a6 Compare March 27, 2026 01:15

jucor force-pushed the jc/clj-parity-d5-prop-test branch 2 times, most recently from 6e59c6c to c8a91ac Compare March 27, 2026 01:53

jucor force-pushed the jc/clj-parity-d9-fix branch from ee798a6 to 6e54a9c Compare March 27, 2026 01:53

jucor force-pushed the jc/clj-parity-d5-prop-test branch from c8a91ac to f41dfb8 Compare March 27, 2026 02:10

jucor force-pushed the jc/clj-parity-d9-fix branch from 09747ea to 8d94246 Compare March 27, 2026 10:41

jucor force-pushed the jc/clj-parity-d5-prop-test branch from f41dfb8 to 3526ab6 Compare March 27, 2026 10:41

jucor changed the title ~~[Stack 14/25] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 15/26] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 30, 2026

jucor force-pushed the jc/clj-parity-d9-fix branch from 8d94246 to 9397ddf Compare March 30, 2026 12:48

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 3526ab6 to 27da50e Compare March 30, 2026 12:48

jucor changed the title ~~[Stack 15/26] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)~~ [Stack 16/27] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 30, 2026

jucor force-pushed the jc/clj-parity-d9-fix branch from 9397ddf to e96a1f7 Compare March 30, 2026 12:54

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 27da50e to 7d16f0e Compare March 30, 2026 12:54

jucor requested a review from Copilot March 30, 2026 16:25

Copilot started reviewing on behalf of jucor March 30, 2026 16:26 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 7d16f0e to 618a693 Compare March 30, 2026 16:49

jucor force-pushed the jc/clj-parity-d9-fix branch from e96a1f7 to 574c169 Compare March 30, 2026 16:49

jucor and others added 4 commits March 30, 2026 18:04

Update plan and journal: mark D5 as done

15d994b

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Plan: add D5 PR number and stack position to cross-reference

de83253

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 618a693 to de83253 Compare March 30, 2026 17:05

jucor force-pushed the jc/clj-parity-d9-fix branch from 574c169 to b64cae8 Compare March 30, 2026 17:05

This was referenced Mar 30, 2026

IGNORE -- crash from spr #2500

Closed

IGNORE -- crash from spr #2502

Closed

jucor closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack 16/27] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)#2448

[Stack 16/27] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)#2448
jucor wants to merge 4 commits into
jc/clj-parity-d9-fixfrom
jc/clj-parity-d5-prop-test

jucor commented Mar 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	Z-score (positive means succ/n > 0.5)
	Z-score (sign determined by (succ + 1) / (n + 1) relative to 0.5; positive when (succ + 1) / (n + 1) > 0.5)

Conversation

jucor commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Mar 30, 2026

Delphi Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Mar 16, 2026 •

edited

Loading