[Stack 17/27] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) by jucor · Pull Request #2449 · compdemocracy/polis

jucor · 2026-03-16T15:48:34Z

Summary

Stacked on #2448 (Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)). Please review and merge #2448 first.
Next in stack: #2450 (Fix D7: match Clojure repness metric formula (product of 4 signed values))

The Python two_prop_test used a standard two-proportion z-test with no pseudocounts,
while Clojure's stats/two-prop-test (stats.clj:18-33) adds +1 to all four inputs
(succ-in, succ-out, pop-in, pop-out) via (map inc ...) before computing
the pooled z-test. This Laplace smoothing regularizes z-scores for small group sizes,
which are common in Polis conversations.

Changes

Signature change: two_prop_test(p1, n1, p2, n2) (proportions) →
two_prop_test(succ_in, succ_out, pop_in, pop_out) (raw counts)
Formula: Standard pooled z-test on pseudocount-adjusted values:
pi1 = (succ_in+1)/(pop_in+1), pi_hat = (s1+s2)/(p1+p2)
Callers updated: Both scalar (add_comparative_stats) and vectorized
(compute_group_comment_stats_df) now pass raw counts matching Clojure's
(stats/two-prop-test (:na in-stats) (sum :na rest-stats) (:ns in-stats) (sum :ns rest-stats))
(repness.clj:97-100)

Affected output fields

rat (agree representativeness test z-score)
rdt (disagree representativeness test z-score)
agree_metric, disagree_metric (downstream of rat/rdt)

Test plan

Targeted D6 tests pass (formula, edge cases, regularization effect)
Full test suite passes (excluding DynamoDB/MinIO tests)
Private dataset tests pass (--include-local)
Golden snapshots re-recorded for all 7 datasets

🤖 Generated with Claude Code

Copilot

Pull request overview

Aligns Delphi’s Python representativeness scoring with the Clojure implementation for discrepancy D6 by changing the two-proportion z-test to use Laplace-style +1 pseudocounts on raw count inputs.

Changes:

Changed two_prop_test / two_prop_test_vectorized to accept raw counts and apply +1 pseudocount to all four inputs (Clojure parity).
Updated key call sites (add_comparative_stats, compute_group_comment_stats_df) to pass counts instead of proportions.
Updated/expanded unit and discrepancy tests and re-recorded golden snapshots reflecting new rat/rdt-derived outputs.

Reviewed changes

Copilot reviewed 6 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`delphi/polismath/pca_kmeans_rep/repness.py`	Reworks scalar + vectorized two-prop test API/formula and updates callers to pass raw counts.
`delphi/tests/test_repness_unit.py`	Updates unit tests for the new two-prop test signature and expected values.
`delphi/tests/test_old_format_repness.py`	Updates old-format compatibility tests for new two-prop test signature.
`delphi/tests/test_discrepancy_fixes.py`	Rewrites D6 parity tests with a reference implementation and adds additional coverage.
`delphi/real_data/r4tykwac8thvzv35jrn53-biodiversity/golden_snapshot.json`	Refreshes golden snapshot outputs impacted by the new z-score computation.
`delphi/docs/PLAN_DISCREPANCY_FIXES.md`	Marks D6 as completed and adds PR mapping entry.
`delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md`	Documents D6 work, rationale, and test outcomes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    p1 = pop_in + 1
+    p2 = pop_out + 1
+
+    pi1 = s1 / p1
+    pi2 = s2 / p2
+    pi_hat = (s1 + s2) / (p1 + p2)
+
+    if pi_hat == 1.0:
+        # Clojure note (stats.clj:26-27): "this isn't quite right... could
+        # actually solve this using limits" — returning 0 for now, matching Clojure.
+        return 0.0
+
+    se = math.sqrt(pi_hat * (1 - pi_hat) * (1/p1 + 1/p2))


+    # Add +1 pseudocount to all four inputs (Clojure: map inc)
+    s1 = succ_in + 1
+    s2 = succ_out + 1
+    p1 = pop_in + 1
+    p2 = pop_out + 1

-    # Standard error
-    se = np.sqrt(p_pooled * (1 - p_pooled) * (1/n1 + 1/n2))
+    pi1 = s1 / p1
+    pi2 = s2 / p2
+    pi_hat = (s1 + s2) / (p1 + p2)

-    # Z-score calculation
-    z = (p1 - p2) / se
+    se = np.sqrt(pi_hat * (1 - pi_hat) * (1/p1 + 1/p2))


+        # With small n, the +1 pseudocount has a large effect
+        # succ=1, pop=1 → without pseudocount: p=1.0 (extreme)
+        # With pseudocount: (1+1)/(1+1) = 1.0, but denominator also shifts


github-actions · 2026-03-30T13:56:05Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1107	320	71%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	312	34	89%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	135	27	80%
regression/recorder.py	36	27	25%
regression/utils.py	138	87	37%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	53	53%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	518	11%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10785	7609	29%

Copilot

Pull request overview

Copilot reviewed 6 out of 8 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T16:29:35Z

+    if pop_in == 0 or pop_out == 0:
        return 0.0


two_prop_test() claims to match Clojure's stats/two-prop-test, but it returns 0.0 when pop_in==0 or pop_out==0. Clojure does not special-case zero populations; it increments pop-in/pop-out (so division-by-zero is avoided) and can yield a non-zero z-score when one side has no votes. Either remove this early return for true parity, or update the docstring/expected behavior to explicitly document this intentional deviation (and consider validating succ_* <= pop_* instead).

Copilot · 2026-03-30T16:29:36Z

+    # Handle edge cases: pop_in=0 or pop_out=0 → 0, pi_hat=1 → 0
+    z = z.where((pop_in > 0) & (pop_out > 0), 0.0)
    z = z.fillna(0.0)
    z = z.replace([np.inf, -np.inf], 0.0)


two_prop_test_vectorized() zeroes results when pop_in==0 or pop_out==0, which diverges from the referenced Clojure implementation (it applies +1 pseudocounts and still computes the statistic). If the goal is Clojure parity, drop this mask and rely on the +1 adjustment plus inf/NaN handling; if the goal is to treat no-data rows as 0, please document that deviation clearly (and align scalar/vectorized behavior + tests accordingly).

Suggested change

# Handle edge cases: pop_in=0 or pop_out=0 → 0, pi_hat=1 → 0

z = z.where((pop_in > 0) & (pop_out > 0), 0.0)

z = z.fillna(0.0)

z = z.replace([np.inf, -np.inf], 0.0)

# Handle edge cases: pi_hat=0 or 1 → se=0 → inf/NaN; map these to 0

z = z.replace([np.inf, -np.inf], 0.0)

z = z.fillna(0.0)

Copilot · 2026-03-30T16:29:36Z

+        # Edge cases: pop=0 → 0
+        assert two_prop_test(5, 5, 0, 100) == 0.0
+        assert two_prop_test(5, 5, 100, 0) == 0.0


The “pop=0 → 0” edge case here uses inconsistent raw counts (succ_in=5 even though pop_in=0). In real usage succ_* should never exceed pop_* (and succ_in must be 0 if pop_in is 0), so this test isn’t exercising a realistic boundary. Consider changing these to consistent inputs (e.g., succ_in=0 when pop_in=0) or explicitly testing/expecting input validation behavior (raise or return 0 for invalid succ>pop).

Suggested change

# Edge cases: pop=0 → 0

assert two_prop_test(5, 5, 0, 100) == 0.0

assert two_prop_test(5, 5, 100, 0) == 0.0

# Edge cases: pop=0 → 0 (use consistent counts: succ_* must be 0 when pop_* is 0)

assert two_prop_test(0, 5, 0, 100) == 0.0

assert two_prop_test(5, 0, 100, 0) == 0.0

Copilot · 2026-03-30T16:29:36Z

+        assert two_prop_test(5, 5, 0, 100) == 0.0
+        assert two_prop_test(5, 5, 100, 0) == 0.0


Same issue as in test_repness_unit.py: these edge cases use impossible raw counts (succ_in=5 with pop_in=0). Since the function now takes raw counts, the tests should either use consistent counts (succ==0 when pop==0) or assert a defined behavior for invalid succ>pop inputs.

Suggested change

assert two_prop_test(5, 5, 0, 100) == 0.0

assert two_prop_test(5, 5, 100, 0) == 0.0

assert two_prop_test(0, 5, 0, 100) == 0.0

assert two_prop_test(5, 0, 100, 0) == 0.0

Also add D4 blob injection test (p-success pseudocount formula) D6: Reconstructs group-vs-other vote counts from group-votes blob data, feeds to two_prop_test(), compares to blob's repness-test. Fails because the old two_prop_test expects proportions, not raw counts. D4: Verifies (n_success+1)/(n_trials+2) matches blob's p-success. Already passes (PSEUDO_COUNT=2.0 was fixed in an earlier PR). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace standard two-proportion z-test with Clojure's version that adds +1 pseudocount to all four inputs (stats.clj:18-33). This Laplace smoothing regularizes z-scores for small group sizes common in Polis. Signature change: two_prop_test(p1, n1, p2, n2) taking proportions → two_prop_test(succ_in, succ_out, pop_in, pop_out) taking raw counts. Updated both scalar and vectorized versions plus all callers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jucor · 2026-03-30T22:54:43Z

Superseded by spr-managed PR stack. See the new stack starting at #2508.

jucor requested a review from Copilot March 16, 2026 15:49

Copilot started reviewing on behalf of jucor March 16, 2026 15:49 View session

Copilot AI reviewed Mar 16, 2026

View reviewed changes

jucor changed the title ~~Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 15/15] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 16, 2026

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 25a1a19 to 3232cc6 Compare March 16, 2026 16:05

jucor closed this Mar 16, 2026

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from d11cc4c to 3232cc6 Compare March 16, 2026 16:05

jucor reopened this Mar 16, 2026

jucor changed the title ~~[Stack 15/15] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 15/16] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 16, 2026

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 3232cc6 to 82f1048 Compare March 16, 2026 18:06

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from 5fd6423 to 1763fbd Compare March 16, 2026 18:08

jucor changed the title ~~[Stack 15/16] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 15/17] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 16, 2026

jucor marked this pull request as draft March 17, 2026 10:35

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from 1763fbd to 67298ed Compare March 17, 2026 16:10

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 82f1048 to 1dbd17f Compare March 17, 2026 16:10

jucor changed the title ~~[Stack 15/17] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 15/24] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 17, 2026

jucor changed the title ~~[Stack 15/24] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 15/25] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 17, 2026

This was referenced Mar 17, 2026

[Stack 16/27] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) #2448

Closed

[Stack 18/27] Fix D7: match Clojure repness metric formula (product of 4 signed values) #2450

Closed

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from 67298ed to de4485d Compare March 18, 2026 18:50

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 6cb475f to fe09dd8 Compare March 18, 2026 19:06

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from de4485d to cb7496c Compare March 18, 2026 19:06

jucor force-pushed the jc/clj-parity-d5-prop-test branch from fe09dd8 to fe2b127 Compare March 19, 2026 10:04

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from cb7496c to 68242c4 Compare March 19, 2026 10:08

jucor force-pushed the jc/clj-parity-d5-prop-test branch from fe2b127 to 0a3752c Compare March 19, 2026 10:43

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from 68242c4 to 96347d5 Compare March 19, 2026 10:44

jucor changed the title ~~[Stack 15/25] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 14/24] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 19, 2026

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 0a3752c to a511b52 Compare March 19, 2026 12:31

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from 96347d5 to 42aee66 Compare March 19, 2026 12:32

jucor force-pushed the jc/clj-parity-d5-prop-test branch from a511b52 to 4b6c485 Compare March 19, 2026 14:52

jucor force-pushed the jc/clj-parity-d5-prop-test branch from d0956ba to c8e60c0 Compare March 24, 2026 10:27

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from d8c8881 to 90c75e9 Compare March 24, 2026 10:27

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 867fcbe to e046d53 Compare March 24, 2026 11:08

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from d6b65aa to 7f94b38 Compare March 24, 2026 11:08

jucor force-pushed the jc/clj-parity-d5-prop-test branch from e046d53 to e50a3d8 Compare March 26, 2026 21:24

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from 7f94b38 to 9867450 Compare March 26, 2026 21:24

jucor force-pushed the jc/clj-parity-d5-prop-test branch from e50a3d8 to 6e59c6c Compare March 27, 2026 01:15

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch 2 times, most recently from 1308d91 to d4b8ef6 Compare March 27, 2026 01:53

jucor force-pushed the jc/clj-parity-d5-prop-test branch 2 times, most recently from c8a91ac to f41dfb8 Compare March 27, 2026 02:10

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch 2 times, most recently from 795026c to b1eec11 Compare March 27, 2026 10:41

jucor force-pushed the jc/clj-parity-d5-prop-test branch from f41dfb8 to 3526ab6 Compare March 27, 2026 10:41

jucor changed the title ~~[Stack 15/25] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 16/26] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 30, 2026

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 3526ab6 to 27da50e Compare March 30, 2026 12:48

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from b1eec11 to 109e60a Compare March 30, 2026 12:48

jucor changed the title ~~[Stack 16/26] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 17/27] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 30, 2026

jucor force-pushed the jc/clj-parity-d6-two-prop-test branch from 109e60a to d1605f1 Compare March 30, 2026 12:54

jucor requested a review from Copilot March 30, 2026 16:25

Copilot started reviewing on behalf of jucor March 30, 2026 16:26 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

jucor force-pushed the jc/clj-parity-d5-prop-test branch from 7d16f0e to 618a693 Compare March 30, 2026 16:49

jucor and others added 3 commits March 30, 2026 18:04

Plan: add D6 PR number and stack position to cross-reference

5f67ffe

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

This was referenced Mar 30, 2026

IGNORE -- crash from spr #2501

Closed

IGNORE -- crash from spr #2503

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack 17/27] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)#2449

[Stack 17/27] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)#2449
jucor wants to merge 3 commits into
jc/clj-parity-d5-prop-testfrom
jc/clj-parity-d6-two-prop-test

jucor commented Mar 16, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		assert two_prop_test(5, 5, 0, 100) == 0.0
		assert two_prop_test(5, 5, 100, 0) == 0.0

Conversation

jucor commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Affected output fields

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented Mar 30, 2026

Delphi Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Mar 16, 2026 •

edited

Loading