Fix D6: match Clojure two-proportion test formula (+1 pseudocount) by jucor · Pull Request #2520 · compdemocracy/polis

jucor · 2026-03-30T22:25:19Z

Summary

The Python two_prop_test used a standard two-proportion z-test with no pseudocounts,
while Clojure's stats/two-prop-test (stats.clj:18-33) adds +1 to all four inputs
(succ-in, succ-out, pop-in, pop-out) via (map inc ...) before computing
the pooled z-test. This Laplace smoothing regularizes z-scores for small group sizes,
which are common in Polis conversations.

Changes

Signature change: two_prop_test(p1, n1, p2, n2) (proportions) →
two_prop_test(succ_in, succ_out, pop_in, pop_out) (raw counts)
Formula: Standard pooled z-test on pseudocount-adjusted values:
pi1 = (succ_in+1)/(pop_in+1), pi_hat = (s1+s2)/(p1+p2)
Callers updated: Both scalar (add_comparative_stats) and vectorized
(compute_group_comment_stats_df) now pass raw counts matching Clojure's
(stats/two-prop-test (:na in-stats) (sum :na rest-stats) (:ns in-stats) (sum :ns rest-stats))
(repness.clj:97-100)

Affected output fields

rat (agree representativeness test z-score)
rdt (disagree representativeness test z-score)
agree_metric, disagree_metric (downstream of rat/rdt)

Test plan

Targeted D6 tests pass (formula, edge cases, regularization effect)
Full test suite passes (excluding DynamoDB/MinIO tests)
Private dataset tests pass (--include-local)
Golden snapshots re-recorded for all 7 datasets

🤖 Generated with Claude Code

Squashed commits

RED: add D6 blob injection test (two_prop_test vs Clojure repness-test)
Fix D6: match Clojure two-proportion test formula (+1 pseudocount)
Plan: add D6 PR number and stack position to cross-reference

commit-id:23c03d70

Stack:

⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

Copilot

Pull request overview

This PR updates Delphi’s representativeness z-score computation to match the Clojure stats/two-prop-test behavior by switching two_prop_test (and its vectorized variant) from proportion inputs to raw count inputs and applying a +1 pseudocount to all four parameters.

Changes:

Changed two_prop_test / two_prop_test_vectorized signatures to accept raw counts (succ_in, succ_out, pop_in, pop_out) and implemented the +1 pseudocount pooled z-test formula.
Updated scalar and vectorized call sites to pass raw counts (na/nd/ns and “other” counts) instead of proportions.
Updated and expanded unit + discrepancy tests and documentation to reflect the new signature and intended Clojure parity.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
`delphi/polismath/pca_kmeans_rep/repness.py`	Implements the new count-based two-proportion test (scalar + vectorized) and updates callers.
`delphi/tests/test_repness_unit.py`	Updates unit tests for the new `two_prop_test` / vectorized signature and expectations.
`delphi/tests/test_old_format_repness.py`	Updates old-format API tests for the new `two_prop_test` signature.
`delphi/tests/test_discrepancy_fixes.py`	Rewrites D6 tests to assert parity with a Clojure-reference implementation and adds blob-injection coverage.
`delphi/docs/PLAN_DISCREPANCY_FIXES.md`	Marks D6 as done in the discrepancy plan.
`delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md`	Adds a journal entry documenting the D6 parity work and test updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    if pop_in == 0 or pop_out == 0:
        return 0.0


+    # Handle edge cases: pop_in=0 or pop_out=0 → 0, pi_hat=1 → 0
+    z = z.where((pop_in > 0) & (pop_out > 0), 0.0)


+        # Edge cases: pop=0 → 0
+        assert two_prop_test(5, 5, 0, 100) == 0.0
+        assert two_prop_test(5, 5, 100, 0) == 0.0


+        # Edge cases: pop=0 → 0
+        assert two_prop_test(5, 5, 0, 100) == 0.0
+        assert two_prop_test(5, 5, 100, 0) == 0.0




+        """Edge cases: n=0 should return 0, pi_hat=1 should return 0."""
+        # n=0 cases
+        check.equal(two_prop_test(5, 5, 0, 10), 0.0)
+        check.equal(two_prop_test(5, 5, 10, 0), 0.0)
+        # Both zero
+        check.equal(two_prop_test(0, 0, 0, 0), 0.0)
+


## Summary The Python `two_prop_test` used a standard two-proportion z-test with no pseudocounts, while Clojure's `stats/two-prop-test` (stats.clj:18-33) adds +1 to all four inputs (`succ-in`, `succ-out`, `pop-in`, `pop-out`) via `(map inc ...)` before computing the pooled z-test. This Laplace smoothing regularizes z-scores for small group sizes, which are common in Polis conversations. ## Changes - **Signature change**: `two_prop_test(p1, n1, p2, n2)` (proportions) → `two_prop_test(succ_in, succ_out, pop_in, pop_out)` (raw counts) - **Formula**: Standard pooled z-test on pseudocount-adjusted values: `pi1 = (succ_in+1)/(pop_in+1)`, `pi_hat = (s1+s2)/(p1+p2)` - **Callers updated**: Both scalar (`add_comparative_stats`) and vectorized (`compute_group_comment_stats_df`) now pass raw counts matching Clojure's `(stats/two-prop-test (:na in-stats) (sum :na rest-stats) (:ns in-stats) (sum :ns rest-stats))` (repness.clj:97-100) ## Affected output fields - `rat` (agree representativeness test z-score) - `rdt` (disagree representativeness test z-score) - `agree_metric`, `disagree_metric` (downstream of rat/rdt) ## Test plan - [x] Targeted D6 tests pass (formula, edge cases, regularization effect) - [x] Full test suite passes (excluding DynamoDB/MinIO tests) - [x] Private dataset tests pass (--include-local) - [x] Golden snapshots re-recorded for all 7 datasets 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Squashed commits - RED: add D6 blob injection test (two_prop_test vs Clojure repness-test) - Fix D6: match Clojure two-proportion test formula (+1 pseudocount) - Plan: add D6 PR number and stack position to cross-reference commit-id:23c03d70

github-actions · 2026-05-19T22:21:38Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1107	320	71%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	312	34	89%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	135	27	80%
regression/recorder.py	36	27	25%
regression/utils.py	138	94	32%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	53	53%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	518	11%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10785	7616	29%

jucor changed the title ~~Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ [Stack 13/17] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 30, 2026

jucor force-pushed the spr/edge/48b77ba3 branch from f593bae to cd39374 Compare March 30, 2026 22:39

jucor force-pushed the spr/edge/23c03d70 branch 3 times, most recently from be320d5 to c30ab91 Compare March 31, 2026 00:35

ballPointPenguin approved these changes Apr 26, 2026

View reviewed changes

jucor requested a review from Copilot May 19, 2026 21:44

Copilot started reviewing on behalf of jucor May 19, 2026 21:44 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

jucor changed the title ~~[Stack 13/17] Fix D6: match Clojure two-proportion test formula (+1 pseudocount)~~ Fix D6: match Clojure two-proportion test formula (+1 pseudocount) May 19, 2026

jucor force-pushed the spr/edge/23c03d70 branch from c30ab91 to 5aa8b27 Compare May 19, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix D6: match Clojure two-proportion test formula (+1 pseudocount)#2520

Fix D6: match Clojure two-proportion test formula (+1 pseudocount)#2520
jucor wants to merge 1 commit into
spr/edge/48b77ba3from
spr/edge/23c03d70

jucor commented Mar 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		# Handle edge cases: pop_in=0 or pop_out=0 → 0, pi_hat=1 → 0
		z = z.where((pop_in > 0) & (pop_out > 0), 0.0)

Conversation

jucor commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Affected output fields

Test plan

Squashed commits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented May 19, 2026

Delphi Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jucor commented Mar 30, 2026 •

edited

Loading