Skip to content

Fix D6: match Clojure two-proportion test formula (+1 pseudocount)#2520

Open
jucor wants to merge 1 commit into
spr/edge/48b77ba3from
spr/edge/23c03d70
Open

Fix D6: match Clojure two-proportion test formula (+1 pseudocount)#2520
jucor wants to merge 1 commit into
spr/edge/48b77ba3from
spr/edge/23c03d70

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 30, 2026

Summary

The Python two_prop_test used a standard two-proportion z-test with no pseudocounts,
while Clojure's stats/two-prop-test (stats.clj:18-33) adds +1 to all four inputs
(succ-in, succ-out, pop-in, pop-out) via (map inc ...) before computing
the pooled z-test. This Laplace smoothing regularizes z-scores for small group sizes,
which are common in Polis conversations.

Changes

  • Signature change: two_prop_test(p1, n1, p2, n2) (proportions) →
    two_prop_test(succ_in, succ_out, pop_in, pop_out) (raw counts)
  • Formula: Standard pooled z-test on pseudocount-adjusted values:
    pi1 = (succ_in+1)/(pop_in+1), pi_hat = (s1+s2)/(p1+p2)
  • Callers updated: Both scalar (add_comparative_stats) and vectorized
    (compute_group_comment_stats_df) now pass raw counts matching Clojure's
    (stats/two-prop-test (:na in-stats) (sum :na rest-stats) (:ns in-stats) (sum :ns rest-stats))
    (repness.clj:97-100)

Affected output fields

  • rat (agree representativeness test z-score)
  • rdt (disagree representativeness test z-score)
  • agree_metric, disagree_metric (downstream of rat/rdt)

Test plan

  • Targeted D6 tests pass (formula, edge cases, regularization effect)
  • Full test suite passes (excluding DynamoDB/MinIO tests)
  • Private dataset tests pass (--include-local)
  • Golden snapshots re-recorded for all 7 datasets

🤖 Generated with Claude Code

Squashed commits

  • RED: add D6 blob injection test (two_prop_test vs Clojure repness-test)
  • Fix D6: match Clojure two-proportion test formula (+1 pseudocount)
  • Plan: add D6 PR number and stack position to cross-reference

commit-id:23c03d70


Stack:


⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

@jucor jucor changed the title Fix D6: match Clojure two-proportion test formula (+1 pseudocount) [Stack 13/17] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Mar 30, 2026
@jucor jucor force-pushed the spr/edge/48b77ba3 branch from f593bae to cd39374 Compare March 30, 2026 22:39
@jucor jucor force-pushed the spr/edge/23c03d70 branch 3 times, most recently from be320d5 to c30ab91 Compare March 31, 2026 00:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates Delphi’s representativeness z-score computation to match the Clojure stats/two-prop-test behavior by switching two_prop_test (and its vectorized variant) from proportion inputs to raw count inputs and applying a +1 pseudocount to all four parameters.

Changes:

  • Changed two_prop_test / two_prop_test_vectorized signatures to accept raw counts (succ_in, succ_out, pop_in, pop_out) and implemented the +1 pseudocount pooled z-test formula.
  • Updated scalar and vectorized call sites to pass raw counts (na/nd/ns and “other” counts) instead of proportions.
  • Updated and expanded unit + discrepancy tests and documentation to reflect the new signature and intended Clojure parity.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
delphi/polismath/pca_kmeans_rep/repness.py Implements the new count-based two-proportion test (scalar + vectorized) and updates callers.
delphi/tests/test_repness_unit.py Updates unit tests for the new two_prop_test / vectorized signature and expectations.
delphi/tests/test_old_format_repness.py Updates old-format API tests for the new two_prop_test signature.
delphi/tests/test_discrepancy_fixes.py Rewrites D6 tests to assert parity with a Clojure-reference implementation and adds blob-injection coverage.
delphi/docs/PLAN_DISCREPANCY_FIXES.md Marks D6 as done in the discrepancy plan.
delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md Adds a journal entry documenting the D6 parity work and test updates.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +121 to 122
if pop_in == 0 or pop_out == 0:
return 0.0
Comment on lines +546 to +547
# Handle edge cases: pop_in=0 or pop_out=0 → 0, pi_hat=1 → 0
z = z.where((pop_in > 0) & (pop_out > 0), 0.0)
Comment on lines +73 to +75
# Edge cases: pop=0 → 0
assert two_prop_test(5, 5, 0, 100) == 0.0
assert two_prop_test(5, 5, 100, 0) == 0.0
Comment on lines +67 to 71
# Edge cases: pop=0 → 0
assert two_prop_test(5, 5, 0, 100) == 0.0
assert two_prop_test(5, 5, 100, 0) == 0.0


Comment on lines +835 to +841
"""Edge cases: n=0 should return 0, pi_hat=1 should return 0."""
# n=0 cases
check.equal(two_prop_test(5, 5, 0, 10), 0.0)
check.equal(two_prop_test(5, 5, 10, 0), 0.0)
# Both zero
check.equal(two_prop_test(0, 0, 0, 0), 0.0)

## Summary


The Python `two_prop_test` used a standard two-proportion z-test with no pseudocounts,
while Clojure's `stats/two-prop-test` (stats.clj:18-33) adds +1 to all four inputs
(`succ-in`, `succ-out`, `pop-in`, `pop-out`) via `(map inc ...)` before computing
the pooled z-test. This Laplace smoothing regularizes z-scores for small group sizes,
which are common in Polis conversations.

## Changes
- **Signature change**: `two_prop_test(p1, n1, p2, n2)` (proportions) →
  `two_prop_test(succ_in, succ_out, pop_in, pop_out)` (raw counts)
- **Formula**: Standard pooled z-test on pseudocount-adjusted values:
  `pi1 = (succ_in+1)/(pop_in+1)`, `pi_hat = (s1+s2)/(p1+p2)`
- **Callers updated**: Both scalar (`add_comparative_stats`) and vectorized
  (`compute_group_comment_stats_df`) now pass raw counts matching Clojure's
  `(stats/two-prop-test (:na in-stats) (sum :na rest-stats) (:ns in-stats) (sum :ns rest-stats))`
  (repness.clj:97-100)

## Affected output fields
- `rat` (agree representativeness test z-score)
- `rdt` (disagree representativeness test z-score)
- `agree_metric`, `disagree_metric` (downstream of rat/rdt)

## Test plan
- [x] Targeted D6 tests pass (formula, edge cases, regularization effect)
- [x] Full test suite passes (excluding DynamoDB/MinIO tests)
- [x] Private dataset tests pass (--include-local)
- [x] Golden snapshots re-recorded for all 7 datasets

🤖 Generated with [Claude Code](https://claude.com/claude-code)


## Squashed commits

- RED: add D6 blob injection test (two_prop_test vs Clojure repness-test)
- Fix D6: match Clojure two-proportion test formula (+1 pseudocount)
- Plan: add D6 PR number and stack position to cross-reference

commit-id:23c03d70
@jucor jucor changed the title [Stack 13/17] Fix D6: match Clojure two-proportion test formula (+1 pseudocount) Fix D6: match Clojure two-proportion test formula (+1 pseudocount) May 19, 2026
@jucor jucor force-pushed the spr/edge/23c03d70 branch from c30ab91 to 5aa8b27 Compare May 19, 2026 22:09
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1107 320 71%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 312 34 89%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 135 27 80%
regression/recorder.py 36 27 25%
regression/utils.py 138 94 32%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 53 53%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 518 11%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10785 7616 29%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants