Skip to content

Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)#2519

Open
jucor wants to merge 1 commit into
spr/edge/0194003dfrom
spr/edge/48b77ba3
Open

Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)#2519
jucor wants to merge 1 commit into
spr/edge/0194003dfrom
spr/edge/48b77ba3

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 30, 2026

Summary

Replace Python's standard one-proportion z-test prop_test(p, n, p0) with
Clojure's Wilson-score-like formula prop_test(succ, n) from stats.clj:10-15:

2 * sqrt(n+1) * ((succ+1)/(n+1) - 0.5)

The Clojure formula has a built-in +1 pseudocount (Laplace smoothing / Beta(1,1)
prior) that regularizes extreme values for small Polis groups. This is separate
from the PSEUDO_COUNT=2.0 used for pa/pd estimation (Beta(2,2) prior):

  • pa = (na + 1) / (ns + 2) — Beta(2,2) prior for probability estimation
  • pat = 2 * sqrt(ns+1) * ((na+1)/(ns+1) - 0.5) — Beta(1,1) prior for significance testing

What changed in the output: pat, pdt values (proportion test z-scores),
and downstream agree_metric / disagree_metric values. The z-scores are
now slightly different due to sqrt(n+1) vs sqrt(n) and (succ+1)/(n+1) vs
(na+1)/(n+2) denominators.

Changes

  • repness.py: prop_test(p, n, p0)prop_test(succ, n) with Clojure formula
  • repness.py: prop_test_vectorized(p, n, p0)prop_test_vectorized(succ, n)
  • repness.py: Callers updated to pass raw counts (na, ns) instead of (pa, ns, 0.5)
  • test_discrepancy_fixes.py: Removed xfail from D5 formula test, added 8 test cases + edge case
  • test_repness_unit.py, test_old_format_repness.py: Updated for new signature
  • Golden snapshots re-recorded for all datasets

Test plan

  • D5 formula tests pass (8 input pairs + edge cases)
  • D5 Clojure blob consistency check passes (all datasets)
  • Full test suite passes (public + private, 19/19 regression tests)
  • Only pre-existing failure: pakistan-incremental D2 (unrelated)

🤖 Generated with Claude Code

Squashed commits

  • RED: add D5 blob injection test (prop_test vs Clojure p-test values)
  • Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)
  • Update plan and journal: mark D5 as done
  • Plan: add D5 PR number and stack position to cross-reference

commit-id:48b77ba3


Stack:


⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

@jucor jucor changed the title Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) [Stack 12/17] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Mar 30, 2026
@jucor jucor force-pushed the spr/edge/48b77ba3 branch 2 times, most recently from cd39374 to a387b9e Compare March 30, 2026 22:47
@jucor jucor force-pushed the spr/edge/0194003d branch from 24de40d to add1343 Compare March 31, 2026 00:35
@jucor jucor force-pushed the spr/edge/48b77ba3 branch from a387b9e to 956e3a8 Compare March 31, 2026 00:35
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Aligns Delphi’s Python representativeness pipeline with the Clojure implementation by replacing the one-proportion z-test with Clojure’s Wilson-score-like (+1 pseudocount) prop_test(succ, n) formula, and updates callers/tests accordingly to keep Python↔Clojure parity work unblocked.

Changes:

  • Replace prop_test(p, n, p0) with prop_test(succ, n) using Clojure’s 2 * sqrt(n+1) * ((succ+1)/(n+1) - 0.5) formula, and update call sites to pass raw counts.
  • Update vectorized proportion test to accept success counts and match the same formula (with n=0 handled as “no data” → 0.0).
  • Expand/adjust discrepancy + unit tests and update parity plan/journal docs to reflect D5 completion.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
delphi/polismath/pca_kmeans_rep/repness.py Replaces scalar/vectorized prop_test with the Clojure formula and updates internal call sites to pass raw counts.
delphi/tests/test_discrepancy_fixes.py Removes D5 xfail, adds multiple formula test cases + a blob-injection check against Clojure blob p-test values.
delphi/tests/test_repness_unit.py Updates unit tests to use the new prop_test(succ, n) / vectorized signature and expected formula outputs.
delphi/tests/test_old_format_repness.py Updates old-format wrapper tests for the new prop_test signature/formula.
delphi/docs/PLAN_DISCREPANCY_FIXES.md Marks D5 as DONE and adds a PR cross-reference entry.
delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md Adds/updates the D5 TDD journal entry describing the change and its rationale.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


Returns:
Z-score
Z-score (positive means succ/n > 0.5)
Comment on lines +48 to +51
# 70 successes out of 100: 2*sqrt(101)*((71/101)-0.5) = ~4.19
assert np.isclose(prop_test(70, 100),
2 * math.sqrt(101) * (71/101 - 0.5), atol=0.01)
# 10 successes out of 50: 2*sqrt(51)*((11/51)-0.5) = ~-4.29
…eudocount)

## Summary


Replace Python's standard one-proportion z-test `prop_test(p, n, p0)` with
Clojure's Wilson-score-like formula `prop_test(succ, n)` from `stats.clj:10-15`:

```
2 * sqrt(n+1) * ((succ+1)/(n+1) - 0.5)
```

The Clojure formula has a built-in +1 pseudocount (Laplace smoothing / Beta(1,1)
prior) that regularizes extreme values for small Polis groups. This is separate
from the `PSEUDO_COUNT=2.0` used for `pa`/`pd` estimation (Beta(2,2) prior):

- `pa = (na + 1) / (ns + 2)` — Beta(2,2) prior for probability estimation
- `pat = 2 * sqrt(ns+1) * ((na+1)/(ns+1) - 0.5)` — Beta(1,1) prior for significance testing

**What changed in the output**: `pat`, `pdt` values (proportion test z-scores),
and downstream `agree_metric` / `disagree_metric` values. The z-scores are
now slightly different due to `sqrt(n+1)` vs `sqrt(n)` and `(succ+1)/(n+1)` vs
`(na+1)/(n+2)` denominators.

## Changes
- `repness.py`: `prop_test(p, n, p0)` → `prop_test(succ, n)` with Clojure formula
- `repness.py`: `prop_test_vectorized(p, n, p0)` → `prop_test_vectorized(succ, n)`
- `repness.py`: Callers updated to pass raw counts `(na, ns)` instead of `(pa, ns, 0.5)`
- `test_discrepancy_fixes.py`: Removed xfail from D5 formula test, added 8 test cases + edge case
- `test_repness_unit.py`, `test_old_format_repness.py`: Updated for new signature
- Golden snapshots re-recorded for all datasets

## Test plan
- [x] D5 formula tests pass (8 input pairs + edge cases)
- [x] D5 Clojure blob consistency check passes (all datasets)
- [x] Full test suite passes (public + private, 19/19 regression tests)
- [x] Only pre-existing failure: pakistan-incremental D2 (unrelated)

🤖 Generated with [Claude Code](https://claude.com/claude-code)


## Squashed commits

- RED: add D5 blob injection test (prop_test vs Clojure p-test values)
- Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount)
- Update plan and journal: mark D5 as done
- Plan: add D5 PR number and stack position to cross-reference

commit-id:48b77ba3
@jucor jucor changed the title [Stack 12/17] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) May 19, 2026
@jucor jucor force-pushed the spr/edge/0194003d branch from add1343 to a244e09 Compare May 19, 2026 22:09
@jucor jucor force-pushed the spr/edge/48b77ba3 branch from 956e3a8 to 664d6dc Compare May 19, 2026 22:09
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1107 320 71%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 297 38 87%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 135 27 80%
regression/recorder.py 36 27 25%
regression/utils.py 138 94 32%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 53 53%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 518 11%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10770 7620 29%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants