[Stack 15/27] Fix D9: z-score thresholds from two-tailed to one-tailed by jucor · Pull Request #2446 · compdemocracy/polis

jucor · 2026-03-13T14:10:33Z

Summary

Stacked on #2443 (Fix test DB connection: use DATABASE_URL with dotenv). Please review and merge #2443 first.
Next in stack: #2448 (Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount))

Fix D9: change z-score significance thresholds from two-tailed to one-tailed, matching Clojure's stats.clj
Z_90: 1.645 → 1.2816, Z_95: 1.96 → 1.6449
Also resolves an internal inconsistency — Python's own stats.py already used the correct one-tailed values

Why one-tailed?

The proportion tests in Polis check whether a comment's agree (or disagree) rate is significantly above 0.5 — a directional hypothesis. One-tailed is correct because we only care about one direction at a time. The two-tailed values were 28% more conservative, causing fewer comments to pass significance.

Test plan

TDD: removed xfail from 3 D9 tests, confirmed red (3 failures), applied fix, confirmed green
Discrepancy tests: 63 passed, 6 skipped, 50 xfailed (all 7 datasets including private)
Regression tests: 19 passed (all 7 datasets, golden snapshots re-recorded)
Repness unit tests: 36 passed (boundary values updated to match new thresholds)
4 pre-existing failures unrelated to D9 (PCA incremental blobs, DB-dependent tests)

🤖 Generated with Claude Code

Copilot

Pull request overview

Updates Delphi’s repness significance thresholds to use one-tailed z-score cutoffs (aligning with the Clojure implementation and existing stats.py expectations), and refreshes affected tests and golden snapshots.

Changes:

Update Z_90/Z_95 constants in repness.py to one-tailed thresholds (1.2816 / 1.6449).
Un-xfail and adjust unit/discrepancy tests to assert the new thresholds (including boundary assertions).
Re-record regression golden snapshot data reflecting the new repness behavior.

Reviewed changes

Copilot reviewed 19 out of 21 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`delphi/polismath/pca_kmeans_rep/repness.py`	Switch Z-score thresholds to one-tailed constants and update inline documentation.
`delphi/tests/test_repness_unit.py`	Update z-score significance unit tests for the new thresholds.
`delphi/tests/test_old_format_repness.py`	Update backwards-compat repness tests for the new thresholds.
`delphi/tests/test_discrepancy_fixes.py`	Remove D9 `xfail` markers now that thresholds match expected values.
`delphi/tests/simplified_repness_test.py`	Update the script constant to the new Z_90 value.
`delphi/real_data/r6vbnhffkxbd7ifmfbdrd-vw/golden_snapshot.json`	Update golden snapshot outputs after repness threshold change.
`delphi/docs/PLAN_DISCREPANCY_FIXES.md`	Add task parallelization notes related to discrepancy fix sequencing.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions · 2026-03-30T13:27:39Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1107	320	71%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	297	43	86%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	135	27	80%
regression/recorder.py	36	27	25%
regression/utils.py	138	87	37%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	53	53%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	518	11%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10770	7618	29%

Copilot

Pull request overview

Copilot reviewed 10 out of 11 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T16:29:24Z

    Returns:
        True if significant at 90% confidence
    """
-    return abs(z) >= Z_90
+    return z > Z_90



The significance helpers were switched to one-tailed semantics (z > Z_90), but Z_90 is still set to 1.645 (two-tailed 90%). To match Clojure’s z-sig-90? (> 1.2816), update the constant (and its comment) to 1.2816; otherwise the new one-tailed check remains overly conservative and D9 parity tests can’t pass once enabled.

Copilot · 2026-03-30T16:29:24Z

    Returns:
        True if significant at 95% confidence
    """
-    return abs(z) >= Z_95
+    return z > Z_95



Same issue as Z_90: z_score_sig_95 now uses one-tailed strict z > Z_95, but Z_95 is still 1.96 (two-tailed 95%). Clojure’s z-sig-95? uses 1.6449; update Z_95 accordingly so 95% gating matches the reference implementation.

Copilot · 2026-03-30T16:29:25Z

@@ -562,6 +564,20 @@ def test_z95_matches_clojure(self):
        check.almost_equal(Z_95, 1.6449, abs=0.001,
                            msg=f"Z_95 should be 1.6449 (one-tailed), got {Z_95}")


The D9 threshold value assertions are still marked @pytest.mark.xfail, which means this PR’s stated purpose (changing Z_90/Z_95 values) isn’t actually enforced by CI. Once the constants are updated, these xfails should be removed so regressions in the threshold values will fail the suite.

Copilot · 2026-03-30T16:29:25Z

-z_score = two_prop_test(70, 100, 50, 100)  # 70/100 vs 50/100
-print(f"Comparison Z-score: {z_score}")
-```
+Statistical helpers (`z_sig_90`, `z_sig_95`, `prop_test`, `two_prop_test`) are


This doc references z_sig_90/z_sig_95, but repness.py defines z_score_sig_90/z_score_sig_95 (and stats.py has been removed). Update the names here (or add documented aliases) so the docs reflect the actual public API.

Suggested change

Statistical helpers (`z_sig_90`, `z_sig_95`, `prop_test`, `two_prop_test`) are

Statistical helpers (`z_score_sig_90`, `z_score_sig_95`, `prop_test`, `two_prop_test`) are

Map dependency graph and file boundaries for D5-D12. Two parallel tracks possible: repness formulas (D5→D6→D7→D8→D10→D11) and conversation/PCA (D3→D15→D12), with D1/D1b after both. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ead stats.py repness.py: change z_score_sig_90/95 from abs(z) >= threshold to z > threshold, matching Clojure's (> z-val 1.2816). Also fix inline significance filters in select_rep_comments_df to use > without .abs(). Remove stats.py and test_stats.py — unused dead code from the original AI-generated Clojure port. repness.py defines its own z-sig functions and never imported stats.py. Update usage_examples.md to point to repness.py instead of deleted stats.py. Add D9 blob comparison tests (significance sets and z-values, xfail until D5/D6), D5/D6/D7/D8 blob comparison tests with shared_count guards, tid type fixes, and D9 unit tests for strict > and one-tailed semantics. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Require side-by-side Clojure/Python verification for every formula change - Exhaustive RED phase: boundary conditions, edge cases, missing test audit - Double-check array shapes, indices, aggregation axes after implementation - Allow skipping very large private datasets during iteration, only run as final validation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

PR 14 (vectorized code refactor) is now a prerequisite for all formula fix PRs, not a post-parity cleanup. It branches off jc/clj-parity-d9-fix (Stack 13) to make the vectorized production path readable and testable against Clojure blob values. Remaining dead code cleanup split to PR 14b. Handoff doc at delphi/docs/HANDOFF_PR14_VECTORIZED_REFACTOR.md. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The D9 z-score threshold change (two-tailed → one-tailed) combined with upstream fixes cascaded into this branch changed the repness output enough to invalidate the vw golden snapshot. Biodiversity re-recorded too for consistency. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jucor · 2026-03-30T22:54:40Z

Superseded by spr-managed PR stack. See the new stack starting at #2508.

jucor changed the base branch from jc/vectorize-participant-info to jc/fix-test-db-connection March 13, 2026 14:12

jucor force-pushed the jc/clj-parity-d9-fix branch from 0f32d4c to abfeacc Compare March 13, 2026 14:13

jucor changed the title ~~Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 13/13] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 13, 2026

jucor requested a review from Copilot March 13, 2026 14:14

Copilot started reviewing on behalf of jucor March 13, 2026 14:15 View session

jucor force-pushed the jc/fix-test-db-connection branch from e2f39bb to 89297cf Compare March 13, 2026 14:17

Copilot AI reviewed Mar 13, 2026

View reviewed changes

Comment thread delphi/polismath/pca_kmeans_rep/repness.py Outdated

Comment thread delphi/docs/PLAN_DISCREPANCY_FIXES.md Outdated

jucor force-pushed the jc/fix-test-db-connection branch from 89297cf to 403be8d Compare March 13, 2026 16:07

jucor force-pushed the jc/clj-parity-d9-fix branch from 9149f7f to e3c5893 Compare March 13, 2026 16:07

jucor mentioned this pull request Mar 13, 2026

[Stack 14/27] Fix test DB connection: use DATABASE_URL with dotenv #2443

Closed

3 tasks

jucor changed the title ~~[Stack 13/13] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 13/15] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 16, 2026

jucor force-pushed the jc/fix-test-db-connection branch from 403be8d to 464da16 Compare March 16, 2026 16:04

jucor force-pushed the jc/clj-parity-d9-fix branch from db36889 to 69350d5 Compare March 16, 2026 16:04

jucor changed the title ~~[Stack 13/15] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 13/16] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 16, 2026

jucor force-pushed the jc/clj-parity-d9-fix branch from 69350d5 to 382de2f Compare March 16, 2026 18:06

jucor changed the title ~~[Stack 13/16] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 13/17] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 16, 2026

jucor changed the title ~~[Stack 13/17] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 13/24] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 17, 2026

jucor changed the title ~~[Stack 13/24] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 13/25] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 17, 2026

jucor mentioned this pull request Mar 17, 2026

[Stack 16/27] Fix D5: match Clojure prop_test formula (Wilson-score-like with +1 pseudocount) #2448

Closed

4 tasks

jucor force-pushed the jc/clj-parity-d9-fix branch from f8a7007 to 19cce44 Compare March 19, 2026 10:03

jucor force-pushed the jc/fix-test-db-connection branch from 6b81d8e to 6da6172 Compare March 19, 2026 10:43

jucor force-pushed the jc/clj-parity-d9-fix branch from 19cce44 to bf2dd99 Compare March 19, 2026 10:43

jucor changed the title ~~[Stack 13/25] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 12/24] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 19, 2026

jucor force-pushed the jc/fix-test-db-connection branch from 6da6172 to 0ef54ca Compare March 19, 2026 12:31

jucor force-pushed the jc/clj-parity-d9-fix branch from bf2dd99 to f8c5793 Compare March 19, 2026 12:31

jucor force-pushed the jc/fix-test-db-connection branch from 0ef54ca to b62c0cd Compare March 19, 2026 14:52

jucor force-pushed the jc/clj-parity-d9-fix branch from f8c5793 to 7f733c1 Compare March 19, 2026 14:52

jucor changed the title ~~[Stack 12/24] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 13/25] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 19, 2026

jucor force-pushed the jc/fix-test-db-connection branch from b62c0cd to 83bfe23 Compare March 23, 2026 15:09

jucor force-pushed the jc/clj-parity-d9-fix branch from 7f733c1 to c920b61 Compare March 23, 2026 15:11

jucor force-pushed the jc/fix-test-db-connection branch from 397f00d to e668601 Compare March 27, 2026 01:15

jucor force-pushed the jc/clj-parity-d9-fix branch from ee798a6 to 6e54a9c Compare March 27, 2026 01:53

jucor force-pushed the jc/fix-test-db-connection branch from e668601 to 4ea1333 Compare March 27, 2026 02:10

jucor force-pushed the jc/clj-parity-d9-fix branch 2 times, most recently from 09747ea to 8d94246 Compare March 27, 2026 10:41

jucor force-pushed the jc/fix-test-db-connection branch from 4ea1333 to 917cba8 Compare March 27, 2026 10:41

jucor changed the title ~~[Stack 13/25] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 14/26] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 30, 2026

jucor force-pushed the jc/fix-test-db-connection branch from 917cba8 to e45a120 Compare March 30, 2026 12:48

jucor force-pushed the jc/clj-parity-d9-fix branch from 8d94246 to 9397ddf Compare March 30, 2026 12:48

jucor changed the title ~~[Stack 14/26] Fix D9: z-score thresholds from two-tailed to one-tailed~~ [Stack 15/27] Fix D9: z-score thresholds from two-tailed to one-tailed Mar 30, 2026

jucor force-pushed the jc/clj-parity-d9-fix branch from 9397ddf to e96a1f7 Compare March 30, 2026 12:54

jucor requested a review from Copilot March 30, 2026 16:25

Copilot started reviewing on behalf of jucor March 30, 2026 16:26 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

jucor force-pushed the jc/fix-test-db-connection branch from b68bd5b to 583f955 Compare March 30, 2026 16:49

jucor force-pushed the jc/clj-parity-d9-fix branch from e96a1f7 to 574c169 Compare March 30, 2026 16:49

jucor and others added 7 commits March 30, 2026 18:04

Re-record vw golden snapshot after D9 z-sig semantics change

0549d8a

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Update plan: mark D9 as done, note stats.py removal for next PR

67044fc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

jucor force-pushed the jc/clj-parity-d9-fix branch from 574c169 to b64cae8 Compare March 30, 2026 17:05

jucor force-pushed the jc/fix-test-db-connection branch from 583f955 to 380b00d Compare March 30, 2026 17:05

This was referenced Mar 30, 2026

IGNORE -- crash from spr #2499

Closed

IGNORE -- crash from spr #2501

Closed

jucor closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack 15/27] Fix D9: z-score thresholds from two-tailed to one-tailed#2446

[Stack 15/27] Fix D9: z-score thresholds from two-tailed to one-tailed#2446
jucor wants to merge 7 commits into
jc/fix-test-db-connectionfrom
jc/clj-parity-d9-fix

jucor commented Mar 13, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		@@ -562,6 +564,20 @@ def test_z95_matches_clojure(self):
		check.almost_equal(Z_95, 1.6449, abs=0.001,
		msg=f"Z_95 should be 1.6449 (one-tailed), got {Z_95}")

	Statistical helpers (`z_sig_90`, `z_sig_95`, `prop_test`, `two_prop_test`) are
	Statistical helpers (`z_score_sig_90`, `z_score_sig_95`, `prop_test`, `two_prop_test`) are

Conversation

jucor commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why one-tailed?

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Mar 30, 2026

Delphi Coverage Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Mar 13, 2026 •

edited

Loading