Fix D2: in-conv participant threshold + D2c vote count source by jucor · Pull Request #2513 · compdemocracy/polis

jucor · 2026-03-30T22:25:11Z

Summary

Fixes the in-conv participant threshold (D2), vote count source (D2c), and base-cluster sort order (D2b) to match Clojure. Adds monotonicity guard tests (D2d).

D2: In-conv threshold

Before: threshold = 7 + sqrt(n_cmts) * 0.1 — increasingly restrictive for larger conversations (e.g., 8.8 for biodiversity's 314 comments)
After: threshold = min(7, n_cmts) — matches Clojure exactly

D2b: Base-cluster sort order (from Copilot review)

Before: Base clusters sorted by size (descending) with IDs reassigned — changes encounter order of centers fed into group-level k-means
After: Keep k-means ID order, matching Clojure's (sort-by :id ...)

D2c: Vote count source (raw vs filtered matrix)

Before: _compute_user_vote_counts and n_cmts used self.rating_mat (filtered — moderated-out comment columns removed). A participant who voted on 8 comments could drop to 5 visible votes after 3 comments were moderated-out, falling below threshold.
After: Both use self.raw_rating_mat (includes all votes, even on moderated-out comments), matching Clojure's user-vote-counts (conversation.clj:217-225) which reads from raw-rating-mat.

D2d: In-conv monotonicity (design decision)

Python does full recompute from raw_rating_mat every time, so monotonicity ("once in, always in") is guaranteed without persistence — votes are immutable in PostgreSQL, so a participant's count never decreases. This is strictly better than Clojure's approach (which persists in-conv to math_main because it uses delta vote processing).

5 guard tests (T1-T5) document this invariant and warn that switching to delta processing would require persisting in-conv to DynamoDB (ref: #2358).

Impact

biodiversity: 428 → 441 in-conv participants (now matches Clojure)
Verified on 4 datasets with complete Clojure cold-start blobs

Incremental vs cold-start blob testing

D2 tests run against both cold-start and incremental Clojure blobs (infrastructure from #2420):

Cold-start blobs are computed in one pass on the full dataset. The in-conv threshold min(7, n_cmts) is evaluated once with the final n_cmts. Python matches these exactly.
Incremental blobs were built progressively as votes trickled in over the conversation's lifetime. The threshold was evaluated at each iteration with a smaller n_cmts, admitting a few extra participants during earlier iterations. The difference is tiny (1–2 participants).

D2 tests on incremental blobs are currently xfailed with an explanatory comment. Matching incremental behaviour exactly would require simulating the progressive threshold — tracked as future work under Replay Infrastructure.

Test results

253 passed, 5 skipped, 36 xfailed (0 failures)

Test plan

D2 tests pass on all datasets with complete Clojure cold-start blobs
D2c: 3 synthetic tests verify vote counts include moderated-out votes, n_cmts includes moderated-out comments, participants stay in-conv after moderation
D2d: 5 monotonicity tests (basic across updates, survives moderation, worker restart + moderation, restart without new votes, mixed participants)
D2 tests xfail on incremental blobs (with explanatory comments)
Full test suite: 253 passed, 0 failures
Golden snapshots re-recorded for affected datasets

🤖 Generated with Claude Code

Squashed commits

Fix D2: in-conv threshold min(7, n_cmts) to match Clojure
Skip D2 tests on datasets with incomplete Clojure blobs
Address Copilot review: fix base-cluster sort order (D2b) and stale comment
Add PR 1 test results to journal
Plan: add D2c (vote count source) and D2d (in-conv monotonicity) to fix plan
Journal: add session 3 findings (D2c vote count source, D2d monotonicity)
Re-record golden snapshots and remove passing xfail markers
xfail D2 in-conv tests on incremental blobs
Journal: add session 4, update plan with D2 incremental in Replay PR B
Fix D2c: use raw_rating_mat for vote counts and n_cmts threshold

commit-id:c0a682ec

Stack:

⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

Copilot

Pull request overview

Aligns Delphi’s “in-conv” participant filtering with the legacy Clojure math pipeline by correcting the vote threshold formula, ensuring vote counts come from the raw (unfiltered) vote matrix, and preserving base-cluster ID/encounter order. Adds targeted discrepancy + monotonicity guard tests and updates supporting documentation/journal entries.

Changes:

Update in-conv threshold to min(7, n_comments) and compute both n_cmts + per-user vote counts from raw_rating_mat (so moderated-out comment votes still count).
Preserve base-cluster ordering by k-means ID (avoid size-sorting + ID reassignment).
Add D2c/D2d synthetic + monotonicity guard tests; update plan/journal docs and a related test comment.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`delphi/polismath/conversation/conversation.py`	Adjusts base-cluster ordering and switches in-conv vote counting / threshold inputs to `raw_rating_mat`.
`delphi/tests/test_discrepancy_fixes.py`	Removes prior D2 xfails for cold-start blobs, adds D2c tests for raw-vs-filtered vote counting, and adds D2d monotonicity guard tests.
`delphi/tests/test_conversation.py`	Updates an in-test comment to reflect the new threshold definition.
`delphi/docs/PLAN_DISCREPANCY_FIXES.md`	Marks D2/D2b/D2c/D2d as done and documents incremental-blob deferral rationale.
`delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md`	Adds detailed journal entries describing the D2-related investigations and fixes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+                0: list(range(10)),    # 10 raw votes → in-conv
+                1: list(range(5, 11)), # 6 raw votes (only 1 moderated-out) → NOT in-conv


+        # Keep base clusters in k-means ID order (matching Clojure's sort-by :id)
+        # Do NOT sort by size or reassign IDs — that would change the encounter
+        # order of centers used in group clustering's first-k-distinct initialization.
+        base_clusters.sort(key=lambda c: c['id'])


## Summary Fixes the in-conv participant threshold (D2), vote count source (D2c), and base-cluster sort order (D2b) to match Clojure. Adds monotonicity guard tests (D2d). ### D2: In-conv threshold - **Before**: `threshold = 7 + sqrt(n_cmts) * 0.1` — increasingly restrictive for larger conversations (e.g., 8.8 for biodiversity's 314 comments) - **After**: `threshold = min(7, n_cmts)` — matches Clojure exactly ### D2b: Base-cluster sort order (from Copilot review) - **Before**: Base clusters sorted by size (descending) with IDs reassigned — changes encounter order of centers fed into group-level k-means - **After**: Keep k-means ID order, matching Clojure's `(sort-by :id ...)` ### D2c: Vote count source (raw vs filtered matrix) - **Before**: `_compute_user_vote_counts` and `n_cmts` used `self.rating_mat` (filtered — moderated-out comment columns removed). A participant who voted on 8 comments could drop to 5 visible votes after 3 comments were moderated-out, falling below threshold. - **After**: Both use `self.raw_rating_mat` (includes all votes, even on moderated-out comments), matching Clojure's `user-vote-counts` (conversation.clj:217-225) which reads from `raw-rating-mat`. ### D2d: In-conv monotonicity (design decision) Python does full recompute from `raw_rating_mat` every time, so monotonicity ("once in, always in") is guaranteed without persistence — votes are immutable in PostgreSQL, so a participant's count never decreases. This is **strictly better** than Clojure's approach (which persists in-conv to `math_main` because it uses delta vote processing). 5 guard tests (T1-T5) document this invariant and warn that switching to delta processing would require persisting in-conv to DynamoDB (ref: #2358). ### Impact - biodiversity: 428 → 441 in-conv participants (now matches Clojure) - Verified on 4 datasets with complete Clojure cold-start blobs ### Incremental vs cold-start blob testing D2 tests run against both **cold-start** and **incremental** Clojure blobs (infrastructure from #2420): - **Cold-start blobs** are computed in one pass on the full dataset. The in-conv threshold `min(7, n_cmts)` is evaluated once with the final `n_cmts`. Python matches these exactly. - **Incremental blobs** were built progressively as votes trickled in over the conversation's lifetime. The threshold was evaluated at each iteration with a smaller `n_cmts`, admitting a few extra participants during earlier iterations. The difference is tiny (1–2 participants). D2 tests on incremental blobs are currently **xfailed** with an explanatory comment. Matching incremental behaviour exactly would require simulating the progressive threshold — tracked as future work under Replay Infrastructure. ### Test results ``` 253 passed, 5 skipped, 36 xfailed (0 failures) ``` ## Test plan - [x] D2 tests pass on all datasets with complete Clojure cold-start blobs - [x] D2c: 3 synthetic tests verify vote counts include moderated-out votes, n_cmts includes moderated-out comments, participants stay in-conv after moderation - [x] D2d: 5 monotonicity tests (basic across updates, survives moderation, worker restart + moderation, restart without new votes, mixed participants) - [x] D2 tests xfail on incremental blobs (with explanatory comments) - [x] Full test suite: 253 passed, 0 failures - [x] Golden snapshots re-recorded for affected datasets 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Squashed commits - Fix D2: in-conv threshold min(7, n_cmts) to match Clojure - Skip D2 tests on datasets with incomplete Clojure blobs - Address Copilot review: fix base-cluster sort order (D2b) and stale comment - Add PR 1 test results to journal - Plan: add D2c (vote count source) and D2d (in-conv monotonicity) to fix plan - Journal: add session 3 findings (D2c vote count source, D2d monotonicity) - Re-record golden snapshots and remove passing xfail markers - xfail D2 in-conv tests on incremental blobs - Journal: add session 4, update plan with D2 incremental in Replay PR B - Fix D2c: use raw_rating_mat for vote counts and n_cmts threshold commit-id:c0a682ec

github-actions · 2026-05-19T22:27:07Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1117	328	71%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	361	47	87%
pca_kmeans_rep/stats.py	107	22	79%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	135	27	80%
regression/recorder.py	36	27	25%
regression/utils.py	137	118	14%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	54	52%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	477	18%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10950	7643	30%

jucor changed the title ~~Fix D2: in-conv participant threshold + D2c vote count source~~ [Stack 6/17] Fix D2: in-conv participant threshold + D2c vote count source Mar 30, 2026

jucor force-pushed the spr/edge/c0a682ec branch 3 times, most recently from 02284b0 to 7f20a34 Compare March 31, 2026 00:35

ballPointPenguin approved these changes Apr 26, 2026

View reviewed changes

jucor requested a review from Copilot May 19, 2026 21:43

Copilot started reviewing on behalf of jucor May 19, 2026 21:44 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

jucor changed the title ~~[Stack 6/17] Fix D2: in-conv participant threshold + D2c vote count source~~ Fix D2: in-conv participant threshold + D2c vote count source May 19, 2026

jucor force-pushed the spr/edge/c0a682ec branch from 7f20a34 to 9fbda43 Compare May 19, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix D2: in-conv participant threshold + D2c vote count source#2513

Fix D2: in-conv participant threshold + D2c vote count source#2513
jucor wants to merge 1 commit into
spr/edge/bdc830dbfrom
spr/edge/c0a682ec

jucor commented Mar 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		0: list(range(10)), # 10 raw votes → in-conv
		1: list(range(5, 11)), # 6 raw votes (only 1 moderated-out) → NOT in-conv

Conversation

jucor commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

D2: In-conv threshold

D2b: Base-cluster sort order (from Copilot review)

D2c: Vote count source (raw vs filtered matrix)

D2d: In-conv monotonicity (design decision)

Impact

Incremental vs cold-start blob testing

Test results

Test plan

Squashed commits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

github-actions Bot commented May 19, 2026

Delphi Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jucor commented Mar 30, 2026 •

edited

Loading