Skip to content

[Stack 19/27] Fix D8: match Clojure repful classification (rat > rdt)#2451

Closed
jucor wants to merge 1 commit into
jc/clj-parity-d7-repness-metricfrom
jc/clj-parity-d8-finalize-stats
Closed

[Stack 19/27] Fix D8: match Clojure repful classification (rat > rdt)#2451
jucor wants to merge 1 commit into
jc/clj-parity-d7-repness-metricfrom
jc/clj-parity-d8-finalize-stats

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 16, 2026

Summary

Stacked on #2450 (Fix D7: match Clojure repness metric formula (product of 4 signed values)). Please review and merge #2450 first.
Next in stack: #2452 (Fix D15: match Clojure moderation handling (zero out columns, don't remove))

Simplifies the repful ("representative for agree or disagree?") classification
to match Clojure's finalize-cmt-stats (repness.clj:175-177).

Before (Python): 3-branch conditional:

  1. pa > 0.5 AND ra > 1.0 → agree
  2. pd > 0.5 AND rd > 1.0 → disagree
  3. Fallback: whichever metric is higher

After (Clojure): rat > rdt → agree, else disagree.

The old thresholds were redundant — rat and rdt (two-proportion z-scores)
already encode whether the group's agree/disagree rate is significantly higher
than other groups. The simple comparison is both correct and clearer.

Changes

  • repness.py: finalize_cmt_stats() — 3-branch logic → rat > rdt
  • repness.py: Vectorized — np.select with conditions → np.where(rat > rdt)
  • test_discrepancy_fixes.py: Expanded from 2 to 6 tests (including edge cases:
    equal rat/rdt, both negative, both zero)
  • Golden snapshots re-recorded (repful direction changes for some comments)

Test plan

  • 6 targeted D8 tests pass (rat>rdt, rat<rdt, equal, both negative, both zero, old-vs-new divergence case)
  • Full test suite passes (excluding DynamoDB/MinIO tests)
  • Private dataset tests pass (--include-local)
  • Golden snapshots re-recorded for all 7 datasets
  • 19/19 regression tests pass

🤖 Generated with Claude Code

@jucor jucor requested a review from Copilot March 16, 2026 18:25
@jucor jucor changed the title Fix D8: match Clojure repful classification (rat > rdt) [Stack 17/17] Fix D8: match Clojure repful classification (rat > rdt) Mar 16, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR simplifies the repful ("representative for agree or disagree") classification in finalize_cmt_stats to match Clojure's repness.clj:175-177 logic. The old 3-branch conditional (pa > 0.5 AND ra > 1.0 → agree, pd > 0.5 AND rd > 1.0 → disagree, fallback to higher metric) is replaced with the simpler rat > rdt → agree, else disagree.

Changes:

  • Replaced both scalar (finalize_cmt_stats) and vectorized (compute_group_comment_stats_df) repful classification with rat > rdt comparison
  • Expanded D8 tests from 2 to 6 formula tests (including edge cases: equal, both negative, both zero), removed xfail markers for now-passing tests
  • Re-recorded golden snapshots for affected datasets to reflect repful direction changes

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
delphi/polismath/pca_kmeans_rep/repness.py Simplified repful classification in both scalar and vectorized paths to rat > rdt
delphi/tests/test_discrepancy_fixes.py Added 4 new edge-case tests, removed xfail from D8 formula tests, updated xfail reason on blob test
delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md Added PR 7 / Session 10 journal entry documenting the D8 fix
delphi/docs/PLAN_DISCREPANCY_FIXES.md Marked D8 as DONE
delphi/real_data/r6vbnhffkxbd7ifmfbdrd-vw/golden_snapshot.json Re-recorded snapshot with updated repness values and repful directions
delphi/real_data/r4tykwac8thvzv35jrn53-biodiversity/golden_snapshot.json Re-recorded snapshot with updated repness values

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread delphi/tests/test_discrepancy_fixes.py Outdated
Comment on lines +1064 to +1066
"""
D8: Python uses if pa > 0.5 AND ra > 1.0 → 'agree'; elif pd > 0.5 AND rd > 1.0 → 'disagree'
Clojure uses simple rat > rdt → 'agree'; else → 'disagree'
Clojure uses simple rat > rdt → 'agree'; else → 'disagree' (repness.clj:175-177)
@jucor jucor marked this pull request as draft March 17, 2026 10:35
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 1d18f1b to 45d6c60 Compare March 17, 2026 16:10
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from db0fe14 to 2a625de Compare March 17, 2026 16:10
@jucor jucor changed the title [Stack 17/17] Fix D8: match Clojure repful classification (rat > rdt) [Stack 17/24] Fix D8: match Clojure repful classification (rat > rdt) Mar 17, 2026
@jucor jucor changed the title [Stack 17/24] Fix D8: match Clojure repful classification (rat > rdt) [Stack 17/25] Fix D8: match Clojure repful classification (rat > rdt) Mar 17, 2026
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 45d6c60 to 5b57f8a Compare March 18, 2026 18:50
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from 39fc7ef to c0684d4 Compare March 18, 2026 19:02
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 5b57f8a to 7705349 Compare March 18, 2026 19:06
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from c0684d4 to 42d9f25 Compare March 18, 2026 19:07
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 3 0 100%
main.py 55 55 0%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 2 0 100%
components/config.py 165 133 19%
components/server.py 116 72 38%
conversation/init.py 2 0 100%
conversation/conversation.py 1108 320 71%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 234 40%
database/postgres.py 306 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 265 22 92%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 50 15 70%
pca_kmeans_rep/repness.py 305 35 89%
poller.py 224 188 16%
regression/init.py 5 0 100%
regression/clojure_comparer.py 182 83 54%
regression/comparer.py 887 473 47%
regression/datasets.py 135 27 80%
regression/recorder.py 36 27 25%
regression/utils.py 138 52 62%
run_math_pipeline.py 260 114 56%
system.py 85 55 35%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 54 52%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 787 787 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 110 110 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 585 477 18%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 63 41 35%
Total 11269 7727 31%

@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from 42d9f25 to b68c7e5 Compare March 19, 2026 10:23
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch 2 times, most recently from 1a0f157 to a8428d5 Compare March 19, 2026 10:46
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from b68c7e5 to c2e521d Compare March 19, 2026 10:46
@jucor jucor changed the title [Stack 17/25] Fix D8: match Clojure repful classification (rat > rdt) [Stack 16/24] Fix D8: match Clojure repful classification (rat > rdt) Mar 19, 2026
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from a8428d5 to d9ed377 Compare March 19, 2026 12:32
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from c2e521d to b8a8e08 Compare March 19, 2026 12:32
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from d9ed377 to 9f20b50 Compare March 19, 2026 14:52
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from b8a8e08 to 0154ce7 Compare March 19, 2026 14:52
@jucor jucor changed the title [Stack 16/24] Fix D8: match Clojure repful classification (rat > rdt) [Stack 17/25] Fix D8: match Clojure repful classification (rat > rdt) Mar 19, 2026
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 9f20b50 to 65f136d Compare March 23, 2026 15:33
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from 0154ce7 to a313f1c Compare March 23, 2026 15:33
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 65f136d to a0d8710 Compare March 23, 2026 15:41
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from a313f1c to 6a2ac55 Compare March 23, 2026 15:41
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from 0297ca2 to c0f1f0f Compare March 24, 2026 10:28
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 7c92111 to f2c2965 Compare March 24, 2026 11:13
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch 2 times, most recently from 4ebd5ab to baceacd Compare March 24, 2026 11:45
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from f2c2965 to e1392d1 Compare March 26, 2026 21:24
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch 2 times, most recently from 74b31de to c4f5811 Compare March 27, 2026 01:15
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch 2 times, most recently from 799a9c4 to 9a1b3b3 Compare March 27, 2026 01:53
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from c4f5811 to 2960412 Compare March 27, 2026 01:53
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 9a1b3b3 to cee0f53 Compare March 27, 2026 02:10
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch 2 times, most recently from 44b04ae to b24d69b Compare March 27, 2026 10:41
@jucor jucor changed the title [Stack 17/25] Fix D8: match Clojure repful classification (rat > rdt) [Stack 18/26] Fix D8: match Clojure repful classification (rat > rdt) Mar 30, 2026
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 3511e00 to e209f37 Compare March 30, 2026 12:48
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from b24d69b to abfbacb Compare March 30, 2026 12:48
@jucor jucor changed the title [Stack 18/26] Fix D8: match Clojure repful classification (rat > rdt) [Stack 19/27] Fix D8: match Clojure repful classification (rat > rdt) Mar 30, 2026
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from e209f37 to 4df6e36 Compare March 30, 2026 12:54
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from abfbacb to 3847f76 Compare March 30, 2026 12:54
@jucor jucor requested a review from Copilot March 30, 2026 16:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 9 to 12
"n_participants_in_csv": 69,
"fixed_timestamp": 1700000000000,
"recorded_at": "2026-03-27T01:51:24.692321"
"recorded_at": "2026-03-27T01:51:39.156540"
},
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snapshot update appears to only change recorded_at (and also math_tick elsewhere), which are timestamp-based and not used for regression comparisons (the comparer ignores math_tick and doesn’t compare metadata). If there are no substantive stage-output diffs, consider reverting to avoid noisy churn or making the recorder write stable values for these fields.

Copilot uses AI. Check for mistakes.
Comment on lines 9 to 12
"n_participants_in_csv": 536,
"fixed_timestamp": 1700000000000,
"recorded_at": "2026-03-27T01:51:23.001794"
"recorded_at": "2026-03-27T01:51:37.771402"
},
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This snapshot change is only updating recorded_at (and math_tick elsewhere), which are timestamp-based and not part of regression comparisons (math_tick is ignored and metadata isn’t compared). If there are no actual stage-output changes for this dataset, please revert these timestamp-only edits to keep diffs meaningful.

Copilot uses AI. Check for mistakes.
Comment on lines +570 to +571
- `test_discrepancy_fixes.py`: Expanded `TestD8FinalizeStats` from 2 to 7 tests (5 formula +
1 blob xfail + edge cases for equal/negative/zero rat/rdt)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The journal entry says TestD8FinalizeStats expanded “from 2 to 7 tests”, but the PR description states 6 tests. Please reconcile the counts (and ideally list the exact test names) so the journal accurately reflects the change set.

Suggested change
- `test_discrepancy_fixes.py`: Expanded `TestD8FinalizeStats` from 2 to 7 tests (5 formula +
1 blob xfail + edge cases for equal/negative/zero rat/rdt)
- `test_discrepancy_fixes.py`: Expanded `TestD8FinalizeStats` to 6 tests covering the formula,
the blob xfail, and edge cases for equal/negative/zero `rat`/`rdt` values

Copilot uses AI. Check for mistakes.
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from 3847f76 to f7062f8 Compare March 30, 2026 16:49
@jucor jucor force-pushed the jc/clj-parity-d7-repness-metric branch from 4df6e36 to 9bb9604 Compare March 30, 2026 16:49
Documents D5-D8 review findings, blob injection tests, CI fixes,
k-divergence discovery, stack reordering, and next steps.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@jucor jucor force-pushed the jc/clj-parity-d8-finalize-stats branch from f7062f8 to f7329b4 Compare March 30, 2026 17:05
This was referenced Mar 30, 2026
@jucor
Copy link
Copy Markdown
Collaborator Author

jucor commented Mar 30, 2026

Superseded by spr-managed PR stack. See the new stack starting at #2508.

@jucor jucor closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants