Speed up regression tests by jucor · Pull Request #2515 · compdemocracy/polis

jucor · 2026-03-30T22:25:14Z

Summary

Default benchmark=False in compare_with_golden() — benchmark mode ran the pipeline 3x for timing statistics, unnecessary for correctness checks. The regression_comparer.py script already had --benchmark as opt-in, so this aligns the default.
Add skip_intermediate_stages parameter to compute_all_stages() — test_conversation_regression now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks overall_match. test_conversation_stages_individually still runs all stages for granular failure detection.

Measured speedup on one of the large private test conversations

Test	Before	After	Speedup
`test_conversation_regression`	317s	23s	13.9x
`test_conversation_stages_individually`	60s	32s	1.9x

The regression test's ~14x speedup comes from two combined effects: no longer running the pipeline 3x (benchmark), and skipping 4 redundant intermediate stages.

Test plan

All 9 public regression tests pass (vw + biodiversity)
Private dataset tests pass (--include-local)
Timing verified on large private dataset

🤖 Generated with Claude Code

Squashed commits

Address Copilot review: fix stale terminology, hardcoded blob_type, and synthetic test tid range
Speed up regression tests: disable benchmark, skip intermediate stages

commit-id:f39f3218

Stack:

⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

Copilot

Pull request overview

This PR speeds up Delphi’s golden-snapshot regression tests by reducing unnecessary computation during comparisons, while preserving the more granular stage-by-stage test for debugging.

Changes:

Switch ConversationComparer.compare_with_golden() to default benchmark=False so correctness checks don’t run the pipeline multiple times for timing stats.
Add skip_intermediate_stages to compute_all_stages() / compare_with_golden() to optionally skip stages 1–4 and only compute the full recompute + export.
Update test_conversation_regression to use skip_intermediate_stages=True since it only asserts overall_match.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 1 comment.

File	Description
`delphi/tests/test_regression.py`	Makes the main regression test skip intermediate stages to reduce runtime.
`delphi/polismath/regression/utils.py`	Adds `skip_intermediate_stages` support in stage computation and benchmarks.
`delphi/polismath/regression/comparer.py`	Changes default benchmarking behavior and plumbs through stage-skipping option.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

## Summary - Default `benchmark=False` in `compare_with_golden()` — benchmark mode ran the pipeline 3x for timing statistics, unnecessary for correctness checks. The `regression_comparer.py` script already had `--benchmark` as opt-in, so this aligns the default. - Add `skip_intermediate_stages` parameter to `compute_all_stages()` — `test_conversation_regression` now skips stages 1-4 (empty, load-only, PCA-only, PCA+clustering) since it only checks `overall_match`. `test_conversation_stages_individually` still runs all stages for granular failure detection. ### Measured speedup on one of the large private test conversations | Test | Before | After | Speedup | |------|--------|-------|---------| | `test_conversation_regression` | 317s | 23s | **13.9x** | | `test_conversation_stages_individually` | 60s | 32s | **1.9x** | The regression test's ~14x speedup comes from two combined effects: no longer running the pipeline 3x (benchmark), and skipping 4 redundant intermediate stages. ## Test plan - [x] All 9 public regression tests pass (vw + biodiversity) - [x] Private dataset tests pass (`--include-local`) - [x] Timing verified on large private dataset 🤖 Generated with [Claude Code](https://claude.com/claude-code) ## Squashed commits - Address Copilot review: fix stale terminology, hardcoded blob_type, and synthetic test tid range - Speed up regression tests: disable benchmark, skip intermediate stages commit-id:f39f3218

github-actions · 2026-05-19T22:30:05Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1117	328	71%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	234	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	361	51	86%
pca_kmeans_rep/stats.py	107	22	79%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	135	27	80%
regression/recorder.py	36	27	25%
regression/utils.py	138	119	14%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	54	52%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	477	18%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10951	7648	30%

jucor changed the title ~~Speed up regression tests~~ [Stack 8/17] Speed up regression tests Mar 30, 2026

jucor force-pushed the spr/edge/f39f3218 branch from 591e196 to 510205d Compare March 30, 2026 22:39

jucor force-pushed the spr/edge/6ae3ee43 branch from 4ad6046 to 603f0ac Compare March 30, 2026 22:47

jucor force-pushed the spr/edge/f39f3218 branch 2 times, most recently from b1ebec4 to 8637399 Compare March 31, 2026 00:35

jucor force-pushed the spr/edge/6ae3ee43 branch from 603f0ac to b9dcc89 Compare March 31, 2026 00:35

ballPointPenguin approved these changes Apr 26, 2026

View reviewed changes

jucor requested a review from Copilot May 19, 2026 21:43

Copilot started reviewing on behalf of jucor May 19, 2026 21:44 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

Comment thread delphi/polismath/regression/comparer.py

jucor changed the title ~~[Stack 8/17] Speed up regression tests~~ Speed up regression tests May 19, 2026

jucor force-pushed the spr/edge/f39f3218 branch from 8637399 to daad2ff Compare May 19, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up regression tests#2515

Speed up regression tests#2515
jucor wants to merge 1 commit into
spr/edge/6ae3ee43from
spr/edge/f39f3218

jucor commented Mar 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jucor commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measured speedup on one of the large private test conversations

Test plan

Squashed commits

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

github-actions Bot commented May 19, 2026

Delphi Coverage Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jucor commented Mar 30, 2026 •

edited

Loading