[Stack 6/27] Add SKIP_GOLDEN env var to disable golden snapshot tests by jucor · Pull Request #2482 · compdemocracy/polis

jucor · 2026-03-30T11:07:35Z

Summary

Stacked on #2485 (Cold-start Clojure math blob generation and cluster visualization). Please review and merge #2485 first.
Next in stack: #2484 (Speed up CI: replace pip with uv pip in Dockerfile (~2x faster installs))

Add SKIP_GOLDEN=1 environment variable to disable golden snapshot regression tests.

During stacked PR development, golden snapshots become stale as computation changes cascade through the stack. Rather than re-recording snapshots at every rebase (which causes conflict cascades in jj/git), we skip them until the stack is merged into edge.

Changes

test_regression.py: Add @_skip_golden decorator to test_conversation_regression and test_conversation_stages_individually — the only two tests that compare against golden snapshots. Other dataset-using tests (Clojure comparison, smoke tests) are unaffected.
python-ci.yml: Set SKIP_GOLDEN=1 in CI so the stacked PRs don't fail on stale snapshots.

Usage

SKIP_GOLDEN=1 pytest tests/          # skip golden snapshot tests
pytest tests/                         # run everything (default)

Test plan

SKIP_GOLDEN=1 pytest tests/test_regression.py -v: 4 skipped, 5 passed
pytest tests/test_regression.py -v: all 9 collected (golden tests run normally)

Copilot

Pull request overview

Adds an opt-out switch for golden snapshot regression tests to make stacked PR development smoother when golden snapshots become stale across rebases.

Changes:

Introduces a SKIP_GOLDEN=1 env var gate via a shared @_skip_golden marker for the two golden snapshot tests.
Sets SKIP_GOLDEN=1 in the Python CI workflow run command to skip golden snapshot tests in CI.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
`delphi/tests/test_regression.py`	Adds a `skipif` marker controlled by `SKIP_GOLDEN` and applies it to the two golden snapshot tests.
`.github/workflows/python-ci.yml`	Exports `SKIP_GOLDEN=1` into the container test run environment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T11:12:42Z

          -e POSTGRES_HOST=postgres \
          -e POSTGRES_PASSWORD=PdwPNS2mDN73Vfbc \
          -e POSTGRES_DB=polis-test \
+          -e SKIP_GOLDEN=1 \
          delphi \


SKIP_GOLDEN=1 is being set unconditionally for the workflow, including push runs to edge/stable. That means golden snapshot regression coverage will effectively be disabled on the mainline branches too. Consider scoping this env var to only pull_request events / stacked-PR branches, or add a separate job (e.g., scheduled or push-to-edge) that runs without SKIP_GOLDEN so golden regressions are still caught in CI.

Copilot

Pull request overview

Copilot reviewed 8 out of 10 changed files in this pull request and generated 7 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-30T13:58:13Z

+#!/usr/bin/env python3
+"""
+Visualize and compare clustering results between different math blob sources.
+
+This script generates side-by-side and overlay visualizations comparing:
+- Comparison A: Golden snapshot (Python) vs Cold-start Clojure blob
+- Comparison B: Cold-start Clojure blob vs Regular Clojure blob
+
+The visualizations focus on the final group clustering layer, showing convex hulls
+around base cluster centers with comprehensive comparison metrics.
+"""


The PR title/description focus on adding SKIP_GOLDEN to disable golden snapshot tests, but this PR also introduces substantial new functionality (cold-start blob generation, visualization scripts, dataset selection changes, dependency additions, and extensive documentation). Please either (a) update the PR description/title to reflect the broader scope, or (b) split the additional tooling/dataset changes into separate PR(s) to keep the stack reviewable.

Copilot · 2026-03-30T13:58:13Z


-def get_dataset_files(name: str) -> Dict[str, str]:
-    """Get file paths for a dataset."""
+def get_dataset_files(name: str, prefer_cold_start: bool = True) -> Dict[str, str]:


When prefer_cold_start=False, math_blob_path is forced to the original blob even if the original file does not exist and only the cold-start blob exists. That creates a guaranteed file-not-found path for a dataset that _check_files would still report as has_math_blob=True. Suggested fix: choose the original blob only if it exists; otherwise fall back to the cold-start blob if present (and consider raising a clear error if neither exists).

Copilot · 2026-03-30T13:58:14Z

+    if prefer_cold_start and cold_start_blob.exists():
+        math_blob_path = str(cold_start_blob)
+    else:
+        math_blob_path = str(original_blob)


When prefer_cold_start=False, math_blob_path is forced to the original blob even if the original file does not exist and only the cold-start blob exists. That creates a guaranteed file-not-found path for a dataset that _check_files would still report as has_math_blob=True. Suggested fix: choose the original blob only if it exists; otherwise fall back to the cold-start blob if present (and consider raising a clear error if neither exists).

Suggested change

if prefer_cold_start and cold_start_blob.exists():

math_blob_path = str(cold_start_blob)

else:

math_blob_path = str(original_blob)

if prefer_cold_start:

if cold_start_blob.exists():

math_blob_path = str(cold_start_blob)

elif original_blob.exists():

math_blob_path = str(original_blob)

else:

raise FileNotFoundError(

f"No math blob found for dataset {name} ({rid}) in {info.path}"

)

else:

if original_blob.exists():

math_blob_path = str(original_blob)

elif cold_start_blob.exists():

math_blob_path = str(cold_start_blob)

else:

raise FileNotFoundError(

f"No math blob found for dataset {name} ({rid}) in {info.path}"

)

Copilot · 2026-03-30T13:58:14Z

+def get_base_cluster_positions(
+    group_cluster: Dict,
+    base_clusters: List[Dict]
+) -> np.ndarray:


id_to_cluster is rebuilt on every call to get_base_cluster_positions, and this function is called once per group in plot_group_clusters_with_hulls. For large datasets this becomes an avoidable O(G*N) cost. Prefer building the {id: cluster} map once (e.g., in plot_group_clusters_with_hulls) and passing it in, or caching it alongside base_clusters.

Copilot · 2026-03-30T13:58:14Z

+    # Build ID to cluster mapping for efficiency
+    id_to_cluster = {bc['id']: bc for bc in base_clusters}
+


id_to_cluster is rebuilt on every call to get_base_cluster_positions, and this function is called once per group in plot_group_clusters_with_hulls. For large datasets this becomes an avoidable O(G*N) cost. Prefer building the {id: cluster} map once (e.g., in plot_group_clusters_with_hulls) and passing it in, or caching it alongside base_clusters.

Suggested change

# Build ID to cluster mapping for efficiency

id_to_cluster = {bc['id']: bc for bc in base_clusters}

# Cache ID-to-cluster mappings per base_clusters list to avoid

# rebuilding the dictionary on every call.

cache = getattr(get_base_cluster_positions, "_id_to_cluster_cache", None)

if cache is None:

cache = {}

setattr(get_base_cluster_positions, "_id_to_cluster_cache", cache)

base_clusters_key = id(base_clusters)

id_to_cluster = cache.get(base_clusters_key)

if id_to_cluster is None:

id_to_cluster = {bc['id']: bc for bc in base_clusters}

cache[base_clusters_key] = id_to_cluster

Copilot · 2026-03-30T13:58:15Z

+from polismath.regression import (
+    discover_datasets,
+    list_available_datasets,
+    get_dataset_files,
+    get_dataset_info
+)


discover_datasets is imported but not used anywhere in this file (the script uses list_available_datasets / get_dataset_info instead). Removing the unused import will reduce lint noise and keep dependencies clearer.

Copilot · 2026-03-30T13:58:15Z

+        exit(1)
+    else:
+        click.echo("\n✓ All datasets visualized successfully!")
+        exit(0)


Using exit() is intended for interactive sessions and can be less explicit/consistent in scripts. Prefer sys.exit(1) / sys.exit(0) (and import sys) for clearer, standard command-line behavior.

github-actions · 2026-03-30T16:59:45Z

Delphi Coverage Report

File	Stmts	Miss	Cover
init.py	2	0	100%
benchmarks/bench_pca.py	76	76	0%
benchmarks/bench_repness.py	81	81	0%
benchmarks/bench_update_votes.py	38	38	0%
benchmarks/benchmark_utils.py	34	34	0%
components/init.py	1	0	100%
components/config.py	165	133	19%
conversation/init.py	2	0	100%
conversation/conversation.py	1118	336	70%
conversation/manager.py	131	42	68%
database/init.py	1	0	100%
database/dynamodb.py	387	233	40%
database/postgres.py	305	205	33%
pca_kmeans_rep/init.py	5	0	100%
pca_kmeans_rep/clusters.py	257	22	91%
pca_kmeans_rep/corr.py	98	17	83%
pca_kmeans_rep/pca.py	52	16	69%
pca_kmeans_rep/repness.py	361	48	87%
pca_kmeans_rep/stats.py	107	22	79%
regression/init.py	4	0	100%
regression/clojure_comparer.py	188	17	91%
regression/comparer.py	887	720	19%
regression/datasets.py	103	22	79%
regression/recorder.py	36	27	25%
regression/utils.py	137	118	14%
run_math_pipeline.py	260	114	56%
umap_narrative/500_generate_embedding_umap_cluster.py	210	109	48%
umap_narrative/501_calculate_comment_extremity.py	112	54	52%
umap_narrative/502_calculate_priorities.py	135	135	0%
umap_narrative/700_datamapplot_for_layer.py	502	502	0%
umap_narrative/701_static_datamapplot_for_layer.py	310	310	0%
umap_narrative/702_consensus_divisive_datamapplot.py	432	432	0%
umap_narrative/801_narrative_report_batch.py	785	785	0%
umap_narrative/802_process_batch_results.py	265	265	0%
umap_narrative/803_check_batch_status.py	175	175	0%
umap_narrative/llm_factory_constructor/init.py	2	2	0%
umap_narrative/llm_factory_constructor/model_provider.py	157	157	0%
umap_narrative/polismath_commentgraph/init.py	1	0	100%
umap_narrative/polismath_commentgraph/cli.py	270	270	0%
umap_narrative/polismath_commentgraph/core/init.py	3	3	0%
umap_narrative/polismath_commentgraph/core/clustering.py	108	108	0%
umap_narrative/polismath_commentgraph/core/embedding.py	104	104	0%
umap_narrative/polismath_commentgraph/lambda_handler.py	219	219	0%
umap_narrative/polismath_commentgraph/schemas/init.py	2	0	100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py	160	9	94%
umap_narrative/polismath_commentgraph/tests/conftest.py	17	17	0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py	74	74	0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py	55	55	0%
umap_narrative/polismath_commentgraph/tests/test_storage.py	87	87	0%
umap_narrative/polismath_commentgraph/utils/init.py	3	0	100%
umap_narrative/polismath_commentgraph/utils/converter.py	283	237	16%
umap_narrative/polismath_commentgraph/utils/group_data.py	354	336	5%
umap_narrative/polismath_commentgraph/utils/storage.py	584	477	18%
umap_narrative/reset_conversation.py	159	50	69%
umap_narrative/run_pipeline.py	453	312	31%
utils/general.py	62	41	34%
Total	10919	7646	30%

jucor · 2026-03-30T22:54:25Z

Superseded by spr-managed PR stack. See the new stack starting at #2508.

jucor changed the title ~~[Stack 5/25] Add SKIP_GOLDEN env var to disable golden snapshot tests~~ [Stack 5/26] Add SKIP_GOLDEN env var to disable golden snapshot tests Mar 30, 2026

jucor mentioned this pull request Mar 30, 2026

[Stack 7/27] Cold-start Clojure math blob generation and cluster visualization #2481

Merged

2 tasks

jucor requested a review from Copilot March 30, 2026 11:09

Copilot started reviewing on behalf of jucor March 30, 2026 11:10 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

jucor changed the title ~~[Stack 5/26] Add SKIP_GOLDEN env var to disable golden snapshot tests~~ [Stack 5/27] Add SKIP_GOLDEN env var to disable golden snapshot tests Mar 30, 2026

jucor mentioned this pull request Mar 30, 2026

[Stack 7/27] Speed up CI: replace pip with uv pip in Dockerfile (~2x faster installs) #2484

Closed

4 tasks

jucor requested review from ballPointPenguin and Copilot March 30, 2026 12:49

jucor force-pushed the jc/disable-snapshot-tests branch from 856b14a to 509a831 Compare March 30, 2026 12:54

jucor mentioned this pull request Mar 30, 2026

[Stack 5/27] Cold-start Clojure math blob generation and cluster visualization #2485

Merged

2 tasks

jucor changed the base branch from edge to jc/cold-start-tooling March 30, 2026 13:11

jucor changed the title ~~[Stack 5/27] Add SKIP_GOLDEN env var to disable golden snapshot tests~~ [Stack 6/27] Add SKIP_GOLDEN env var to disable golden snapshot tests Mar 30, 2026

Base automatically changed from jc/cold-start-tooling to edge March 30, 2026 13:24

Add SKIP_GOLDEN env var to disable golden snapshot regression tests

ec4c93e

Copilot AI reviewed Mar 30, 2026

View reviewed changes

jucor force-pushed the jc/disable-snapshot-tests branch from 509a831 to ec4c93e Compare March 30, 2026 16:49

jucor mentioned this pull request Mar 30, 2026

IGNORE -- crash from spr #2492

Closed

4 tasks

jucor closed this Mar 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Stack 6/27] Add SKIP_GOLDEN env var to disable golden snapshot tests#2482

[Stack 6/27] Add SKIP_GOLDEN env var to disable golden snapshot tests#2482
jucor wants to merge 1 commit into
edgefrom
jc/disable-snapshot-tests

jucor commented Mar 30, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

Copilot AI Mar 30, 2026

Uh oh!

github-actions Bot commented Mar 30, 2026

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-    if prefer_cold_start and cold_start_blob.exists():
-        math_blob_path = str(cold_start_blob)
-    else:
-        math_blob_path = str(original_blob)
+    if prefer_cold_start:
+        if cold_start_blob.exists():
+            math_blob_path = str(cold_start_blob)
+        elif original_blob.exists():
+            math_blob_path = str(original_blob)
+        else:
+            raise FileNotFoundError(
+                f"No math blob found for dataset {name} ({rid}) in {info.path}"
+            )
+    else:
+        if original_blob.exists():
+            math_blob_path = str(original_blob)
+        elif cold_start_blob.exists():
+            math_blob_path = str(cold_start_blob)
+        else:
+            raise FileNotFoundError(
+                f"No math blob found for dataset {name} ({rid}) in {info.path}"
+            )

		# Build ID to cluster mapping for efficiency
		id_to_cluster = {bc['id']: bc for bc in base_clusters}

-    # Build ID to cluster mapping for efficiency
-    id_to_cluster = {bc['id']: bc for bc in base_clusters}
+    # Cache ID-to-cluster mappings per base_clusters list to avoid
+    # rebuilding the dictionary on every call.
+    cache = getattr(get_base_cluster_positions, "_id_to_cluster_cache", None)
+    if cache is None:
+        cache = {}
+        setattr(get_base_cluster_positions, "_id_to_cluster_cache", cache)
+    base_clusters_key = id(base_clusters)
+    id_to_cluster = cache.get(base_clusters_key)
+    if id_to_cluster is None:
+        id_to_cluster = {bc['id']: bc for bc in base_clusters}
+        cache[base_clusters_key] = id_to_cluster

Conversation

jucor commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Usage

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 30, 2026

Delphi Coverage Report

Uh oh!

jucor commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jucor commented Mar 30, 2026 •

edited

Loading