Skip to content

[Stack 6/27] Add SKIP_GOLDEN env var to disable golden snapshot tests#2482

Closed
jucor wants to merge 1 commit into
edgefrom
jc/disable-snapshot-tests
Closed

[Stack 6/27] Add SKIP_GOLDEN env var to disable golden snapshot tests#2482
jucor wants to merge 1 commit into
edgefrom
jc/disable-snapshot-tests

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 30, 2026

Summary

Stacked on #2485 (Cold-start Clojure math blob generation and cluster visualization). Please review and merge #2485 first.
Next in stack: #2484 (Speed up CI: replace pip with uv pip in Dockerfile (~2x faster installs))

Add SKIP_GOLDEN=1 environment variable to disable golden snapshot regression tests.

During stacked PR development, golden snapshots become stale as computation changes cascade through the stack. Rather than re-recording snapshots at every rebase (which causes conflict cascades in jj/git), we skip them until the stack is merged into edge.

Changes

  • test_regression.py: Add @_skip_golden decorator to test_conversation_regression and test_conversation_stages_individually — the only two tests that compare against golden snapshots. Other dataset-using tests (Clojure comparison, smoke tests) are unaffected.
  • python-ci.yml: Set SKIP_GOLDEN=1 in CI so the stacked PRs don't fail on stale snapshots.

Usage

SKIP_GOLDEN=1 pytest tests/          # skip golden snapshot tests
pytest tests/                         # run everything (default)

Test plan

  • SKIP_GOLDEN=1 pytest tests/test_regression.py -v: 4 skipped, 5 passed
  • pytest tests/test_regression.py -v: all 9 collected (golden tests run normally)

@jucor jucor changed the title [Stack 5/25] Add SKIP_GOLDEN env var to disable golden snapshot tests [Stack 5/26] Add SKIP_GOLDEN env var to disable golden snapshot tests Mar 30, 2026
@jucor jucor requested a review from Copilot March 30, 2026 11:09
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-out switch for golden snapshot regression tests to make stacked PR development smoother when golden snapshots become stale across rebases.

Changes:

  • Introduces a SKIP_GOLDEN=1 env var gate via a shared @_skip_golden marker for the two golden snapshot tests.
  • Sets SKIP_GOLDEN=1 in the Python CI workflow run command to skip golden snapshot tests in CI.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
delphi/tests/test_regression.py Adds a skipif marker controlled by SKIP_GOLDEN and applies it to the two golden snapshot tests.
.github/workflows/python-ci.yml Exports SKIP_GOLDEN=1 into the container test run environment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 93 to 97
-e POSTGRES_HOST=postgres \
-e POSTGRES_PASSWORD=PdwPNS2mDN73Vfbc \
-e POSTGRES_DB=polis-test \
-e SKIP_GOLDEN=1 \
delphi \
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SKIP_GOLDEN=1 is being set unconditionally for the workflow, including push runs to edge/stable. That means golden snapshot regression coverage will effectively be disabled on the mainline branches too. Consider scoping this env var to only pull_request events / stacked-PR branches, or add a separate job (e.g., scheduled or push-to-edge) that runs without SKIP_GOLDEN so golden regressions are still caught in CI.

Copilot uses AI. Check for mistakes.
@jucor jucor changed the title [Stack 5/26] Add SKIP_GOLDEN env var to disable golden snapshot tests [Stack 5/27] Add SKIP_GOLDEN env var to disable golden snapshot tests Mar 30, 2026
@jucor jucor force-pushed the jc/disable-snapshot-tests branch from 856b14a to 509a831 Compare March 30, 2026 12:54
@jucor jucor changed the base branch from edge to jc/cold-start-tooling March 30, 2026 13:11
@jucor jucor changed the title [Stack 5/27] Add SKIP_GOLDEN env var to disable golden snapshot tests [Stack 6/27] Add SKIP_GOLDEN env var to disable golden snapshot tests Mar 30, 2026
Base automatically changed from jc/cold-start-tooling to edge March 30, 2026 13:24
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 10 changed files in this pull request and generated 7 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1 to +11
#!/usr/bin/env python3
"""
Visualize and compare clustering results between different math blob sources.

This script generates side-by-side and overlay visualizations comparing:
- Comparison A: Golden snapshot (Python) vs Cold-start Clojure blob
- Comparison B: Cold-start Clojure blob vs Regular Clojure blob

The visualizations focus on the final group clustering layer, showing convex hulls
around base cluster centers with comprehensive comparison metrics.
"""
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title/description focus on adding SKIP_GOLDEN to disable golden snapshot tests, but this PR also introduces substantial new functionality (cold-start blob generation, visualization scripts, dataset selection changes, dependency additions, and extensive documentation). Please either (a) update the PR description/title to reflect the broader scope, or (b) split the additional tooling/dataset changes into separate PR(s) to keep the stack reviewable.

Copilot uses AI. Check for mistakes.

def get_dataset_files(name: str) -> Dict[str, str]:
"""Get file paths for a dataset."""
def get_dataset_files(name: str, prefer_cold_start: bool = True) -> Dict[str, str]:
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When prefer_cold_start=False, math_blob_path is forced to the original blob even if the original file does not exist and only the cold-start blob exists. That creates a guaranteed file-not-found path for a dataset that _check_files would still report as has_math_blob=True. Suggested fix: choose the original blob only if it exists; otherwise fall back to the cold-start blob if present (and consider raising a clear error if neither exists).

Copilot uses AI. Check for mistakes.
Comment on lines +181 to +184
if prefer_cold_start and cold_start_blob.exists():
math_blob_path = str(cold_start_blob)
else:
math_blob_path = str(original_blob)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When prefer_cold_start=False, math_blob_path is forced to the original blob even if the original file does not exist and only the cold-start blob exists. That creates a guaranteed file-not-found path for a dataset that _check_files would still report as has_math_blob=True. Suggested fix: choose the original blob only if it exists; otherwise fall back to the cold-start blob if present (and consider raising a clear error if neither exists).

Suggested change
if prefer_cold_start and cold_start_blob.exists():
math_blob_path = str(cold_start_blob)
else:
math_blob_path = str(original_blob)
if prefer_cold_start:
if cold_start_blob.exists():
math_blob_path = str(cold_start_blob)
elif original_blob.exists():
math_blob_path = str(original_blob)
else:
raise FileNotFoundError(
f"No math blob found for dataset {name} ({rid}) in {info.path}"
)
else:
if original_blob.exists():
math_blob_path = str(original_blob)
elif cold_start_blob.exists():
math_blob_path = str(cold_start_blob)
else:
raise FileNotFoundError(
f"No math blob found for dataset {name} ({rid}) in {info.path}"
)

Copilot uses AI. Check for mistakes.
Comment on lines +259 to +262
def get_base_cluster_positions(
group_cluster: Dict,
base_clusters: List[Dict]
) -> np.ndarray:
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id_to_cluster is rebuilt on every call to get_base_cluster_positions, and this function is called once per group in plot_group_clusters_with_hulls. For large datasets this becomes an avoidable O(G*N) cost. Prefer building the {id: cluster} map once (e.g., in plot_group_clusters_with_hulls) and passing it in, or caching it alongside base_clusters.

Copilot uses AI. Check for mistakes.
Comment on lines +275 to +277
# Build ID to cluster mapping for efficiency
id_to_cluster = {bc['id']: bc for bc in base_clusters}

Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

id_to_cluster is rebuilt on every call to get_base_cluster_positions, and this function is called once per group in plot_group_clusters_with_hulls. For large datasets this becomes an avoidable O(G*N) cost. Prefer building the {id: cluster} map once (e.g., in plot_group_clusters_with_hulls) and passing it in, or caching it alongside base_clusters.

Suggested change
# Build ID to cluster mapping for efficiency
id_to_cluster = {bc['id']: bc for bc in base_clusters}
# Cache ID-to-cluster mappings per base_clusters list to avoid
# rebuilding the dictionary on every call.
cache = getattr(get_base_cluster_positions, "_id_to_cluster_cache", None)
if cache is None:
cache = {}
setattr(get_base_cluster_positions, "_id_to_cluster_cache", cache)
base_clusters_key = id(base_clusters)
id_to_cluster = cache.get(base_clusters_key)
if id_to_cluster is None:
id_to_cluster = {bc['id']: bc for bc in base_clusters}
cache[base_clusters_key] = id_to_cluster

Copilot uses AI. Check for mistakes.
Comment on lines +26 to +31
from polismath.regression import (
discover_datasets,
list_available_datasets,
get_dataset_files,
get_dataset_info
)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

discover_datasets is imported but not used anywhere in this file (the script uses list_available_datasets / get_dataset_info instead). Removing the unused import will reduce lint noise and keep dependencies clearer.

Copilot uses AI. Check for mistakes.
Comment on lines +883 to +886
exit(1)
else:
click.echo("\n✓ All datasets visualized successfully!")
exit(0)
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using exit() is intended for interactive sessions and can be less explicit/consistent in scripts. Prefer sys.exit(1) / sys.exit(0) (and import sys) for clearer, standard command-line behavior.

Copilot uses AI. Check for mistakes.
@jucor jucor force-pushed the jc/disable-snapshot-tests branch from 509a831 to ec4c93e Compare March 30, 2026 16:49
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1118 336 70%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 233 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 361 48 87%
pca_kmeans_rep/stats.py 107 22 79%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 103 22 79%
regression/recorder.py 36 27 25%
regression/utils.py 137 118 14%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 54 52%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 477 18%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10919 7646 30%

@jucor jucor mentioned this pull request Mar 30, 2026
4 tasks
@jucor
Copy link
Copy Markdown
Collaborator Author

jucor commented Mar 30, 2026

Superseded by spr-managed PR stack. See the new stack starting at #2508.

@jucor jucor closed this Mar 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants