Skip to content

Per-discrepancy test infrastructure#2512

Open
jucor wants to merge 1 commit into
spr/edge/d2f65026from
spr/edge/bdc830db
Open

Per-discrepancy test infrastructure#2512
jucor wants to merge 1 commit into
spr/edge/d2f65026from
spr/edge/bdc830db

Conversation

@jucor
Copy link
Copy Markdown
Collaborator

@jucor jucor commented Mar 30, 2026

Summary

Per-discrepancy test infrastructure for TDD fixing of Python-Clojure differences.

Changes

  • Add per-discrepancy test markers and parametrized test infrastructure
  • Cold-start recorder: coordinate parallel runs with marker file, auto-pause math workers
  • Update journal with xpassed test breakdown across all datasets
  • Address Copilot review: remove unused import, fix script issues
  • Add naming convention documentation

Test plan

  • 223 passed, 4 skipped, 22 xfailed, 7 xpassed, 0 failures
    🤖 Generated with Claude Code

Squashed commits

  • Address Copilot review feedback and fix pre-existing test failures
  • Add per-discrepancy test infrastructure and fix journal
  • Fix cold-start recorder: auto-pause math workers and clean up containers
  • Cold-start recorder: coordinate parallel runs with marker file
  • Address Copilot review: remove unused import, fix script issues
  • Update journal: xpassed test breakdown with all 7 datasets
  • Add naming convention
  • Test both incremental and cold-start Clojure blobs
  • Add repness blob comparison tests and fix tid type mismatches
  • Address Copilot review on PR [Stack 9/27] Per-discrepancy test infrastructure #2420: fix stale references and clean up imports

commit-id:bdc830db


Stack:


⚠️ Part of a stack created by spr. Do not merge manually using the UI - doing so may have unexpected results.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds per-discrepancy (Python vs Clojure) test scaffolding and supporting utilities to enable TDD-driven parity fixes, including dataset+blob variant parametrization and cold-start blob generation improvements.

Changes:

  • Introduces test_discrepancy_fixes.py with per-discrepancy, dataset+blob-parametrized tests (mostly xfail-targeted for staged parity work).
  • Adds blob-variant discovery (incremental vs cold_start) and updates legacy regression/comparison tests to run against specific blob variants.
  • Improves cold-start blob generation script coordination (parallel runs, container cleanup) and adjusts pytest execution defaults to better leverage new caching.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
delphi/tests/test_legacy_repness_comparison.py Update legacy repness comparison tests to use composite dataset+blob IDs and load the matching Clojure blob.
delphi/tests/test_legacy_clojure_regression.py Parametrize legacy Clojure regression tests over dataset+blob variants and reuse cached Conversations.
delphi/tests/test_discrepancy_fixes.py New per-discrepancy parity test suite with dataset+blob parametrization and helper utilities.
delphi/tests/conftest.py Adds session-scoped Conversation cache, blob-aware dataset parametrization, and dataset-grouped test reordering.
delphi/scripts/generate_cold_start_clojure.py Adds coordination for parallel cold-start runs (marker file + pause/unpause logic), plus improved DB-copy mechanics and container cleanup.
delphi/scripts/clojure_comparer.py Switch votes loader to non-test utility and update messaging.
delphi/pyproject.toml Default pytest opts to sequential execution to exploit the shared Conversation cache.
delphi/polismath/regression/datasets.py Adds blob_type selection to get_dataset_files() and introduces get_blob_variants().
delphi/polismath/regression/init.py Exports get_blob_variants.
delphi/docs/PLAN_DISCREPANCY_FIXES.md Documents PR naming convention.
delphi/docs/CLJ-PARITY-FIXES-JOURNAL.md Adds the parity-fix work journal.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +90 to +94
subprocess.run(['docker', 'pause', container], capture_output=True)

# Create marker so the last coldstart run knows to unpause
PAUSE_MARKER.touch()
click.echo(f" ✓ Paused {len(workers)} container(s)")
Comment thread delphi/tests/conftest.py
'comments': comments,
}

return deepcopy(_SESSION_CONV_CACHE[dataset_name])
Comment on lines +31 to +35
import numpy as np
import pytest
import pytest_check as check

from polismath.conversation.conversation import Conversation
Comment thread delphi/tests/test_discrepancy_fixes.py Outdated
## Summary


Per-discrepancy test infrastructure for TDD fixing of Python-Clojure differences.

### Changes

- Add per-discrepancy test markers and parametrized test infrastructure
- Cold-start recorder: coordinate parallel runs with marker file, auto-pause math workers
- Update journal with xpassed test breakdown across all datasets
- Address Copilot review: remove unused import, fix script issues
- Add naming convention documentation

## Test plan

- [x] 223 passed, 4 skipped, 22 xfailed, 7 xpassed, 0 failures
🤖 Generated with [Claude Code](https://claude.com/claude-code)


## Squashed commits

- Address Copilot review feedback and fix pre-existing test failures
- Add per-discrepancy test infrastructure and fix journal
- Fix cold-start recorder: auto-pause math workers and clean up containers
- Cold-start recorder: coordinate parallel runs with marker file
- Address Copilot review: remove unused import, fix script issues
- Update journal: xpassed test breakdown with all 7 datasets
- Add naming convention
- Test both incremental and cold-start Clojure blobs
- Add repness blob comparison tests and fix tid type mismatches
- Address Copilot review on PR #2420: fix stale references and clean up imports

commit-id:bdc830db
@jucor jucor changed the title [Stack 5/17] Per-discrepancy test infrastructure Per-discrepancy test infrastructure May 19, 2026
@jucor jucor force-pushed the spr/edge/d2f65026 branch from 53d3dce to c964a2c Compare May 19, 2026 22:09
@jucor jucor force-pushed the spr/edge/bdc830db branch from d06ba7f to 3a73a89 Compare May 19, 2026 22:09
@github-actions
Copy link
Copy Markdown

Delphi Coverage Report

File Stmts Miss Cover
init.py 2 0 100%
benchmarks/bench_pca.py 76 76 0%
benchmarks/bench_repness.py 81 81 0%
benchmarks/bench_update_votes.py 38 38 0%
benchmarks/benchmark_utils.py 34 34 0%
components/init.py 1 0 100%
components/config.py 165 133 19%
conversation/init.py 2 0 100%
conversation/conversation.py 1118 336 70%
conversation/manager.py 131 42 68%
database/init.py 1 0 100%
database/dynamodb.py 387 233 40%
database/postgres.py 305 205 33%
pca_kmeans_rep/init.py 5 0 100%
pca_kmeans_rep/clusters.py 257 22 91%
pca_kmeans_rep/corr.py 98 17 83%
pca_kmeans_rep/pca.py 52 16 69%
pca_kmeans_rep/repness.py 361 48 87%
pca_kmeans_rep/stats.py 107 22 79%
regression/init.py 4 0 100%
regression/clojure_comparer.py 188 17 91%
regression/comparer.py 887 720 19%
regression/datasets.py 135 27 80%
regression/recorder.py 36 27 25%
regression/utils.py 137 118 14%
run_math_pipeline.py 260 114 56%
umap_narrative/500_generate_embedding_umap_cluster.py 210 109 48%
umap_narrative/501_calculate_comment_extremity.py 112 54 52%
umap_narrative/502_calculate_priorities.py 135 135 0%
umap_narrative/700_datamapplot_for_layer.py 502 502 0%
umap_narrative/701_static_datamapplot_for_layer.py 310 310 0%
umap_narrative/702_consensus_divisive_datamapplot.py 432 432 0%
umap_narrative/801_narrative_report_batch.py 785 785 0%
umap_narrative/802_process_batch_results.py 265 265 0%
umap_narrative/803_check_batch_status.py 175 175 0%
umap_narrative/llm_factory_constructor/init.py 2 2 0%
umap_narrative/llm_factory_constructor/model_provider.py 157 157 0%
umap_narrative/polismath_commentgraph/init.py 1 0 100%
umap_narrative/polismath_commentgraph/cli.py 270 270 0%
umap_narrative/polismath_commentgraph/core/init.py 3 3 0%
umap_narrative/polismath_commentgraph/core/clustering.py 108 108 0%
umap_narrative/polismath_commentgraph/core/embedding.py 104 104 0%
umap_narrative/polismath_commentgraph/lambda_handler.py 219 219 0%
umap_narrative/polismath_commentgraph/schemas/init.py 2 0 100%
umap_narrative/polismath_commentgraph/schemas/dynamo_models.py 160 9 94%
umap_narrative/polismath_commentgraph/tests/conftest.py 17 17 0%
umap_narrative/polismath_commentgraph/tests/test_clustering.py 74 74 0%
umap_narrative/polismath_commentgraph/tests/test_embedding.py 55 55 0%
umap_narrative/polismath_commentgraph/tests/test_storage.py 87 87 0%
umap_narrative/polismath_commentgraph/utils/init.py 3 0 100%
umap_narrative/polismath_commentgraph/utils/converter.py 283 237 16%
umap_narrative/polismath_commentgraph/utils/group_data.py 354 336 5%
umap_narrative/polismath_commentgraph/utils/storage.py 584 477 18%
umap_narrative/reset_conversation.py 159 50 69%
umap_narrative/run_pipeline.py 453 312 31%
utils/general.py 62 41 34%
Total 10951 7651 30%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants