Skip to content

Add trans-spec generator (replaces make_yaml)#299

Open
amc-corey-cox wants to merge 6 commits intomainfrom
make-yaml-refactored
Open

Add trans-spec generator (replaces make_yaml)#299
amc-corey-cox wants to merge 6 commits intomainfrom
make-yaml-refactored

Conversation

@amc-corey-cox
Copy link
Copy Markdown
Collaborator

Summary

  • Refactors the YAML generation script from RTIInternational/NHLBI-BDC-DMC-HV into src/dm_bip/trans_spec_gen/
  • Adds dm-bip generate-trans-specs CLI command (Typer) for generating LinkML-Map transformation specification YAML from a metadata CSV
  • Uses the latest upstream Jinja2 template (PR #542) with single-quoted expressions for valid YAML output, conditional visit/age handling
  • Renames the placeholder make_yaml/ module to trans_spec_gen/
  • Includes detailed module README documenting usage, input CSV format, and output structure

Test plan

  • 16 unit tests covering all template branches (unit_match, unit_convert, unit_expr, unit_casestmt), visit/age conditionals, file output, and YAML validity
  • Full test suite passes (58/58)
  • Lint clean
  • CLI tested end-to-end (uv run dm-bip generate-trans-specs --help and with sample data)

Closes #175

Clean rewrite of DMCYAML_07_GenerateYAML_forPy.py as a single
parameterized function. Saved here while the scaffold branch
keeps the original for collaborative walkthrough.
- Pull in updated yaml_measobs.j2 from RTIInternational/NHLBI-BDC-DMC-HV
  PR #542: single-quoted expressions (valid YAML), conditional visit/age
  handling, renamed unit_casestmt_custom field
- Add fillna(0) to match upstream script's NaN handling
- Update sample CSV with new columns (has_visit, has_visit_expr,
  associatedvisit_expr, has_age, unit_casestmt_custom)
- Expand tests from 6 to 16: YAML validity, all 4 unit branches,
  visit/age conditionals, common field mapping
- Rename src/dm_bip/make_yaml/ to src/dm_bip/trans_spec_gen/
- Add `dm-bip generate-trans-specs` Typer subcommand with --input,
  --output, --cohort, --entity, --template options
- Rewrite module README with usage, input CSV format, output structure
- Add trans_spec_gen to top-level README repo structure
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 6, 2026

Codecov Report

❌ Patch coverage is 70.27027% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 66.28%. Comparing base (06313b6) to head (6887289).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/dm_bip/cli.py 0.00% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #299      +/-   ##
==========================================
+ Coverage   65.63%   66.28%   +0.64%     
==========================================
  Files           5        6       +1     
  Lines         227      264      +37     
==========================================
+ Hits          149      175      +26     
- Misses         78       89      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR brings the trans-spec (LinkML-Map transformation spec) YAML generation workflow into dm-bip by adding a dedicated trans_spec_gen module and a dm-bip generate-trans-specs Typer CLI command, backed by a Jinja2 template and unit tests.

Changes:

  • Added dm_bip.trans_spec_gen module (generator, template, and module-level README) to generate per-variable YAML trans-spec files from a metadata CSV.
  • Added dm-bip generate-trans-specs CLI command to run the generator.
  • Added unit tests + synthetic CSV fixture to validate output structure and key template branches; removed the old make_yaml placeholder README.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
src/dm_bip/trans_spec_gen/generate_trans_specs.py Implements CSV→Jinja2→YAML file generation.
src/dm_bip/trans_spec_gen/templates/yaml_measobs.j2 Provides the MeasurementObservation YAML template with unit/visit/age branching.
src/dm_bip/trans_spec_gen/README.md Documents CLI usage, CSV schema, and output layout.
src/dm_bip/cli.py Adds the generate-trans-specs CLI command wiring.
tests/unit/test_generate_trans_specs.py Adds unit tests covering template branches and file output.
tests/input/make_yaml/shortdata_sample.csv Adds a synthetic CSV fixture for tests.
src/dm_bip/make_yaml/README.md Removes the deprecated/placeholder module README.
README.md Updates repo structure listing to include trans_spec_gen/.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/dm_bip/trans_spec_gen/templates/yaml_measobs.j2
Comment thread src/dm_bip/trans_spec_gen/generate_trans_specs.py
Comment thread tests/unit/test_generate_trans_specs.py Outdated
Comment thread tests/unit/test_generate_trans_specs.py
Comment thread tests/unit/test_generate_trans_specs.py Outdated
Comment thread src/dm_bip/trans_spec_gen/templates/yaml_measobs.j2 Outdated
- Fix missing closing paren in uuid5 visit expression template
- Fix missing unit derivation in unit_expr template branch
- Sanitize bdchm_varname with Path.name to prevent path traversal
- Fix stale module docstring in tests
- Tighten unit_expr and visit_expr test assertions
Copy link
Copy Markdown
Collaborator

@Sigfried Sigfried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're not going to include the rest of the Stata pipeline here? When we convert the Stata code to Python, it would probably be good not to have to go back and forth between repos.


## Input CSV format

The metadata CSV must contain these columns:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It currently expects these specific columns because it only generates specs for MeasurementObservation, right? Should we make notes here or anywhere about how this will be changing when we generalize to other domains?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorporate Stata YAML authoring tool

4 participants