feat: add MSSQL/Fabric support to data-parity skill#705
feat: add MSSQL/Fabric support to data-parity skill#705suryaiyer95 wants to merge 12 commits intofeat/data-parity-skill-improvementsfrom
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
|
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
5 issues found across 30 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/opencode/src/altimate/native/connections/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/data-diff.ts:266">
P2: Missing single-quote escaping on `partitionValue` in the date-mode branches. The categorical mode (6 lines above) escapes with `.replace(/'/g, "''")`, but none of the date-mode `switch` cases do. Apply the same escaping for consistency and defense in depth.</violation>
<violation number="2" location="packages/opencode/src/altimate/native/connections/data-diff.ts:489">
P2: `partitionColumn` is not identifier-quoted in `buildPartitionDiscoverySQL`, but it is quoted in `buildPartitionWhereClause`. If the column name is a reserved word (e.g. `date`, `order`), the discovery query will produce a syntax error. Quote the column consistently between both functions.</violation>
</file>
<file name="packages/opencode/src/altimate/native/connections/registry.ts">
<violation number="1" location="packages/opencode/src/altimate/native/connections/registry.ts:125">
P2: `fabric` is missing from the `PASSWORD_DRIVERS` set. Since it maps to the same sqlserver driver as `sqlserver`/`mssql` (both present in the set), `fabric` should also be included to get the same friendly error when `password` is not a string.</violation>
</file>
<file name="packages/opencode/src/altimate/tools/data-diff.ts">
<violation number="1" location="packages/opencode/src/altimate/tools/data-diff.ts:206">
P2: When `d.values` is nullish, `d.values?.join(" | ")` evaluates to `undefined`, which the template literal coerces to the string `"undefined"`. The output would read e.g. `[source only] undefined`. Use a fallback to produce a sensible message.</violation>
</file>
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:148">
P1: `flattenRow` flattens all array values, but only the empty-string key (`""`) holds mssql's merged unnamed columns. A legitimate array column value (e.g. from JSON aggregation) will be incorrectly spread, corrupting the row data and misaligning columns. Restrict flattening to the `""` key only.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
Bun's runtime never fires native addon async callbacks, so the async `new duckdb.Database(path, opts, callback)` form would hit the 2-second timeout fallback on every connection attempt. Switch to the synchronous constructor form `new duckdb.Database(path)` / `new duckdb.Database(path, opts)` which throws on error and completes immediately in both Node and bun runtimes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The async callback form with 2s fallback was already working correctly at e3df5a4. The timeout was caused by a missing duckdb .node binary, not a bun incompatibility. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add `warehouseTypeToDialect()` mapping: sqlserver→tsql, mssql→tsql, fabric→fabric, postgresql→postgres, mariadb→mysql. Fixes critical serde mismatch where Rust engine rejects raw warehouse type names. - Update both `resolveDialect()` functions to use the mapping - Add MSSQL/Fabric cases to `dateTruncExpr()` — DATETRUNC(DAY, col) - Add locale-safe date literal casting via CONVERT(DATE, ..., 23) - Register `fabric` in DRIVER_MAP (reuses sqlserver TDS driver) - Add `fabric` normalize aliases in normalize.ts - Add 15 SQL Server driver unit tests (TOP injection, truncation, schema introspection, connection lifecycle, result format) - Add 9 dialect mapping unit tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Support all 7 Azure AD / Entra ID auth types in `sqlserver.ts`: `azure-active-directory-password`, `access-token`, `service-principal-secret`, `msi-vm`, `msi-app-service`, `azure-active-directory-default`, `token-credential` - Force TLS encryption for all Azure AD connections - Dynamic import of `@azure/identity` for `DefaultAzureCredential` - Add normalize aliases for Azure AD config fields (`authentication`, `azure_tenant_id`, `azure_client_id`, `azure_client_secret`, `access_token`) - Add `fabric: SQLSERVER_ALIASES` to DRIVER_ALIASES - Add 10 Azure AD unit tests covering all auth flows, encryption, and `DefaultAzureCredential` with managed identity Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LL.md - Add SQL Server / Fabric schema inspection query in Step 2 - Add "SQL Server and Microsoft Fabric" section with: - Supported configurations table (sqlserver, mssql, fabric) - Fabric connection guide with Azure AD auth types - Algorithm behavior notes (joindiff vs hashdiff selection) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rscore column filter
- **Azure AD auth**: Pass `azure-active-directory-*` types directly to tedious
instead of constructing `DefaultAzureCredential` ourselves. Tedious imports
`@azure/identity` internally and creates credentials — avoids bun CJS/ESM
`isTokenCredential` boundary issue that caused "not an instance of the token
credential class" errors.
- **Auth shorthands**: Map `CLI`, `default`, `password`, `service-principal`,
`msi`, `managed-identity` to their full tedious type names.
- **Column filter**: Remove `_.startsWith("_")` filter from `execute()` result
columns — it stripped legitimate aliases like `_p` used by partition discovery,
causing partitioned diffs to return empty results.
- **Tests**: Remove `@azure/identity` mock (no longer imported by driver),
update auth assertions, add shorthand mapping tests, fix column filter test.
- **Verified**: All 97 driver tests pass. Full data-diff pipeline tested against
real MSSQL server (profile, joindiff, auto, where_clause, partitioned).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…lattening - Upgrade `mssql` from v11 to v12 (`tedious` 18 → 19) - Use explicit `ConnectionPool` instead of global `mssql.connect()` to isolate multiple simultaneous connections - Flatten unnamed column arrays — `mssql` merges unnamed columns (e.g. `SELECT COUNT(*), SUM(...)`) into a single array under the empty-string key; restore positional column values - Proper column name resolution: compare `namedKeys.length` against flattened row length, fall back to synthetic `col_0`, `col_1`, etc. - Update test mock to export `ConnectionPool` class and `createMockPool` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions
Use ternary expressions (`x ? {...} : {}`) instead of short-circuit
(`x && {...}`) to avoid spreading a boolean value.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
403477e to
811c2be
Compare
|
@suryaiyer95 can you address code review comments? |
- P1: restrict `flattenRow` to only spread the empty-string key (`""`) where mssql merges unnamed columns, preserving legitimate array values - P2: escape single quotes in `partitionValue` for date-mode branches in `buildPartitionWhereClause` (categorical mode already escaped) - P2: add `fabric` to `PASSWORD_DRIVERS` set in registry for consistent password validation alongside `sqlserver`/`mssql` - P2: fallback to `"(no values)"` when `d.values` is nullish to prevent template literal coercing `undefined` to the string `"undefined"` Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- sqlserver-unit: 3 tests for unnamed column flattening — verifies only the empty-string key is spread, legitimate named arrays are preserved - driver-normalize: fabric type uses SQLSERVER_ALIASES (server → host, trustServerCertificate → trust_server_certificate) - connections: fabric type is recognized in DRIVER_MAP and listed correctly Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add "Minimum Version Requirements" table to SKILL.md covering SQL Server 2022+, mssql v12, and @azure/identity v4 with rationale for each - Document auth shorthands (CLI, default, password, service-principal, msi) - Move @azure/identity from dependencies to optional peerDependencies so it is NOT installed by default — only required for Azure AD auth - Add runtime check in sqlserver driver: if Azure AD auth type is requested but @azure/identity is missing, throw a clear install instruction error Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…solution - For `azure-active-directory-default` (CLI/default auth), acquire token ourselves instead of delegating to tedious's internal `@azure/identity` - Strategy: try `DefaultAzureCredential` first, fall back to `az` CLI subprocess - Bypasses Bun resolving `@azure/identity` to browser bundle where `DefaultAzureCredential` is a non-functional stub - Also bypasses CJS/ESM `isTokenCredential` boundary mismatch - All 31 driver unit tests pass, verified against real Fabric endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="packages/drivers/src/sqlserver.ts">
<violation number="1" location="packages/drivers/src/sqlserver.ts:90">
P2: Avoid `execSync` in the async connection path; it can block the event loop for up to the CLI timeout.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.
| const { execSync } = await import("node:child_process") | ||
| const json = execSync( | ||
| "az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv", | ||
| { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] }, | ||
| ).trim() | ||
| if (json) token = json |
There was a problem hiding this comment.
P2: Avoid execSync in the async connection path; it can block the event loop for up to the CLI timeout.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/drivers/src/sqlserver.ts, line 90:
<comment>Avoid `execSync` in the async connection path; it can block the event loop for up to the CLI timeout.</comment>
<file context>
@@ -54,24 +54,60 @@ export async function connect(config: ConnectionConfig): Promise<Connector> {
+ // Attempt 2: Azure CLI subprocess (universal fallback)
+ if (!token) {
+ try {
+ const { execSync } = await import("node:child_process")
+ const json = execSync(
+ "az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv",
</file context>
| const { execSync } = await import("node:child_process") | |
| const json = execSync( | |
| "az account get-access-token --resource https://database.windows.net/ --query accessToken -o tsv", | |
| { encoding: "utf-8", timeout: 15000, stdio: ["pipe", "pipe", "pipe"] }, | |
| ).trim() | |
| if (json) token = json | |
| const { execFile } = await import("node:child_process") | |
| const cliToken = await new Promise<string>((resolve, reject) => { | |
| execFile( | |
| "az", | |
| ["account", "get-access-token", "--resource", "https://database.windows.net/", "--query", "accessToken", "-o", "tsv"], | |
| { timeout: 15000 }, | |
| (error, stdout) => { | |
| if (error) return reject(error) | |
| resolve((stdout ?? "").toString().trim()) | |
| }, | |
| ) | |
| }) | |
| if (cliToken) token = cliToken |
Summary
connect/execute/listSchemas/listTables/describeTable/closewith T-SQLTOPinjection,sys.*catalog queries, and row flattening for unnamed columnsdefault,password,access-token,service-principal-secret,msi-vm,msi-app-service) with shorthand aliases (cli,default,password,service-principal,msi)sqlserver/mssql→tsql,fabric→fabricwithDATETRUNC()andCONVERT(DATE, ..., 23)for date partitioningConnectionPoolisolation for concurrent connections, unnamed-column array flattening, synthetic column name fallbackKey files
packages/drivers/src/sqlserver.ts,packages/drivers/test/sqlserver-unit.test.tspackages/opencode/src/altimate/native/connections/data-diff.tspackages/opencode/src/altimate/tools/data-diff.ts.opencode/skills/data-parity/SKILL.mdTest plan
bun test packages/drivers/test/sqlserver-unit.test.ts— 28 tests passbun test packages/opencode/test/altimate/data-diff-dialect.test.ts— 9 tests pass🤖 Generated with Claude Code
Summary by cubic
Adds end-to-end MSSQL and Microsoft Fabric support to the data-parity skill with Azure AD auth, Rust dialect mapping (
tsql/fabric), and an upgradedmssqlv12 driver. Fordefault/CLIAzure AD, the driver now acquires tokens via@azure/identityor falls back to Azure CLI to avoid Bun resolution issues; docs note minimum versions and@azure/identityas optional for Azure AD.New Features
data_diff(auto, joindiff, hashdiff, profile, cascade) with date partitioning viaDATETRUNC()andCONVERT(DATE, ..., 23); mapssqlserver/mssql→tsql,fabric→fabric; improved query detection (keyword + whitespace).default/CLI,password,service-principal,msi), TLS enforced for AAD; fordefault/CLI, get an access token using@azure/identityorazCLI fallback; explicitConnectionPoolisolation; T‑SQLTOPinjection withExecuteOptions.noLimit; robust result shaping for unnamed columns and synthetic column names when needed.fabricto reuse the SQL Server driver; add normalize aliases forauthenticationand Azure fields;mssqlupgraded to v12; SKILL docs include minimum versions and a Fabric connection guide;@azure/identitylisted as an optional peer dependency (only needed for Azure AD).Bug Fixes
noLimitbypasses injectedTOP; stable SQL-vs-table detection.CONVERT(DATE, ..., 23); show "(no values)" when diff rows have no values; addedfabricto password-driver validation.Written for commit e40a24b. Summary will update on new commits.