fix: Prevent result handler credentials from being stored in config files #1609

nj1973 · 2025-10-13T14:52:11Z

Description of changes

Use connection names in result handler config instead of the actual connection content. This prevents credentials from being written to config files. It's also a cleaner way to do it.

Issues to be closed

Closes #1608

Checklist

I have followed the CONTRIBUTING Guide.
I have commented my code, particularly in hard-to-understand areas
I have updated any relevant documentation to reflect my changes, if applicable
I have added unit and/or integration tests relevant to my change as needed
I have already checked locally that all unit tests and linting are passing (use the tests/local_check.sh script)
I have manually executed end-to-end testing (E2E) with the affected databases/engines

…iles

…files

nj1973 · 2025-10-14T08:51:11Z

/gcbrun

sundar-mudupalli-work · 2025-10-15T23:33:16Z

Hi,

I am not clear how to define and use the Postgres results table. Can you provide the DDL ?

Here is what I tried and the error I got

data-validation validate row -tc postgres -sc bq -tbls=data_validator_issues.issue1610=public.dvt_core_types -pk id -hash='col_datetime' -rh postgres.public.results
...
File "/home/user/professional-services-data-validator/env/lib/python3.12/site-packages/data_validation/result_handlers/postgres.py", line 162, in _insert_postgres
    _ = clients.get_ibis_table(self._client, schema_name, table_name)
...
__
    raise NotImplementedError(
NotImplementedError: Could not find signature for to_sqla_type: <PGDialect_psycopg2, NoneType>

Here is the results table definition and maybe that is not correct

CREATE TYPE PUBLIC.err_record AS ( code bigint, message text, details text );

CREATE TYPE PUBLIC.label_rec AS ( KEY text, VALUE text );

CREATE TABLE PUBLIC.results
  (
     run_id             TEXT,
     validation_name    TEXT,
     validation_type    TEXT,
     start_time         TIMESTAMP,
     end_time           TIMESTAMP,
     source_table_name  TEXT,
     target_table_name  TEXT,
     source_column_name TEXT,
     target_column_name TEXT,
     aggregation_type   TEXT,
     group_by_columns   TEXT,
     primary_keys       TEXT,
     num_random_rows    BIGINT,
     source_agg_value   TEXT,
     target_agg_value   TEXT,
     difference         DOUBLE PRECISION,
     pct_difference     DOUBLE PRECISION,
     pct_threshold      DOUBLE PRECISION,
     validation_status  TEXT,
     labels             LABEL_REC[],
     configuration_json TEXT,
     error_result       ERR_RECORD
  );

nj1973 · 2025-10-16T07:15:33Z

You don't need to pre-create the table. The docs ask you just create a schema.

https://github.com/GoogleCloudPlatform/professional-services-data-validator/blob/develop/docs/installation.md#postgresql-1

I tested it and it works for me.

postgres=# \d pso_data_validator_results.results
Did not find any relation named "pso_data_validator_results.results".

data-validation validate row -sc ora_local -tc pg_local -tbls=pso_data_validator.dvt_core_types  --hash="*" -rh=pg_rh.pso_data_validator_results.results --labels=tag1=neil-dvt1

The table exists and holds the results:

postgres=# \d pso_data_validator_results.results
                   Table "pso_data_validator_results.results"
       Column       |           Type            | Collation | Nullable | Default
--------------------+---------------------------+-----------+----------+---------
 validation_name    | character varying(3000)   |           | not null |
 validation_type    | character varying(3000)   |           | not null |
 aggregation_type   | character varying(3000)   |           |          |
 source_table_name  | character varying(3000)   |           |          |
 source_column_name | character varying(3000)   |           |          |
 source_agg_value   | character varying(3000)   |           |          |
 target_table_name  | character varying(3000)   |           |          |
 target_column_name | character varying(3000)   |           |          |
 target_agg_value   | character varying(3000)   |           |          |
 group_by_columns   | character varying(3000)   |           |          |
 primary_keys       | character varying(3000)   |           |          |
 num_random_rows    | bigint                    |           |          |
 difference         | double precision          |           |          |
 pct_difference     | double precision          |           |          |
 pct_threshold      | double precision          |           |          |
 validation_status  | character varying(3000)   |           | not null |
 run_id             | character varying(3000)   |           | not null |
 labels             | character varying(3000)[] |           |          |
 start_time         | timestamp with time zone  |           | not null |
 end_time           | timestamp with time zone  |           | not null |

postgres=# select validation_name,labels from pso_data_validator_results.results;
 validation_name |       labels
-----------------+--------------------
 hash__all       | {'tag1=neil-dvt1'}
 hash__all       | {'tag1=neil-dvt1'}
 hash__all       | {'tag1=neil-dvt1'}

sundar-mudupalli-work · 2025-10-17T00:50:49Z

Neil,

Yes, it does work. However we need some doc change(s):

The validation reports section should clearly mention that for the Postgres result handler if the table does not exist the account must have CREATE TABLE privileges in order to create the table and write the results. The docs already say that you need to create a table for Big Query and provide the schema for the results table. We should also provide the schema if the customer wants to create the table before using DVT.
I am not clear on the right schema. This line says the schema has a column called error_result. I don't see that in the Postgres table. Does this json schema need to be updated ?
Part of the problem is that we are using two different mechanisms to write - one for Postgres and one for BigQuery. BigQuery has a limitation on the number of mutations per table per day. For customers running large number of Cloud Run jobs using pandas_gbq.to_gbq which is equivalent to <dataframe>.to_sql will not work.

Given there is a customer waiting and it is only doc changes, I have approved the PR.

Thank you for the excellent work.

Sundar Mudupalli

sundar-mudupalli-work

Please change the documentation as requested.

nj1973 · 2025-10-17T08:22:08Z

Please change the documentation as requested.

For item 1 in your list I have added a link to the existing result handler setup documentation from the validation report section instead of duplicating the setup notes.

Item 2: I believe the error_result field is part of the BigQuery streaming and not applicable for a PostgreSQL table.

Item 3: Yes it is unfortunate that we can't reliably use load jobs for BigQuery (which would be a big performance improvement - #1572). I've added a small note to the installation doc.

nj1973 added 2 commits October 13, 2025 13:22

fix: Prevent result handler credentials from being stored in config f…

f9bda6a

…iles

tests: Fix result handler tests

83f3ce5

nj1973 linked an issue Oct 13, 2025 that may be closed by this pull request

Specifying CONNECTION_NAME.DATASET.TABLE for result handler results in parameters including passwords in config files #1608

Closed

pull-request-size bot added the size/L label Oct 13, 2025

nj1973 added 2 commits October 14, 2025 08:28

chore: Fix type hints

35f70ea

Merge branch 'develop' into 1608-result-handler-parameters-in-config-…

169aa3c

…files

nj1973 marked this pull request as ready for review October 14, 2025 08:51

sundar-mudupalli-work approved these changes Oct 17, 2025

View reviewed changes

nj1973 added 2 commits October 17, 2025 08:08

docs: Add extra result handler pre-req note

f13a65d

docs: Add extra BQ result handler detail to installation doc

24258af

nj1973 merged commit ef3df2e into develop Oct 17, 2025
4 of 5 checks passed

nj1973 deleted the 1608-result-handler-parameters-in-config-files branch October 17, 2025 08:22

release-please bot mentioned this pull request Oct 13, 2025

chore(develop): release 8.3.0 #1605

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Prevent result handler credentials from being stored in config files #1609

fix: Prevent result handler credentials from being stored in config files #1609

Uh oh!

nj1973 commented Oct 13, 2025 •

edited

Loading

Uh oh!

nj1973 commented Oct 14, 2025

Uh oh!

sundar-mudupalli-work commented Oct 15, 2025

Uh oh!

nj1973 commented Oct 16, 2025

Uh oh!

sundar-mudupalli-work commented Oct 17, 2025 •

edited

Loading

Uh oh!

sundar-mudupalli-work left a comment

Uh oh!

nj1973 commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Prevent result handler credentials from being stored in config files #1609

fix: Prevent result handler credentials from being stored in config files #1609

Uh oh!

Conversation

nj1973 commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of changes

Issues to be closed

Checklist

Uh oh!

nj1973 commented Oct 14, 2025

Uh oh!

sundar-mudupalli-work commented Oct 15, 2025

Uh oh!

nj1973 commented Oct 16, 2025

Uh oh!

sundar-mudupalli-work commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sundar-mudupalli-work left a comment

Choose a reason for hiding this comment

Uh oh!

nj1973 commented Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nj1973 commented Oct 13, 2025 •

edited

Loading

sundar-mudupalli-work commented Oct 17, 2025 •

edited

Loading