|
| 1 | +--- |
| 2 | +sidebar_position: 1 |
| 3 | +--- |
| 4 | + |
| 5 | +# Information Schema Functions |
| 6 | + |
| 7 | +The `infoschema` functions make BigQuery [INFORMATION SCHEMA](https://cloud.google.com/bigquery/docs/information-schema-intro) metadata accessible via parameterised table functions, without requiring hard-coding of region, project, dataset or table identifiers. This enables them to be used in programmatic metadata-driven use-cases such as data profiling, monitoring and templated dashboard development. |
| 8 | + |
| 9 | +Functions are available in all Google Cloud regions. Note that in the examples below, the `my_region` placeholder needs to be replaced with a region or multi-region identifier from any [BigQuery region](https://cloud.google.com/bigquery/docs/locations#regions), replacing any dash ('-') with an underscore ('_'). |
| 10 | + |
| 11 | +All columns are retained from the associated `INFORMATION_SCHEMA` view, with an additional `id` added as the first column based on the metadata granularity of the response (i.e. `dataset_id` or `table_id`) for ease of subsequent manipulation and integration. Data types and structures are corrected where necessary, and column names for projects, datasets and tables are standardised to `project_id`, `dataset_name` and `table_name`. |
| 12 | + |
| 13 | +- `dataset_id` = `project_id.dataset_name` |
| 14 | +- `project_id` = `project_id.dataset_name.table_name`. |
| 15 | + |
| 16 | +Additional fields are added in the following scenarios: |
| 17 | + |
| 18 | +- Unix timestamp fields have a `TIMESTAMP` equivalent field added. |
| 19 | +- Bytes values have a GiB field added (divided by $1024^3$). |
| 20 | +- Labels are converted to a JSON object, which can be queried directly using the label key: `SELECT JSON_VALUE(label_column_name, '$label_key') AS label key`. |
| 21 | + |
| 22 | +## Function Reference |
| 23 | + |
| 24 | +### Query Options |
| 25 | +Where functions include the JSON argument `query_options`, the following options can be set: |
| 26 | + |
| 27 | +JSON Path | Data Type | Description |
| 28 | +--- | --- | --- |
| 29 | +`where_clause` | `STRING` | Optional WHERE clause to filter response rows |
| 30 | +`except_columns` | `ARRAY<STRING>` | Column values to exclude from the reponse |
| 31 | + |
| 32 | +For example, the following `query_options` structure would exclude the `project_number` column and filter rows for where the `table_schema` (`dataset_name`) begins with "00": |
| 33 | + |
| 34 | +```sql |
| 35 | +SET query_options = JSON """ |
| 36 | +{ |
| 37 | + "where_clause": "WHERE table_schema LIKE '00%'", |
| 38 | + "except_columns": ["project_number"] |
| 39 | +} |
| 40 | +"""; |
| 41 | +``` |
| 42 | + |
| 43 | +Note that the column _will_ still be present in the output from the table function, however it will contain null values. |
| 44 | + |
| 45 | +### infoschema.datasets |
| 46 | +This is a functional implementation of the [INFORMATION_SCHEMA.SCHEMATA](https://cloud.google.com/bigquery/docs/information-schema-datasets-schemata) view. |
| 47 | + |
| 48 | +Argument | Data Type | Description |
| 49 | +--- | --- | --- |
| 50 | +`region` | `STRING` | The region from which to retrieve dataset-level metadata |
| 51 | + |
| 52 | +```sql |
| 53 | +DECLARE region, datasets_query STRING; |
| 54 | +DECLARE datasets_json JSON; |
| 55 | + |
| 56 | +SET region = "my_region" ; |
| 57 | + |
| 58 | +SET datasets_query = (SELECT `datatovalue-tools.my_region`.datasets_query(region)); |
| 59 | +EXECUTE IMMEDIATE (datasets_query) INTO datasets_json; |
| 60 | +SELECT * FROM `datatovalue-tools.my_region`.datasets(datasets_json); |
| 61 | +``` |
| 62 | + |
| 63 | +### infoschema.tables |
| 64 | +This is a functional implementation of the [INFORMATION_SCHEMA.TABLES](https://cloud.google.com/bigquery/docs/information-schema-tables) view. |
| 65 | + |
| 66 | +Argument | Data Type | Description |
| 67 | +--- | --- | --- |
| 68 | +`dataset_ids` | `ARRAY<STRING>` | The datasets from which to retrieve table-level metadata. |
| 69 | + |
| 70 | +```sql |
| 71 | +DECLARE dataset_ids ARRAY<STRING>; |
| 72 | +DECLARE tables_query STRING; |
| 73 | +DECLARE tables_json JSON; |
| 74 | + |
| 75 | +SET dataset_ids = ["dataset_id_a", "dataset_id_b", ... ]; |
| 76 | + |
| 77 | +SET tables_query = (SELECT `datatovalue-tools.my_region`.tables_query(dataset_ids)); |
| 78 | +EXECUTE IMMEDIATE (tables_query) INTO tables_json; |
| 79 | +SELECT * FROM `datatovalue-tools.my_region`.tables(tables_json); |
| 80 | +``` |
| 81 | + |
| 82 | +### infoschema.table_metadata |
| 83 | +This is a functional implementation of the BigQuery `__TABLES__` metadata view. |
| 84 | + |
| 85 | +Argument | Data Type | Description |
| 86 | +--- | --- | --- |
| 87 | +`dataset_ids` | `ARRAY<STRING>` | The datasets from which to retrieve table-level metadata. |
| 88 | + |
| 89 | +```sql |
| 90 | +DECLARE dataset_ids ARRAY<STRING>; |
| 91 | +DECLARE table_metadata_query STRING; |
| 92 | +DECLARE table_metadata_json JSON; |
| 93 | + |
| 94 | +SET dataset_ids = ["dataset_id_a", "dataset_id_b", ... ]; |
| 95 | + |
| 96 | +SET table_metadata_query = (SELECT `datatovalue-tools.my_region`.table_metadata_query(dataset_ids)); |
| 97 | +EXECUTE IMMEDIATE (table_metadata_query) INTO table_metadata_json; |
| 98 | +SELECT * FROM `datatovalue-tools.my_region`.table_metadata(table_metadata_json); |
| 99 | +``` |
| 100 | + |
| 101 | +### infoschema.table_options |
| 102 | +This is a functional implementation of the [INFORMATION_SCHEMA.TABLE_OPTIONS](https://cloud.google.com/bigquery/docs/information-schema-table-options) view. The reponse is restructured from key-value pairs into columns for ease of subsequent manipulation. |
| 103 | + |
| 104 | +Argument | Data Type | Description |
| 105 | +--- | --- | --- |
| 106 | +`dataset_ids` | `ARRAY<STRING>` | The datasets from which to retrieve table-level metadata. |
| 107 | + |
| 108 | +```sql |
| 109 | +DECLARE dataset_ids ARRAY<STRING>; |
| 110 | +DECLARE table_options_query STRING; |
| 111 | +DECLARE table_options_json JSON; |
| 112 | + |
| 113 | +SET dataset_ids = ["dataset_id_a", "dataset_id_b", ... ]; |
| 114 | + |
| 115 | +SET table_options_query = (SELECT `datatovalue-tools.my_region`.table_options_query(dataset_ids)); |
| 116 | +EXECUTE IMMEDIATE (table_options_query) INTO table_options_json; |
| 117 | +SELECT * FROM `datatovalue-tools.my_region`.table_options(table_options_json); |
| 118 | +``` |
| 119 | + |
| 120 | +### infoschema.table_storage |
| 121 | +This is a functional implementation of the [INFORMATION_SCHEMA.TABLE_STORAGE](https://cloud.google.com/bigquery/docs/information-schema-table-storage) view. |
| 122 | + |
| 123 | +Argument | Data Type | Description |
| 124 | +--- | --- | --- |
| 125 | +`project_id` | `STRING` | The project from which to retrieve table-level storage metadata. |
| 126 | +`query_options` | `JSON` | The region from which to retrieve table-level storage metadata. |
| 127 | + |
| 128 | +```sql |
| 129 | +DECLARE region, project_id, table_storage_query STRING; |
| 130 | +DECLARE table_storage_json JSON; |
| 131 | + |
| 132 | +SET project_id = "my_project_id"; |
| 133 | +SET region = "my_region"; |
| 134 | + |
| 135 | +SET table_storage_query = (SELECT `datatovalue-tools.my_region`.table_storage_query(project_id, region)); |
| 136 | +EXECUTE IMMEDIATE (table_storage_query) INTO table_storage_json; |
| 137 | +SELECT * FROM `datatovalue-tools.my_region`.table_storage(table_storage_json); |
| 138 | +``` |
| 139 | + |
| 140 | +### infoschema.partitions |
| 141 | +This is a functional implementation of the [INFORMATION_SCHEMA.PARTITIONS](https://cloud.google.com/bigquery/docs/information-schema-partitions) view. |
| 142 | + |
| 143 | +Argument | Data Type | Description |
| 144 | +--- | --- | --- |
| 145 | +`table_ids` | `ARRAY<STRING>` | The tables from which to retrieve partition-level metadata. |
| 146 | + |
| 147 | +```sql |
| 148 | +DECLARE table_ids ARRAY<STRING>; |
| 149 | +DECLARE partitions_query STRING; |
| 150 | +DECLARE partitions_json JSON; |
| 151 | + |
| 152 | +SET table_ids = ["table_id_a", "table_id_b", ... ]; |
| 153 | + |
| 154 | +SET partitions_query = (SELECT `datatovalue-tools.my_region`.partitions_query(table_ids)); |
| 155 | +EXECUTE IMMEDIATE (partitions_query) INTO partitions_json; |
| 156 | +SELECT * FROM `datatovalue-tools.my_region`.partitions(partitions_json); |
| 157 | +``` |
| 158 | + |
| 159 | +### infoschema.columns |
| 160 | +This is a functional implementation of the [INFORMATION_SCHEMA.COLUMNS](https://cloud.google.com/bigquery/docs/information-schema-columns) view. |
| 161 | + |
| 162 | +Argument | Data Type | Description |
| 163 | +--- | --- | --- |
| 164 | +`dataset_ids` | `ARRAY<STRING>` | The datasets from which to retrieve column-level metadata. |
| 165 | + |
| 166 | +```sql |
| 167 | +DECLARE dataset_ids ARRAY<STRING>; |
| 168 | +DECLARE columns_query STRING; |
| 169 | +DECLARE columns_json JSON; |
| 170 | + |
| 171 | +SET dataset_ids = ["dataset_id_a", "dataset_id_b", ... ]; |
| 172 | + |
| 173 | +SET columns_query = (SELECT `datatovalue-tools.my_region`.columns_query(dataset_ids)); |
| 174 | +EXECUTE IMMEDIATE (columns_query) INTO columns_json; |
| 175 | +SELECT * FROM `datatovalue-tools.my_region`.columns(columns_json); |
| 176 | +``` |
| 177 | + |
| 178 | +### infoschema.column_field_paths |
| 179 | +This is a functional implementation of the [INFORMATION_SCHEMA.COLUMN_FIELD_PATHS](https://cloud.google.com/bigquery/docs/information-schema-columns) view. |
| 180 | + |
| 181 | +Argument | Data Type | Description |
| 182 | +--- | --- | --- |
| 183 | +`dataset_ids` | `ARRAY<STRING>` | The datasets from which to retrieve column-level metadata. |
| 184 | + |
| 185 | +```sql |
| 186 | +DECLARE dataset_ids ARRAY<STRING>; |
| 187 | +DECLARE column_field_paths_query STRING; |
| 188 | +DECLARE column_field_paths_json JSON; |
| 189 | + |
| 190 | +SET dataset_ids = ["dataset_id_a", "dataset_id_b", ... ]; |
| 191 | + |
| 192 | +SET column_field_paths_query = (SELECT `datatovalue-tools.my_region`.column_field_paths_query(dataset_ids)); |
| 193 | +EXECUTE IMMEDIATE (column_field_paths_query) INTO column_field_paths_json; |
| 194 | +SELECT * FROM `datatovalue-tools.my_region`.column_field_paths(column_field_paths_json); |
| 195 | +``` |
| 196 | + |
| 197 | + |
| 198 | + |
0 commit comments