-
| I'm writing my data file with pyarrow, and adding metadata with: schema = pa.Schema.from_pandas(df).with_metadata(
    {"updated": datetime.utcnow().isoformat() + "Z"},
)
table = pa.Table.from_pandas(df, schema=schema)
pq.write_table(table, output)Is there any API for accessing that metadata? If there is, I couldn't find it Edit, here's a python script to create a parquet file, reload it, and print out its schema: from datetime import datetime
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq
d = {'col1': [1, 2]}
df = pd.DataFrame(data=d)
schema = pa.Schema.from_pandas(df).with_metadata(
    {"updated": datetime.utcnow().isoformat() + "Z"},
)
table = pa.Table.from_pandas(df, schema=schema)
pq.write_table(table, "data/test.parquet")
t = pq.read_table("data/test.parquet")
print(t.schema)When I query its metadata via  >>> await query(conn, `SELECT * FROM parquet_metadata('http://devd.io:8000/data/test.parquet')`)
[
  {
    "file_name": "http://devd.io:8000/data/test.parquet",
    "row_group_id": 0,
    "row_group_num_rows": 2,
    "row_group_num_columns": 1,
    "row_group_bytes": 100,
    "column_id": 0,
    "file_offset": 108,
    "num_values": 2,
    "path_in_schema": "col1",
    "type": "INT64",
    "stats_min": "1",
    "stats_max": "2",
    "stats_null_count": 0,
    "stats_distinct_count": null,
    "stats_min_value": "1",
    "stats_max_value": "2",
    "compression": "SNAPPY",
    "encodings": "PLAIN_DICTIONARY, PLAIN, RLE",
    "index_page_offset": 0,
    "dictionary_page_offset": 4,
    "data_page_offset": 36,
    "total_compressed_size": 104,
    "total_uncompressed_size": 100
  }
]OK, it seems like this might want to be a bug filed against duckdb itself? Here's the duckdb shell: Filed as duckdb/duckdb#2534 | 
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
| There's a   | 
Beta Was this translation helpful? Give feedback.
-
| Yes, this is an issue for main duckdb, thanks for posting a summary there, @llimllib . | 
Beta Was this translation helpful? Give feedback.
Yes, this is an issue for main duckdb, thanks for posting a summary there, @llimllib .