Add Parquet export support to OtelTracesSqlEngine#43
Conversation
AstraBert
left a comment
There was a problem hiding this comment.
The change is ok, but one thing that is not super clear to me is the usefulness: the to_parquet method is not used within the Streamlit application: I imagined that you wanted to use it to download the observability data, but in this way it's just an additional method with no direct value whatsoever for the user
| # Add date column for partitioning if needed | ||
| if partition_cols and "date" in partition_cols: | ||
| df["date"] = pd.to_datetime(df["start_time"], unit="us").dt.date |
There was a problem hiding this comment.
Why is adding the "date" column needed? Can't we just convert the start_time one to datetime?
There was a problem hiding this comment.
Plus, there is no validation of the partition columns, meaning that they could include also columns that are not in the dataframe
| @@ -0,0 +1,32 @@ | |||
| import pandas as pd | |||
There was a problem hiding this comment.
The PR description says there are 10+ test cases, but here I only see one: is there another test file you did not commit?
There was a problem hiding this comment.
Yeah, it will be in the next patch.
I wanted to get your opinion on the approach before I could add it to the Streamlit application. If the overall approach looks good, I will add the rest of the changes in the next patch. |
|
The approach looks good, feel free to add it to Streamlit |
|
@AstraBert Please take another look. |
|
Hi @AstraBert, Just wanted to follow up on this PR. I addressed the earlier feedback and updated the patch. Please let me know if there are any additional changes needed. I am happy to make further updates if needed. Thanks again for the earlier review! |





Summary
Adds
to_parquet()method toOtelTracesSqlEnginefor exporting trace data in Apache Parquet format, enabling efficient storage and high-performance analytics.Motivation
Currently, trace data can only be exported to SQL databases or kept in memory as pandas DataFrames. For long-term storage, archival, and sharing trace datasets, a performant columnar file format is needed.
Changes
to_parquet()method with: