Skip to content

Add Parquet export support to OtelTracesSqlEngine#43

Merged
AstraBert merged 4 commits into
run-llama:mainfrom
nishanthp:add_support_for_parquet
Mar 2, 2026
Merged

Add Parquet export support to OtelTracesSqlEngine#43
AstraBert merged 4 commits into
run-llama:mainfrom
nishanthp:add_support_for_parquet

Conversation

@nishanthp
Copy link
Copy Markdown
Contributor

@nishanthp nishanthp commented Feb 8, 2026

Summary

Adds to_parquet() method to OtelTracesSqlEngine for exporting trace data in Apache Parquet format, enabling efficient storage and high-performance analytics.

Motivation

Currently, trace data can only be exported to SQL databases or kept in memory as pandas DataFrames. For long-term storage, archival, and sharing trace datasets, a performant columnar file format is needed.

Changes

  • Added to_parquet() method with:
    • Configurable compression algorithms (snappy, gzip, brotli, lz4, zstd)
    • Optional partitioning support for efficient filtering
    • Automatic date column extraction for time-based partitioning
  • Comprehensive test suite with test cases covering:
    • Basic export functionality
    • Partitioning by service and date
    • Data integrity validation
    • Edge cases (empty dataframes, file size efficiency)

Copy link
Copy Markdown
Member

@AstraBert AstraBert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change is ok, but one thing that is not super clear to me is the usefulness: the to_parquet method is not used within the Streamlit application: I imagined that you wanted to use it to download the observability data, but in this way it's just an additional method with no direct value whatsoever for the user

Comment thread src/notebookllama/instrumentation.py Outdated
Comment on lines +168 to +170
# Add date column for partitioning if needed
if partition_cols and "date" in partition_cols:
df["date"] = pd.to_datetime(df["start_time"], unit="us").dt.date
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is adding the "date" column needed? Can't we just convert the start_time one to datetime?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus, there is no validation of the partition columns, meaning that they could include also columns that are not in the dataframe

@@ -0,0 +1,32 @@
import pandas as pd
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says there are 10+ test cases, but here I only see one: is there another test file you did not commit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it will be in the next patch.

@nishanthp
Copy link
Copy Markdown
Contributor Author

The change is ok, but one thing that is not super clear to me is the usefulness: the to_parquet method is not used within the Streamlit application: I imagined that you wanted to use it to download the observability data, but in this way it's just an additional method with no direct value whatsoever for the user

I wanted to get your opinion on the approach before I could add it to the Streamlit application.

If the overall approach looks good, I will add the rest of the changes in the next patch.

@AstraBert
Copy link
Copy Markdown
Member

The approach looks good, feel free to add it to Streamlit

@nishanthp
Copy link
Copy Markdown
Contributor Author

Local Testing Snapshots
Screenshot 2026-02-16 at 12 35 27 PM

Screenshot 2026-02-16 at 12 36 56 PM Screenshot 2026-02-16 at 12 37 53 PM Screenshot 2026-02-16 at 12 39 48 PM

@nishanthp
Copy link
Copy Markdown
Contributor Author

Screenshot 2026-02-16 at 2 02 33 PM

Ran the tests locally.

@nishanthp
Copy link
Copy Markdown
Contributor Author

@AstraBert Please take another look.

@nishanthp
Copy link
Copy Markdown
Contributor Author

Hi @AstraBert,

Just wanted to follow up on this PR. I addressed the earlier feedback and updated the patch.

Please let me know if there are any additional changes needed. I am happy to make further updates if needed.

Thanks again for the earlier review!

@AstraBert AstraBert merged commit 849e221 into run-llama:main Mar 2, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants