Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unified Tracing with LangfuseConnector Across Main and Sub-Pipelines in Haystack #1605

Closed
immortal3 opened this issue Apr 4, 2025 · 8 comments · Fixed by #1624
Closed
Assignees
Labels
feature request Ideas to improve an integration P1

Comments

@immortal3
Copy link

immortal3 commented Apr 4, 2025

Please provide guidance or documentation on how to configure Haystack pipelines with LangfuseConnector to ensure a single, unified trace across both main pipelines and sub-pipelines. If this functionality is not currently supported, I would like to propose it as a feature request to enhance traceability in complex pipeline setups.


Description:
When using Haystack pipelines that include sub-pipelines alongside the LangfuseConnector, it's currently unclear how to maintain a consistent trace context throughout the entire execution flow.

At present, it appears that each pipeline (main and sub-pipelines) may generate separate traces, which makes it difficult to monitor or debug the full journey of a request from the top-level pipeline into its nested components in Langfuse.


Reproducible Example:

Here's a simplified example to demonstrate the issue. We have a main pipeline and two sub-pipelines. A LangfuseConnector is included in the main pipeline for tracing purposes.

from haystack import Pipeline
...

# Sub-pipeline 1: Embedding and Retrieval
@component
class RetrievalPipeline:
    def run():
        pipeline = Pipeline()
        document_store = InMemoryDocumentStore(embedding_similarity_function="cosine")
        text_embedder = SentenceTransformersTextEmbedder()
        retriever = InMemoryEmbeddingRetriever(document_store=document_store)

        pipeline.add_component("text_embedder", text_embedder)
        pipeline.add_component("retriever", retriever)
        pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

        return pipeline.run()

# Sub-pipeline 2: Answer Generation
@component
class GenerationPipeline:
    def run():
        pipeline = Pipeline()
        generator = OpenAIGenerator(api_key="YOUR_API_KEY")  # Replace with actual key or use env var
        pipeline.add_component("generator", generator)

        return pipeline.run()

# Main pipeline
main_pipeline = Pipeline()

langfuse_connector = Langfuse()
retrieval_pipeline = RetrievalPipeline()
generation_pipeline = GenerationPipeline()

main_pipeline.add_component("langfuse", langfuse_connector)
main_pipeline.add_component("retrieval", retrieval_pipeline)
main_pipeline.add_component("generation", generation_pipeline)

main_pipeline.connect("retrieval", "generation")

query = "What is Haystack?"
main_pipeline.invoke()

Expected Behavior:
We would expect a single trace in Langfuse that includes the full execution of the main pipeline along with its sub-pipelines (retrieval and generation steps). This would provide an end-to-end view of the request lifecycle in a unified format.


Additional Notes:
It seems there is a span_handler parameter available in the LangfuseConnector, and we wonder if customizing it could allow propagation of trace context across sub-pipelines. If there are any recommended approaches or workarounds using this, documentation or examples would be greatly appreciated.


Feature Request (if not currently supported):
If unified trace context propagation is not currently supported out of the box, please consider adding this functionality. It would significantly improve the developer experience and observability for users building modular, nested pipelines in Haystack.


@immortal3 immortal3 added the feature request Ideas to improve an integration label Apr 4, 2025
@sjrl
Copy link
Contributor

sjrl commented Apr 4, 2025

Hey @immortal3 thanks for raising!

One request:

Would it be possible for you to try using the OpenTelemetryTracer from haystack.tracing.opentelemetry import OpenTelemetryTracer to see if that works? I'm trying to understand if the workaround shown here also applies in your case. We have run into this issue when trying to capture the traces of haystack pipelines executed from within SuperComponent which is our builtin component that lets you wrap a complete pipeline and use it like a single component.

@immortal3
Copy link
Author

@sjrl I tried OpenTelemetryTracer using langfuse endpoint as mentioned in above thread, but it still does the same thing (multiple traces). And on top of it, It's logging other operation like ES which we don't want since Pricing depends on Events.

To give you more context, we are already on langsmith and for Pipeline, Everything is working quite well, even this above mentioned Sub-pipeline. Traces are coming in single waterfall structure.

Since, We want to optimize latency, we tried switch to AsnycPipeline and tried langfuse since it's officially documented in haystack. So, both of these issues are intertwined #1604

For AsnycPipeline, if we can get langsmith working, It would be great, because from UI wise, langfuse is missing quite a lot. We are directly using @Traceable from langsmith on top of Component.run methods.(https://docs.smith.langchain.com/reference/python/run_helpers/langsmith.run_helpers.traceable)

If you're willing to help with langsmith, Should i create new issue for it ?

@julian-risch julian-risch added the P1 label Apr 4, 2025
@julian-risch
Copy link
Member

(related to deepset-ai/haystack-experimental#217 )

@sjrl
Copy link
Contributor

sjrl commented Apr 8, 2025

Hey @immortal3 I've opened a PR #1624 that contains the fix to this issue. Once that's merged and released then our Langfuse integration properly collects the traces of sub-pipelines under the main one.

@immortal3
Copy link
Author

@sjrl After release, Do we have to use SuperComponent to wrap our pipelines for single trace ? Or Current setup will work ?

@sjrl
Copy link
Contributor

sjrl commented Apr 8, 2025

@immortal3 no your current set up should also work. You can see in the test I made here doesn't use super components, but just custom components that wrap a pipeline.

@sjrl
Copy link
Contributor

sjrl commented Apr 11, 2025

Hey @immortal3 this has been merged now and is available in langfuse-haystack==0.10.1 here

@immortal3
Copy link
Author

Great. Thanks for quickly jumping in and fixing it. 🙌

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Ideas to improve an integration P1
Projects
None yet
3 participants