|
| 1 | +--- |
| 2 | +id: braintrust |
| 3 | +title: Braintrust Integration |
| 4 | +sidebar_label: Braintrust Integration |
| 5 | +toc_max_heading_level: 2 |
| 6 | +keywords: |
| 7 | + - ai |
| 8 | + - agents |
| 9 | + - braintrust |
| 10 | + - observability |
| 11 | + - tracing |
| 12 | + - prompts |
| 13 | +tags: |
| 14 | + - Braintrust |
| 15 | + - Python SDK |
| 16 | + - Temporal SDKs |
| 17 | +description: |
| 18 | + Add LLM observability and prompt management to Python Workflows using the Temporal Python SDK and Braintrust. |
| 19 | +--- |
| 20 | + |
| 21 | +Temporal's integration with [Braintrust](https://braintrust.dev) gives you full observability into your AI agent |
| 22 | +Workflows—tracing every LLM call, managing prompts without code deploys, and tracking costs across models. |
| 23 | + |
| 24 | +When building AI agents with Temporal, you get durable execution: automatic retries, state persistence, and the ability |
| 25 | +to recover from failures mid-workflow. Braintrust adds the observability layer: see exactly what your agents are doing, |
| 26 | +iterate on prompts in a UI, and measure whether changes improve outcomes. |
| 27 | + |
| 28 | +The integration connects these capabilities with minimal code changes. Every Workflow and Activity becomes a span in |
| 29 | +Braintrust, and every LLM call is traced with inputs, outputs, tokens, and latency. |
| 30 | + |
| 31 | +:::info |
| 32 | + |
| 33 | +The Temporal Python SDK integration with Braintrust is currently in |
| 34 | +[Public Preview](/evaluate/development-production-features/release-stages#public-preview). Refer to the |
| 35 | +[Temporal product release stages guide](/evaluate/development-production-features/release-stages) for more information. |
| 36 | + |
| 37 | +::: |
| 38 | + |
| 39 | +All code snippets in this guide are taken from the |
| 40 | +[deep research sample](https://github.com/braintrustdata/braintrust-cookbook/blob/main/examples/TemporalDeepResearch/TemporalDeepResearch.mdx). Refer to the sample for the |
| 41 | +complete code and run it locally. |
| 42 | + |
| 43 | +## Prerequisites |
| 44 | + |
| 45 | +- This guide assumes you are already familiar with Braintrust. If you aren't, refer to the |
| 46 | + [Braintrust documentation](https://www.braintrust.dev/docs) for more details. |
| 47 | +- If you are new to Temporal, we recommend reading [Understanding Temporal](/evaluate/understanding-temporal) or taking |
| 48 | + the [Temporal 101](https://learn.temporal.io/courses/temporal_101/) course. |
| 49 | +- Ensure you have set up your local development environment by following the |
| 50 | + [Set up your local development environment](/develop/python/core-application) guide. When you're done, leave the |
| 51 | + Temporal Development Server running if you want to test your code locally. |
| 52 | + |
| 53 | +## Configure Workers to use Braintrust |
| 54 | + |
| 55 | +Workers execute the code that defines your Workflows and Activities. To trace Workflow and Activity execution in |
| 56 | +Braintrust, add the `BraintrustPlugin` to your Worker. |
| 57 | + |
| 58 | +Follow the steps below to configure your Worker. |
| 59 | + |
| 60 | +1. Install the Braintrust SDK with Temporal support. |
| 61 | + |
| 62 | + ```bash |
| 63 | + uv pip install "braintrust[temporal]" |
| 64 | + ``` |
| 65 | + |
| 66 | +2. Initialize the Braintrust logger before creating your Worker. The logger must be initialized first so that spans are |
| 67 | + properly connected. |
| 68 | + |
| 69 | + ```python |
| 70 | + import os |
| 71 | + from braintrust import init_logger |
| 72 | + |
| 73 | + # Initialize BEFORE creating the Temporal client or worker |
| 74 | + init_logger(project=os.environ.get("BRAINTRUST_PROJECT", "my-project")) |
| 75 | + ``` |
| 76 | + |
| 77 | +3. Add the `BraintrustPlugin` to your Worker. |
| 78 | + |
| 79 | + ```python |
| 80 | + from braintrust.contrib.temporal import BraintrustPlugin |
| 81 | + from temporalio.worker import Worker |
| 82 | + |
| 83 | + worker = Worker( |
| 84 | + client, |
| 85 | + task_queue="my-task-queue", |
| 86 | + workflows=[MyWorkflow], |
| 87 | + activities=[my_activity], |
| 88 | + plugins=[BraintrustPlugin()], # Add this line |
| 89 | + ) |
| 90 | + ``` |
| 91 | + |
| 92 | +4. Add the plugin to your Temporal Client as well. This enables span context propagation, linking client code to the |
| 93 | + Workflows it starts. |
| 94 | + |
| 95 | + ```python |
| 96 | + from temporalio.client import Client |
| 97 | + from braintrust.contrib.temporal import BraintrustPlugin |
| 98 | + |
| 99 | + client = await Client.connect( |
| 100 | + "localhost:7233", |
| 101 | + plugins=[BraintrustPlugin()], |
| 102 | + ) |
| 103 | + ``` |
| 104 | + |
| 105 | +5. Run the Worker. Ensure the Worker process has access to your Braintrust API key via the `BRAINTRUST_API_KEY` |
| 106 | + environment variable. |
| 107 | + |
| 108 | + ```bash |
| 109 | + export BRAINTRUST_API_KEY="your-api-key" |
| 110 | + python worker.py |
| 111 | + ``` |
| 112 | + |
| 113 | + :::tip |
| 114 | + |
| 115 | + You only need to provide API credentials to the Worker process. The client application that starts Workflow |
| 116 | + Executions doesn't need the Braintrust API key. |
| 117 | + |
| 118 | + ::: |
| 119 | + |
| 120 | +## Trace LLM calls with wrap_openai |
| 121 | + |
| 122 | +The simplest way to trace LLM calls is to wrap your OpenAI client. Every call through the wrapped client automatically |
| 123 | +creates a span in Braintrust with inputs, outputs, token counts, and latency. |
| 124 | + |
| 125 | +```python |
| 126 | +from braintrust import wrap_openai |
| 127 | +from openai import AsyncOpenAI |
| 128 | + |
| 129 | +# Wrap the client - all calls are now traced |
| 130 | +# max_retries=0 because Temporal handles retries |
| 131 | +client = wrap_openai(AsyncOpenAI(max_retries=0)) |
| 132 | +``` |
| 133 | + |
| 134 | +Use this client in your Activities: |
| 135 | + |
| 136 | +```python |
| 137 | +from temporalio import activity |
| 138 | + |
| 139 | +@activity.defn |
| 140 | +async def invoke_model(prompt: str) -> str: |
| 141 | + client = wrap_openai(AsyncOpenAI(max_retries=0)) |
| 142 | + |
| 143 | + response = await client.chat.completions.create( |
| 144 | + model="gpt-4o", |
| 145 | + messages=[ |
| 146 | + {"role": "system", "content": "You are a helpful assistant."}, |
| 147 | + {"role": "user", "content": prompt}, |
| 148 | + ], |
| 149 | + ) |
| 150 | + |
| 151 | + return response.choices[0].message.content |
| 152 | +``` |
| 153 | + |
| 154 | +After running a Workflow, you'll see a trace hierarchy in Braintrust: |
| 155 | + |
| 156 | +``` |
| 157 | +my-workflow-request (client span) |
| 158 | +└── temporal.workflow.MyWorkflow |
| 159 | + └── temporal.activity.invoke_model |
| 160 | + └── Chat Completion (gpt-4o) |
| 161 | +``` |
| 162 | + |
| 163 | +## Add custom spans for application context |
| 164 | + |
| 165 | +Add your own spans to capture business-level context like user queries, workflow inputs, and final outputs. |
| 166 | + |
| 167 | +```python |
| 168 | +from braintrust import start_span |
| 169 | + |
| 170 | +async def run_research(query: str): |
| 171 | + with start_span(name="research-request", type="task") as span: |
| 172 | + span.log(input={"query": query}) |
| 173 | + |
| 174 | + result = await client.execute_workflow( |
| 175 | + ResearchWorkflow.run, |
| 176 | + query, |
| 177 | + id=f"research-{uuid.uuid4()}", |
| 178 | + task_queue="research-task-queue", |
| 179 | + ) |
| 180 | + |
| 181 | + span.log(output={"result": result}) |
| 182 | + return result |
| 183 | +``` |
| 184 | + |
| 185 | +## Manage prompts with load_prompt |
| 186 | + |
| 187 | +Braintrust lets you manage prompts in a UI and deploy changes without code deploys. The workflow is: |
| 188 | + |
| 189 | +1. **Develop** prompts in code, see results in Braintrust traces |
| 190 | +2. **Create** a prompt in the Braintrust UI from your best version |
| 191 | +3. **Evaluate** different versions using Braintrust's eval tools |
| 192 | +4. **Deploy** by pointing your code at the Braintrust prompt |
| 193 | +5. **Iterate** in the UI—changes go live without code deploys |
| 194 | + |
| 195 | +To load a prompt from Braintrust in your Activity: |
| 196 | + |
| 197 | +```python |
| 198 | +import braintrust |
| 199 | +from temporalio import activity |
| 200 | + |
| 201 | +@activity.defn |
| 202 | +async def invoke_model(prompt_slug: str, user_input: str) -> str: |
| 203 | + # Load prompt from Braintrust |
| 204 | + prompt = braintrust.load_prompt( |
| 205 | + project=os.environ.get("BRAINTRUST_PROJECT", "my-project"), |
| 206 | + slug=prompt_slug, |
| 207 | + ) |
| 208 | + |
| 209 | + # Build returns the full prompt configuration |
| 210 | + built = prompt.build() |
| 211 | + |
| 212 | + # Extract system message |
| 213 | + system_content = None |
| 214 | + for msg in built.get("messages", []): |
| 215 | + if msg.get("role") == "system": |
| 216 | + system_content = msg["content"] |
| 217 | + break |
| 218 | + |
| 219 | + client = wrap_openai(AsyncOpenAI(max_retries=0)) |
| 220 | + |
| 221 | + response = await client.chat.completions.create( |
| 222 | + model="gpt-4o", |
| 223 | + messages=[ |
| 224 | + {"role": "system", "content": system_content}, |
| 225 | + {"role": "user", "content": user_input}, |
| 226 | + ], |
| 227 | + ) |
| 228 | + |
| 229 | + return response.choices[0].message.content |
| 230 | +``` |
| 231 | + |
| 232 | +:::tip |
| 233 | + |
| 234 | +Provide a fallback prompt in your code for resilience. If Braintrust is unavailable, your Workflow continues with the |
| 235 | +hardcoded prompt. |
| 236 | + |
| 237 | +```python |
| 238 | +DEFAULT_SYSTEM_PROMPT = "You are a helpful assistant." |
| 239 | + |
| 240 | +try: |
| 241 | + prompt = braintrust.load_prompt(project="my-project", slug="my-prompt") |
| 242 | + system_content = extract_system_message(prompt.build()) |
| 243 | +except Exception as e: |
| 244 | + activity.logger.warning(f"Failed to load prompt: {e}. Using fallback.") |
| 245 | + system_content = DEFAULT_SYSTEM_PROMPT |
| 246 | +``` |
| 247 | + |
| 248 | +::: |
| 249 | + |
| 250 | +## Example: Deep Research Agent |
| 251 | + |
| 252 | +The [deep research sample](https://github.com/braintrustdata/braintrust-cookbook/blob/main/examples/TemporalDeepResearch/TemporalDeepResearch.mdx) demonstrates a complete AI |
| 253 | +agent that: |
| 254 | + |
| 255 | +- Plans research strategies |
| 256 | +- Generates search queries |
| 257 | +- Executes web searches in parallel |
| 258 | +- Synthesizes findings into comprehensive reports |
| 259 | + |
| 260 | +The sample shows all integration patterns: wrapped OpenAI client, BraintrustPlugin on Worker and Client, custom spans, |
| 261 | +and prompt management with `load_prompt()`. |
| 262 | + |
| 263 | +To run the sample: |
| 264 | + |
| 265 | +```bash |
| 266 | +# Terminal 1: Start Temporal |
| 267 | +temporal server start-dev |
| 268 | + |
| 269 | +# Terminal 2: Start the worker |
| 270 | +export BRAINTRUST_API_KEY="your-api-key" |
| 271 | +export OPENAI_API_KEY="your-api-key" |
| 272 | +export BRAINTRUST_PROJECT="deep-research" |
| 273 | +uv run python -m worker |
| 274 | + |
| 275 | +# Terminal 3: Run a research query |
| 276 | +uv run python -m start_workflow "What are the latest advances in quantum computing?" |
| 277 | +``` |
0 commit comments