-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
What would you like to happen?
Summary
This PR adds support for Triton Inference Server in Apache Beam’s RunInference transform by implementing a TritonModelHandler class.
What does this PR do?
• Implements TritonModelHandler that extends ModelHandler[str, PredictionResult, Model]
• Enables inference on text data using Triton Inference Server models
• Supports batch processing of text strings through the Beam pipeline
• Handles model loading, initialization, and inference execution with Triton server
Key Features
• Model Loading: Initializes Triton server with configurable model repository and model name
• Batch Inference: Processes sequences of text strings efficiently
• Result Handling: Parses JSON responses from Triton and returns structured PredictionResult objects
• Flexible Configuration: Supports custom inference arguments
Use Case
This handler allows users to leverage Triton Inference Server’s optimized inference capabilities within Apache Beam pipelines, particularly useful for:
• Text classification tasks
• Document processing pipelines
• Real-time and batch ML inference workloads
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner