Skip to content

[Feature Request]: Add Triton Inference Server support for RunInference transform #36368

@SaiShashank12

Description

@SaiShashank12

What would you like to happen?

Summary

This PR adds support for Triton Inference Server in Apache Beam’s RunInference transform by implementing a TritonModelHandler class.
What does this PR do?
• Implements TritonModelHandler that extends ModelHandler[str, PredictionResult, Model]
• Enables inference on text data using Triton Inference Server models
• Supports batch processing of text strings through the Beam pipeline
• Handles model loading, initialization, and inference execution with Triton server
Key Features
• Model Loading: Initializes Triton server with configurable model repository and model name
• Batch Inference: Processes sequences of text strings efficiently
• Result Handling: Parses JSON responses from Triton and returns structured PredictionResult objects
• Flexible Configuration: Supports custom inference arguments
Use Case
This handler allows users to leverage Triton Inference Server’s optimized inference capabilities within Apache Beam pipelines, particularly useful for:
• Text classification tasks
• Document processing pipelines
• Real-time and batch ML inference workloads

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions