Add composite metrics for kubernetes inference gateway metrics protocol#725
Add composite metrics for kubernetes inference gateway metrics protocol#725BenjaminBraunDev wants to merge 6 commits intotriton-inference-server:mainfrom
Conversation
|
Tested these changes manually with both launching an http server and hitting |
indrajit96
left a comment
There was a problem hiding this comment.
Can we make corresponding additions in documentation
Here
https://github.com/triton-inference-server/tensorrtllm_backend?tab=readme-ov-file#triton-metrics
|
@kaiyux Could you advise what would be the approach for external contribution here? |
Since we do not switch to GitHub development for this repo yet, we'll need someone to integration the changes into the internal repo, merge and publish the changes, and then credit the contributor. cc @juney-nvidia @schetlur-nv for vis. |
|
Thanks @kaiyux! I can help with integrating to the internal repo once the changes are finalized. What steps need to be taken to properly credit the contributor? |
We do something like the following, so that the contributor will be marked as "Co-authored" Feel free to let me know when it gets merged and I can do it. |

In order to integration Triton Inference Server (specifically with TensorRT-LLM backend) with Gateway API Inference Extension, it must adhere to Gateway's Model Server Protocol. This protocol requires the model server to publish the following prometheus metrics under some consistent family/labels:
Currently TensorRT-LLM backend pipes the the following TensorRT-LLM batch manager statistics as prometheus metrics:
These are realized as the following prometheus metrics:
These current metrics are sufficient to compose the Gateway metrics by adding a the following new metrics:
and add these to the existing prometheus metrics:
These can then be mapped directly to the metrics in the Gateway protocol, allowing for integration with Gateway's End-Point Picker for load balancing.