Open
Description
Problem
- Model publishers may not always assign the right Model Architecture tags
- We want to detect when a model is a TensorRT Engine && can be run via TRT-LLM inference runtime.
isTensorrtModel Rules
- At least 1 file ending in
.engine
. - (Optional) At least 1 file named
config.json
. Caveat: By design, model builders can actually rename this file.
Engine compatibility rules
For context, TensorRT models are specific to:
GPU architectures
, i.e. models compiled for Ada will only run on AdaTRT-LLM release
, i.e. models compiled on release version v0.9.0 will need to run on 0.9.0OS
(optional), though as of v0.9.0, models are cross OS compatible. We're still testing as it could be flaky.n GPUs
, i.e. GPU topology. This can be detected by counting the # of engine files actually.
Unfortunately, afaik config.json
and other metadata files do not track the hardware/build-time configurations once the models are built, so model authors will have to specify this info.
^ We'll update this info as it changes, and as we learn more 😄 .
Naming
- TensorRT weights can be
.plans
or.onnx
- TensorRT weights that run in TensorRT-LLM are in
.engines
- So we may need to be specific across the various TRT formats, i.e.
isTensorrtEngine
vsisTensorrtPlan
?
Metadata
Metadata
Assignees
Labels
No labels