Skip to content

feat: simple heuristic for isTensorrtEngine #690

Open
@freelerobot

Description

@freelerobot

Problem

  • Model publishers may not always assign the right Model Architecture tags
  • We want to detect when a model is a TensorRT Engine && can be run via TRT-LLM inference runtime.

isTensorrtModel Rules

  1. At least 1 file ending in .engine.
  2. (Optional) At least 1 file named config.json. Caveat: By design, model builders can actually rename this file.

Engine compatibility rules

For context, TensorRT models are specific to:

  1. GPU architectures, i.e. models compiled for Ada will only run on Ada
  2. TRT-LLM release, i.e. models compiled on release version v0.9.0 will need to run on 0.9.0
  3. OS (optional), though as of v0.9.0, models are cross OS compatible. We're still testing as it could be flaky.
  4. n GPUs, i.e. GPU topology. This can be detected by counting the # of engine files actually.

Unfortunately, afaik config.json and other metadata files do not track the hardware/build-time configurations once the models are built, so model authors will have to specify this info.

^ We'll update this info as it changes, and as we learn more 😄 .

Naming

  • TensorRT weights can be .plans or .onnx
  • TensorRT weights that run in TensorRT-LLM are in .engines
  • So we may need to be specific across the various TRT formats, i.e. isTensorrtEngine vs isTensorrtPlan?

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions