-
Notifications
You must be signed in to change notification settings - Fork 33
Add trt decoder #307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add trt decoder #307
Conversation
bd14f16
to
c34be87
Compare
- Add trt_decoder class implementing TensorRT-accelerated inference - Support both ONNX model loading and pre-built engine loading - Include precision configuration (fp16, bf16, int8, fp8, tf32, best) - Add hardware platform detection for capability-based precision selection - Implement CUDA memory management and stream-based execution - Add Python utility script for ONNX to TensorRT engine conversion - Update CMakeLists.txt to build TensorRT decoder plugin - Add comprehensive parameter validation and error handling
c34be87
to
9e97e26
Compare
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
libs/qec/python/cudaq_qec/plugins/tensorrt_utils/build_engine_from_onnx.py
Show resolved
Hide resolved
import tensorrt as trt | ||
|
||
|
||
def build_engine(onnx_file, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this file exposed as part of the wheel such that regular users will be able to use this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would rather this file be in the docs as an example as to how to convert an ONNX file to an TRT engine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would rather this file be in the docs as an example as to how to convert an ONNX file to an TRT engine
Is there a reason we can't just expose this utility function in our wheel? Assuming it's possible, if we are going to reference it in our docs, it would be better to just use it from the wheel rather than ask users to copy/paste code.
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
…trix) Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
…ecoder model, added to unittest Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
I, Scott Thornton <[email protected]>, hereby add my Signed-off-by to this commit: 9e97e26 Signed-off-by: Scott Thornton <[email protected]>
/ok to test fb16b36 |
@wsttiger, there was an error processing your request: See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/ |
/ok to test c9e563f |
Signed-off-by: Scott Thornton <[email protected]>
Signed-off-by: Scott Thornton <[email protected]>
/ok to test 42c2b32 |
Signed-off-by: Scott Thornton <[email protected]>
/ok to test 5ad505b |
Signed-off-by: Scott Thornton <[email protected]>
/ok to test 2d08b88 |
Signed-off-by: Scott Thornton <[email protected]>
/ok to test 4defcfd |
Signed-off-by: Scott Thornton <[email protected]>
/ok to test 62cdbac |
Signed-off-by: Scott Thornton <[email protected]>
/ok to test d4e79a9 |
Signed-off-by: Scott Thornton <[email protected]>
/ok to test d8489f7 |
Signed-off-by: Scott Thornton <[email protected]>
/ok to test 6ba9191 |
…eels yaml and script Signed-off-by: Scott Thornton <[email protected]>
/ok to test eea3198 |
Signed-off-by: Scott Thornton <[email protected]>
/ok to test 33359f5 |
wget https://developer.download.nvidia.com/compute/tensorrt/10.13.3/local_installers/nv-tensorrt-local-repo-ubuntu2404-10.13.3-cuda-12.9_1.0-1_amd64.deb | ||
dpkg -i nv-tensorrt-local-repo-ubuntu2404-10.13.3-cuda-12.9_1.0-1_amd64.deb | ||
cp /var/nv-tensorrt-local-repo-ubuntu2404-10.13.3-cuda-12.9/nv-tensorrt-local-4B177B4F-keyring.gpg /usr/share/keyrings/ | ||
apt update | ||
apt install -y tensorrt-dev |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC, as written right now, this installs a mix of CUDA 13 and CUDA 12 stuff into the dev image. This might work better.
# Generate a pin preferences file to specify the desired CUDA version for tensorrt.
# The "cache search" will make it propagate to all of tensorrt's depdendencies.
apt-cache search tensorrt | awk '{print "Package: "$1"\nPin: version *+cuda12.9\nPin-Priority: 1001\n"}' | tee /etc/apt/preferences.d/tensorrt-cuda12.9.pref > /dev/null
apt update
apt install tensorrt tensorrt-dev
For CUDA 13, it would need to be updated to cuda13.0 instead of cuda 12.9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In actuality, having the tensorrt
package and the tensorrt-dev
package is a bit redundant. It turns out the only thing you really need is the tensorrt-dev
package for our dev image, and I suspect that means the tensorrt-lib
package (which is much smaller) is the only thing needed for our released Docker image. (And hopefully pre-existing Python packages cover the Python environment.)
In other words, I think we can simply do this:
apt-cache search tensorrt | awk '{print "Package: "$1"\nPin: version *+cuda12.9\nPin-Priority: 1001\n"}' | tee /etc/apt/preferences.d/tensorrt-cuda12.9.pref > /dev/null
apt update
apt install tensorrt-dev
IS_ARM = _is_arm_architecture() | ||
|
||
# Test inputs - 100 test cases with 24 detectors each | ||
TEST_INPUTS = [[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test cases span 500 lines, and it looks like many of them are redundant and/or contain only a single non-zero syndrome. If those are the intended test vectors, would it be possible to collapse these into something like "initialize with all 0's", and then just set the 1's where you want them?
add_dependencies(CUDAQXQECUnitTests test_qec) | ||
gtest_discover_tests(test_qec) | ||
|
||
# TensorRT decoder test is only built for x86 architectures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we support CUDA 13, we should be able to do x86 and ARM for that CUDA version.
Signed-off-by: Scott Thornton <[email protected]>
/ok to test 5e38f6a |
Add TensorRT Decoder Plugin for Quantum Error Correction
Overview
This PR introduces a new TensorRT-based decoder plugin for quantum error correction, leveraging NVIDIA TensorRT for accelerated neural network inference in QEC applications.
Key Features
Technical Implementation
trt_decoder
implementing thedecoder
interface with TensorRT backendFiles Added/Modified
libs/qec/include/cudaq/qec/trt_decoder_internal.h
- Internal API declarationslibs/qec/lib/decoders/plugins/trt_decoder/trt_decoder.cpp
- Main decoder implementationlibs/qec/lib/decoders/plugins/trt_decoder/CMakeLists.txt
- Plugin build configurationlibs/qec/python/cudaq_qec/plugins/tensorrt_utils/build_engine_from_onnx.py
- Python utilitylibs/qec/unittests/test_trt_decoder.cpp
- Comprehensive unit testsTesting
Usage Example
Dependencies
Performance Benefits
This implementation provides a production-ready TensorRT decoder plugin that can significantly accelerate quantum error correction workflows while maintaining compatibility with the existing CUDA-Q QEC framework.