Skip to content

Conversation

wsttiger
Copy link
Collaborator

Add TensorRT Decoder Plugin for Quantum Error Correction

Overview

This PR introduces a new TensorRT-based decoder plugin for quantum error correction, leveraging NVIDIA TensorRT for accelerated neural network inference in QEC applications.

Key Features

  • TensorRT Integration: Full TensorRT runtime integration with support for both ONNX model loading and pre-built engine loading
  • Flexible Precision Support: Configurable precision modes (fp16, bf16, int8, fp8, tf32, best) with automatic hardware capability detection
  • Memory Management: Efficient CUDA memory allocation and stream-based execution
  • Parameter Validation: Comprehensive input validation with clear error messages
  • Python Utilities: ONNX to TensorRT engine conversion script for model preprocessing

Technical Implementation

  • Core Decoder Class: trt_decoder implementing the decoder interface with TensorRT backend
  • Hardware Detection: Automatic GPU capability detection for optimal precision selection
  • Error Handling: Robust error handling with graceful fallbacks and informative error messages
  • Plugin Architecture: CMake-based plugin system with conditional TensorRT linking

Files Added/Modified

  • libs/qec/include/cudaq/qec/trt_decoder_internal.h - Internal API declarations
  • libs/qec/lib/decoders/plugins/trt_decoder/trt_decoder.cpp - Main decoder implementation
  • libs/qec/lib/decoders/plugins/trt_decoder/CMakeLists.txt - Plugin build configuration
  • libs/qec/python/cudaq_qec/plugins/tensorrt_utils/build_engine_from_onnx.py - Python utility
  • libs/qec/unittests/test_trt_decoder.cpp - Comprehensive unit tests
  • Updated CMakeLists.txt files for integration

Testing

  • ✅ All 8 unit tests passing
  • Parameter validation tests
  • File loading utility tests
  • Edge case handling tests
  • Error condition testing

Usage Example

// Load from ONNX model
cudaqx::heterogeneous_map params;
params.insert("onnx_load_path", "model.onnx");
params.insert("precision", "fp16");
auto decoder = std::make_unique<trt_decoder>(H, params);

// Or load pre-built engine
params.clear();
params.insert("engine_load_path", "model.trt");
auto decoder = std::make_unique<trt_decoder>(H, params);

Dependencies

  • TensorRT 10.13.3.9+
  • CUDA 12.0+
  • NVIDIA GPU with appropriate compute capability

Performance Benefits

  • GPU-accelerated inference for QEC decoding
  • Optimized precision selection based on hardware capabilities
  • Efficient memory usage with CUDA streams
  • Reduced latency compared to CPU-based decoders

This implementation provides a production-ready TensorRT decoder plugin that can significantly accelerate quantum error correction workflows while maintaining compatibility with the existing CUDA-Q QEC framework.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 29, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

- Add trt_decoder class implementing TensorRT-accelerated inference
- Support both ONNX model loading and pre-built engine loading
- Include precision configuration (fp16, bf16, int8, fp8, tf32, best)
- Add hardware platform detection for capability-based precision selection
- Implement CUDA memory management and stream-based execution
- Add Python utility script for ONNX to TensorRT engine conversion
- Update CMakeLists.txt to build TensorRT decoder plugin
- Add comprehensive parameter validation and error handling
Signed-off-by: Scott Thornton <[email protected]>
import tensorrt as trt


def build_engine(onnx_file,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file exposed as part of the wheel such that regular users will be able to use this file?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would rather this file be in the docs as an example as to how to convert an ONNX file to an TRT engine

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would rather this file be in the docs as an example as to how to convert an ONNX file to an TRT engine

Is there a reason we can't just expose this utility function in our wheel? Assuming it's possible, if we are going to reference it in our docs, it would be better to just use it from the wheel rather than ask users to copy/paste code.

I, Scott Thornton <[email protected]>, hereby add my Signed-off-by to this commit: 9e97e26

Signed-off-by: Scott Thornton <[email protected]>
@wsttiger
Copy link
Collaborator Author

/ok to test fb16b36

@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 16, 2025

/ok to test fb16b36

@wsttiger, there was an error processing your request: E2

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/2/

@wsttiger
Copy link
Collaborator Author

/ok to test c9e563f

@wsttiger
Copy link
Collaborator Author

/ok to test 42c2b32

@wsttiger
Copy link
Collaborator Author

/ok to test 5ad505b

Signed-off-by: Scott Thornton <[email protected]>
@wsttiger
Copy link
Collaborator Author

/ok to test 2d08b88

Signed-off-by: Scott Thornton <[email protected]>
@wsttiger
Copy link
Collaborator Author

/ok to test 4defcfd

@wsttiger
Copy link
Collaborator Author

/ok to test 62cdbac

@wsttiger
Copy link
Collaborator Author

/ok to test d4e79a9

@wsttiger
Copy link
Collaborator Author

/ok to test d8489f7

@wsttiger
Copy link
Collaborator Author

/ok to test 6ba9191

@wsttiger
Copy link
Collaborator Author

/ok to test eea3198

@wsttiger
Copy link
Collaborator Author

/ok to test 33359f5

Comment on lines +72 to +76
wget https://developer.download.nvidia.com/compute/tensorrt/10.13.3/local_installers/nv-tensorrt-local-repo-ubuntu2404-10.13.3-cuda-12.9_1.0-1_amd64.deb
dpkg -i nv-tensorrt-local-repo-ubuntu2404-10.13.3-cuda-12.9_1.0-1_amd64.deb
cp /var/nv-tensorrt-local-repo-ubuntu2404-10.13.3-cuda-12.9/nv-tensorrt-local-4B177B4F-keyring.gpg /usr/share/keyrings/
apt update
apt install -y tensorrt-dev
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC, as written right now, this installs a mix of CUDA 13 and CUDA 12 stuff into the dev image. This might work better.

# Generate a pin preferences file to specify the desired CUDA version for tensorrt.
# The "cache search" will make it propagate to all of tensorrt's depdendencies.
apt-cache search tensorrt | awk '{print "Package: "$1"\nPin: version *+cuda12.9\nPin-Priority: 1001\n"}' | tee /etc/apt/preferences.d/tensorrt-cuda12.9.pref > /dev/null
apt update
apt install tensorrt tensorrt-dev

For CUDA 13, it would need to be updated to cuda13.0 instead of cuda 12.9.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In actuality, having the tensorrt package and the tensorrt-dev package is a bit redundant. It turns out the only thing you really need is the tensorrt-dev package for our dev image, and I suspect that means the tensorrt-lib package (which is much smaller) is the only thing needed for our released Docker image. (And hopefully pre-existing Python packages cover the Python environment.)

In other words, I think we can simply do this:

apt-cache search tensorrt | awk '{print "Package: "$1"\nPin: version *+cuda12.9\nPin-Priority: 1001\n"}' | tee /etc/apt/preferences.d/tensorrt-cuda12.9.pref > /dev/null
apt update
apt install tensorrt-dev

IS_ARM = _is_arm_architecture()

# Test inputs - 100 test cases with 24 detectors each
TEST_INPUTS = [[
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test cases span 500 lines, and it looks like many of them are redundant and/or contain only a single non-zero syndrome. If those are the intended test vectors, would it be possible to collapse these into something like "initialize with all 0's", and then just set the 1's where you want them?

add_dependencies(CUDAQXQECUnitTests test_qec)
gtest_discover_tests(test_qec)

# TensorRT decoder test is only built for x86 architectures
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that we support CUDA 13, we should be able to do x86 and ARM for that CUDA version.

@wsttiger
Copy link
Collaborator Author

/ok to test 5e38f6a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants