-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Got Segmentation fault (core dumped) of TensorRT 10.3 when running execute_async_v3 on GPU H20 #4395
Comments
Thanks for your reply. According to the code you provided, I update my code as follows: import os
from collections import OrderedDict # keep the order of the tensors implicitly
from pathlib import Path
import numpy as np
import tensorrt as trt
from cuda import cudart
# yapf:disable
trt_file = Path("model.plan")
audio = np.arange(1 * 256 * 128, dtype=np.float32).reshape(1, 256, 128) # inference input data
trunc_start = np.array([2], dtype=np.int64)
offset = np.array([126], dtype=np.int64)
att_cache = np.arange(32 * 20 * 128 * 128, dtype=np.float32).reshape(32, 20, 128, 128)
frame_cache = np.arange(4 * 128, dtype=np.float32).reshape(1, 4, 128)
def run():
# create Logger, available level: VERBOSE, INFO, WARNING, ERROR, INTERNAL_ERROR
logger = trt.Logger(trt.Logger.ERROR)
# load engine from file and skip building process if it existed
if trt_file.exists():
with open(trt_file, "rb") as f:
engine_bytes = f.read()
if engine_bytes == None:
print("Fail getting serialized engine")
return
print("Succeed getting serialized engine")
# build a serialized network from scratch
else:
# create Builder
builder = trt.Builder(logger)
# create BuidlerConfig to set attribution of the network
config = builder.create_builder_config()
# create Network
network = builder.create_network()
# create OptimizationProfile if using Dynamic-Shape mode
profile = builder.create_optimization_profile()
# set workspace for the building process (all GPU memory is used by default)
config.set_memory_pool_limit(trt.MemoryPoolType.WORKSPACE, 1 << 30)
# set input tensor of the network
input_tensor = network.add_input(
input_tensor_name, trt.float32, [-1, -1, -1])
# set dynamic shape range of the input tensor
profile.set_shape(input_tensor.name, [1, 1, 1], [3, 4, 5], [6, 8, 10])
# add the Optimization Profile into the BuilderConfig
config.add_optimization_profile(profile)
# here is only an identity layer in this simple network, which the output is exactly equal to input
identity_layer = network.add_identity(input_tensor)
# mark the tensor for output
network.mark_output(identity_layer.get_output(0))
# create a serialized network from the network
engine_bytes = builder.build_serialized_network(network, config)
if engine_bytes == None:
print("Fail building engine")
return
print("Succeed building engine")
# save the serialized network as binaray file
with open(trt_file, "wb") as f:
f.write(engine_bytes)
print(f"Succeed saving engine ({trt_file})")
engine = trt.Runtime(logger).deserialize_cuda_engine(
engine_bytes) # create inference engine
if engine == None:
print("Fail getting engine for inference")
return
print("Succeed getting engine for inference")
# create Execution Context from the engine (analogy to a GPU context, or a CPU process)
context = engine.create_execution_context()
tensor_name_list = [engine.get_tensor_name(
i) for i in range(engine.num_io_tensors)]
# set runtime size of input tensor if using Dynamic-Shape mode
context.set_input_shape("audio", audio.shape)
context.set_input_shape("trunc_start", trunc_start.shape)
context.set_input_shape("offset", offset.shape)
context.set_input_shape("att_cache", att_cache.shape)
context.set_input_shape("frame_cache", frame_cache.shape)
# Print information of input / output tensors
for name in tensor_name_list:
mode = engine.get_tensor_mode(name)
data_type = engine.get_tensor_dtype(name)
buildtime_shape = engine.get_tensor_shape(name)
runtime_shape = context.get_tensor_shape(name)
print(f"{'Input ' if mode == trt.TensorIOMode.INPUT else 'Output'}->{data_type}, {buildtime_shape}, {runtime_shape}, {name}")
# prepare the memory buffer on host and device
buffer = OrderedDict()
for name in tensor_name_list:
data_type = engine.get_tensor_dtype(name)
runtime_shape = context.get_tensor_shape(name)
n_byte = trt.volume(runtime_shape) * \
np.dtype(trt.nptype(data_type)).itemsize
host_buffer = np.empty(runtime_shape, dtype=trt.nptype(data_type))
device_buffer = cudart.cudaMalloc(n_byte)[1]
buffer[name] = [host_buffer, device_buffer, n_byte]
import pdb; pdb.set_trace()
# set runtime data, MUST use np.ascontiguousarray, it is a SERIOUS lesson
buffer["audio"][0] = np.ascontiguousarray(audio)
buffer["trunc_start"][0] = np.ascontiguousarray(trunc_start)
buffer["offset"][0] = np.ascontiguousarray(offset)
buffer["att_cache"][0] = np.ascontiguousarray(att_cache)
buffer["frame_cache"][0] = np.ascontiguousarray(frame_cache)
for name in tensor_name_list:
# bind address of device buffer to context
context.set_tensor_address(name, buffer[name][1])
# copy input data from host to device
for name in tensor_name_list:
if engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
cudart.cudaMemcpy(buffer[name][1], buffer[name][0].ctypes.data,
buffer[name][2], cudart.cudaMemcpyKind.cudaMemcpyHostToDevice)
# do inference computation
context.execute_async_v3(0)
# copy output data from device to host
for name in tensor_name_list:
if engine.get_tensor_mode(name) == trt.TensorIOMode.OUTPUT:
cudart.cudaMemcpy(buffer[name][0].ctypes.data, buffer[name][1],
buffer[name][2], cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost)
for name in tensor_name_list:
print(name)
print(buffer[name][0])
# free the GPU memory buffer after all work
for _, device_buffer, _ in buffer.values():
cudart.cudaFree(device_buffer)
if __name__ == "__main__":
os.system("rm -rf *.trt")
# build a TensorRT engine and do inference
run()
# load a TensorRT engine and do inference
run()
print("Finish") but the |
Use a common onnx like resnet50.onnx, then build a plan, and run my script, to check pass or not. |
I tried writing a very simple 2-layer LSTM model and converted it to a TRT engine in the same way, and it worked fine. Of course, I could also try ResNet50. |
Description
I used the following commands to convert an ONNX model to a TRT engine, where the input.onnx file is the original model:
Then I tried to perform inference using TensorRT, but encountered a “Segmentation fault (core dumped)” error. Below are my model information and code:
TRT Engine
My Code
Environment
TensorRT Version: 10.3
NVIDIA GPU: H20
NVIDIA Driver Version: 535.161.08
CUDA Version: 12.4
CUDNN Version: 8.9
Operating System:
Python Version (if applicable): 3.10.4
Tensorflow Version (if applicable):
PyTorch Version (if applicable): 2.3.0
Baremetal or Container (if so, version):
Relevant Files
Model link:
Steps To Reproduce
Commands or scripts:
Have you tried the latest release?:
Can this model run on other frameworks? For example run ONNX model with ONNXRuntime (
polygraphy run <model.onnx> --onnxrt
):The text was updated successfully, but these errors were encountered: