Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.
Describe the bug
- I'd used tensort8.6.1 to convert an existing onnx model with qdq nodes, say a model exported using Model Optimizer Toolkit, to run on a orin-equiped platform. However, the inference result of the engine is not good as expected, it showed a large discrepancy compared with the original qdq onnx. For convenience, i again tried trt8.6.1 and trt10.11 on my local x86-64 workstation to convert the onnx into engine, to verify if i can see the same phenomenon. Unfortunately, yes. I calculated cosine similarity between qdq onnx and generated engine according to trt8.6 and trt10.11 respectively.
As depicted above, we can indeed see a large discrepancy.
Steps/Code to reproduce #bug
- convert command:
trtexec --onnx=quant0206.onnx --saveEngine=quant_0206.engine --dumpProfile=true --best --verbose=true
Expected behavior
Who can help?
- I really appreciate if anyone can help solve the problem, or some instructive advice on how to solve it. I guess the problem may be caused by the conversion process from onnx to trt in which precision lost occurred in some operators.
System information
- Container used (if applicable): nvcr.io/nvidia/tensorrt-llm/release:1.0.0
- OS (e.g., Ubuntu 22.04, CentOS 7, Windows 10): ? Ubuntu 20.04
- CPU architecture (x86_64, aarch64): x86_64
- GPU name (e.g. H100, A100, L40S): GT3060
- GPU memory size: 12G
- Number of GPUs: 1
- Library versions (if applicable):
- Python: 3.9
- ModelOpt version or commit hash: 0.40.0
- CUDA: nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0
- PyTorch: 2.0
- Transformers: not used
- TensorRT-LLM: not used
- ONNXRuntime: 1.19.2-gpu
- TensorRT: trt8.6 or trt10.11
- Any other details that may help:
Before submitting an issue, please make sure it hasn't been already addressed by searching through the existing and past issues.
Describe the bug
Steps/Code to reproduce #bug
trtexec --onnx=quant0206.onnx --saveEngine=quant_0206.engine --dumpProfile=true --best --verbose=trueExpected behavior
Who can help?
System information
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jun__2_19:15:15_PDT_2021
Cuda compilation tools, release 11.4, V11.4.48
Build cuda_11.4.r11.4/compiler.30033411_0