Skip to content

BF16 inaccuracy of TensorRT 10.14.1 at Slice nodes #4718

@dangtruong-shopee

Description

@dangtruong-shopee

Description

Environment

TensorRT Version: 10.14.1

NVIDIA GPU: A30

NVIDIA Driver Version: 580.126.09

CUDA Version: 13.0

CUDNN Version: 9.8.0

Operating System:

Python Version (if applicable): 3.12.3

Tensorflow Version (if applicable):

PyTorch Version (if applicable):

Baremetal or Container (if so, version):

Relevant Files

Google Drive link: https://drive.google.com/drive/u/0/folders/1GbuWrsyY5gzFnzgd6ZbcPg6PTHXZloXt

Steps To Reproduce

Run the following command to run the graph in bf16 TRT. The output file is already included in the Google Drive link for your reference.
polygraphy run graph_final_baked.onnx --trt --bf16 --trt-outputs mark all --load-inputs layerwise_inputs.json --save-outputs 10_14_all_outputs.json

Check the following tensors in 10_14_all_outputs.json: llm_lt_click_reqcate1_items_input/slice_tile_num:0, llm_lt_click_pcate_items_input/slice_tile_num:0, you will find that their values are 0:

Taken from 10_14_all_outputs.json for your reference:

llm_lt_click_reqcate1_items_input/slice_tile_num:0 [dtype=float32, shape=(1, 1)] | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0), max=0 at (0, 0), avg-magnitude=0, p90=0, p95=0, p99=0
llm_lt_click_pcate_items_input/slice_tile_num:0 [dtype=float32, shape=(1, 1)] | Stats: mean=0, std-dev=0, var=0, median=0, min=0 at (0, 0), max=0 at (0, 0), avg-magnitude=0, p90=0, p95=0, p99=0

In actuality, their values are not 0, I have extracted those nodes into a subgraph at slice_subgraph.onnx. Strangely enough, when running this subgraph in isolation, the issue does not occur. Run the following command to verify:
polygraphy run slice_subgraph.onnx --trt --bf16 --load-inputs layerwise_inputs.json --save-outputs slice_trt_outputs.json 2>/dev/null && polygraphy inspect data slice_trt_outputs.json --show-values

Outputs of running slice_subgraph.onnx:

llm_lt_click_pcate_items_input/slice_tile_num:0 [dtype=float32, shape=(1, 1)] | Stats: mean=23, std-dev=0, var=0, median=23, min=23 at (0, 0), max=23 at (0, 0), avg-magnitude=23, p90=23, p95=23, p99=23
        [[23.]]
    llm_lt_click_reqcate1_items_input/slice_tile_num:0 [dtype=float32, shape=(1, 1)] | Stats: mean=34, std-dev=0, var=0, median=34, min=34 at (0, 0), max=34 at (0, 0), avg-magnitude=34, p90=34, p95=34, p99=34
        [[34.]]

Please help me check this issue. It is affecting the accuracy of our bf16 engine and we would like to find a solution for it. Thank you in advance.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Module:AccuracyOutput mismatch between TensorRT and other frameworks

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions