[ONNX] Fix nncf.errors.ValidationError: There is no tensor with the name#3988
Conversation
| try: | ||
| weight_tensor = get_tensor(model, weight_name) | ||
| data_type = weight_tensor.data_type | ||
| except nncf.ValidationError: | ||
| data_type = node_with_weight.layer_attributes.weight_attrs[weight_port_id]["dtype"] |
There was a problem hiding this comment.
Can we just obtain datatype through node_with_weight.layer_attributes.weight_attrs[weight_port_id]["dtype"] instead without try-except?
There was a problem hiding this comment.
What a reason that tensor with weight_name is not exists in the model?
Is it incorrect building graph or model was updated?
There was a problem hiding this comment.
Let me explain the problem. We have the following subgraph in the model
flowchart TD
A(Weights) --> B(Transpose)
A --> C(MatMul)
B --> D(Gather)
I(input_ids) --> D
D --> S(...)
S --> C
As we can see, the Gather and MatMul operations share the same weight, and there is a Transpose applied to the weight tensor before the Gather operation.
We have not encountered such a subgraph before. Previously, we have seen models with shared weights where the weight was consumed directly by operations that use weights, with no intermediate operations between the weight tensor and those operations. So we assumed that there are no operations applied to the weight tensor before it is consumed. We cannot retrieve the weight tensor for the Gather operation because the tensor name is incorrect.
Of course, this should be fixed during NNCF graph building, and we should somehow handle such cases. However, it is currently not clear how to do this properly. For example, there is still a question about what the weight shape should be: is it the shape of the static tensor (as it is stored), or the shape expected by the operation at its weight port?
So this PR is not intended to resolve the issue and should be considered a workaround to unblock the user. With this fix, NNCF no longer crashes, and we are able to compress the model.
There was a problem hiding this comment.
@anzr299 Thanks for the questions. Yes, I think we can do that.
57dec34 to
129a81b
Compare
| weight_name = node_with_weight.layer_attributes.weight_attrs[weight_port_id]["name"] | ||
| weight_tensor = get_tensor(model, weight_name) | ||
| return ONNX_DTYPE_TO_NNCF_DTYPE[weight_tensor.data_type] | ||
| data_type = node_with_weight.layer_attributes.weight_attrs[weight_port_id]["dtype"] |
There was a problem hiding this comment.
Looks like a problem is in that use "name" from weight_attrs, that defined as weight_edge_name in graph builder, that means that it's previous node in constant subgraph, in this case it's not a constant node.
If weight subgraph contains any node like transpose/reshape/DQ using layer_attribute can produce incorrect data.
In graph builder to get weight tensor used get_tensor_edge_name function that search constant data in subgraph, maybe need to use this function or similar function.
I'm not sure that it works as expected.
And please add tests for this case
There was a problem hiding this comment.
Current changes are covered by tests/onnx/test_layer_attributes.py::test_layer_attributes test. All other were discussed offline.
Changes
Added a fix to prevent crashes caused by intermediate operations between the weight tensor and consuming operations
Reason for changes
Related tickets
Tests