Conversation
| - If the model is provided in the PEtab SciML :ref:`NN model YAML format <NN_YAML>`, | ||
| datasets must follow the PyTorch dimension ordering. For example, if the first | ||
| layer is ``Conv2d``, the input should be in ``(C, W, H)`` format. | ||
| - For NN models in other framework-specific formats, input datasets must follow | ||
| the dimension ordering of the respective framework. |
There was a problem hiding this comment.
Currently, we only support perm:=row and perm:=column in the array file, so we could use that langauge here to be specific that we only support the two most popular options. I am not aware of another option but I guess other options possibly exist...
There was a problem hiding this comment.
The tricky thing is that permutation is not everything. For example in Julia a RGB image is encoded as zeros(10, 10, 3) (note that the depth is encoded last), while in PyTorch we have torch.zeros(3, 10, 10). So also indexing convention differs which I try to clarify here.
There was a problem hiding this comment.
Ah right... then this is a blind spot for the format and could be a warning box etc.?
There was a problem hiding this comment.
It is not a blind-spot right, as long as you know the ML model format, the rest can be inferred?
There was a problem hiding this comment.
Hm, the issue I see is that we choose to solve an equivalent issue by providing perm in array files, but then choose to not solve this other equivalent issue (we choose to not provide dimension ordering). If perm removes a blind spot, then there is a blind spot for dimension ordering, right?
There was a problem hiding this comment.
This is a fair point, providing dimension ordering would be more consistent. Maybe more suitable for a separate issue (definitely worth a discussion)?
There was a problem hiding this comment.
I'm fine with keeping it flexible/framework-specific and requiring frameworks to specify that in their own documentation -- just that we can make a note/warning of this issue maybe here ( https://petab-sciml--62.org.readthedocs.build/62/format.html#dealing-with-arrays ) and reference it whenever it comes up (array file, mapping table).
Also fine with providing dimension ordering if someone wants to figure that out 😅
There was a problem hiding this comment.
Also fine with providing dimension ordering if someone wants to figure that out 😅
I definitely do not want to figure this out 😅, I think a warning in mentioned section sounds the best!
| For a NN model with ID ``nnId``, an input reference has the form | ||
| ``nnId.inputs[<inputArgumentIndex>][<inputIndex>]``: |
There was a problem hiding this comment.
We expect input IDs in the array file, but inputArgumentIndex here. The input IDs in the array file are not PEtab IDs, right? This might be a source of confusion.
There was a problem hiding this comment.
Actually, regardless of input data inputArgumentIndex is required, as even with array input we can have multiple input arguments to the forward function. In case of array file, the index is omitted.
There was a problem hiding this comment.
Why is inputArgumentIndex required? NN ID + input ID combination specifies a unique input, or?
There was a problem hiding this comment.
Just to clarify, I'm fine with requiring the index here if it makes life easier for devs. In the array files, input ID is sufficient. Here, NN ID + input IDs would be sufficient. Instead, we currently have NN ID + input index, which is fine but then a bit inconsistent?
There was a problem hiding this comment.
We do not have NN ID + input ID anywhere else?
There was a problem hiding this comment.
In any case, index->ID mapping in mapping tables should already be in the test suite, so I don't think additional tests would be required for this smaller change?
Actually not in the test-suite currently.
Fine for me to leave that out and only support the index notation. I think this solution should mean that NN model YAML input IDs are also mapped via index in the mapping table, before they can be used in the array file, for consistency (for the reason above). I think this isn't a big issue since I guess users will use tools to generate the mapping file from the NN model YAML anyway. In any case, index->ID mapping in mapping tables should already be in the test suite, so I don't think ad
I think maybe always supporting the indexing notation would be the easiest to stay consistent. As for YAML input ids, it makes sense to allow them to be valid ids, but if a user want to enter elements (e.g., inputId[0]), this still would need to be done in the mapping table, while the id itself would be possible to directly assign in the hybridization table for mapping to array file.
There was a problem hiding this comment.
As for YAML input ids, it makes sense to allow them to be valid ids
👍
if a user want to enter elements (e.g., inputId[0]), this still would need to be done in the mapping table
Agreed, indexing should only be possible in the mapping table, as is the case for parameters.
the id itself would be possible to directly assign in the hybridization table for mapping to array file
👍
I think we agree and I am flexible about the outcome. Here's one option, what do you think?
- input IDs in array files must be valid PEtab IDs
- NN model YAML input IDs are valid PEtab IDs without needing the mapping table
- currently not allowed (might be allowed later):
nnId.inputs[<inputId>][<inputIndex>]and<inputId>[<inputIndex>]
There was a problem hiding this comment.
currently not allowed to reduce test suite: nnId.inputs[][] and []
Not sure I understand this, otherwise I agree!
There was a problem hiding this comment.
Ok, now everything looks good to me :)
I will update the PR with everything discussed!
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
| tensor created as ``zeros(3, 3, 2)``. To enable exchange, we therefore | ||
| recommend that importers provide a utility to export NN parameters to the | ||
| PEtab SciML array format (PyTorch conventions) and document the dimension | ||
| ordering used when exporting arrays. |
There was a problem hiding this comment.
I think so far we haven't assumed that the array format follows PyTorch conventions right? Hence the perm parameter in the array file. If the arrays in the array file follow PyTorch dimension ordering, why not also assume that matrices are stored in row-major format like PyTorch?
We could support storage of the dimension ordering for every array in the array file. A simpler alternative: add a global variable to the array file like pytorch_dimension_ordering [bool] -- when it's true we can port between frameworks easily, when it's false then we can't...
Then the message here would be more like "For portability, store arrays in array files according to pytorch dimension ordering, and set the pytorch_dimension_ordering flag to True. Otherwise, it's difficult to use the stored arrays in different frameworks.".
There was a problem hiding this comment.
I like this idea quite a lot, and think it would definitely clarify things!
Not sure it fits this PR though, how about a separate PR (would you have time to draft it?)
There was a problem hiding this comment.
Yes makes sense, I'll add pytorch_dimension_ordering to the package/spec/docs after this PR
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
It has been a long time since the spec was first written, moreover, changes have been made. With this PR I aim to remove old non-relevant parts of the spec, as well as improve overall readability.