Skip to content

Improve readability format.rst#62

Merged
sebapersson merged 5 commits intomainfrom
readabillity_format
Feb 20, 2026
Merged

Improve readability format.rst#62
sebapersson merged 5 commits intomainfrom
readabillity_format

Conversation

@sebapersson
Copy link
Copy Markdown
Collaborator

It has been a long time since the spec was first written, moreover, changes have been made. With this PR I aim to remove old non-relevant parts of the spec, as well as improve overall readability.

Copy link
Copy Markdown
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

Comment thread doc/format.rst
Comment on lines +161 to +165
- If the model is provided in the PEtab SciML :ref:`NN model YAML format <NN_YAML>`,
datasets must follow the PyTorch dimension ordering. For example, if the first
layer is ``Conv2d``, the input should be in ``(C, W, H)`` format.
- For NN models in other framework-specific formats, input datasets must follow
the dimension ordering of the respective framework.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, we only support perm:=row and perm:=column in the array file, so we could use that langauge here to be specific that we only support the two most popular options. I am not aware of another option but I guess other options possibly exist...

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tricky thing is that permutation is not everything. For example in Julia a RGB image is encoded as zeros(10, 10, 3) (note that the depth is encoded last), while in PyTorch we have torch.zeros(3, 10, 10). So also indexing convention differs which I try to clarify here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right... then this is a blind spot for the format and could be a warning box etc.?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not a blind-spot right, as long as you know the ML model format, the rest can be inferred?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, the issue I see is that we choose to solve an equivalent issue by providing perm in array files, but then choose to not solve this other equivalent issue (we choose to not provide dimension ordering). If perm removes a blind spot, then there is a blind spot for dimension ordering, right?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a fair point, providing dimension ordering would be more consistent. Maybe more suitable for a separate issue (definitely worth a discussion)?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with keeping it flexible/framework-specific and requiring frameworks to specify that in their own documentation -- just that we can make a note/warning of this issue maybe here ( https://petab-sciml--62.org.readthedocs.build/62/format.html#dealing-with-arrays ) and reference it whenever it comes up (array file, mapping table).

Also fine with providing dimension ordering if someone wants to figure that out 😅

Copy link
Copy Markdown
Collaborator Author

@sebapersson sebapersson Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also fine with providing dimension ordering if someone wants to figure that out 😅

I definitely do not want to figure this out 😅, I think a warning in mentioned section sounds the best!

Comment thread doc/format.rst
Comment thread doc/format.rst
Comment on lines +238 to +239
For a NN model with ID ``nnId``, an input reference has the form
``nnId.inputs[<inputArgumentIndex>][<inputIndex>]``:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We expect input IDs in the array file, but inputArgumentIndex here. The input IDs in the array file are not PEtab IDs, right? This might be a source of confusion.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, regardless of input data inputArgumentIndex is required, as even with array input we can have multiple input arguments to the forward function. In case of array file, the index is omitted.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is inputArgumentIndex required? NN ID + input ID combination specifies a unique input, or?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to clarify, I'm fine with requiring the index here if it makes life easier for devs. In the array files, input ID is sufficient. Here, NN ID + input IDs would be sufficient. Instead, we currently have NN ID + input index, which is fine but then a bit inconsistent?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have NN ID + input ID anywhere else?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, index->ID mapping in mapping tables should already be in the test suite, so I don't think additional tests would be required for this smaller change?

Actually not in the test-suite currently.

Fine for me to leave that out and only support the index notation. I think this solution should mean that NN model YAML input IDs are also mapped via index in the mapping table, before they can be used in the array file, for consistency (for the reason above). I think this isn't a big issue since I guess users will use tools to generate the mapping file from the NN model YAML anyway. In any case, index->ID mapping in mapping tables should already be in the test suite, so I don't think ad

I think maybe always supporting the indexing notation would be the easiest to stay consistent. As for YAML input ids, it makes sense to allow them to be valid ids, but if a user want to enter elements (e.g., inputId[0]), this still would need to be done in the mapping table, while the id itself would be possible to directly assign in the hybridization table for mapping to array file.

Copy link
Copy Markdown
Member

@dilpath dilpath Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for YAML input ids, it makes sense to allow them to be valid ids

👍

if a user want to enter elements (e.g., inputId[0]), this still would need to be done in the mapping table

Agreed, indexing should only be possible in the mapping table, as is the case for parameters.

the id itself would be possible to directly assign in the hybridization table for mapping to array file

👍

I think we agree and I am flexible about the outcome. Here's one option, what do you think?

  • input IDs in array files must be valid PEtab IDs
  • NN model YAML input IDs are valid PEtab IDs without needing the mapping table
  • currently not allowed (might be allowed later): nnId.inputs[<inputId>][<inputIndex>] and <inputId>[<inputIndex>]

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently not allowed to reduce test suite: nnId.inputs[][] and []

Not sure I understand this, otherwise I agree!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, updated now

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, now everything looks good to me :)

I will update the PR with everything discussed!

Comment thread doc/format.rst Outdated
Comment thread doc/format.rst
Comment thread doc/format.rst Outdated
Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
Copy link
Copy Markdown
Member

@dilpath dilpath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Comment thread doc/format.rst Outdated
Comment thread doc/format.rst
Comment on lines +545 to +548
tensor created as ``zeros(3, 3, 2)``. To enable exchange, we therefore
recommend that importers provide a utility to export NN parameters to the
PEtab SciML array format (PyTorch conventions) and document the dimension
ordering used when exporting arrays.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so far we haven't assumed that the array format follows PyTorch conventions right? Hence the perm parameter in the array file. If the arrays in the array file follow PyTorch dimension ordering, why not also assume that matrices are stored in row-major format like PyTorch?

We could support storage of the dimension ordering for every array in the array file. A simpler alternative: add a global variable to the array file like pytorch_dimension_ordering [bool] -- when it's true we can port between frameworks easily, when it's false then we can't...

Then the message here would be more like "For portability, store arrays in array files according to pytorch dimension ordering, and set the pytorch_dimension_ordering flag to True. Otherwise, it's difficult to use the stored arrays in different frameworks.".

Copy link
Copy Markdown
Collaborator Author

@sebapersson sebapersson Feb 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea quite a lot, and think it would definitely clarify things!

Not sure it fits this PR though, how about a separate PR (would you have time to draft it?)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes makes sense, I'll add pytorch_dimension_ordering to the package/spec/docs after this PR

Co-authored-by: Dilan Pathirana <59329744+dilpath@users.noreply.github.com>
@sebapersson sebapersson merged commit 095ed5e into main Feb 20, 2026
1 check passed
@sebapersson sebapersson deleted the readabillity_format branch February 20, 2026 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants