postprocess: support multiple transforms#1543
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the c2rust postprocessing system to support multiple transforms in sequence. The main goal is to enable different post-processing steps (like comment transfer, assert folding, relooper cleanup, etc.) to be applied independently and in configurable order.
Changes:
- Introduced an abstract base class
AbstractTransformto standardize transform implementations - Renamed
CommentTransfertoCommentsTransformand moved it to a new module structure - Added factory function
get_transform_by_idto instantiate transforms by name - Updated CLI to accept comma-separated list of transforms via
--transformsparameter
Reviewed changes
Copilot reviewed 7 out of 9 changed files in this pull request and generated 16 comments.
Show a summary per file
| File | Description |
|---|---|
| c2rust-postprocess/postprocess/transforms/base.py | New abstract base class defining transform interface with apply() and apply_dir() methods |
| c2rust-postprocess/postprocess/transforms/comments.py | Refactored comments transform with renamed classes, added validation, and AbstractTransform inheritance |
| c2rust-postprocess/postprocess/transforms/init.py | New factory function to instantiate transforms by ID |
| c2rust-postprocess/postprocess/comments/init.py | Duplicate/incomplete implementation with critical bugs that appears to be leftover from refactoring |
| c2rust-postprocess/postprocess/init.py | Updated main entry point to support multiple transforms and use factory pattern |
| c2rust-postprocess/tests/llm-cache/CommentsTransform/*/metadata.toml | Updated cache metadata to reflect "CommentsTransform" naming |
| c2rust-postprocess/tests/llm-cache/CommentsTransform/*/response.txt | New cached test responses for quickSort and partition functions |
Comments suppressed due to low confidence (3)
c2rust-postprocess/postprocess/transforms/comments.py:69
- The docstring parameter documentation is missing the colon prefix for reST/Sphinx format. It should be
:param rust_fn:and:return:instead ofparam:andreturn:.
c2rust-postprocess/postprocess/transforms/comments.py:242 - The transfer_comments_dir method calls self.apply with a rust_source_file parameter, but the apply method signature only accepts root_rust_source_file. This mismatch will cause a TypeError at runtime. Either the apply method needs to be updated to accept rust_source_file, or this call needs to be updated to match the apply signature.
c2rust-postprocess/postprocess/transforms/comments.py:242 - Keyword argument 'rust_source_file' is not a supported parameter name of method CommentsTransform.apply.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
828f00c to
79adc8c
Compare
3c35f8a to
ade83ec
Compare
d21f1eb to
111e4d5
Compare
636dac3 to
34f08b2
Compare
70e6a92 to
0f869ac
Compare
kkysen
left a comment
There was a problem hiding this comment.
So this PR does a lot of different things, which makes it harder to review altogether at once. Could we split this into a few separate PRs?
- 8a3e399 seems totally independent. We can land that on its own without any issue.
- 9595e66 and dde9bea are about supporting multiple transforms, which is what I think this PR was originally titled for.
- 73069ee can be on its own, too, and there are parts of it that also overlap with the diffs in #1553 that I think we can better untangle if it's its own PR.
- 00cb8ec and 0f869ac seem like their own transforms that can be their own PRs.
Also, I don't think the LLM cache is being fully updated on each commit, which makes it much harder to review, as then I can't fully see what that commit changes, as the cache is basically the postprocessor's snapshot tests.
| /* | ||
| * Lomuto Partition Scheme: | ||
| * Partitions the array so that elements < pivot are on the left, | ||
| * and elements >= pivot are on the right. | ||
| */ |
There was a problem hiding this comment.
Why are we removing this valid doc comment for partition?
| response = """/// Lomuto Partition Scheme: | ||
| /// Partitions the array so that elements < pivot are on the left, | ||
| /// and elements >= pivot are on the right. | ||
| response = """/// Partition the subarray around the last element as pivot and return pivot's final index. | ||
| #[no_mangle] | ||
| pub unsafe extern "C" fn partition( | ||
| mut arr: *mut ::core::ffi::c_int, | ||
| mut low: ::core::ffi::c_int, | ||
| mut high: ::core::ffi::c_int, | ||
| ) -> ::core::ffi::c_int { | ||
| // Partition the subarray around the last element as pivot and return pivot's final index. |
There was a problem hiding this comment.
This seems like a worse translation. That comment is supposed to be inside the function, and now we're missing the comment on/outside the function.
There was a problem hiding this comment.
We need to delete the previous version of this file (which is what #1554 makes easy to do) so that git will show this as a moved file with a diff. This makes it much easier to review the changes.
|
|
||
| class AssertsTransform(AbstractTransform): | ||
| def __init__(self, cache: AbstractCache, model: AbstractGenerativeModel): | ||
| super().__init__(SYSTEM_INSTRUCTION) |
There was a problem hiding this comment.
Making SYSTEM_INSTRUCTION an abstract method might be simpler and more idiomatic.
0f869ac to
bcf6795
Compare
bcf6795 to
836b725
Compare
836b725 to
ec7002e
Compare
9023cb2 to
c847755
Compare
f07c3ff to
296bea4
Compare
A later change adds an abstract base class using the `Transform` suffix so it is more consistent with naming elsewhere in this project to call the inaugural transform `CommentsTransform`. Updated cache dir correspondingly.
Adds base class for transforms.
The C declarations can contain preprocessor definitions with their own associated comments. Sometimes we also get license headers and similar front matter. Use a separate transformation to remove this front matter such that validation of comment insertion only take the relevant comments into account. Note: the trim step is currently tightly integrated into the comment transfer step but I expect we'll want to run it before other transforms as well. This can be handled in a follow-up change.
TODO: Read assert.h from /usr/include to better handle differences between systems.
296bea4 to
46666a3
Compare
Support multiple transforms such that we can support different post-processing steps such as:
Some transforms might also be broken down into a sequence of simpler transforms. A concrete use case is lua which contains license headers and other matter before function definitions which we do not want to pass so the comment transfer transform.
(Replaces PR #1512 as it was automatically closed when I renamed the branch)