Skip to content

Comments

Add CeedOperatorCompositeSetSequential and fix edge cases in CUDA gen at points assembly#1930

Merged
jeremylt merged 4 commits intomainfrom
zach/add-sequential-and-at-points-fixes
Feb 20, 2026
Merged

Add CeedOperatorCompositeSetSequential and fix edge cases in CUDA gen at points assembly#1930
jeremylt merged 4 commits intomainfrom
zach/add-sequential-and-at-points-fixes

Conversation

@zatkins-dev
Copy link
Collaborator

@zatkins-dev zatkins-dev commented Feb 18, 2026

These are changes needed to support functionality in Ratel (https://gitlab.com/micromorph/ratel/-/merge_requests/1213).

First, I need to be able to turn off the parallel sub-operator functionality.

The particular use case is that I need to project a quantity from different mesh elements to the same set of points. The points here are "shared" between the two sides - they have the same quadrature weights and areas, etc. I then need to use both of those at-points vectors as inputs to the "real" QFunctions which use each of the interpolated values to compute the physics at quadrature points. These "real" QFunctions then interpolate back to each of the two sets of mesh elements. That's a use-case that's fundamentally incompatible with the way that our at points operators work, which group points by which element they are inside of. We strictly need 2 "setup" operators (one for each set of mesh elements) that just interp to points, then 2 "output" operators (again, one for each set of mesh elements) to compute the physics and interp-transpose back to the mesh nodes. There is never a situation where I would only run the "setup" or "output" -- they are explicitly coupled. But, the math is perfectly well-defined if that sequence is allowed.

Second, the linear assemble diagonal (and maybe also the normal assembly) were trying to write to passive outputs (particularly with CeedEvalNone), which is incorrect. This fixes that by adding an additional flag to the QFunction builder function.

Specifically, previously the point loop would have something like this:

        // EvalMode: none
        const CeedInt comp_stride_out_1 = 1;
        WritePoint<num_comp_out_1, comp_stride_out_1, max_num_points>(data, elem, i, points.num_per_elem[elem], indices.outputs[1], r_s_out_1, d_out_1);

Here, d_out_1 is a passive (CeedEvalNone`) output. We should instead just not include inactive outputs in the QFunction build routine.

@jrwrigh
Copy link
Collaborator

jrwrigh commented Feb 18, 2026

First, I need to be able to turn off the parallel sub-operator functionality.

Could you not implement that via calls to separate CeedOperators? I'm not sure how much I like adding the backdoor here (though admittedly it's just a gut feeling).

I guess part of the dislike is that now the order of CeedOperator additions gives an implicit notion of ordering. Granted, CeedQFunctionAddInput() already has that implicit notion of ordering, so it's not unprecedented.

@jeremylt
Copy link
Member

Can you outline for posterity here why non-sequential is important? Also probably should be in the docs why you might need to do this

@jeremylt
Copy link
Member

Codecov is correct - this should be tested, if we can cook a representative reason why we would want this

@zatkins-dev
Copy link
Collaborator Author

First, I need to be able to turn off the parallel sub-operator functionality.

Could you not implement that via calls to separate CeedOperators? I'm not sure how much I like adding the backdoor here (though admittedly it's just a gut feeling).

I guess part of the dislike is that now the order of CeedOperator additions gives an implicit notion of ordering. Granted, CeedQFunctionAddInput() already has that implicit notion of ordering, so it's not unprecedented.

That causes issues when, for example, I need to run one suboperator first in a MatCeed MatMult. In order to do that with separate operators, I would have to overload the MatCeed functions, when all I really need is to know that the suboperators will execute in order.

@zatkins-dev
Copy link
Collaborator Author

The particular use case is that I need to project two quantities from different mesh elements to the same set of points. I then need to use both of those as inputs to the "real" qfunctions -- one that interpolates the computed output to each of the two sets of mesh nodes. That's a use-case that's fundamentally incompatible with the way that our at points operators work -- it requires 2 "setup" operators (one for each set of mesh elements) that just interp to points, then 2 "output" operators (again, one for each set of mesh elements) to compute the physics and interp-transpose back to the mesh nodes. But, the math is perfectly well-defined if that sequence is allowed.

@zatkins-dev
Copy link
Collaborator Author

Codecov is correct - this should be tested, if we can cook a representative reason why we would want this

Yeah, I didn't add tests yet. I think that splitting the input and output parts of the operator like described above (but with only one mesh element set) would be a fine test.

@zatkins-dev zatkins-dev force-pushed the zach/add-sequential-and-at-points-fixes branch from 059c0f3 to 4543c65 Compare February 19, 2026 16:55
@jeremylt
Copy link
Member

Was that ref bug present in HIP or Sycl?

@zatkins-dev zatkins-dev force-pushed the zach/add-sequential-and-at-points-fixes branch from f237a56 to dad8f31 Compare February 19, 2026 18:14
@zatkins-dev
Copy link
Collaborator Author

Was that ref bug present in HIP or Sycl?

good question -- there were actually a few more cases I missed (none in Sycl though, I don't think it ever got the field ordering optimization)

@zatkins-dev zatkins-dev force-pushed the zach/add-sequential-and-at-points-fixes branch 2 times, most recently from a366d8a to b681510 Compare February 19, 2026 19:01
@zatkins-dev
Copy link
Collaborator Author

Hmm, I'm realizing that this doesn't fix all my issues, especially when assembling. I have some additional ideas I'm workshopping

@zatkins-dev
Copy link
Collaborator Author

I still think that this MR is useful, but I think I need a more fundamental solution long-term. See #1931 for my best idea.

@zatkins-dev zatkins-dev force-pushed the zach/add-sequential-and-at-points-fixes branch from b681510 to 11a8069 Compare February 19, 2026 23:21
@jeremylt jeremylt merged commit a49e5d5 into main Feb 20, 2026
31 checks passed
@jeremylt jeremylt deleted the zach/add-sequential-and-at-points-fixes branch February 20, 2026 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants