tensor: allow compute to bypass caching mechanism#408
tensor: allow compute to bypass caching mechanism#408rohany wants to merge 1 commit intotensor-compiler:masterfrom
compute to bypass caching mechanism#408Conversation
This commit adds a boolean to the `TensorBase::compute` method allowing for bypass of the compute caching mechanism. This is useful when benchmarking the generated kernels.
|
cc @stephenchouca this PR makes it easier to write benchmarks, so I'd like to merge it (and also into the |
|
Is this change really necessary? As an example, if one wants to benchmark SpMV multiple times, they can already do something like this with the current API: for (int t = 0; t < trials; t++) {
Tensor<double> y({A.getDimension(0)}, dv);
y(i) = A(i,j) * x(j);
y.compile();
auto begin = std::chrono::high_resolution_clock::now();
y.assemble();
y.compute();
auto end = std::chrono::high_resolution_clock::now();
auto diff = std::chrono::duration<double,std::milli>(end - begin).count();
}The compute kernel would be run in every iteration since More fundamentally though, I wonder if perhaps we should be discouraging people from benchmarking TACO using the lazy evaluation API. As things are now, it can be difficult for users to figure out what's actually being timed, since there's so much happening in the background. For instance, if |
probably not. I understand that you can benchmark it this way, but it just becomes sort of cumbersome and slower to run than being able to just do the computation in the benchmark loop.
I agree. I don't have suggestions as of now, but probably not how the CLI does it (manually lowering and then calling the packed functions). |
This commit adds a boolean to the
TensorBase::computemethod allowingfor bypass of the compute caching mechanism. This is useful when
benchmarking the generated kernels.