CTest by chen2021673 · Pull Request #143 · InfiniTensor/InfiniTrain

chen2021673 · 2026-04-13T07:58:30Z

No description provided.

- Add infini_train_add_test CMake macro for simplified test registration - Integrate gtest_discover_tests for automatic test case discovery - Refactor all test directories to use unified macro (autograd, optimizer, hook, slow, lora) - Reduce test CMakeLists.txt code by 68% - Add LoRA tests (12 test cases) - Delete TEST_REPORT.md - Test labels: cpu/cuda/distributed/slow for flexible test execution - Add shared test_macros.cmake in tests/common/ BREAKING CHANGE: Test registration now uses macro instead of manual add_test() Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

…d signed change - Group results into improvements / regressions / normal sections - Only regressions cause exit code 1; improvements print but pass - Show signed percentage (+/-) instead of absolute error Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… paths Cast backward gradients to fp32 for bf16 compute in matmul, linear, and outer ops to preserve accumulation precision. Add vectorized no-broadcast fast paths for elementwise forward/backward kernels, skip unnecessary Fill(0) when cuBLAS beta=0 fully overwrites output, and cast saved tensors to forward compute dtype in SetupContext. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ous guards - Add FIXME in Linear::SetupContext and Matmul::SetupContext noting that an extra cast is performed because autocast runs before autograd; compute_dtype should come from autocast, not from output tensor dtype. - Add IsContiguous() to Tensor class and guard both fast paths in elementwise.cu (forward and backward) so non-contiguous tensors fall back to the broadcast path until proper stride tracking is added. - Replace silent dtype cast in AccumulateGrad with a WARNING log; grad is now used as-is when dtype mismatch is detected. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Print baseline/test paths at the start of output and update argument help text. In compare_tps, flip signed_change to (test-baseline)/baseline so positive means test is faster and negative means regression.

…all cudaMallocAsync

…memory Add needs_input_grad_ tracking in autograd Function to skip unnecessary gradient allocation and computation for frozen (requires_grad=false) parameters. For LoRA fine-tuning, this avoids allocating grad_weight tensors for all frozen base model weights, reducing peak GPU memory from ~10.7GB to ~7.7GB. Also consolidate LinearBackward loose params into LinearMeta and LinearGradFlags structs for clarity.

Simplifies the Linear autograd function by removing the intermediate LinearMeta struct and passing parameters directly to kernel implementations.

…needed Previously, saved_tensors_ was set twice: first with cast tensors for both input and weight, then immediately overwritten with the needs_input_grad-conditional version without casting. This meant saved tensors were never cast to compute_dtype, causing dtype mismatches in backward.

Replace std::random_device with 42 + omp_get_thread_num() to ensure reproducible LoRA initialization across runs.

Replace TEST_F with TEST_P across all test suites so each suite runs on both CPU and CUDA without duplicating test logic. Adds InfiniTrainTestP, TensorTestBaseP, AutogradTestBaseP, and DistributedInfiniTrainTestP base classes with automatic CUDA/NCCL skip guards. Introduces INFINI_TRAIN_REGISTER_TEST* C++ macros and infini_train_add_test_suite CMake macro to eliminate repetitive INSTANTIATE_TEST_SUITE_P / infini_train_add_test boilerplate. Removes deprecated test/, slow/, and split optimizer test files; consolidates optimizer tests into a single binary with creation + step suites.

luoyueyuguang and others added 13 commits March 25, 2026 11:34

feat: expand test infrastructure

1e12d2e

fix: make distributed labels selectable

0d6321c

refactor(compare): clarify baseline vs test semantics in compare scripts

b7b4e52

Print baseline/test paths at the start of output and update argument help text. In compare_tps, flip signed_change to (test-baseline)/baseline so positive means test is faster and negative means regression.

perf(cuda/elementwise): pass broadcast strides by value to kill per-c…

d993eaa

…all cudaMallocAsync

refactor(linear): replace LinearMeta struct with direct member variables

0898e9d

Simplifies the Linear autograd function by removing the intermediate LinearMeta struct and passing parameters directly to kernel implementations.

fix(rng): use fixed seed for deterministic LoRA init

871b71e

Replace std::random_device with 42 + omp_get_thread_num() to ensure reproducible LoRA initialization across runs.

chen2021673 changed the title ~~C test~~ CTest Apr 13, 2026

chen2021673 closed this Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CTest#143

CTest#143
chen2021673 wants to merge 13 commits intomasterfrom
CTest

chen2021673 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chen2021673 commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants