Skip to content

CTest#143

Closed
chen2021673 wants to merge 13 commits intomasterfrom
CTest
Closed

CTest#143
chen2021673 wants to merge 13 commits intomasterfrom
CTest

Conversation

@chen2021673
Copy link
Copy Markdown
Contributor

No description provided.

luoyueyuguang and others added 13 commits March 25, 2026 11:34
- Add infini_train_add_test CMake macro for simplified test registration
- Integrate gtest_discover_tests for automatic test case discovery
- Refactor all test directories to use unified macro (autograd, optimizer, hook, slow, lora)
- Reduce test CMakeLists.txt code by 68%
- Add LoRA tests (12 test cases)
- Delete TEST_REPORT.md
- Test labels: cpu/cuda/distributed/slow for flexible test execution
- Add shared test_macros.cmake in tests/common/

BREAKING CHANGE: Test registration now uses macro instead of manual add_test()

Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
…d signed change

- Group results into improvements / regressions / normal sections
- Only regressions cause exit code 1; improvements print but pass
- Show signed percentage (+/-) instead of absolute error

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… paths

Cast backward gradients to fp32 for bf16 compute in matmul, linear, and
outer ops to preserve accumulation precision.
Add vectorized no-broadcast fast paths for elementwise forward/backward
kernels, skip unnecessary Fill(0) when cuBLAS beta=0 fully overwrites
output, and cast saved tensors to forward compute dtype in SetupContext.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ous guards

- Add FIXME in Linear::SetupContext and Matmul::SetupContext noting that an
  extra cast is performed because autocast runs before autograd; compute_dtype
  should come from autocast, not from output tensor dtype.
- Add IsContiguous() to Tensor class and guard both fast paths in
  elementwise.cu (forward and backward) so non-contiguous tensors fall back to
  the broadcast path until proper stride tracking is added.
- Replace silent dtype cast in AccumulateGrad with a WARNING log; grad is now
  used as-is when dtype mismatch is detected.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Print baseline/test paths at the start of output and update argument
help text. In compare_tps, flip signed_change to (test-baseline)/baseline
so positive means test is faster and negative means regression.
…memory

Add needs_input_grad_ tracking in autograd Function to skip unnecessary
gradient allocation and computation for frozen (requires_grad=false)
parameters. For LoRA fine-tuning, this avoids allocating grad_weight
tensors for all frozen base model weights, reducing peak GPU memory
from ~10.7GB to ~7.7GB.

Also consolidate LinearBackward loose params into LinearMeta and
LinearGradFlags structs for clarity.
Simplifies the Linear autograd function by removing the intermediate LinearMeta struct and passing parameters directly to kernel implementations.
…needed

Previously, saved_tensors_ was set twice: first with cast tensors for
both input and weight, then immediately overwritten with the
needs_input_grad-conditional version without casting. This meant saved
tensors were never cast to compute_dtype, causing dtype mismatches in
backward.
Replace std::random_device with 42 + omp_get_thread_num() to ensure reproducible LoRA initialization across runs.
Replace TEST_F with TEST_P across all test suites so each suite runs on
both CPU and CUDA without duplicating test logic. Adds InfiniTrainTestP,
TensorTestBaseP, AutogradTestBaseP, and DistributedInfiniTrainTestP base
classes with automatic CUDA/NCCL skip guards. Introduces
INFINI_TRAIN_REGISTER_TEST* C++ macros and infini_train_add_test_suite
CMake macro to eliminate repetitive INSTANTIATE_TEST_SUITE_P /
infini_train_add_test boilerplate. Removes deprecated test/, slow/, and
split optimizer test files; consolidates optimizer tests into a single
binary with creation  + step suites.
@chen2021673 chen2021673 changed the title C test CTest Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants