Awesome Algorithm Auto Tools

A curated collection of tools, frameworks, and resources for AI-driven automated model training — letting AI agents autonomously run experiments, fine-tune models, optimize hyperparameters, and evolve themselves.

Inspired by Karpathy's AutoResearch, HuggingFace Skills, and the broader AutoML movement.

Why This List?

The paradigm is shifting: instead of manually tuning models, we now have tools that let AI agents design experiments, modify training code, evaluate results, and iterate autonomously — while you sleep.

This repository collects the best open-source tools and frameworks that make this possible across the full training lifecycle.

Autonomous Experiment / Research Frameworks
Agent-Driven Training Skills (HuggingFace Ecosystem)
LLM Fine-Tuning Frameworks
RL Alignment Training Frameworks (RLHF / GRPO)
Automated Hyperparameter Optimization / AutoML
Self-Evolving / Self-Play Training
Lightweight Pretraining Frameworks
Experiment Tracking & Orchestration
Benchmarks & Evaluation
Coding Agents (for Training Script Development)
Recommended Stacks

Autonomous Experiment / Research Frameworks

Core idea: AI agents autonomously design experiments, modify training code, evaluate results, and iterate. You sleep, AI experiments.

Project	Description	Key Highlight
AutoResearch	AI agent runs autonomous ML experiments in a loop	630 lines of Python, ~100 experiments overnight, 11% efficiency gain on GPT-2 training
AI Scientist v2	Fully automated scientific discovery with agentic tree search	Hypothesis → Experiment → Paper, no human templates needed
auto-ml-agent	LLM-orchestrated autonomous ML pipeline	End-to-end: data preprocessing → model deployment, multi-agent architecture
MLAgentBench	Benchmark for evaluating AI agents on ML experimentation	13 end-to-end ML tasks from CIFAR-10 to BabyLM
AutoAgent	Zero-code LLM agent framework with self-play customization	Create agents via natural language, iterative self-improvement
ShinkaEvolve	LLM-as-mutation-operator program evolution framework	Evolves programs for scientific discovery

Agent-Driven Training Skills (HuggingFace Ecosystem)

"Vibe Training" — use natural language to drive the full model training lifecycle through coding agents.

Project	Description	Key Highlight
HuggingFace Skills	Standardized ML skill packages for coding agents	12 skills: model training (SFT/DPO/GRPO), vision training, experiment tracking, evaluation, dataset management
HuggingFace AutoTrain	No-code training platform	Upload data → auto model selection → training → evaluation → Hub publishing

HF Skills covers:

hugging-face-model-trainer — Fine-tune LLMs with TRL (SFT, DPO, GRPO), 0.5B to 70B parameters
hugging-face-vision-trainer — Train object detection & image classification (RTDETRv2, YOLOS, ViT)
hugging-face-jobs — Run compute jobs on HF infrastructure with cost estimation
hugging-face-trackio — ML experiment tracking with real-time metrics
hugging-face-evaluation — Model evaluation with lighteval
hugging-face-datasets — Dataset creation and management
Compatible with: Claude Code, OpenAI Codex, Google Gemini CLI, Cursor

LLM Fine-Tuning Frameworks

The training engines. Upper-level agents (AutoResearch, HF Skills) ultimately call these frameworks to execute training.

Project	Description	Key Highlight
Unsloth	Ultra-efficient LLM fine-tuning & RL	2x faster, 70% less VRAM; custom CUDA kernels; MoE 12x faster; MCP Server available
Axolotl	Flexible, production-ready fine-tuning	YAML-driven; v0.8.x: QAT, sequence parallelism, GRPO, full RLHF pipeline
LlamaFactory	Unified fine-tuning with Web UI	LlamaBoard browser UI; 100+ models; SFT/RLHF/DPO/PPO
TRL	HuggingFace's RL training library	SFT, DPO, GRPO, PPO, KTO, ORPO; deep Transformers/PEFT integration
torchtune	PyTorch-native fine-tuning	No extra abstractions; multi-node support (Feb 2025)
NeMo AutoModel	NVIDIA's DTensor-native training library	Day-0 HuggingFace support; single-to-multi-node scaling

RL Alignment Training Frameworks (RLHF / GRPO)

2025-2026 trend: GRPO (Group Relative Policy Optimization) is replacing PPO as the default alignment method — no critic model needed, simpler and more stable.

Project	Description	Key Highlight
OpenRLHF	High-performance RLHF framework on Ray + vLLM	70B+ full tuning; PPO/DAPO/REINFORCE++; async agent RLHF; MARTI fork for multi-agent RL
rLLM	Post-training RL framework for language agents	Custom agents + environments → RL training → deployment; rLLM-FinQA-4B beats Qwen3-235B
LlamaGym	Online RL fine-tuning for LLM agents	Define agent → create LLM → write RL loop

Automated Hyperparameter Optimization / AutoML

Project	Description	Key Highlight
AgentHPO	LLM-driven hyperparameter optimization	Matches/surpasses human best trials on 12 ML tasks with explainable results
Optuna	Industry-standard HPO framework	Bayesian search, pruning, distributed execution, visualization dashboard
Microsoft NNI	Full AutoML toolkit	Neural Architecture Search + HPO + model compression + feature engineering
W&B Sweeps	Automated hyperparameter search + tracking	Bayesian/Grid/Random search; Hyperband early stopping; cross-machine parallelism

Self-Evolving / Self-Play Training

Core idea: Models generate their own training data to train themselves, reducing dependence on human annotations.

Project	Description	Key Highlight
SPIN	Self-Play Fine-Tuning	Model plays against its previous iterations; outperforms DPO + GPT-4 preference data without extra annotations
SPPO	Self-Play Preference Optimization	Iterative policy updates approximating Nash equilibrium with convergence guarantees
Multi-Agent Evolve	One LLM plays Proposer + Solver + Judge roles	Verified improvements on math, coding, reasoning with Qwen2.5-3B
Multiagent Finetuning	Multi-agent society from same base model	Multi-agent iteration keeps improving where single-model self-training plateaus
CORY	Cooperative multi-agent RL fine-tuning	Pioneer + Observer dual-agent paradigm (NeurIPS 2024)

Lightweight Pretraining Frameworks

Pair these with autonomous experiment frameworks — fast, small-scale training is the foundation for autonomous experimentation.

Project	Description	Key Highlight
nanochat	Minimal LLM training harness (AutoResearch's engine)	Single GPU; tokenization → pretrain → finetune → eval → chat; GPT-2 for ~$48
Nanotron	Minimal 3D-parallel LLM pretraining	Data + Tensor + Pipeline parallelism; scales from experiments to production

Experiment Tracking & Orchestration

Project	Description	Key Highlight
Weights & Biases	Experiment tracking + sweeps + model registry	Industry standard; integrates with all major frameworks
MLflow 3.0	Open-source experiment tracking + model serving	Self-hosted; nested experiments; model registry
HF Trackio	Lightweight experiment tracking in HF ecosystem	Deep integration with HF Skills; agents can read metrics and make decisions

Benchmarks & Evaluation

Benchmark	Description	Key Highlight
MLE-bench	75 Kaggle ML engineering competition tasks	Evaluates AI agents on real ML engineering: training, data prep, experiments
MLAgentBench	13 end-to-end ML experimentation tasks	Stanford SNAP; Claude v3 Opus best at 37.5%
MLRC-Bench	ML Research Competition challenges	Tests novel methodology development
LiveCodeBench	Contamination-free coding benchmark	Fresh problems from LeetCode/AtCoder/Codeforces

Coding Agents (for Training Script Development)

These agents don't train models directly, but can write and debug training code, completing the automation loop when paired with HF Skills.

Project	Description	Key Highlight
Aider	Terminal AI pair programming	Git integration; supports Claude/GPT/DeepSeek/local models
OpenHands	AI-driven software development (open-source Devin)	Autonomous code editing + execution + debugging; MIT license
SWE-agent	Autonomous GitHub issue fixer	SWE-bench open-source SOTA (NeurIPS 2024)

Recommended Stacks

Most Complete Automation

HuggingFace Skills + Claude Code + Unsloth + W&B

Natural language → Claude Code orchestrates → HF Skills calls Unsloth for training → W&B tracks experiments.

Lightest Autonomous Research

AutoResearch + nanochat (single GPU)

Start before bed, wake up to ~100 autonomous experiment results.

Most Flexible Production Setup

Axolotl / LlamaFactory + OpenRLHF + Optuna + MLflow

YAML-configured training + automated HPO + full experiment tracking.

Trends (2026)

AutoResearch Paradigm: Karpathy proved "AI autonomously doing ML research" works with just 630 lines of code
"Vibe Training": HF Skills enables natural-language-driven model training lifecycle
GRPO > PPO: DeepSeek's GRPO is becoming the default alignment method (no critic model, simpler, more stable)
Self-Play Breakthrough: Multi-agent self-evolution (SPIN, MAE) overcomes single-model self-training plateaus
MCP Standardization: Model Context Protocol adopted by OpenAI/Google/Microsoft as the "USB-C for AI agents"
Single-GPU Research: Unsloth + nanochat + AutoResearch enables individual developers to do serious LLM research

Contributing

Contributions are welcome! Please open an issue or submit a PR if you know of tools that fit this collection.

Criteria for inclusion:

Must be directly usable for automated model training workflows
Preference for open-source projects with active maintenance
Focus on tools that leverage AI/LLMs to automate the training process itself

License

This curated list is released under CC0 1.0.

Compiled March 2026. Project statuses may change — check individual GitHub repos for the latest.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
算法自动化训练.md		算法自动化训练.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Algorithm Auto Tools

Why This List?

Table of Contents

Autonomous Experiment / Research Frameworks

Agent-Driven Training Skills (HuggingFace Ecosystem)

LLM Fine-Tuning Frameworks

RL Alignment Training Frameworks (RLHF / GRPO)

Automated Hyperparameter Optimization / AutoML

Self-Evolving / Self-Play Training

Lightweight Pretraining Frameworks

Experiment Tracking & Orchestration

Benchmarks & Evaluation

Coding Agents (for Training Script Development)

Recommended Stacks

Most Complete Automation

Lightest Autonomous Research

Most Flexible Production Setup

Trends (2026)

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Awesome Algorithm Auto Tools

Why This List?

Table of Contents

Autonomous Experiment / Research Frameworks

Agent-Driven Training Skills (HuggingFace Ecosystem)

LLM Fine-Tuning Frameworks

RL Alignment Training Frameworks (RLHF / GRPO)

Automated Hyperparameter Optimization / AutoML

Self-Evolving / Self-Play Training

Lightweight Pretraining Frameworks

Experiment Tracking & Orchestration

Benchmarks & Evaluation

Coding Agents (for Training Script Development)

Recommended Stacks

Most Complete Automation

Lightest Autonomous Research

Most Flexible Production Setup

Trends (2026)

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages