Skip to content
View runhaoli-creator's full-sized avatar

Block or report runhaoli-creator

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
runhaoli-creator/README.md

Runhao Li

Building intelligent multi-agent LLM systems

LLM Agent Engineer · MS CS @ University of Southern California

Email LinkedIn GitHub


About

I design and post-train multi-agent LLM systems — from learned inter-agent communication protocols to data-centric alignment pipelines to production vLLM serving stacks. My focus is on agent teams that are reliable, interpretable, and cheap to deploy end-to-end.

Current interests: multi-agent orchestration · post-training (DPO / KTO / GRPO) · vLLM multi-LoRA serving · agent evaluation harnesses · learned latent communication protocols.


Featured Projects

latent-agent-team  ·  Budgeted Multi-Agent Communication

Five-agent team (Planner · Retriever · Browser · Verifier · Memory) that replaces natural-language inter-agent messages with learned latent channels — continuous embeddings or VQ codes with an adaptive bitrate scheduler.

Results · Mind2Web 81.5% ElemAcc · WebShop 72.4% SR · AgentBench 66.8% SR

acm-icl  ·  Autonomy-Calibrated Multi-Agent In-Context Learning

Four-stage inference pipeline (Solver → Skeptic → Verifier → Calibrated Judge) with DD-CoT structured reasoning and per-peer reliability scoring for epistemic robustness under adversarial peer pressure.

Results · 73.9% average across 5 peer-pressure benchmarks · +13.7 pp over strongest multi-agent-debate baseline (MAD)

dmapo  ·  Data-centric Multi-Agent Preference Optimization

Six-stage data-centric alignment pipeline — prompts → on-policy generation → three-judge multi-agent scoring (Qwen3-8B) → process critic → confidence gating → KTO. Unified trainer supporting DPO / KTO / ORPO / SimPO / SFT.

Results · Mistral-7B on only 1,871 gated examples (3.45% accept rate) beats every baseline trained on 10–20k — MT-Bench 7.62 · AlpacaEval 96.3% · win-rate 85.3% vs. 68.2% best baseline


More Agent Research

Repo Summary
updr-reasoning Uncertainty-Prompted Debate and Repair — adaptive-compute multi-persona reasoning with uncertainty-gated self-repair
RAMTL Role-Adaptive Multi-Tool Learning — single-backbone multi-role agent framework for tool use and function calling
DEAMS Decentralized Epistemic Alignment for Multimodal Swarms — MA-GRPO across heterogeneous Qwen-VL / InternVL agents
PAGC Partner-Adaptive Grounded Communication — cooperative MARL with emergent text-grounded communication
KTM-WM Training-free kernel-trick world models for LLM agent planning (beam / MPC / CEM planners)

Tech Stack

LLM / Agent Frameworks

LangChain LangGraph HuggingFace TRL PEFT MCP Function%20Calling Outlines

Models

Qwen Llama Mistral Gemma Phi Qwen-VL

Retrieval / RAG

FAISS BGE BM25 Reranker HyDE Vector%20DB

Training / Post-Training

PyTorch QLoRA DPO KTO GRPO DeepSpeed Accelerate FSDP

Deployment / Serving

vLLM FastAPI Docker AWS SageMaker Redis

Evaluation / MLOps

wandb MLflow LLM--as--Judge pytest

General

Python Bash Git Linux Hydra


GitHub Stats

Runhao's GitHub stats

Top Languages

GitHub Streak


Open to full-time LLM Agent Engineer roles · 2026

runhaoli@usc.edu  ·  LinkedIn  ·  GitHub

Popular repositories Loading

  1. PHINO PHINO Public

    Python 1

  2. dmapo dmapo Public

    Direct multi-agent policy optimization — unified DPO/KTO/ORPO/SimPO framework.

    Python 1

  3. DiffArena DiffArena Public

    Python

  4. Research_agent Research_agent Public

    TeX

  5. acm-icl acm-icl Public

    Autonomy-calibrated multi-agent in-context learning with vLLM multi-LoRA serving.

    Python

  6. latent-agent-team latent-agent-team Public

    Budgeted latent communication for multi-agent LLM teams.

    Python