LLM Agent Engineer · MS CS @ University of Southern California
I design and post-train multi-agent LLM systems — from learned inter-agent communication protocols to data-centric alignment pipelines to production vLLM serving stacks. My focus is on agent teams that are reliable, interpretable, and cheap to deploy end-to-end.
Current interests: multi-agent orchestration · post-training (DPO / KTO / GRPO) · vLLM multi-LoRA serving · agent evaluation harnesses · learned latent communication protocols.
latent-agent-team · Budgeted Multi-Agent Communication
Five-agent team (Planner · Retriever · Browser · Verifier · Memory) that replaces natural-language inter-agent messages with learned latent channels — continuous embeddings or VQ codes with an adaptive bitrate scheduler.
Results · Mind2Web 81.5% ElemAcc · WebShop 72.4% SR · AgentBench 66.8% SR
acm-icl · Autonomy-Calibrated Multi-Agent In-Context Learning
Four-stage inference pipeline (Solver → Skeptic → Verifier → Calibrated Judge) with DD-CoT structured reasoning and per-peer reliability scoring for epistemic robustness under adversarial peer pressure.
Results · 73.9% average across 5 peer-pressure benchmarks · +13.7 pp over strongest multi-agent-debate baseline (MAD)
dmapo · Data-centric Multi-Agent Preference Optimization
Six-stage data-centric alignment pipeline — prompts → on-policy generation → three-judge multi-agent scoring (Qwen3-8B) → process critic → confidence gating → KTO. Unified trainer supporting DPO / KTO / ORPO / SimPO / SFT.
Results · Mistral-7B on only 1,871 gated examples (3.45% accept rate) beats every baseline trained on 10–20k — MT-Bench 7.62 · AlpacaEval 96.3% · win-rate 85.3% vs. 68.2% best baseline
| Repo | Summary |
|---|---|
| updr-reasoning | Uncertainty-Prompted Debate and Repair — adaptive-compute multi-persona reasoning with uncertainty-gated self-repair |
| RAMTL | Role-Adaptive Multi-Tool Learning — single-backbone multi-role agent framework for tool use and function calling |
| DEAMS | Decentralized Epistemic Alignment for Multimodal Swarms — MA-GRPO across heterogeneous Qwen-VL / InternVL agents |
| PAGC | Partner-Adaptive Grounded Communication — cooperative MARL with emergent text-grounded communication |
| KTM-WM | Training-free kernel-trick world models for LLM agent planning (beam / MPC / CEM planners) |
LLM / Agent Frameworks
Models
Retrieval / RAG
Training / Post-Training
Deployment / Serving
Evaluation / MLOps
General
Open to full-time LLM Agent Engineer roles · 2026