news | Yixiao HUANG

Jun 12, 2026	Introducing Agent’s Last Exam, a large-scale benchmark evaluating AI agents on long-horizon, economically valuable professional tasks. Check out what we find on the importance of the harness vs. models, and how we confirm the downgrade of Claude Fable 5.
Jun 12, 2026	Two papers are accepted by ICML 2026: Multi-Objective Learning for Diffusion Models: A Statistical Theory under Semi-Supervised Learning Breaking the Reversal Curse in Autoregressive Language Models via Identity Bridge (Spotlight)
Jun 12, 2026	A new preprint on implicit Chain-of-Thought is out! Transformers Provably Learn to Internalize Chain-of-Thought
Oct 31, 2025	I’m looking for a research internship in Summer 2026 . I’d be happy to connect if you’re interested in my research!
Sep 18, 2025	Three papers are accepted by NeurIPS 2025: OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models (D&B Track) Generalization or Hallucination? Understanding Out-of-Context Reasoning in Transformers Understanding and Improving Fast Adversarial Training against $\ell_0$ Bounded Perturbations