news

Jun 12, 2026 Introducing Agent’s Last Exam, a large-scale benchmark evaluating AI agents on long-horizon, economically valuable professional tasks. Check out what we find on the importance of the harness vs. models, and how we confirm the downgrade of Claude Fable 5.
Jun 12, 2026 Two papers are accepted by ICML 2026:
Jun 12, 2026 A new preprint on implicit Chain-of-Thought is out!
Oct 31, 2025 I’m looking for a research internship in Summer 2026 . I’d be happy to connect if you’re interested in my research!
Sep 18, 2025 Three papers are accepted by NeurIPS 2025: