news
| Jun 12, 2026 | Introducing Agent’s Last Exam, a large-scale benchmark evaluating AI agents on long-horizon, economically valuable professional tasks. Check out what we find on the importance of the harness vs. models, and how we confirm the downgrade of Claude Fable 5. |
|---|---|
| Jun 12, 2026 | Two papers are accepted by ICML 2026: |
| Jun 12, 2026 | A new preprint on implicit Chain-of-Thought is out! |
| Oct 31, 2025 | I’m looking for a research internship in Summer 2026 . I’d be happy to connect if you’re interested in my research! |
| Sep 18, 2025 | Three papers are accepted by NeurIPS 2025: |