Yixiao HUANG
  • about
  • publications
  • cv

Announcement_11

June 12, 2026

2026

Introducing Agent’s Last Exam, a large-scale benchmark evaluating AI agents on long-horizon, economically valuable professional tasks. Check out what we find on the importance of the harness vs. models, and how we confirm the downgrade of Claude Fable 5.

© Copyright 2026 Yixiao HUANG. Powered by Jekyll with al-folio theme. Last updated: June 12, 2026.