Frontier AI Models Struggle With Humanity’s Last Exam Benchmark
The world of artificial intelligence (AI) has been rapidly evolving, with new advancements arriving almost daily. However, even as these developments continue to impress, the frontier AI models have faced significant challenges when confronted with an ambitious new benchmark: “Humanity’s Last Exam” (HLE). A project spearheaded by the Center for AI Safety and Scale AI, this benchmark aims to test AI systems on a diverse range of academic disciplines to determine their ability to handle complex, specialized tasks. However, even the most advanced AI models have struggled to make meaningful progress on the test, and the results have raised questions … Read more