The Arc Prize Foundation, a nonprofit co-founded by prominent AI researcher François Chollet, announced in a blog post on Monday that it has created a new, challenging test to measure the general ...
The most sophisticated AI models in existence today have scored poorly on a new benchmark designed to measure their progress towards artificial general intelligence (AGI) – and brute-force computing ...
Kolena, a startup building tools to test, benchmark and validate the performance of AI models, today announced that it raised $15 million in a funding round led by Lobby Capital with participation ...
OpenAI just released the full version of its new o1 model -- and it's dangerously committed to lying. Apollo Research tested six frontier models for "in-context scheming" -- a model's ability to take ...
We all know AI isn't perfect. Is it safe to involve large language models when the stakes are this high? At Black Hat, a team from MITRE explains how they're stress-testing today's top LLMs. When the ...
We propose a Vuong-type model-selection test for models defined by conditional moment restrictions. The moment restrictions that define the models can be standard equality restrictions that ...