Techmeme: Artificial Analysis announces AA-Omniscience, a . . . Artificial Analysis announces AA-Omniscience, a benchmark for knowledge and hallucination across 40+ topics; Claude 4 1 Opus takes first place in its key metric — Announcing AA-Omniscience, our new benchmark for knowledge and hallucination across >40 topics, where all but three models are more likely to hallucinate than give a correct answer
AA-Omniscience: Evaluating Cross-Domain Knowledge Reliability . . . We introduce AA-Omniscience, a benchmark designed to measure both factual recall and knowledge calibration across 6,000 questions Questions are derived from authoritative academic and industry sources, and cover 42 economically relevant topics within six different domains
New benchmark reveals major weakness in leading AI models Anthropic’s Claude 4 1 Opus ranked highest, demonstrating a stronger balance of accuracy and truthfulness than competitors such as xAI’s Grok 4 and OpenAI’s GPT-5 1 According to Artificial