Artificial Analysis Intelligence Index

A composite benchmark score (0–100) from Artificial Analysis that combines reasoning, knowledge, math and science evals into one comparable intelligence number.

The Intelligence Index is an independent composite score published by Artificial Analysis. It blends multiple hard evaluations — including GPQA Diamond (graduate-level science), Humanity's Last Exam, instruction following, and math/coding tasks — into a single number that tracks general capability.

Because every model is run through the same harness, the index is one of the most reliable ways to compare models across labs — more robust than any single benchmark, and harder to game than self-reported scores.

As a rough guide on today's scale: 50+ is frontier-class, 35–50 is strong production quality, 20–35 is capable for routine tasks, and below 20 suits narrow or high-volume budget work.

Smartest LLMs →GPQA →

More terms

Context window →Tokens per second (throughput) →Time to first token (latency) →GPQA (Diamond) →Elo rating (for LLMs) →Prompt caching →