modelgrep

Qwen: Qwen3 235B A22B Thinking 2507 vs xAI: Grok 4.20

Qwen: Qwen3 235B A22B Thinking 2507 wins on more metrics (5 of 9), but the right pick depends on what you optimize for — see the breakdown below.

MetricQwen: Qwen3 235B A22B Thinking 2507xAI: Grok 4.20
Intelligence Index29.529.7
Coding Index23.225.4
GPQA Diamond79%79%
Design Arena Elo1097
Speed (tokens/sec)7281
Latency508ms701ms
Input price /M$0.100$1.25
Output price /M$0.100$2.50
Context window262K2M
CapabilitiesReasoningToolsJSONReasoningToolsJSONVision