Qwen: Qwen3 235B A22B Thinking 2507 wins on more metrics (5 of 9), but the right pick depends on what you optimize for — see the breakdown below.
| Metric | Qwen: Qwen3 235B A22B Thinking 2507 | xAI: Grok 4.20 |
|---|---|---|
| Intelligence Index | 29.5 | 29.7✓ |
| Coding Index | 23.2 | 25.4✓ |
| GPQA Diamond | 79%✓ | 79% |
| Design Arena Elo | 1097✓ | — |
| Speed (tokens/sec) | 72 | 81✓ |
| Latency | 508ms✓ | 701ms |
| Input price /M | $0.100✓ | $1.25 |
| Output price /M | $0.100✓ | $2.50 |
| Context window | 262K | 2M✓ |
| Capabilities | ReasoningToolsJSON | ReasoningToolsJSONVision |