StepFun: Step 3.7 Flash wins on more metrics (5 of 9), but the right pick depends on what you optimize for — see the breakdown below.
| Metric | OpenAI: GPT-5.1-Codex | StepFun: Step 3.7 Flash |
|---|---|---|
| Intelligence Index | 43.1✓ | 42.6 |
| Coding Index | 36.6 | 37.1✓ |
| GPQA Diamond | 86%✓ | 81% |
| Design Arena Elo | 1218 | 1232✓ |
| Speed (tokens/sec) | 94✓ | 74 |
| Latency | 3.5s | 1.8s✓ |
| Input price /M | $1.25 | $0.200✓ |
| Output price /M | $10.00 | $1.15✓ |
| Context window | 400K✓ | 256K |
| Capabilities | ReasoningToolsJSONVision | ReasoningToolsJSONVision |