modelgrep

OpenAI: GPT-5.1-Codex vs StepFun: Step 3.7 Flash

StepFun: Step 3.7 Flash wins on more metrics (5 of 9), but the right pick depends on what you optimize for — see the breakdown below.

MetricOpenAI: GPT-5.1-CodexStepFun: Step 3.7 Flash
Intelligence Index43.142.6
Coding Index36.637.1
GPQA Diamond86%81%
Design Arena Elo12181232
Speed (tokens/sec)9474
Latency3.5s1.8s
Input price /M$1.25$0.200
Output price /M$10.00$1.15
Context window400K256K
CapabilitiesReasoningToolsJSONVisionReasoningToolsJSONVision