Mistral Medium 3.5 is the best Mistral model for coding, with a 35.4 Artificial Analysis Coding Index across benchmarks like SWE-bench and SciCode. Devstral 2 2512 (23.7) and Mistral Large 3 2512 (22.7) round out the top three.
AI models ranked by the Artificial Analysis Coding Index, measuring real-world software engineering ability across benchmarks like SWE-bench, SciCode and terminal tasks. The best LLMs for code generation, debugging and agentic development.
Mistral Medium 3.5 is the best Mistral model for coding, with a 35.4 Artificial Analysis Coding Index across benchmarks like SWE-bench and SciCode. Devstral 2 2512 (23.7) and Mistral Large 3 2512 (22.7) round out the top three.
Devstral 2 2512 (23.7) is the closest alternative on this metric, followed by Mistral Large 3 2512 (22.7). See the full ranking above for the tradeoffs.
modelgrep tracks 19 Mistral models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Mistral Medium 3.5. 9 of them qualify for this ranking.