modelgrep

Cheapest LLMs

Quick answer · Updated June 2026

The cheapest LLM is Ling-2.6-flash at $0.010 per million input tokens. Granite 4.0 Micro ($0.017) and Llama 3.1 8B Instruct ($0.020) round out the top three.

$0.010Input /M
26.2Intelligence
153 t/sSpeed
262KContext

AI models ranked by input token price. The most affordable large language model APIs, from budget open-weight models to discounted frontier models.

  1. 1I
    ling-2.6-flash
    ToolsJSON26.2 intel · 153 t/s · 871ms ttft
    $0.010
    Input /M
  2. 2I
    granite-4.0-h-micro
    7.7 intel · 29 t/s · 454ms ttft
    $0.017
    Input /M
  3. 3M
    llama-3.1-8b-instruct
    ToolsJSON11.8 intel · 147 t/s · 141ms ttft
    $0.020
    Input /M
  4. 4M
    mistral-nemo
    ToolsJSON76 t/s · 272ms ttft · 131K ctx
    $0.020
    Input /M
  5. 5M
    llama-3.2-1b-instruct
    6.3 intel · 83 t/s · 312ms ttft
    $0.027
    Input /M
  6. 6O
    gpt-oss-20b
    ReasoningToolsJSON24.5 intel · 348 t/s · 235ms ttft
    $0.029
    Input /M
  7. 7L
    lfm-2-24b-a2b
    10.5 intel · 52 t/s · 215ms ttft
    $0.030
    Input /M
  8. 8A
    nova-micro-v1
    Tools10.3 intel · 73 t/s · 293ms ttft
    $0.035
    Input /M
  9. 9C
    command-r7b-12-2024
    JSON55 t/s · 239ms ttft · 128K ctx
    $0.037
    Input /M
  10. 10O
    gpt-oss-120b
    ReasoningToolsJSON24.5 intel · 450 t/s · 181ms ttft
    $0.039
    Input /M
  11. 11Q
    qwen-2.5-7b-instruct
    79 t/s · 384ms ttft · 131K ctx
    $0.040
    Input /M
  12. 12S
    l3-lunaris-8b
    JSON70 t/s · 147ms ttft · 8K ctx
    $0.040
    Input /M
  13. 13A
    trinity-mini
    ReasoningToolsJSON42 t/s · 694ms ttft · 131K ctx
    $0.045
    Input /M
  14. 14Q
    qwen3-30b-a3b-instruct-2507
    ToolsJSON15.0 intel · 69 t/s · 242ms ttft
    $0.048
    Input /M
  15. 15I
    granite-4.1-8b
    ToolsJSON12.4 intel · 73 t/s · 197ms ttft
    $0.050
    Input /M
  16. 16N
    nemotron-3-nano-30b-a3b
    ReasoningToolsJSON24.3 intel · 141 t/s · 434ms ttft
    $0.050
    Input /M
  17. 17O
    gpt-5-nano
    ReasoningToolsJSON+125.9 intel · 100 t/s · 2.1s ttft
    $0.050
    Input /M
  18. 18Q
    qwen3-8b
    ReasoningToolsJSON10.6 intel · 4 t/s · 615ms ttft
    $0.050
    Input /M
  19. 19G
    gemma-3-4b-it
    JSONVision6.3 intel · 20 t/s · 529ms ttft
    $0.050
    Input /M
  20. 20G
    gemma-3-12b-it
    ToolsJSONVision8.8 intel · 32 t/s · 538ms ttft
    $0.050
    Input /M
  21. 21M
    mistral-small-24b-instruct-2501
    JSON38 t/s · 322ms ttft · 33K ctx
    $0.050
    Input /M
  22. 22M
    llama-3.2-3b-instruct
    72 t/s · 262ms ttft · 131K ctx
    $0.051
    Input /M
  23. 23G
    gemma-4-26b-a4b-it
    ReasoningToolsJSON+131.2 intel · 56 t/s · 366ms ttft
    $0.060
    Input /M
  24. 24Z
    glm-4.7-flash
    ReasoningToolsJSON30.1 intel · 39 t/s · 343ms ttft
    $0.060
    Input /M
  25. 25G
    gemma-3n-e4b-it
    24 t/s · 383ms ttft · 33K ctx
    $0.060
    Input /M

Frequently asked

What is the cheapest LLM?

The cheapest LLM is Ling-2.6-flash at $0.010 per million input tokens. Granite 4.0 Micro ($0.017) and Llama 3.1 8B Instruct ($0.020) round out the top three.

What's a good alternative to Ling-2.6-flash?

Granite 4.0 Micro ($0.017) is the closest alternative on this metric, followed by Llama 3.1 8B Instruct ($0.020). See the full ranking above for the tradeoffs.

By maker

All rankings