modelgrep
Explainer

LLM Pricing Explained: Understanding Token Costs

How pricing works and strategies to reduce your API costs.

LLM APIs charge by the token—but what does that actually mean for your costs? Here's how pricing works and how to estimate what you'll pay.

What's a Token?

A token is roughly 4 characters or 0.75 words in English. The sentence "Hello, how are you today?" is about 7 tokens. Code typically uses more tokens per line than prose because of syntax characters.

Rule of thumb: 1,000 tokens ≈ 750 words ≈ 1-2 pages of text.

Input vs Output Pricing

Most models charge differently for input (your prompt) and output (the response):

  • Input tokens: What you send (system prompt, user message, context)
  • Output tokens: What the model generates

Output tokens are typically 2-5x more expensive than input tokens. This means long responses cost more than long prompts.

Example Cost Calculation

For a model priced at $1/M input, $3/M output:

  • You send 500 tokens (input): $0.0005
  • Model responds with 200 tokens (output): $0.0006
  • Total cost per request: $0.0011

At 10,000 requests per day, that's $11/day or ~$330/month.

Cost Reduction Strategies

1. Shorten Your Prompts

Every token in your system prompt is charged on every request. Trim unnecessary instructions. Be concise.

2. Limit Output Length

Set max_tokens to prevent runaway responses. If you need 100 words, don't allow 1,000.

3. Use Cheaper Models for Simple Tasks

Route simple queries to smaller, cheaper models. Save expensive models for complex reasoning.

4. Cache Common Responses

If users ask similar questions, cache responses instead of re-querying the API.

Compare pricing across models

Sort by input/output price to find the most cost-effective options for your use case.

View Pricing

Hidden Costs to Watch

  • System prompts multiply. A 500-token system prompt sent 10,000 times = 5M input tokens.
  • Context accumulation. Chat apps that send full history get expensive fast. Consider summarization.
  • Retries. Failed requests that you retry still cost money for the input tokens sent.