qwen/qwen3-vl-8b-instruct
Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| Novitafp8 | $0.080 | $0.500 | 131K | 98.2% |
| AtlasCloudfp8 | $0.080 | $0.500 | 128K | 98.8% |
| Alibaba | $0.117 | $0.455 | 131K | 99.7% |
| Parasailbf16 | $0.250 | $0.750 | 262K | 99.7% |
Qwen3 VL 8B Instruct costs $0.080 per million input tokens and $0.500 per million output tokens via OpenRouter, making it 38th cheapest of 298 paid models.
Qwen3 VL 8B Instruct scores 14.3 on the Artificial Analysis Intelligence Index, ranking 150th of 178 benchmarked models, with a GPQA Diamond score of 43%.
Qwen3 VL 8B Instruct generates around 54 tokens per second with 462ms time-to-first-token (p50), the 154th fastest tracked model.
Qwen3 VL 8B Instruct supports a 256K-token context window and can output up to 33K tokens. It accepts image, text input.