qwen/qwen3-vl-30b-a3b-instruct
Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| Alibaba | $0.130 | $0.520 | 131K | 100% |
| AtlasCloudfp8 | $0.150 | $0.600 | 128K | 98.7% |
| DeepInfrafp8 | $0.150 | $0.600 | 262K | 100% |
| Novitabf16 | $0.200 | $0.700 | 131K | 97.1% |
| Phala | $0.200 | $0.700 | 128K | 94.9% |
| SiliconFlowfp8 | $0.290 | $1.00 | 262K | 96.6% |
Qwen3 VL 30B A3B Instruct costs $0.130 per million input tokens and $0.520 per million output tokens via OpenRouter, making it 70th cheapest of 298 paid models.
Qwen3 VL 30B A3B Instruct scores 16.0 on the Artificial Analysis Intelligence Index, ranking 138th of 179 benchmarked models, with a GPQA Diamond score of 70%.
Qwen3 VL 30B A3B Instruct generates around 46 tokens per second with 361ms time-to-first-token (p50), the 180th fastest tracked model.
Qwen3 VL 30B A3B Instruct supports a 262K-token context window and can output up to 33K tokens. It accepts text, image input.