Together AI
Cheap, fast inference for open models — Llama 3.3 70B at $0.88 per million tokens
Overall score
About
Together AI is a managed inference platform for open-source models — Llama, Mistral, Qwen, DeepSeek, FLUX — at a fraction of OpenAI/Anthropic prices. Pay-as-you-go per-token billing, dedicated GPU deployments from $3.99/hr, and fine-tuning from $0.48 per million tokens. The catalog of supported models is the differentiator: Together stays current on day-one drops where competitors lag.
Best for: Builders running open-source LLMs in production who don't want to manage GPUs. Pick Together AI when you want a wide model catalog at predictable per-token prices, with dedicated GPU as an option once volume justifies it.
Pricing
| Tier | Monthly | Annual /mo | Billing | Notes |
|---|---|---|---|---|
| Serverless inference | — | — | usage | Per-token pricing across model catalog;Llama 3.3 70B: $0.88/M tokens;Vision and audio model pricing;Image generation per-image;Batch API ~50% cheaper;OpenAI-compatible endpoints · Pay-as-you-go. Most builders start here. |
| Dedicated inference | — | — | usage | Single-tenant GPU deployments;$3.99-$9.95/hr (H100, H200, B200);Reserved capacity for 6+ days at discounted rates · Hourly billing. Move here when serverless rate-limits become a constraint. |
| GPU clusters | — | — | usage | On-demand: $3.49-$7.49/hr;Reserved 6+ days: $2.55-$7.15/hr;Bare-metal training and inference · For teams running training or large-scale custom deployments. |
Key features
- OpenAI-compatible API for drop-in replacement
- Wide model catalog (Llama, Mistral, Qwen, DeepSeek, FLUX)
- Dedicated inference at $3.99-$9.95/hr
- Fine-tuning from $0.48 per million tokens
- Image generation from $0.0019/image
- Audio TTS and Whisper transcription
- Batch API at ~50% off serverless prices
- On-demand GPU clusters from $3.49/hr
Integrations
Trust & compliance
- Stage range
- —
- Founded
- 2022
- Status
- active
- SOC 2
- unknown
- GDPR
- unknown
- Data residency
- unknown
- External rating
- —
- Last verified
- May 2026
Reviews
Be the first to share your experience.
Related tools in agent_infra
See all ai agent infrastructure →- Ollama3.6
The easiest way to run open language models locally
- Pinecone3.4
Reference vector database for RAG and semantic search — Starter tier is free up to 2GB
- Hugging Face3.3
The model hub the open-source AI ecosystem runs on — free Spaces, $9 PRO, $20/user Team
- Replicate3.2
Run, fine-tune, and deploy AI models with one line of code
Pairs well with
- ChatGPTllm_chat
General-purpose AI assistant from OpenAI
- Cursorcoding
AI-first code editor
- ManyChatsupport
AI chat automation for Messenger/IG/SMS