Together AI
Cheap, fast inference for open models — Llama 3.3 70B at $0.88 per million tokens
Overall score
About
Together AI is a managed inference platform for open-source models — Llama, Mistral, Qwen, DeepSeek, FLUX — at a fraction of OpenAI/Anthropic prices. Pay-as-you-go per-token billing, dedicated GPU deployments from $3.99/hr, and fine-tuning from $0.48 per million tokens. The catalog of supported models is the differentiator: Together stays current on day-one drops where competitors lag.
Best for: Builders running open-source LLMs in production who don't want to manage GPUs. Pick Together AI when you want a wide model catalog at predictable per-token prices, with dedicated GPU as an option once volume justifies it.
Pricing
Serverless inference
- Monthly
- n/a
- Annual /mo
- n/a
- Billing
- usage
- Notes
- Per-token pricing across model catalog;Llama 3.3 70B: $0.88/M tokens;Vision and audio model pricing;Image generation per-image;Batch API ~50% cheaper;OpenAI-compatible endpoints · Pay-as-you-go. Most builders start here.
Dedicated inference
- Monthly
- n/a
- Annual /mo
- n/a
- Billing
- usage
- Notes
- Single-tenant GPU deployments;$3.99-$9.95/hr (H100, H200, B200);Reserved capacity for 6+ days at discounted rates · Hourly billing. Move here when serverless rate-limits become a constraint.
GPU clusters
- Monthly
- n/a
- Annual /mo
- n/a
- Billing
- usage
- Notes
- On-demand: $3.49-$7.49/hr;Reserved 6+ days: $2.55-$7.15/hr;Bare-metal training and inference · For teams running training or large-scale custom deployments.
| Tier | Monthly | Annual /mo | Billing | Notes |
|---|---|---|---|---|
| Serverless inference | n/a | n/a | usage | Per-token pricing across model catalog;Llama 3.3 70B: $0.88/M tokens;Vision and audio model pricing;Image generation per-image;Batch API ~50% cheaper;OpenAI-compatible endpoints · Pay-as-you-go. Most builders start here. |
| Dedicated inference | n/a | n/a | usage | Single-tenant GPU deployments;$3.99-$9.95/hr (H100, H200, B200);Reserved capacity for 6+ days at discounted rates · Hourly billing. Move here when serverless rate-limits become a constraint. |
| GPU clusters | n/a | n/a | usage | On-demand: $3.49-$7.49/hr;Reserved 6+ days: $2.55-$7.15/hr;Bare-metal training and inference · For teams running training or large-scale custom deployments. |
Startup offer
Included in hub programs
Key features
- OpenAI-compatible API for drop-in replacement
- Wide model catalog (Llama, Mistral, Qwen, DeepSeek, FLUX)
- Dedicated inference at $3.99-$9.95/hr
- Fine-tuning from $0.48 per million tokens
- Image generation from $0.0019/image
- Audio TTS and Whisper transcription
- Batch API at ~50% off serverless prices
- On-demand GPU clusters from $3.49/hr
Integrations
Trust & compliance
- Stage range
- MVP → Growth
- Founded
- 2022
- Status
- active
- SOC 2
- unknown
- GDPR
- unknown
- Data residency
- unknown
- External rating
- n/a
- Last verified
- Jun 2026
Reviews
Be the first to share your experience.