Together AI

Cheap, fast inference for open models — Llama 3.3 70B at $0.88 per million tokens

together.ai ↗

API

Visit site

Overall score

2.8/ 5

SME Fit3/5

usage-metered pricing + free trial

JTBD4/5

solid named JTBD

Integration3/5

API

Trust3/5

growing, founded 2022

Quality1/5

no public rating

Compliance2/5

compliance unknown

About

Together AI is a managed inference platform for open-source models — Llama, Mistral, Qwen, DeepSeek, FLUX — at a fraction of OpenAI/Anthropic prices. Pay-as-you-go per-token billing, dedicated GPU deployments from $3.99/hr, and fine-tuning from $0.48 per million tokens. The catalog of supported models is the differentiator: Together stays current on day-one drops where competitors lag.

Best for: Builders running open-source LLMs in production who don't want to manage GPUs. Pick Together AI when you want a wide model catalog at predictable per-token prices, with dedicated GPU as an option once volume justifies it.

Pricing

Tier	Monthly	Annual /mo	Billing	Notes
Serverless inference	—	—	usage	Per-token pricing across model catalog;Llama 3.3 70B: $0.88/M tokens;Vision and audio model pricing;Image generation per-image;Batch API ~50% cheaper;OpenAI-compatible endpoints · Pay-as-you-go. Most builders start here.
Dedicated inference	—	—	usage	Single-tenant GPU deployments;$3.99-$9.95/hr (H100, H200, B200);Reserved capacity for 6+ days at discounted rates · Hourly billing. Move here when serverless rate-limits become a constraint.
GPU clusters	—	—	usage	On-demand: $3.49-$7.49/hr;Reserved 6+ days: $2.55-$7.15/hr;Bare-metal training and inference · For teams running training or large-scale custom deployments.

Key features

OpenAI-compatible API for drop-in replacement
Wide model catalog (Llama, Mistral, Qwen, DeepSeek, FLUX)
Dedicated inference at $3.99-$9.95/hr
Fine-tuning from $0.48 per million tokens
Image generation from $0.0019/image
Audio TTS and Whisper transcription
Batch API at ~50% off serverless prices
On-demand GPU clusters from $3.49/hr

Integrations

OpenAI-compatible APILangChainLlamaIndexHeliconeOpenRouter

Trust & compliance

Stage range: —
Founded: 2022
Status: active
SOC 2: unknown
GDPR: unknown
Data residency: unknown
External rating: —
Last verified: May 2026

Reviews

Be the first to share your experience.

Related tools in agent_infra

See all ai agent infrastructure →

Ollama3.6
The easiest way to run open language models locally
Pinecone3.4
Reference vector database for RAG and semantic search — Starter tier is free up to 2GB
Hugging Face3.3
The model hub the open-source AI ecosystem runs on — free Spaces, $9 PRO, $20/user Team
Replicate3.2
Run, fine-tune, and deploy AI models with one line of code

Pairs well with

ChatGPTllm_chat
General-purpose AI assistant from OpenAI
Cursorcoding
AI-first code editor
ManyChatsupport
AI chat automation for Messenger/IG/SMS

Compared to other tools

Groq vs Together AI