← Back to all tools

Together AI

Cheap, fast inference for open models — Llama 3.3 70B at $0.88 per million tokens

API
Visit site

Overall score

2.8/ 5
SME Fit3/5usage-metered pricing + free trial
JTBD4/5solid named JTBD
Integration3/5API
Trust3/5growing, founded 2022
Quality1/5no public rating
Compliance2/5compliance unknown

About

Together AI is a managed inference platform for open-source models — Llama, Mistral, Qwen, DeepSeek, FLUX — at a fraction of OpenAI/Anthropic prices. Pay-as-you-go per-token billing, dedicated GPU deployments from $3.99/hr, and fine-tuning from $0.48 per million tokens. The catalog of supported models is the differentiator: Together stays current on day-one drops where competitors lag.

Best for: Builders running open-source LLMs in production who don't want to manage GPUs. Pick Together AI when you want a wide model catalog at predictable per-token prices, with dedicated GPU as an option once volume justifies it.

Pricing

TierMonthlyAnnual /moBillingNotes
Serverless inferenceusagePer-token pricing across model catalog;Llama 3.3 70B: $0.88/M tokens;Vision and audio model pricing;Image generation per-image;Batch API ~50% cheaper;OpenAI-compatible endpoints · Pay-as-you-go. Most builders start here.
Dedicated inferenceusageSingle-tenant GPU deployments;$3.99-$9.95/hr (H100, H200, B200);Reserved capacity for 6+ days at discounted rates · Hourly billing. Move here when serverless rate-limits become a constraint.
GPU clustersusageOn-demand: $3.49-$7.49/hr;Reserved 6+ days: $2.55-$7.15/hr;Bare-metal training and inference · For teams running training or large-scale custom deployments.

Key features

  • OpenAI-compatible API for drop-in replacement
  • Wide model catalog (Llama, Mistral, Qwen, DeepSeek, FLUX)
  • Dedicated inference at $3.99-$9.95/hr
  • Fine-tuning from $0.48 per million tokens
  • Image generation from $0.0019/image
  • Audio TTS and Whisper transcription
  • Batch API at ~50% off serverless prices
  • On-demand GPU clusters from $3.49/hr

Integrations

OpenAI-compatible APILangChainLlamaIndexHeliconeOpenRouter

Trust & compliance

Stage range
Founded
2022
Status
active
SOC 2
unknown
GDPR
unknown
Data residency
unknown
External rating
Last verified
May 2026

Reviews

Be the first to share your experience.

Related tools in agent_infra

See all ai agent infrastructure
  • Ollama3.6

    The easiest way to run open language models locally

  • Pinecone3.4

    Reference vector database for RAG and semantic search — Starter tier is free up to 2GB

  • Hugging Face3.3

    The model hub the open-source AI ecosystem runs on — free Spaces, $9 PRO, $20/user Team

  • Replicate3.2

    Run, fine-tune, and deploy AI models with one line of code

Pairs well with

Compared to other tools