Skip to content
Back to all tools

Together AI

Cheap, fast inference for open models — Llama 3.3 70B at $0.88 per million tokens

API
Jack Phillips
Audited by Jack Phillips · Updated June 2026
Visit site

Overall score

2.8/ 5
SME fit3/5
usage-metered pricing + free trial
JTBD4/5
solid named JTBD
Integration3/5
API
Trust3/5
growing, founded 2022
Quality1/5
no public rating
Compliance2/5
compliance unknown

About

Together AI is a managed inference platform for open-source models — Llama, Mistral, Qwen, DeepSeek, FLUX — at a fraction of OpenAI/Anthropic prices. Pay-as-you-go per-token billing, dedicated GPU deployments from $3.99/hr, and fine-tuning from $0.48 per million tokens. The catalog of supported models is the differentiator: Together stays current on day-one drops where competitors lag.

Best for: Builders running open-source LLMs in production who don't want to manage GPUs. Pick Together AI when you want a wide model catalog at predictable per-token prices, with dedicated GPU as an option once volume justifies it.

Pricing

  • Serverless inference

    Monthly
    n/a
    Annual /mo
    n/a
    Billing
    usage
    Notes
    Per-token pricing across model catalog;Llama 3.3 70B: $0.88/M tokens;Vision and audio model pricing;Image generation per-image;Batch API ~50% cheaper;OpenAI-compatible endpoints · Pay-as-you-go. Most builders start here.
  • Dedicated inference

    Monthly
    n/a
    Annual /mo
    n/a
    Billing
    usage
    Notes
    Single-tenant GPU deployments;$3.99-$9.95/hr (H100, H200, B200);Reserved capacity for 6+ days at discounted rates · Hourly billing. Move here when serverless rate-limits become a constraint.
  • GPU clusters

    Monthly
    n/a
    Annual /mo
    n/a
    Billing
    usage
    Notes
    On-demand: $3.49-$7.49/hr;Reserved 6+ days: $2.55-$7.15/hr;Bare-metal training and inference · For teams running training or large-scale custom deployments.

Startup offer

Included in hub programs

Key features

  • OpenAI-compatible API for drop-in replacement
  • Wide model catalog (Llama, Mistral, Qwen, DeepSeek, FLUX)
  • Dedicated inference at $3.99-$9.95/hr
  • Fine-tuning from $0.48 per million tokens
  • Image generation from $0.0019/image
  • Audio TTS and Whisper transcription
  • Batch API at ~50% off serverless prices
  • On-demand GPU clusters from $3.49/hr

Integrations

OpenAI-compatible APILangChainLlamaIndexHeliconeOpenRouter

Trust & compliance

Stage range
MVP → Growth
Founded
2022
Status
active
SOC 2
unknown
GDPR
unknown
Data residency
unknown
External rating
n/a
Last verified
Jun 2026

Reviews

Be the first to share your experience.

Pairs well with

Compared to other tools