Groq
Sub-second LPU inference — Llama 3.1 8B at 840 tokens/sec for $0.05/M input
Overall score
About
Groq runs language models on its own LPU (Language Processing Unit) hardware, optimized for inference speed. Throughput is the differentiator: Llama 3.1 8B at 840 tokens/sec, GPT OSS 20B at 1,000 tokens/sec — multiples faster than GPU-based competitors. Pricing is per-token PAYG at the cheapest end of the inference market.
Best for: Real-time AI experiences where latency matters — voice agents, streaming chat, autocomplete. The free API key + cheap per-token rates make Groq the default speed-comparison benchmark before considering Together AI or OpenAI.
Pricing
| Tier | Monthly | Annual /mo | Billing | Notes |
|---|---|---|---|---|
| Pay-as-you-go | Free | Free | usage | Free API key on signup;Llama 3.1 8B Instant: $0.05/M input, $0.08/M output;GPT OSS 20B: $0.075/M input, $0.30/M output;GPT OSS 120B: $0.15/M input, $0.60/M output;Llama 3.3 70B: $0.59/M input, $0.79/M output;OpenAI-compatible API · Linear pricing across the catalog. No platform fee. |
| Enterprise | — | — | flat | Enterprise-only models (Minimax M2.5, Qwen3-VL 32B);On-premises deployments;Custom SLAs;Volume pricing · Contact sales — required for on-prem and gated models. |
Key features
- Custom LPU hardware (vs GPU competitors)
- Llama 3.1 8B: 840 tokens/sec at $0.05/M input
- Free API key with no specific limit on signup
- OpenAI-compatible endpoints
- Wide model catalog (Llama, GPT OSS, Qwen, DeepSeek)
- Sub-second response times
- Linear, predictable token pricing
Integrations
Trust & compliance
- Stage range
- —
- Founded
- 2016
- Status
- active
- SOC 2
- unknown
- GDPR
- unknown
- Data residency
- unknown
- External rating
- —
- Last verified
- May 2026
Reviews
Be the first to share your experience.
Related tools in agent_infra
See all ai agent infrastructure →- Ollama3.6
The easiest way to run open language models locally
- Pinecone3.4
Reference vector database for RAG and semantic search — Starter tier is free up to 2GB
- Hugging Face3.3
The model hub the open-source AI ecosystem runs on — free Spaces, $9 PRO, $20/user Team
- Replicate3.2
Run, fine-tune, and deploy AI models with one line of code
Pairs well with
- ChatGPTllm_chat
General-purpose AI assistant from OpenAI
- Cursorcoding
AI-first code editor
- ManyChatsupport
AI chat automation for Messenger/IG/SMS