Groq
Sub-second LPU inference — Llama 3.1 8B at 840 tokens/sec for $0.05/M input
Overall score
About
Groq runs language models on its own LPU (Language Processing Unit) hardware, optimized for inference speed. Throughput is the differentiator: Llama 3.1 8B at 840 tokens/sec, GPT OSS 20B at 1,000 tokens/sec — multiples faster than GPU-based competitors. Pricing is per-token PAYG at the cheapest end of the inference market.
Best for: Real-time AI experiences where latency matters — voice agents, streaming chat, autocomplete. The free API key + cheap per-token rates make Groq the default speed-comparison benchmark before considering Together AI or OpenAI.
Pricing
Pay-as-you-go
- Monthly
- Free
- Annual /mo
- Free
- Billing
- usage
- Notes
- Free API key on signup;Llama 3.1 8B Instant: $0.05/M input, $0.08/M output;GPT OSS 20B: $0.075/M input, $0.30/M output;GPT OSS 120B: $0.15/M input, $0.60/M output;Llama 3.3 70B: $0.59/M input, $0.79/M output;OpenAI-compatible API · Linear pricing across the catalog. No platform fee.
Enterprise
- Monthly
- n/a
- Annual /mo
- n/a
- Billing
- flat
- Notes
- Enterprise-only models (Minimax M2.5, Qwen3-VL 32B);On-premises deployments;Custom SLAs;Volume pricing · Contact sales — required for on-prem and gated models.
| Tier | Monthly | Annual /mo | Billing | Notes |
|---|---|---|---|---|
| Pay-as-you-go | Free | Free | usage | Free API key on signup;Llama 3.1 8B Instant: $0.05/M input, $0.08/M output;GPT OSS 20B: $0.075/M input, $0.30/M output;GPT OSS 120B: $0.15/M input, $0.60/M output;Llama 3.3 70B: $0.59/M input, $0.79/M output;OpenAI-compatible API · Linear pricing across the catalog. No platform fee. |
| Enterprise | n/a | n/a | flat | Enterprise-only models (Minimax M2.5, Qwen3-VL 32B);On-premises deployments;Custom SLAs;Volume pricing · Contact sales — required for on-prem and gated models. |
Key features
- Custom LPU hardware (vs GPU competitors)
- Llama 3.1 8B: 840 tokens/sec at $0.05/M input
- Free API key with no specific limit on signup
- OpenAI-compatible endpoints
- Wide model catalog (Llama, GPT OSS, Qwen, DeepSeek)
- Sub-second response times
- Linear, predictable token pricing
Integrations
Trust & compliance
- Stage range
- Solopreneur → Seed
- Founded
- 2016
- Status
- active
- SOC 2
- unknown
- GDPR
- unknown
- Data residency
- unknown
- External rating
- n/a
- Last verified
- Jun 2026
Reviews
Be the first to share your experience.
Related tools in Agent infrastructure
Pairs well with
People who've discussed Groq
See all people →Curated mentions from podcasts, posts, and public stacks. Editorial coverage; not endorsements.
Chamath Palihapitiya
Founder & CEO; Co-host All-In
Chamath Palihapitiya's Social Capital made one of the early bets on Groq, the LPU inference startup. They put $10M in April 2017 (seed) and another $52.3M in September 2018 as part of a $60M convertible note. At Groq's $20B valuation (Nvidia headline financing in 2026), the position is worth billions and is one of Social Capital's signature deep-tech wins. Chamath consistently champions Groq across All-In segments and X posts.
Daniel Gross
Investor; Partner
Daniel Gross is a public backer of Groq, the Language Processing Unit inference company shipping sub-second Llama at scale. He's named the position on his personal site's portfolio and through NFDG's broader infra thesis: own the AI inference layer rather than the model layer.