Groq

Sub-second LPU inference — Llama 3.1 8B at 840 tokens/sec for $0.05/M input

groq.com ↗

APIFree tier

Visit site

Overall score

3.1/ 5

SME Fit3/5

usage-metered pricing + free tier

JTBD4/5

solid named JTBD

Integration3/5

API

Trust5/5

mature, founded 2016

Quality1/5

no public rating

Compliance2/5

compliance unknown

About

Groq runs language models on its own LPU (Language Processing Unit) hardware, optimized for inference speed. Throughput is the differentiator: Llama 3.1 8B at 840 tokens/sec, GPT OSS 20B at 1,000 tokens/sec — multiples faster than GPU-based competitors. Pricing is per-token PAYG at the cheapest end of the inference market.

Best for: Real-time AI experiences where latency matters — voice agents, streaming chat, autocomplete. The free API key + cheap per-token rates make Groq the default speed-comparison benchmark before considering Together AI or OpenAI.

Pricing

Tier	Monthly	Annual /mo	Billing	Notes
Pay-as-you-go	Free	Free	usage	Free API key on signup;Llama 3.1 8B Instant: $0.05/M input, $0.08/M output;GPT OSS 20B: $0.075/M input, $0.30/M output;GPT OSS 120B: $0.15/M input, $0.60/M output;Llama 3.3 70B: $0.59/M input, $0.79/M output;OpenAI-compatible API · Linear pricing across the catalog. No platform fee.
Enterprise	—	—	flat	Enterprise-only models (Minimax M2.5, Qwen3-VL 32B);On-premises deployments;Custom SLAs;Volume pricing · Contact sales — required for on-prem and gated models.

Key features

Custom LPU hardware (vs GPU competitors)
Llama 3.1 8B: 840 tokens/sec at $0.05/M input
Free API key with no specific limit on signup
OpenAI-compatible endpoints
Wide model catalog (Llama, GPT OSS, Qwen, DeepSeek)
Sub-second response times
Linear, predictable token pricing

Integrations

OpenAI-compatible APILangChainLlamaIndexHeliconeOpenRouter

Trust & compliance

Stage range: —
Founded: 2016
Status: active
SOC 2: unknown
GDPR: unknown
Data residency: unknown
External rating: —
Last verified: May 2026

Reviews

Be the first to share your experience.

Related tools in agent_infra

See all ai agent infrastructure →

Ollama3.6
The easiest way to run open language models locally
Pinecone3.4
Reference vector database for RAG and semantic search — Starter tier is free up to 2GB
Hugging Face3.3
The model hub the open-source AI ecosystem runs on — free Spaces, $9 PRO, $20/user Team
Replicate3.2
Run, fine-tune, and deploy AI models with one line of code

Pairs well with

ChatGPTllm_chat
General-purpose AI assistant from OpenAI
Cursorcoding
AI-first code editor
ManyChatsupport
AI chat automation for Messenger/IG/SMS