Skip to content

mnfst/awesome-free-llm-apis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

33 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation


Awesome Free LLM APIs

Awesome

LLM APIs with permanent free tiers for text inference.



Contents

Provider APIs

APIs run by the companies that train or fine-tune the models themselves.

Cohere πŸ‡¨πŸ‡¦

Free "Trial" API key, no credit card. 1,000 API calls/month. Non-commercial use only.

Base URL: https://api.cohere.com/v2

Model Name Context Max Output Modality Rate Limit
Command A (111B) 256K 4K Text 20 RPM
Command R+ 128K 4K Text 20 RPM
Command R 128K 4K Text 20 RPM
Command R7B 128K 4K Text 20 RPM
Embed 4 β€” β€” Embeddings (Text + Image) 2,000 inputs/min
Rerank 3.5 β€” β€” Reranking 10 RPM

Google Gemini πŸ‡ΊπŸ‡Έ

Free tier unavailable in EU/UK/Switzerland. Free-tier prompts may be used by Google to improve products. 1

Base URL: https://generativelanguage.googleapis.com/v1beta

Model Name Context Max Output Modality Rate Limit
Gemini 2.5 Flash 1M 65K Text + Image + Audio + Video 10 RPM, 250 RPD
Gemini 2.5 Flash-Lite 1M 65K Text + Image + Audio + Video 15 RPM, 1,000 RPD

Mistral AI πŸ‡«πŸ‡·

Free "Experiment" plan, no credit card. ~1B tokens/month.

Base URL: https://api.mistral.ai/v1

Model Name Context Max Output Modality Rate Limit
Mistral Small 4 256K 256K Text + Image + Code ~1 RPS, 500K TPM
Mistral Medium 3 128K 128K Text ~1 RPS, 500K TPM
Mistral Large 3 256K 256K Text ~1 RPS, 500K TPM
Mistral Nemo (12B) 128K 128K Text ~1 RPS, 500K TPM
Codestral 256K 256K Code ~1 RPS, 500K TPM
Pixtral Large 128K 128K Text + Image ~1 RPS, 500K TPM

Z AI (Zhipu AI) πŸ‡¨πŸ‡³

Permanent free models, no credit card required.

Base URL: https://open.bigmodel.cn/api/paas/v4

Model Name Context Max Output Modality Rate Limit
GLM-4.7-Flash 200K 128K Text 1 concurrent request
GLM-4.5-Flash 128K ~8K Text 1 concurrent request
GLM-4.6V-Flash 128K ~4K Text + Image 1 concurrent request

Inference providers

Third-party platforms that host open-weight models from various sources.

Cerebras πŸ‡ΊπŸ‡Έ

Free tier, no credit card. Ultra-fast inference (~2,600 tok/s). 1M tokens/day cap.

Base URL: https://api.cerebras.ai/v1

Model Name Context Max Output Modality Rate Limit
llama3.1-8b 128K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD
gpt-oss-120b 128K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD
qwen-3-235b-a22b-instruct-2507 131K (8K on free) 8K Text 30 RPM, 14,400 RPD, 1M TPD
zai-glm-4.7 128K (8K on free) 8K Text 10 RPM, 100 RPD, 1M TPD

Cloudflare Workers AI πŸ‡ΊπŸ‡Έ

10,000 Neurons/day free. 50+ models available on free tier.

Base URL: https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run

Model Name Context Max Output Modality Rate Limit
@cf/meta/llama-3.3-70b-instruct-fp8-fast 131K Shared w/ context Text 10K neurons/day (shared)
@cf/meta/llama-3.1-8b-instruct-fp8-fast 131K Shared w/ context Text 10K neurons/day (shared)
@cf/meta/llama-3.2-11b-vision-instruct 131K Shared w/ context Text + Vision 10K neurons/day (shared)
@cf/meta/llama-4-scout-17b-16e-instruct Up to 10M Shared w/ context Multimodal 10K neurons/day (shared)
@cf/mistralai/mistral-small-3.1-24b-instruct 128K Shared w/ context Text 10K neurons/day (shared)
@cf/google/gemma-4-26b-a4b-it 256K Shared w/ context Text 10K neurons/day (shared)
@cf/qwen/qwq-32b 32K Shared w/ context Text 10K neurons/day (shared)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b 32K Shared w/ context Text 10K neurons/day (shared)
+ 42 more models Varies Varies Text, Image, Audio, Embeddings 10K neurons/day (shared)

GitHub Models πŸ‡ΊπŸ‡Έ

Free prototyping for all GitHub users. 45+ models. Per-request limits (8K in / 4K out).

Base URL: https://models.inference.ai.azure.com

Model Name Context Max Output Modality Rate Limit
gpt-4.1 1M 32K Text 10 RPM, 50 RPD
gpt-4.1-mini 1M 32K Text 15 RPM, 150 RPD
gpt-4o 128K 16K Text + Vision 10 RPM, 50 RPD
o3-mini 200K 100K Text (reasoning) 10 RPM, 50 RPD
o4-mini 200K 100K Text (reasoning) 10 RPM, 50 RPD
Llama-4-Scout-17B-16E 512K ~4K Text + Vision 15 RPM, 150 RPD
Llama-4-Maverick-17B-128E 256K ~4K Text + Vision 10 RPM, 50 RPD
Meta-Llama-3.3-70B 131K ~4K Text 15 RPM, 150 RPD
DeepSeek-R1 64K 8K Text (reasoning) 15 RPM, 150 RPD
Mistral-Small-3.1 128K ~4K Text + Vision 15 RPM, 150 RPD
+ 35 more models Varies Varies Text / Image Varies by tier

Groq πŸ‡ΊπŸ‡Έ

Free tier, no credit card. Ultra-fast LPU inference. 2

Base URL: https://api.groq.com/openai/v1

Model Name Context Max Output Modality Rate Limit
llama-3.3-70b-versatile 131K 32K Text 30 RPM, 14,400 RPD
llama-3.1-8b-instant 131K 131K Text 30 RPM, 14,400 RPD
llama-4-scout-17b-16e-instruct 131K 8K Text + Vision 30 RPM, 14,400 RPD
llama-4-maverick-17b-128e-instruct 131K 8K Text + Vision 15 RPM, 500 RPD
qwen3-32b 131K 131K Text 30 RPM, 14,400 RPD
gpt-oss-120b 131K 32K Text 30 RPM, 14,400 RPD
kimi-k2-instruct 262K 262K Text 30 RPM, 14,400 RPD
deepseek-r1-distill-70b 131K 8K Text 30 RPM, 14,400 RPD
whisper-large-v3 β€” β€” Audio β†’ Text 20 RPM, 2,000 RPD
whisper-large-v3-turbo β€” β€” Audio β†’ Text 20 RPM, 2,000 RPD

Hugging Face πŸ‡ΊπŸ‡Έ

Free Serverless Inference API

Base URL: https://api-inference.huggingface.co/models

Model Name Context Max Output Modality Rate Limit
Meta-Llama-3.1-8B-Instruct 128K ~4K Text ~1,000 RPD
Mistral-7B-Instruct-v0.3 32K ~4K Text ~1,000 RPD
Mixtral-8x7B-Instruct-v0.1 32K ~4K Text ~1,000 RPD
Phi-3.5-mini-instruct 128K ~4K Text ~1,000 RPD
Qwen2.5-7B-Instruct 131K ~4K Text ~1,000 RPD

Kilo Code πŸ‡ΊπŸ‡Έ

Free models with no credit card required. kilo-auto/free auto-router routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). 3

Base URL: https://api.kilo.ai/api/gateway

Model Name Context Max Output Modality Rate Limit
bytedance-seed/dola-seed-2.0-pro:free β€” β€” Text ~200 req/hr
x-ai/grok-code-fast-1:optimized:free β€” β€” Text (code) ~200 req/hr
nvidia/nemotron-3-super-120b-a12b:free 262K 32K Text ~200 req/hr
arcee-ai/trinity-large-thinking:free β€” β€” Text (reasoning) ~200 req/hr
openrouter/free Varies Varies Text ~200 req/hr

LLM7.io πŸ‡¬πŸ‡§

Zero-friction API gateway. No registration needed for basic access. 30+ models.

Base URL: https://api.llm7.io/v1

Model Name Context Max Output Modality Rate Limit
deepseek-r1-0528 β€” β€” Text (reasoning) 30 RPM (120 with token)
deepseek-v3-0324 β€” β€” Text 30 RPM (120 with token)
gemini-2.5-flash-lite β€” β€” Text + Vision 30 RPM (120 with token)
gpt-4o-mini β€” β€” Text + Vision 30 RPM (120 with token)
mistral-small-3.1-24b 32K β€” Text 30 RPM (120 with token)
qwen2.5-coder-32b β€” β€” Text (code) 30 RPM (120 with token)
+ ~24 more models Varies Varies Text 30 RPM (120 with token)

NVIDIA NIM πŸ‡ΊπŸ‡Έ

Free with NVIDIA Developer Program membership. 100+ models. No daily token cap.

Base URL: https://integrate.api.nvidia.com/v1

Model Name Context Max Output Modality Rate Limit
deepseek-ai/deepseek-r1 128K ~163K Text (reasoning) ~40 RPM
nvidia/llama-3.1-nemotron-ultra-253b-v1 128K 4K Text ~40 RPM
nvidia/nemotron-3-super-120b-a12b 262K 262K Text ~40 RPM
nvidia/nemotron-3-nano-30b-a3b 128K 32K Text ~40 RPM
meta/llama-3.1-405b-instruct 128K 4K Text ~40 RPM
qwen/qwen2.5-72b-instruct 128K 8K Text ~40 RPM
google/gemma-4-31b 128K 8K Text ~40 RPM
mistralai/mistral-large-2-instruct 128K 4K Text ~40 RPM
nvidia/nemotron-nano-2-vl 128K 8K Vision + Text + Video ~40 RPM
minimax/minimax-m2.7 128K 8K Text ~40 RPM
+ 90 more models Varies Varies Text, Image, Video, Speech, Embeddings ~40 RPM

Ollama Cloud πŸ‡ΊπŸ‡Έ

Free tier with qualitative usage limits. 400+ models from Ollama library. Not OpenAI SDK-compatible; uses Ollama API. 4

Base URL: https://api.ollama.com

Model Name Context Max Output Modality Rate Limit
llama3.1:cloud 128K Model-dependent Text Session/weekly limits (unpublished)
deepseek-r1:cloud 128K Model-dependent Text (reasoning) Session/weekly limits (unpublished)
qwen2.5:cloud 128K Model-dependent Text Session/weekly limits (unpublished)
gemma2:cloud 8K Model-dependent Text Session/weekly limits (unpublished)
mistral:cloud 32K Model-dependent Text Session/weekly limits (unpublished)
+ 400 more models Varies Varies Text Session/weekly limits (unpublished)

OpenRouter πŸ‡ΊπŸ‡Έ

35+ free models (marked with :free suffix). OpenAI SDK-compatible. 5

Base URL: https://openrouter.ai/api/v1

Model Name Context Max Output Modality Rate Limit
deepseek/deepseek-r1-0528:free 163K ~163K Text (reasoning) 20 RPM, 200 RPD
deepseek/deepseek-chat-v3-0324:free 163K 163K Text 20 RPM, 200 RPD
qwen/qwen3.6-plus:free 1M 65K Text 20 RPM, 200 RPD
qwen/qwen3-coder-480b-a35b:free 262K ~32K Text 20 RPM, 200 RPD
meta-llama/llama-4-scout:free 10M 16K Multimodal 20 RPM, 200 RPD
meta-llama/llama-4-maverick:free 1M 16K Multimodal 20 RPM, 200 RPD
meta-llama/llama-3.3-70b-instruct:free 65K ~16K Text 20 RPM, 200 RPD
google/gemma-4-31b-it:free 256K ~8K Multimodal 20 RPM, 200 RPD
nvidia/nemotron-3-super-120b-a12b:free 1M ~32K Text 20 RPM, 200 RPD
openai/gpt-oss-120b:free 131K 131K Text 20 RPM, 200 RPD
minimax/minimax-m2.5:free 196K 8K Text 20 RPM, 200 RPD
mistralai/devstral-2512:free 256K ~32K Text 20 RPM, 200 RPD
+ ~23 more free models Varies Varies Text / Image 20 RPM, 200 RPD

SiliconFlow πŸ‡¨πŸ‡³

Free tier with 14 CNY signup credits. Permanently free models available.

Base URL: https://api.siliconflow.cn/v1

Model Name Context Max Output Modality Rate Limit
Qwen/Qwen3-8B 131K 131K Text 1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B ~33K 16K Text (reasoning) 1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-R1-Distill-Qwen-7B 131K Configurable Text (reasoning) 1,000 RPM, 50K TPM
THUDM/glm-4-9b-chat 32K 32K Text 1,000 RPM, 50K TPM
THUDM/GLM-4.1V-9B-Thinking 66K 66K Vision + Text 1,000 RPM, 50K TPM
deepseek-ai/DeepSeek-OCR β€” 8K Vision (OCR) 1,000 RPM, 50K TPM
+ embedding/speech models Varies Varies Embeddings, Speech 1,000 RPM, 50K TPM

Contributing

Know a free tier that's missing? Open a PR. Include the provider, endpoint, rate limits (link to their docs), and a few notable models. Trial credits and time-limited promos don't count.

Glossary

Abbreviation Meaning
RPM Requests per minute
RPD Requests per day
TPM Tokens per minute
TPD Tokens per day
RPS Requests per second

Notes

  • All endpoints are OpenAI SDK-compatible unless noted.
  • Each link points to the provider's API key page.

Footnotes

  1. Free tier not available in the EU, UK, or Switzerland (available regions). ↩

  2. Groq rate limits vary by model. Llama 4 Maverick is limited to 500 RPD. Most other models get 14,400 RPD (rate limits). ↩

  3. Kilo Code free model list may change over time. nvidia/nemotron-3-super-120b-a12b:free is for trial use only β€” prompts are logged by NVIDIA. Auto-router kilo-auto/free routes to minimax/minimax-m2.5:free (80%) and stepfun/step-3.5-flash:free (20%). ↩

  4. Ollama Cloud measures usage by GPU time, not tokens or requests. Free tier described as "light usage" with session limits resetting every 5 hours and weekly limits every 7 days. Pro (50x more) and Max (250x more) plans available. Not OpenAI SDK-compatible; uses Ollama API. ↩

  5. Free models default to 200 RPD. A one-time purchase of $10+ in credits unlocks 1,000 RPD for free models. OpenRouter also offers a Free Models Router (openrouter/free) and model fallbacks for chaining models in priority order. ↩