Skip to content

Never pay for the same LLM call twice. A super simple proxy compatibile with most clients.

Notifications You must be signed in to change notification settings

codebeaver-ai/llmcache

Repository files navigation

LLMCache - API Caching Proxy

Never pay for the same LLM call twice.

A high-performance API caching proxy built with FastAPI and SQLAlchemy that provides intelligent caching for HTTP requests. This service helps reduce API costs and improve response times by caching API responses and serving them when appropriate. Optimized for caching responses from Large Language Models (LLMs) and other usage-based API services to reduce costs and improve performance.

Quick start

Set up the env file. You can use the .env.example file as a template:

API_SECRET_TOKEN=my-secret-token
DATABASE_URL=postgresql://postgres:postgres@postgres:5432/proxy_cache

Start LLMCache with docker compose locally:

docker compose -f docker-compose.local.yml up -d

You can now use it with any API that you need to cache, and even with the most common LLM modules. Example for the Openai python module:

import openai

openai.api_base = "http://localhost:8001"
openai.default_headers = {
   "x-proxy-auth": "my-secret-token",
   "x-proxy-base-url": "https://api.openai.com/v1",
}

Example for the Anthropic python module:

import anthropic

client = anthropic.Anthropic(
   base_url="http://0.0.0.0:8001",
   http_client=httpx.Client(
      transport=transport,
      headers={
            "x-proxy-auth": "my-secret-token",
            "x-proxy-base-url": "https://api.anthropic.com",
      },
   ),
)

How It Works

  1. Incoming requests are authenticated using the X-Proxy-Auth header
  2. A unique cache key is generated based on:
    • HTTP method
    • Path
    • Headers (filtered)
    • Query parameters/POST data
  3. If a valid cached response exists, it's returned immediately
  4. If no cache exists or it's expired, the request is forwarded to the target API
  5. The response is cached for future use

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Testing

To test a local version of the LLMCache, you can use the following command:

docker compose -f docker-compose.test.yml up -d

To properly test, you will need to set up the additional envs you find in the .env.example file.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments


About

Never pay for the same LLM call twice. A super simple proxy compatibile with most clients.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published