Run the llama.cpp server for any model: ```bash llama-server --hf-repo ggml-org/bge-small-en-v1.5-Q8_0-GGUF \ --hf-file bge-small-en-v1.5-q8_0.gguf -c 2048 --embeddings --port 9997 ``` Cerb connects using OpenAI standard API endpoints.