GGUF is a powerful bash script for managing and interacting with large language models using llama.cpp. It provides a comprehensive set of functions for downloading, running, and chatting with AI models, as well as managing a local database of model information.
- Download models from Hugging Face
- Run models and start server instances
- Interactive chat sessions with models
- Manage a local database of model information
- Generate embeddings
- Tokenize and detokenize text
- Monitor running model servers
- Fetch recent and trending GGUF models from Hugging Face
Before you begin, ensure you have the following installed:
llama-servercommand (macOS:brew install llama.cpp)huggingface-clicommand (macOS:brew install huggingface-cli)sqlite3(usually pre-installed on macOS)jq(macOS:brew install jq)
-
Clone this repository:
git clone https://github.com/garyblankenship/gguf.git -
Make the script executable:
chmod +x gguf.sh -
Optionally, add the script to your PATH for easier access.
Here are some common commands:
# Download a new model
gguf pull bartowski/Qwen2.5-Math-1.5B-Instruct-GGUF
# List all models
gguf ls
# Start a chat session with a model
gguf chat model-slug
# Generate embeddings
gguf embed model-slug "Your text here"
# Check server health
gguf health
# Show running processes
gguf ps
# Get recent GGUF models from Hugging Face
gguf recent
# Get trending GGUF models from Hugging Face
gguf trendingFor a full list of commands, run:
ggufFor help with a specific command, use:
gguf <command> --helpFor more detailed information about each command and its options, please refer to the inline comments in the script or use the --help option with any command.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
This project is licensed under the MIT License - see the LICENSE file for details.
- llama.cpp for the underlying model server
- Hugging Face for hosting the models
