ConsumerBench is a comprehensive benchmarking framework that evaluates the runtime performance of user-defined GenAI applications under realistic conditions on end-user devices.
# Clone the repository
git clone https://github.com/your-org/ConsumerBench.git
cd ConsumerBench
# Set up environment
conda create -n consumerbench python=3.10
conda activate consumerbench
pip install -r requirements.txtFollow instructions mentioned in applications/
Add your own yml workflow in configs/
Run the benchmark using the command
python src/scripts/run_consumerbench.py --config <path-to-config>
The benchmark has been tested on the following hardware:
- Setup 1:
- CPU: Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
- GPU: NVIDIA RTX 6000
- System Memory: 32GB
- CPU cores: 12
 
- Setup 2:
- Macbook Pro M1
- Unified Memory: 32GB
 
ConsumerBench/
├── src/                    # Source code
├── inference_backends/     # Inference backends
├── models/                 # GenAI models
├── applications/           # Applications
├── configs/                # Example user configurations & workflows
└── scripts/                # Result processing and plotting scripts
Text-to-text generation for chat and Q&A with:
- Local backend mimicking OpenAI API
- Powered by llama.cpp for efficient CPU-GPU co-execution
- Located in applications/Chatbot
Agent-based reasoning for complex fact gathering:
- Built on open-deep-research framework
- Served via LiteLLM
- Located in applications/DeepResearch
Text-to-image generation optimized for edge devices:
- Utilizes stable-diffusion-webui in API mode
- Located in applications/ImageGen
Audio-to-text transcription for real-time and offline use:
- Whisper-based backend over HTTP
- Located in applications/LiveCaptions
Run the script:
./scripts/run_benchmark.sh configs/workflow_imagegen.yml 0This script collects:
- GPU metrics - Compute/memory bandwidth (DCGM)
- CPU utilization - Via statutility
- CPU memory bandwidth - Via pcm-memoryutility
- GPU power - Via NVMLutility
- CPU power - Via RAPLutility
Results are saved in the results directory with timestamps. PDF plots are automatically generated.
To modify Service Level Objectives (SLOs):
- Chatbot: scripts/result_processing/parse-results-chatbot-log.py
- DeepResearch: scripts/result_processing/parse-results-deepresearch-log.py
- ImageGen: scripts/result_processing/parse-results-imagegen-log.py
- LiveCaptions: scripts/result_processing/parse-results-whisper-log.py
| Application | Config | 
|---|---|
| Chatbot | configs/workflow_chatbot.yml | 
| LiveCaptions | configs/workflow_live_captions.yml | 
| ImageGen | configs/workflow_imagegen.yml | 
CPU-only: Change
devicefrom "gpu" to "cpu" in the configs.
- Greedy allocation: configs/workflow_chatbot_imagegen_live_captions.yml
- GPU partitioning: configs/workflow_chatbot_imagegen_live_captions_mps.yml
- Config: configs/workflow_chatbot_deep_research.yml
- Edit example_workflow/llamacpp_server.shto add-c 128000 -nkvofor Chatbot-KVCache-CPU
- Greedy allocation: configs/workflow_content_creation.yml
- GPU partitioning: configs/workflow_content_creation_mps.yml