Bachelor thesis project - Deploying open-source LLM (Ollama) on AWS cloud infrastructure with GPU support.
Author: Stepan Konecny
Client -> API Gateway (REST, API key auth) -> VPC Link -> NLB -> EC2 (g5.xlarge, NVIDIA GPU) -> Ollama (Docker)
ollama-aws/
├── terraform/ # Infrastructure as Code (Terraform)
│ ├── main.tf # VPC, EC2 (g5.xlarge), NLB, API Gateway, security groups
│ ├── variables.tf # Configurable parameters (region, AMI, SSH key)
│ └── outputs.tf # Terraform outputs (instance IP, API URL, API key)
├── scripts/
│ ├── setup-gpu.sh # EC2 user_data - installs NVIDIA drivers, Docker, Ollama
│ └── deploy-ollama.sh # Manual redeployment script for Ollama container
├── examples/
│ ├── client.py # Python client - generate, chat, list models
│ └── client.js # JavaScript/Node.js client - same functionality
├── tests/
│ ├── test_api.py # Functional tests - connectivity, auth, generation, error handling
│ └── benchmark.py # Performance benchmark - latency, TTFT, throughput
└── docker-compose.yml # Docker Compose config for Ollama with GPU passthrough
| Resource | Details |
|---|---|
| VPC | 10.0.0.0/16, public + private subnet |
| EC2 | g5.xlarge (NVIDIA A10G GPU, 24GB VRAM), Ubuntu 24.04, 100GB gp3 |
| NLB | Internal Network Load Balancer, TCP forwarding to port 11434 |
| API Gateway | REST API with API key authentication, VPC Link to NLB |
# Deploy infrastructure
cd terraform
terraform init
terraform apply
# Test API
python tests/test_api.py <API_URL> <API_KEY>
# Run benchmark
python tests/benchmark.py <API_URL> <API_KEY>