Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
b14bac9
docs: add comprehensive NetBird with Caddy deployment automation guide
onelrian Jan 29, 2026
6b3b163
fix: h've all vault on the ansible script
onelrian Jan 29, 2026
885e625
fix: solve issue with workflow syntax
onelrian Jan 29, 2026
5d28ea6
fix: resolving syntax issue in workflow
onelrian Jan 29, 2026
db5e226
fix: resolved ansible configuration issue, github workflow issue
onelrian Jan 29, 2026
4b938db
fix: resolved ansible configuration issue, github workflow issue
onelrian Jan 29, 2026
763ddc1
fix: resolved ansible configuration issue, github workflow issue
onelrian Jan 29, 2026
2a7c580
fix: resolved workflow issues
onelrian Jan 29, 2026
5e2d4e3
fix: resolved workflow trigger issue
onelrian Jan 29, 2026
ee121dd
fix: resolved ansible script issue
onelrian Jan 29, 2026
f4878f8
fix: resolved the json issue
Donemmanuelo Jan 30, 2026
cf3dcfa
fix: robust JSON loading and default SSH user
Donemmanuelo Jan 30, 2026
6ff34e8
fix: robust JSON loading and default SSH user
Donemmanuelo Jan 30, 2026
142990b
fix: automatically install docker on target host
Donemmanuelo Jan 30, 2026
0dca6d4
fix: add retries for apt operations to handle lock contention
Donemmanuelo Jan 30, 2026
c35aaf0
fix: keycloak configuration issue
Donemmanuelo Jan 30, 2026
0a7990a
fix: add a new secrets
Donemmanuelo Jan 30, 2026
e3d5d98
fix: handshake issue
Donemmanuelo Jan 30, 2026
e63a76f
fix: the secret access issue
Donemmanuelo Jan 30, 2026
dc3f5e8
fix: resolved workflow and script issues
Donemmanuelo Jan 30, 2026
1892681
fix: added missen secret
Donemmanuelo Jan 30, 2026
fd9bf09
fix: resolved secret issues
Donemmanuelo Jan 30, 2026
d1a9d3c
fix: resolved management container issue
Donemmanuelo Jan 30, 2026
2242233
fix: resolved secrets issue with netbird_turn
Donemmanuelo Jan 30, 2026
d86780d
fix: resolved secret issues in the managemnt container
Donemmanuelo Jan 30, 2026
5d71241
fix: resolved secret issue
Donemmanuelo Jan 30, 2026
fa2eefb
fix: resolved the redirection issue
Donemmanuelo Jan 30, 2026
f678c2e
fix: automated the creation of default user
Donemmanuelo Jan 30, 2026
cbf7780
fix: resolved netbird authentication issue
Donemmanuelo Jan 30, 2026
3848100
feat: added cleanup functionality to the ansible script
Donemmanuelo Jan 31, 2026
0301f23
fix: resolved the cleanup options
Donemmanuelo Jan 31, 2026
275dc0a
fix: updated cleanup script through pipeline
Donemmanuelo Jan 31, 2026
9fa611c
fix: resolved the redirection issue
Donemmanuelo Jan 31, 2026
012a0f2
fix: added placeholder for custom password
Donemmanuelo Jan 31, 2026
9d7d700
fix: added placeholder for custom password
Donemmanuelo Jan 31, 2026
56962f9
fix: refined the documentation to reflect the poroject status
Donemmanuelo Jan 31, 2026
1f8c1ef
fix: resolved the cleanup script issue
Donemmanuelo Jan 31, 2026
565adc2
fix: resolved the nonetype issue
Donemmanuelo Jan 31, 2026
b11f1b0
fix: resolved issue with cleanup script
Donemmanuelo Jan 31, 2026
24f1a5a
fix: resolved the IAM policy for ssm
Donemmanuelo Feb 1, 2026
d5cbab9
fix: resolved issue related to types
Donemmanuelo Feb 2, 2026
dccbb5f
fix: resolved connection issue
Donemmanuelo Feb 2, 2026
3f00be0
fix: resolved token iinvalid issue
Donemmanuelo Feb 2, 2026
30a0949
fix: resolved token iinvalid issue
Donemmanuelo Feb 2, 2026
2c18e0c
fix: resolved missin plugin path issue
Donemmanuelo Feb 2, 2026
e33b764
fix: resolved the cleanup script issue
Donemmanuelo Feb 2, 2026
cf48848
fix: resolved the orchestration flexibility issue
Donemmanuelo Feb 3, 2026
bb5b9c6
fix: resolved the ssm issue
Donemmanuelo Feb 3, 2026
92e6a13
fix: resolved the NoneType issue
Donemmanuelo Feb 3, 2026
ee0d6ac
fix: added a bucket for the ssm plugin
Donemmanuelo Feb 3, 2026
e034e3c
fix: resolved nonetype issue
Donemmanuelo Feb 3, 2026
b2a088a
fix: resolved the credential issue
Donemmanuelo Feb 3, 2026
da8acae
fix: resolved the aws credential issue
Donemmanuelo Feb 3, 2026
675c0bc
fix: resolved ansible lint issue
Donemmanuelo Feb 3, 2026
9cbc313
fix: resolved lint issue
Donemmanuelo Feb 3, 2026
8b14d2f
fix: resolved linting issues
Donemmanuelo Feb 3, 2026
4eeab5b
fix: resolved the credential issue
Donemmanuelo Feb 3, 2026
23dd973
fix: resolved the credential issue
Donemmanuelo Feb 3, 2026
ff522ba
fix: resolved the container network issue
Donemmanuelo Feb 3, 2026
aa1f4da
fix: resolved issue with the cleanup script
Donemmanuelo Feb 3, 2026
d8aca91
fix: resolved the token issue
Donemmanuelo Feb 3, 2026
cb4eeae
fix: resolved the token issue
Donemmanuelo Feb 3, 2026
b323df1
fix: resolved cleanup issue
Donemmanuelo Feb 3, 2026
4f03d9f
fix: resolved secret issue
Donemmanuelo Feb 3, 2026
2c8a9f1
fix: the issue with the idp
Donemmanuelo Feb 3, 2026
6a0f41b
fix: resolved login issue
Donemmanuelo Feb 3, 2026
f40a5d7
docs: enhance infrastructure automation and lifecycle documentation
Donemmanuelo Feb 4, 2026
335e277
fix: resolved the parsing issue
Donemmanuelo Feb 4, 2026
a757d10
fix: resolved the parsing issues
Donemmanuelo Feb 4, 2026
d9a99ab
feat(infra): fix AWS SSM connection and enhance deployment docs
Donemmanuelo Feb 4, 2026
28f4e12
feat(infra): fix AWS SSM connection and enhance deployment docs
Donemmanuelo Feb 4, 2026
81187fa
fix: resolved nonetype issue
Donemmanuelo Feb 4, 2026
ae6f78c
fix: resolved cleanup noetype issue
Donemmanuelo Feb 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
403 changes: 403 additions & 0 deletions .github/workflows/ansible-deploy.yml

Large diffs are not rendered by default.

4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -53,3 +53,7 @@ monitor-netbird/kubernetes/helm/monitoring-stack/charts/_loki/
.env
.quodo
.vivus

.vault_pass
**/vault_pass
**/.vault_pass
63 changes: 45 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,33 +1,60 @@
# NetBird Infrastructure
# NetBird Infrastructure Lifecycle Automation

## Overview
Production-grade Infrastructure-as-Code (IaC) for NetBird deployments, featuring automated lifecycle management, Identity Provider (IdP) orchestration, and secure reverse proxy configuration.

Production-grade infrastructure automation for NetBird deployments.
## 🚀 Key Features

This repository provides comprehensive deployment solutions for NetBird with Caddy setup as reverse proxy.
- **Full Lifecycle Automation**: One-click deployment and complete environment destruction (Cleanup) via Ansible or GitHub Actions.
- **Keycloak IdP Orchestration**: Automatic configuration of Keycloak realms, OIDC clients (Web & Management), API scopes, and protocol mappers.
- **Secure Reverse Proxy**: Integrated Caddy setup with automatic Let's Encrypt TLS.
- **Flexible Provisioning**: Support for SSH Remote hosts and AWS SSM (Systems Manager).
- **Security Hardening**: Automated secret generation, PKCE enforcement, and secure redirect policies.

## Deployment Options
## 🛠 Deployment Options

### 1. Automated Deployment (Recommended)
### 1. GitHub Actions (Production-Ready CI/CD)
The recommended way to manage your NetBird infrastructure. Supports manual triggers and automated push-to-deploy.

Use Ansible to automatically provision NetBird with Caddy, Keycloak, and all required configurations.
- **Action**: `Ansible Deployment`
- **Features**:
- Toggle between `deploy` and `cleanup`.
- Targeted deployment to `ssh_remote` or `aws_ssm`.
- Automatic Keycloak setup using repository secrets.
- **Documentation**: [CI/CD Automation Guide](./docs/automation-guide.md)

- **Guide**: [Ansible Deployment Guide](infrastructure/ansible/README.md)
### 2. Ansible (Self-Hosted Orchestration)
Idempotent and self-healing deployment script for manual execution.

### 2. Quickstart (Test/Dev)
- **Guide**: [Ansible Deployment Guide](./infrastructure/ansible/README.md)
- **Features**: Automatic secret generation and Keycloak API integration.

Quickly bootstrap a full stack including Zitadel IdP using the setup script.
### 3. Quickstart / Legacy Options
For testing or specific manual configurations.

- **Guide**: [Quickstart with Zitadel](infrastructure/scripts/README.md)
- **Zitadel Quickstart**: [Setup Guide](./infrastructure/scripts/README.md)
- **Manual Caddy**: [Manual Deployment](./docs/caddy-deployment.md)

### 3. Manual Deployment
## 🔐 Custom Credentials

Manually deploy NetBird with Caddy reverse proxy on a single host.
You can easily customize the initial admin access by setting these variables (GitHub Secrets or manual inputs):

- **Guide**: [Manual Caddy Deployment Guide](docs/caddy-deployment.md)
- `KEYCLOAK_ADMIN_USER_SECRET`: Admin username (defaults to `admin`).
- `KEYCLOAK_ADMIN_PASSWORD_SECRET`: Custom password for the dashboard user. If left empty, a secure random password will be generated and displayed in the GitHub Actions deployment logs.

## Support and Contributions
*Note: If you update the password secret after deployment, the automation will automatically sync the new password to Keycloak on the next run.*

- Documentation: [docs/](docs/)
- Issues: GitHub issue tracker
- NetBird: [Official documentation](https://docs.netbird.io/)
## 🧹 Cleanup and Reset

The project includes a robust cleanup routine that performs a total reset:
- Stops all containers and **removes all Docker volumes** (including persistent data).
- Deletes the entire Keycloak Realm and associated clients.
- Removes all configuration directories (`/opt/netbird`).
- Resets the `key-netbird` Docker network.

You can trigger this by running the workflow with the `cleanup` action.

## 📚 Documentation

- [**Automation Guide**](./docs/automation-guide.md): Deep dive into the CI/CD and Ansible lifecycle.
- [**Ansible README**](./infrastructure/ansible/README.md): Variable definitions and local usage.
- [**Official NetBird Docs**](https://docs.netbird.io/): NetBird configuration and architecture.
154 changes: 154 additions & 0 deletions docs/automation-guide.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# NetBird Automation & Lifecycle Management Guide

This guide provides detailed documentation for the infrastructure automation tools provided in this repository, focusing on Ansible and GitHub Actions.

## Table of Contents
1. [Architecture Overview](#architecture-overview)
2. [Automation Components](#automation-components)
3. [Ansible Playbook Details](#ansible-playbook-details)
4. [GitHub Actions Pipeline](#github-actions-pipeline)
5. [Keycloak Automation](#keycloak-automation)
6. [Peer Provisioning](#peer-provisioning)
7. [Cleanup and Reset](#cleanup-and-reset)
8. [Troubleshooting](#troubleshooting)

---

## Architecture Overview

The automation stack deploys NetBird in a production-grade Infrastructure-as-Code (IaC) configuration using:
- **Caddy**: As a reverse proxy with automatic TLS (Let's Encrypt).
- **Keycloak**: As the Identity Provider (OIDC).
- **Docker Compose**: Orchestrates NetBird services (Management, Signal, Relay, Dashboard).
- **Coturn**: Handles STUN/TURN for NAT traversal.

## Automation Components

- **Ansible Playbook** (`infrastructure/ansible/playbook.yaml`): The core orchestrator that prepares the system, configures Keycloak, generates service configurations, and starts the stack.
- **Keycloak Setup Script** (`infrastructure/scripts/keycloak-setup.sh`): A bash utility used by both the playbook and the CI/CD pipeline to automate Keycloak realm and client creation.
- **GitHub Workflow** (`.github/workflows/ansible-deploy.yml`): Provides a "Push-to-Deploy" experience with support for manual triggers and environment cleanup.

## Ansible Playbook Details

The playbook is designed to be **idempotent**, **self-healing**, and supports both **SSH Remote** and **AWS SSM** connections.

### Key Features:
- **Automatic Secret Generation**: If service secrets are not provided in `vars.yml`, they are automatically generated using secure random strings and base64 encoded.
- **Dynamic Keycloak Configuration**: It can detect if Keycloak needs to be configured and will automatically create the Realm, Clients (Web and Management), Protocol Mappers, and a default Admin user.
- **Security Hardening**: Enforces PKCE, secure redirect policies, and audience validation.
- **Tag-based Execution**:
- `config`: Only update configuration files and restart services.
- `cleanup`: Remove the entire deployment and clean up Keycloak.
- `debug`: Show non-sensitive debug information.

### Usage:
```bash
# Local Deployment
ansible-playbook -i inventory.yaml playbook.yaml

# Remote SSH Deployment
ansible-playbook -i inventory.yaml playbook.yaml --ask-become-pass
```

## GitHub Actions Pipeline

The CI/CD pipeline supports two main deployment targets: **SSH Remote** and **AWS SSM**.

### Action Inputs:
- `action`: Choose between `deploy` (default) or `cleanup`.
- `deployment_target`: Choose between `ssh_remote` or `aws_ssm`. Defaults to `vars.DEPLOY_TARGET` or `ssh_remote`.
- `netbird_domain`: Override the default NetBird domain.
- `keycloak_url`: Override the Keycloak auth URL.
- `admin_password`: Set a custom password for the default admin user.

### Secrets Management:
The pipeline pulls secrets from GitHub Repository Secrets and passes them securely to Ansible. If `KEYCLOAK_URL` is provided, it will first run the automated setup script to ensure the IdP is ready.

**Zero-Config Security:**
If service secrets (Management, Relay, TURN, Datastore) are left empty in GitHub Secrets or `vars.yml`, the automation will:
1. **Automatically generate** cryptographically strong random keys.
2. **Sanitize** them to meet NetBird's requirements (e.g., Base64 encoding).
3. **Persist** them to the target server.

### AWS SSM Requirements:
When deploying to AWS via SSM, the following additional configuration is required:
1. **S3 Bucket**: A private S3 bucket is used for staging files. Set the `AWS_S3_BUCKET` secret in GitHub.
2. **IAM Instance Profile**: The EC2 instance must have an IAM role with `AmazonSSMManagedInstanceCore` and `s3:GetObject`/`s3:PutObject` permissions on the staging bucket.
3. **IAM User/Role for GitHub**: The credentials used by the GitHub Action must have `ssm:StartSession`, `ssm:SendCommand`, and `s3:PutObject` permissions.
You can set repository-wide defaults using GitHub Variables (`vars`):
- `DEPLOY_ACTION`: Default action (deploy/cleanup).
- `DEPLOY_TARGET`: Default target (ssh_remote/aws_ssm).

## Keycloak Automation

The automation ensures that:
1. A **NetBird Realm** is created.
2. A **Public Client** (`netbird-client`) is configured with correct Redirect URIs and Web Origins.
3. A **Confidential Client** (`netbird-management`) is created for API access.
4. **Protocol Mappers** for `audience` and `groups` are added to the tokens.
5. The **`api` client scope** is automatically created and assigned to ensure dashboard access.
6. **Logout Redirects** are configured to prevent redirect loops during session expiration.
7. A **Default Admin User** is provisioned for immediate access.

### Custom Credentials:
You can customize the default user and password by setting the following variables in GitHub Secrets or as manual inputs:
- `KEYCLOAK_ADMIN_USER_SECRET`: Custom username (defaults to `admin`).
- `KEYCLOAK_ADMIN_PASSWORD_SECRET`: Custom password. If left empty, a secure random password will be generated and printed in the GitHub Actions logs.

**Note**: If you change the password in your configuration after a deployment, the automation will update the password for the existing user in Keycloak on the next run.

## Peer Provisioning

Beyond deploying the NetBird server, this project includes automation to provision users and their devices (peers) using the NetBird Management API.

### Key Features:
- **Automatic Group & Policy Creation**: Ensures all peers belonging to a specific user are placed in a common group with an access policy that allows them to communicate with each other.
- **Dynamic Setup Keys**: Generates reusable setup keys on-the-fly to authorize new peers.
- **Multi-Platform Support**: Installs the NetBird agent on Debian/Ubuntu, RHEL/CentOS, and Docker-enabled systems.
- **Local & Remote Execution**: Can be run against remote fleets or to provision the local machine as a peer.

### Usage:
For detailed instructions, see the [Peer Provisioning Guide](infrastructure/ansible/PROVISIONING.md).

```bash
export NETBIRD_API_TOKEN="your_token"
ansible-playbook -i inventory.yaml provision_peers.yaml -e "target_user=john-doe"
```

## Cleanup and Reset

The cleanup routine is designed for a total environment reset.

### What is removed:
- All Docker containers and volumes associated with the project (using `remove_volumes: true`).
- The `/opt/netbird` deployment directory.
- The custom Docker network `key-netbird`.
- The entire **Keycloak Realm** created for NetBird.

### How to trigger:
**Via CLI:**
```bash
ansible-playbook -i inventory.yaml playbook.yaml --tags cleanup
```

**Via GitHub Actions:**
Run the `Ansible Deployment` workflow manually and select the `cleanup` action.

## Troubleshooting

### 'nb_management_secret' is undefined
This usually occurs if the sanitization block was skipped. Ensure you are using the latest version of the playbook where the variable finalization block is marked with `tags: [always]`.

### Keycloak 'invalid_uri' or 'invalid_audience'
These are typically caused by missing Protocol Mappers or incorrect Redirect URIs. The automated setup script resolves these by explicitly adding the `oidc-audience-mapper` and `oidc-group-membership-mapper`.

### AWS SSM 'NoneType' Error
If the pipeline fails with `expected string or bytes-like object, got 'NoneType'` during Gathering Facts:
1. The playbook now defaults to `gather_facts: false` and uses an explicit setup task to mitigate this.
2. Ensure the **AWS Session Manager Plugin** is installed (handled by the workflow).
3. Verify that the GitHub Runner's IAM user has **`ssm:StartSession`** permissions for the target instance.
4. Check that the target EC2 instance has the **SSM Agent** installed and an IAM Role with the `AmazonSSMManagedInstanceCore` policy.
5. The connection now uses `ansible_aws_ssm_shell: bash` and `ansible_shell_type: sh` for maximum compatibility.

### Docker Network Conflicts
If the `key-netbird` network already exists with a different driver, the playbook might fail. The cleanup routine will remove it, allowing for a fresh start.
104 changes: 104 additions & 0 deletions infrastructure/ansible/PROVISIONING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# NetBird Peer & User Provisioning Automation

This playbook automates the process of creating NetBird groups, policies, and setup keys, and installing the NetBird agent on target machines (Peers).

## Features

- **Automated API Interaction**: Creates a NetBird group and a "self-communication" policy for a specific user.
- **Keycloak Integration**: Automatically creates the user in your self-hosted Keycloak IdP if they don't exist.
- **Dynamic Setup Keys**: Generates a reusable setup key associated with the user's group.
- **Multi-OS Support**: Installs NetBird on Debian/Ubuntu and RHEL/CentOS/Fedora systems.
- **Docker Support**: Can deploy the NetBird agent as a Docker container.
- **Local & Remote**: Can be run against remote servers via SSH or against the local machine.

## Prerequisites

1. **NetBird API Token**: A Personal Access Token (PAT) from your NetBird dashboard (Settings > Tokens).
2. **Management URL**: The URL of your NetBird management service (e.g., `https://netbird.example.com`).
3. **Ansible Collections**:
```bash
ansible-galaxy collection install community.docker community.general community.crypto
```
4. **Ansible**: Installed on your control node.

## Environment Variables

- `NETBIRD_API_TOKEN`: **(Required)** Your NetBird Personal Access Token.
- `NETBIRD_MANAGEMENT_URL`: (Optional) Your NetBird Management URL. Defaults to `https://{{ netbird_domain }}`.

---

## Deployment Scenarios

### Scenario A: Provision Local Machine (Localhost)

If you want to install and register NetBird on the **same machine** where you are running Ansible:

```bash
export NETBIRD_API_TOKEN="your_token_here"

# Run against localhost
ansible-playbook provision_peers.yaml \
-i localhost, \
-e "target_user=my-local-user" \
-e "netbird_domain=netbird.example.com" \
--ask-become-pass
```

### Scenario B: Provision Remote Servers (SSH)

If you want to install NetBird on one or more **remote servers**:

1. **Define your Inventory** (`inventory.yaml`):
```yaml
all:
hosts:
peer-1: { ansible_host: 1.2.3.4, ansible_user: ubuntu }
peer-2: { ansible_host: 5.6.7.8, ansible_user: centos }
```

2. **Run the Playbook**:
```bash
export NETBIRD_API_TOKEN="your_token_here"

# Ansible will automatically handle the local API part and the remote peer part
ansible-playbook -i inventory.yaml provision_peers.yaml \
-e "target_user=john-doe" \
-e "netbird_domain=netbird.example.com"
```


## Keycloak Integration (Self-Hosted)

If you are using the self-hosted stack provided in this repository, you can also automate user creation in Keycloak.

Pass the Keycloak admin credentials to the playbook:

```bash
ansible-playbook -i inventory.yaml provision_peers.yaml \
-e "target_user=new-user" \
-e "netbird_domain=netbird.example.com" \
-e "keycloak_admin_password=your_keycloak_admin_pass"
```

**Variables for Keycloak customization:**
- `keycloak_admin_user`: Defaults to `admin`.
- `keycloak_admin_password`: Your Keycloak admin password.
- `netbird_realm`: Defaults to `netbird`.
- `netbird_user_password`: Initial password for the new user (defaults to `Netbird123!`).

---

## How it works

1. **Play 1: API Provisioning (Local)**: The first part of the playbook always runs on `localhost` to ensure the necessary groups and policies exist in your NetBird Management service (and Keycloak if configured) and to generate a setup key.
2. **Play 2: Peer Installation**:
- Installs the NetBird agent on target hosts using the native package manager (`apt` or `yum/dnf`) or Docker.
- Runs `netbird up` with the generated setup key retrieved from the local play.

## Troubleshooting

- **API Authentication**: Ensure `NETBIRD_API_TOKEN` is valid and has sufficient permissions.
- **Sudo Access**: When running locally or remotely without a root user, use `--ask-become-pass` (or `-K`) to provide the sudo password.
- **Docker**: If `use_docker=true`, ensure Docker is installed and the `community.docker` Ansible collection is available.
- **Localhost in Inventory**: If you get errors about `localhost` not being found, ensure it's either in your inventory file or that your Ansible configuration allows the implicit `localhost`.
Loading