Skip to content

Conversation

josecelano
Copy link
Member

Phase 1: Configuration Management System

This PR implements the first phase of the 12-Factor App refactoring for the Torrust Tracker Demo, focusing on configuration management and environment-based deployment.

🎯 Objectives Completed

  • Environment-based configuration using .env files and templates
  • Configuration templates with envsubst for dynamic generation
  • Automated configuration scripts with validation and colored output
  • Docker Compose integration with generated environment files
  • Makefile automation for configuration and deployment workflows
  • MySQL database migration from SQLite for production readiness

🏗️ Architecture Changes

Configuration Structure

infrastructure/config/
├── environments/
│   ├── local.env          # Local development configuration
│   └── production.env     # Production deployment configuration
└── templates/
    ├── tracker.toml.tpl         # Torrust Tracker configuration template
    ├── prometheus.yml.tpl       # Prometheus monitoring template
    └── docker-compose.env.tpl   # Docker Compose environment template

New Automation Scripts

  • configure-env.sh - Processes templates and generates configuration files
  • validate-config.sh - Validates generated configurations for syntax and completeness

Enhanced Makefile Targets

  • configure-local / configure-production - Generate environment-specific configs
  • validate-config / validate-config-production - Validate configurations
  • deploy-local / deploy-production - Full deployment workflows

🔧 Key Features

1. Template-Based Configuration

All configuration files are now generated from templates using environment variables:

  • No more hardcoded values in configuration files
  • Environment-specific settings (logging levels, database URLs, etc.)
  • Secrets management through environment variables

2. Environment Separation

  • Local: Debug logging, direct access, SQLite/MySQL support
  • Production: Info logging, reverse proxy mode, MySQL database, SSL ready

3. Validation & Quality Assurance

  • TOML and YAML syntax validation
  • Environment-specific configuration checks
  • Template variable substitution verification
  • Linting integration with project standards

4. Docker Compose Integration

  • Generated .env files for Docker Compose
  • Consistent variable naming across all services
  • Environment-specific service configurations

🚀 Usage

Quick Start - Local Development

make configure-local    # Generate local configurations
make validate-config    # Validate configurations
make deploy-local       # Deploy with local settings

Production Deployment

# Set required secrets as environment variables
export TORRUST_PROD_DATABASE_URL="mysql://user:pass@host:3306/db"
export TORRUST_PROD_API_TOKEN="your-api-token"
# ... other production secrets

make configure-production  # Generate production configurations
make validate-config-production  # Validate configurations
make deploy-production    # Deploy to production

🛡️ Security Improvements

  • No hardcoded secrets - All sensitive values use environment variables
  • Environment isolation - Clear separation between local and production configs
  • Validation safeguards - Prevent deployment with misconfigured settings

📋 Breaking Changes

  • Configuration files are now generated and should not be edited directly
  • .env files must be generated using the new scripts
  • Database configuration now defaults to MySQL for both environments

🧪 Testing

All changes have been validated with:

  • Linting - yamllint, shellcheck, markdownlint
  • Configuration validation - Both local and production environments
  • Template processing - All templates generate correctly
  • Docker Compose integration - Services start with generated configs

📚 Documentation

Updated documentation includes:

  • Phase 1 implementation guide with detailed migration steps
  • Configuration management workflow documentation
  • Makefile target reference and usage examples

🔄 Next Steps (Future PRs)

  • Phase 2: Infrastructure as Code enhancements
  • Phase 3: Service isolation and containerization improvements
  • Phase 4: Monitoring and observability integrations

Ready for Review: This PR represents a complete Phase 1 implementation of the 12-Factor App methodology. All existing functionality is preserved while adding robust configuration management capabilities.

…on management

- Add environment-based configuration system with local and production environments
- Create configuration templates for tracker.toml and prometheus.yml with variable substitution
- Implement configure-env.sh script for processing environment files and generating configs
- Add validate-config.sh script for comprehensive configuration validation
- Support colored logging output consistent with existing project scripts
- Include comprehensive validation for TOML/YAML syntax and environment-specific settings
- Remove staging environment references (only local and production needed)
- Update yamllint configuration to exclude application/storage directory from linting
- Update documentation and migration guide to reflect current implementation

This establishes the foundation for environment-based configuration management
as outlined in the 12-Factor App methodology.
- Fix validate-config.sh to skip yamllint for files in application/storage/
- Use basic Python YAML validation for ignored directories instead
- Fix grep pattern for template variable validation (avoid curly brace errors)
- Add docker-compose.env.tpl template for Docker Compose environment
- Update environment files with all required variables for production
- Ensure validation works correctly for both local and production environments

All linting checks now pass without warnings while respecting project ignore rules.
@josecelano josecelano self-assigned this Jul 8, 2025
@josecelano josecelano requested a review from da2ce7 July 8, 2025 17:54
@josecelano josecelano linked an issue Jul 8, 2025 that may be closed by this pull request
18 tasks
- Remove LOG_LEVEL, ON_REVERSE_PROXY, and TRACKER_PRIVATE from environment files
- Use sensible defaults in tracker.toml.tpl that work across environments
- Update validation script to reflect unified configuration approach
- Add production.env.tpl template and remove production.env from tracking
- Ensure production.env with secrets is properly ignored
- All environments now use: private=true, on_reverse_proxy=true, external_ip=0.0.0.0
- Administrators can modify config after deployment if needed

This follows 12-Factor App methodology by using environment variables
only for secrets and essential template generation values.
- Fix environment variable names in compose.yaml:
  * TORRUST_TRACKER_CONFIG_OVERRIDE_CORE__DATABASE__PATH (was URL)
  * TORRUST_TRACKER_CONFIG_OVERRIDE_HTTP_API__ACCESS_TOKENS__ADMIN
- Change database config from 'url' to 'path' in tracker.toml.tpl
- Add comprehensive TOML configuration example in template header
- Replace JSON example with TOML format for consistency
- Update documentation with proper Figment override patterns
- All changes follow torrust-tracker-configuration conventions

These changes ensure the tracker receives correct environment variable
names following the Figment configuration override pattern with
double underscores (__) separating nested sections.
- Add security warning to prometheus.yml.tpl about plain text token storage
- Document Prometheus limitation with runtime environment variable substitution
- Include TODO items for researching safer secret injection methods:
  * Prometheus file_sd_configs with dynamic token refresh
  * External authentication proxy (oauth2-proxy, etc.)
  * Vault integration or secret management solutions
  * Init containers to generate configs with short-lived tokens

- Add security documentation to .env.production about plain text secrets
- Explain runtime secret injection alternatives for Docker Compose
- Provide practical examples for secure deployment workflows
- Mention Docker secrets and external secret management options

These changes improve security awareness and provide clear paths
for implementing enhanced secret management in production deployments.
… new config workflow

- Add [PROJECT_ROOT], [TERRAFORM_DIR], [VM_REMOTE] indicators to all commands
- Document new template-based configuration system (Step 1.7)
- Add 'Working Directory Confusion' as most common troubleshooting issue
- Include configure-local and validate-config workflow steps
- Enhance directory context warnings and error recovery guidance
- Update summary to reflect new configuration workflow timing
- Fix markdown formatting and lint issues

Resolves the directory confusion issue encountered during testing where
make commands failed when run from wrong directory (infrastructure/terraform
vs project root).
… IP detection issues

- Add Step 2.4 'Refresh OpenTofu State' in main workflow to prevent 'No IP assigned yet' issues
- Update workflow summary to include state refresh step
- Add lesson learned about OpenTofu state synchronization
- This prevents the common issue where libvirt provider state becomes stale after cloud-init
- Resolves the exact issue encountered during integration testing session
- Add TORRUST_TRACKER_CONFIG_OVERRIDE_CORE__DATABASE__DRIVER to compose.yaml
- Update docker-compose.env.tpl to include tracker database environment variables
- Update tracker.toml.tpl comments to reflect both driver and path env overrides
- Update tracker.toml.tpl example to show mysql instead of sqlite3

This fixes the issue where the tracker container was using sqlite3 driver
with mysql connection string, causing database connection errors.

Changes:
- application/compose.yaml: Add mysql driver environment variable
- infrastructure/config/templates/docker-compose.env.tpl: Add tracker DB config vars
- infrastructure/config/templates/tracker.toml.tpl: Update comments and examples
- infrastructure/tests/test-integration.sh: Previous improvements to use local repo
@josecelano josecelano removed a link to an issue Jul 9, 2025
18 tasks
…e and proper testing procedures

- Add critical network architecture understanding (double virtualization)
- Document proper API testing procedures with authentication tokens
- Clarify port access rules (nginx proxy vs internal Docker ports)
- Add comprehensive endpoint testing examples with jq
- Improve manual cleanup instructions for libvirt volumes
- Document common testing mistakes and correct procedures
- Add troubleshooting for volume conflicts and deployment issues
- Include advanced testing commands and monitoring verification
- Update timing expectations and success indicators
… and fixes

- Add detailed external smoke testing section using official Torrust client tools
- Document proper API authentication and nginx proxy usage
- Add troubleshooting for common issues (firewall, proxy headers, tracker visibility)
- Include performance validation and response time expectations
- Update cleanup procedures and resource management
- Add comprehensive summary of successful test results
- Fix tracker configuration: set private=false for public tracker operation
- Document critical configuration fixes and lessons learned
- Improve working directory context and command execution guidance
@da2ce7
Copy link

da2ce7 commented Jul 9, 2025

@josecelano
Copy link
Member Author

@josecelano I did open a Pull Request on this branch:

Good Work so far! :)

Hi @da2ce7, thanks!. I will review and reply to your issue and PR properly. This is my "fast" reply.

All your proposals look excellent, and I agree that they would move the repo to the next level. There have been some decisions from my side that have not been shared publicly in the repo yet, but there are some reasons why the project is in its current state.

The first objective for this project was:

  • Migrate the tracker demo to a new provider (cheaper and hopefully more performant)
  • Split the demo into: tracker demo and index demo. This repository is only for the tracker.

Then I decided to take the opportunity to also:

  • Create a more declarative installation:
    • Increasing the percentage of the automated task in the tracker install from 5% to 90%.
    • Improving the documentation about the tracker infrastructure requirements.

At the same time, I decided to:

  • Learn more about how to use AI models to increase productivity.

I have not written a single line directly, but I have learned a lot about providing context and guidance to the model.

That was the initial state.

On the other hand, I don't have a good level of system administration skills, so I decided to let the model propose the initial architecture. I assumed the model would be better suited to building a fast solution using the most common technologies for this type of work. I didn't want to expend that much time on this repo since it's not one of the highest priorities in the Torrust roadmap.

There were several things I didn't like, and I wouldn't do it that way if I were to start doing it on my own. For example:

  • I don't like makefiles, or at least the current code. I would have a hierarchy (subcommand), cleaner code and tests.
  • I would do it in small increments, starting from the infrastructure requirements.
  • I would try to add unit tests for all like: creating the base VM, installing dependencies, etc. Right now, there are only high-level tests. But the best one has to be run manually (docs/guides/integration-testing-guide.md). I mean, with the AI.

However, I decided to continue in this way because:

  • It gives us a working product without expending too much time. And it's valuable for sys admins who want to install the tracker. It's a "live" documentation.
  • I thought we could use it as a base implementation. I thought it would be easy to instruct the AI model to modify this example to use a different tool rather than starting with another solution that is not commonly used. I guess that would require much more context and guidance. But I'm learning in the process.

Later on, since I was moving quickly, I decided to refactor a little to at least apply the 12-factor methodology to the current solution. I'm in that process now. I think it will take me a couple more days to finish this refactor. In fact, I asked you if I should continue with this refactor, as I thought it might be totally out of the initial scope.

Regarding your proposals, if we want to continue improving the repo, I suggest reviewing and estimating them. One of the benefits of using something more popular is that it would be easier for a sysadmin with experience if we use well-known tools. However, given that we can now use AI, a more reliable and robust solution is a better alternative.

In your proposal, I missed two tools:

A friend of mine (sysadmin) is a big fan of them, and I tried to use them in another project (without AI) with no success. It was hard to write nutshell scripts.

…approach

- Refactored setup_torrust_tracker function to use git archive for clean file copying
- Updated installation script to require .env file (12-factor principle)
- Removed fallback to .env.production in favor of infrastructure-generated configs
- Added proper error handling and validation in installation process
- Updated integration testing guide to reflect new workflow
- Added deprecation notice to .env.production file
- Created refactoring documentation with migration notes

This change ensures integration tests use the same configuration approach as
production deployments and follows 12-factor principles strictly.
- Fixed integration test script git archive extraction logic
- Added .env file transfer to VM in integration test setup
- Updated API endpoint testing to use correct authentication tokens
- Fixed test endpoints to use nginx proxy (port 80) instead of internal ports
- Updated integration testing guide with successful test results and critical details
- Added authentication requirements and correct API usage examples
- Documented network architecture and port access patterns

Integration tests now pass completely:
✅ VM deployment and SSH access
✅ Docker services (MySQL, Tracker, Prometheus, Grafana, Nginx)
✅ API endpoints with authentication (health check, stats API)
✅ UDP tracker ports (6868, 6969) listening
✅ Monitoring services health checks

This completes the 12-factor app refactoring with working end-to-end validation.
…figuration management

- Move nginx.conf template from application/share/container/default/config/ to infrastructure/config/templates/nginx.conf.tpl
- Update install.sh to only verify nginx.conf exists (not create it)
- Update infrastructure/scripts/configure-env.sh to generate nginx.conf from template
- Add application/storage cleanup to clean-and-fix target with user confirmation
- Improve separation of concerns: infrastructure manages templates, application verifies files exist
- Follow 12-factor principles: configuration is generated by infrastructure, not application

This ensures the nginx.conf file is always generated with the latest configuration
and integration tests work properly with clean application/storage directories.
- Remove application/.env.production (now in infrastructure/config/environments/)
- Remove application/share/container/default/config/prometheus.yml (now generated by infrastructure)
- Remove application/share/container/default/config/tracker.prod.container.sqlite3.toml (now generated by infrastructure)

This completes the 12-factor app refactoring by moving configuration generation
to the infrastructure layer, making the application layer purely focused on
runtime deployment.
- Add comprehensive comments explaining envsubst variable escaping
- Fix proxy_set_header directives using  for nginx variables
- Add DOLLAR=$ environment variable for template processing
- Add TODO comments for HTTPS configuration section fixes
- Resolves nginx proxy configuration errors during service startup
…ng guide

- Add Step 1.8: Clean Application Storage (Optional but Recommended)
- Include destructive operation warnings for database, SSL certs, config files
- Provide both interactive and non-interactive cleanup approaches
- Add verification steps to confirm complete cleanup
- Prevents issues from corrupted data, expired certificates, and stale configs
…tructions

- Add .github/prompts/run-integration-testing-guide.prompt.md
- Provide clear 5-phase execution workflow for contributors
- Include critical rules, destructive operation warnings, and success criteria
- Add troubleshooting guidance and common issue solutions
- Document SSH key configuration requirements and cloud-init timing
- Specify concrete deliverables and documentation requirements
- Link to integration testing guide and smoke testing guide
- Improve clarity, reduce ambiguity, and add comprehensive context
@josecelano
Copy link
Member Author

ACK 77c3fa3

@josecelano josecelano marked this pull request as ready for review July 10, 2025 07:33
@josecelano josecelano merged commit 2a4ddf3 into main Jul 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants