Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,5 @@ deno.lock
/project.inlang/settings.json
/src/lib/generated
/src/temp
/l10n-cage

# Environment-specific cache files
/.inlang-settings-cache.json
326 changes: 326 additions & 0 deletions L10N.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
# Localization (L10n) Developer Guide

This guide covers the complete l10n system for developers working on the PauseAI website.

## Overview

The l10n system automatically generates locale-appropriate content using LLMs. It operates in different modes based on your environment, tries to stay dormant in default development, and provides safety mechanisms to prevent accidental overwrites of production content.

Lots of detail here, but main message is that unless you are developing the system itself, you don't have to dig into them. Let our CI/CD feed the l10ns and let you preview them if necessary.

### Key Concepts

- **Locales**: Specific language/region combinations (e.g., `en`, `de`, `nl`)
- **L10n Cage**: Git repository storing all generated l10n content (Cache Adopting Git Engine)
- **Modes**: Different operating modes based on environment and configuration
- **Branch Safety**: Prevents local development from writing to production cache

## L10n Modes

The system automatically determines its operating mode:

### 1. **en-only** Mode

- **When**: Only English locale configured (`PARAGLIDE_LOCALES=en` or undefined on a dev machine)
- **Behavior**: No l10ns bothered, maximum speed
- **Use case**: Default for local development

### 2. **dry-run** Mode

- **When**: Multiple locales are configured, but either the LLM API key isn't given or is suppressed using the `--dry-run` flag
- **Behavior**: Uses l10ns already captured in the cage, no LLM calls
- **Use case**: Work with existing l10ns without generating new content

### 3. **perform** Mode

- **When**: Multiple locales + valid API key + proper branch
- **Behavior**: Prompts LLMs to capture new l10ns into the cage, locks it by committing, ships it by pushing
- **Use case**: Generating new/updated l10n content

## CLI Usage

### Automatic L10n Integration

Running servers or builds automatically invokes l10n operations:

```bash
# Development server - runs l10n in dry-run mode only
pnpm dev

# Production build - runs l10n in whatever mode is configured
pnpm build
```

### Manual L10n Commands

```bash
# Default mode (usually en-only for local dev)
pnpm l10n

# Dry-run mode (use cached content only)
pnpm l10n --dry-run

# Verbose output for debugging
pnpm l10n --verbose

# Force retranslation of specific files
pnpm l10n --force "*.md"
pnpm l10n --force "2024-*.md"
pnpm l10n --force "faq.md" "proposal.md"
```

### Force Mode Patterns

Force mode supports glob patterns for selective retranslation:

- `*.md` - All markdown files
- `2024-*.md` - Files starting with "2024-"
- `**/important.md` - Files named "important.md" anywhere
- `{faq,proposal}.md` - Specific files using brace expansion

## Environment Configuration

### Locale Selection

```bash
# English only (default for local development)
PARAGLIDE_LOCALES=en

# Specific locales
PARAGLIDE_LOCALES=en,nl,de

# All available locales (default in CI)
PARAGLIDE_LOCALES=all

# Exclude specific locales
PARAGLIDE_LOCALES=-fr,es # All except French and Spanish
```

### L10n Capture

```bash
# Enable LLM to be prompted to find new l10ns (required for perform mode)
L10N_OPENROUTER_API_KEY="your-api-key-here"

# Override which l10n cage branch to use (optional)
L10N_BRANCH="my-feature-branch"
```

The `L10N_BRANCH` override might be useful when you want to:

- Use a different cage branch than your current website branch
- Test l10n content from another branch
- Work around branch naming conflicts

## Branch Safety System

The system prevents accidental writes to the production cache:

### Local Development Rules

- **Cannot write to main branch** of the l10n cage from local development
- **Must work on feature branches** in the website repository if new l10ns are to be captured

### CI/CD Exception

- **CI environments can write to main** for production deployments
- **Branch detection** uses environment variables and Git state to separate distinct work

### Override Options

If you need to work around branch safety:

1. **Use a feature branch** in the website repository (recommended):

```bash
git checkout -b my-l10n-work
pnpm l10n # This creates/uses matching l10n cage branch
```

2. **Set explicit branch** in environment:
```bash
L10N_BRANCH="my-branch" pnpm l10n
```

## L10n Cage Architecture

### How the Cage Works

The l10n cage is a Cache Adopting Git's Engine: a [repository](https://github.com/PauseAI/paraglide) that gets cloned locally as `l10n-cage/`.

The cage:

- **Holds captured l10n content** - in dry-run mode, you access cached l10ns previously captured
- **Captures new l10n content** - in perform mode, LLMs are prompted to find new l10ns that are captured in the cage
- **Locks newly captured l10ns by committing** - to cache the work and prevent reperforming it
- **Ships the captured l10ns to safety by pushing** - makes it safe to e.g. clean out the cage

This provides:

- **Version control** of all l10n decisions and changes
- **Branch isolation** between different feature development
- **Audit trail** of what was generated when and why
- **Cost efficiency** by avoiding duplicate LLM calls

### Cage Structure

```
l10n-cage/
├── json/ # Aggregated short messages
│ ├── de.json
│ ├── nl.json
│ └── ...
└── md/ # Localized markdown pages
├── de/
│ ├── faq.md
│ ├── proposal.md
│ └── ...
├── nl/
└── ...
```

### Branch Strategy

- **main branch**: Production l10n content in the cage
- **Feature branches**: L10n operations create matching cage branches (website `my-feature` → cage `my-feature`)
- **Branch naming**: See [`l10nCageBranch()` function](./scripts/l10n/branch-safety.ts) for complete logic
- **Promotion**: You can use standard git merge commands to promote l10n commits between cage branches

## Development Workflow

### Day-to-Day Content Changes

**Most developers never need to run l10n locally** - the CI/CD system handles it automatically:

1. **Edit content** in English using Decap CMS or direct file editing
2. **Create pull request** from your website branch or fork - this triggers l10n work on a matching cage branch
3. **CMS users**: The CMS creates pull requests automatically, which also trigger l10n generation
4. **L10n is generated automatically** in CI/CD on the appropriate cage branch
5. **Preview l10n results** in staging before merging

**Current limitation**: While main branch is locked to en-only locale, target your pull requests to `l10-preview` or similar branches that have all locales enabled in order to have l10ns automatically captured.

### L10n Development

When working on l10n system improvements:

1. **Set up environment**:

```bash
cp template.env .env
# Add your L10N_OPENROUTER_API_KEY
```

2. **Work on a feature branch** in the website repository:

```bash
git checkout -b improve-german-l10n
```

3. **Test changes**:

```bash
# Preview/estimate the pending l10n-capturing work
pnpm l10n --dry-run --verbose

# Perform the captures
PARAGLIDE_LOCALES=en,de pnpm l10n
```

4. **Force recapturing particular l10ns**:
```bash
pnpm l10n --force "specific-file.md"
```

### Reusing L10n Content Between Branches

If you want to use l10n content you already captured in another cage branch without regenerating it (to avoid cost or non-determinism), you can promote commits between l10n cage branches:

```bash
# Switch to the l10n cage directory
cd l10n-cage

# Merge commits from another branch
git checkout my-current-branch
git merge other-feature-branch

# Return to website directory
cd ..
```

This lets you reuse expensive LLM-generated content across different feature branches.

### Adding New Locales

1. **Add locale to configuration**:

```javascript
// project.inlang/default-settings.js
locales: ['en', 'nl', 'de', 'fr'] // Add 'fr'
```

2. **Estimate work locally**:

```bash
PARAGLIDE_LOCALES=en,fr pnpm l10n --dry-run --verbose
```

3. You could **perform it locally** or **through a pull request**

## Troubleshooting

### Common Issues

**"Cannot write to main branch"**

- Solution: Work on a feature branch, hide your API key, or use `--dry-run`

**"API key too short"**

- Solution: Set a valid `L10N_OPENROUTER_API_KEY` (10+ characters)

**"Git push authentication failed"**

- Solution: Set up GitHub Personal Access Token or use `--dry-run`

**Missing l10n files**

- Solution: Run `pnpm l10n --dry-run` to ensure cache is set up

### Debug Mode

For detailed debugging:

```bash
pnpm l10n --dry-run --verbose
```

This shows:

- Mode determination reasoning
- Cache setup details
- File processing information
- Branch and environment status, and estimated costs

## Production Considerations

### CI/CD Integration

The l10n system is designed to run automatically in CI/CD:

- **Defaults to all locales** in CI environments without overrides
- **Uses production API keys** from environment
- **Writes to appropriate cache branches** [based on deployment context](./scripts/l10n/branch-safety.ts#:~:text=function%20l10nCageBranch)
- **Integrates with build process** via `pnpm build`

### Performance

- **Cache-first approach**: Avoids duplicate LLM calls
- **Selective regeneration**: Only processes changed content
- **Branch isolation**: Prevents conflicts between parallel development

### Cost Management

- **Git-based caching** eliminates duplicate API calls
- **Force mode** allows targeted regeneration when needed
- **Dry-run mode** enables testing without API costs
Loading