diff --git a/.gitignore b/.gitignore index f03ac6ece..7f6aa6fac 100644 --- a/.gitignore +++ b/.gitignore @@ -23,6 +23,5 @@ deno.lock /project.inlang/settings.json /src/lib/generated /src/temp +/l10n-cage -# Environment-specific cache files -/.inlang-settings-cache.json diff --git a/L10N.md b/L10N.md new file mode 100644 index 000000000..2cc04f57d --- /dev/null +++ b/L10N.md @@ -0,0 +1,326 @@ +# Localization (L10n) Developer Guide + +This guide covers the complete l10n system for developers working on the PauseAI website. + +## Overview + +The l10n system automatically generates locale-appropriate content using LLMs. It operates in different modes based on your environment, tries to stay dormant in default development, and provides safety mechanisms to prevent accidental overwrites of production content. + +Lots of detail here, but main message is that unless you are developing the system itself, you don't have to dig into them. Let our CI/CD feed the l10ns and let you preview them if necessary. + +### Key Concepts + +- **Locales**: Specific language/region combinations (e.g., `en`, `de`, `nl`) +- **L10n Cage**: Git repository storing all generated l10n content (Cache Adopting Git Engine) +- **Modes**: Different operating modes based on environment and configuration +- **Branch Safety**: Prevents local development from writing to production cache + +## L10n Modes + +The system automatically determines its operating mode: + +### 1. **en-only** Mode + +- **When**: Only English locale configured (`PARAGLIDE_LOCALES=en` or undefined on a dev machine) +- **Behavior**: No l10ns bothered, maximum speed +- **Use case**: Default for local development + +### 2. **dry-run** Mode + +- **When**: Multiple locales are configured, but either the LLM API key isn't given or is suppressed using the `--dry-run` flag +- **Behavior**: Uses l10ns already captured in the cage, no LLM calls +- **Use case**: Work with existing l10ns without generating new content + +### 3. **perform** Mode + +- **When**: Multiple locales + valid API key + proper branch +- **Behavior**: Prompts LLMs to capture new l10ns into the cage, locks it by committing, ships it by pushing +- **Use case**: Generating new/updated l10n content + +## CLI Usage + +### Automatic L10n Integration + +Running servers or builds automatically invokes l10n operations: + +```bash +# Development server - runs l10n in dry-run mode only +pnpm dev + +# Production build - runs l10n in whatever mode is configured +pnpm build +``` + +### Manual L10n Commands + +```bash +# Default mode (usually en-only for local dev) +pnpm l10n + +# Dry-run mode (use cached content only) +pnpm l10n --dry-run + +# Verbose output for debugging +pnpm l10n --verbose + +# Force retranslation of specific files +pnpm l10n --force "*.md" +pnpm l10n --force "2024-*.md" +pnpm l10n --force "faq.md" "proposal.md" +``` + +### Force Mode Patterns + +Force mode supports glob patterns for selective retranslation: + +- `*.md` - All markdown files +- `2024-*.md` - Files starting with "2024-" +- `**/important.md` - Files named "important.md" anywhere +- `{faq,proposal}.md` - Specific files using brace expansion + +## Environment Configuration + +### Locale Selection + +```bash +# English only (default for local development) +PARAGLIDE_LOCALES=en + +# Specific locales +PARAGLIDE_LOCALES=en,nl,de + +# All available locales (default in CI) +PARAGLIDE_LOCALES=all + +# Exclude specific locales +PARAGLIDE_LOCALES=-fr,es # All except French and Spanish +``` + +### L10n Capture + +```bash +# Enable LLM to be prompted to find new l10ns (required for perform mode) +L10N_OPENROUTER_API_KEY="your-api-key-here" + +# Override which l10n cage branch to use (optional) +L10N_BRANCH="my-feature-branch" +``` + +The `L10N_BRANCH` override might be useful when you want to: + +- Use a different cage branch than your current website branch +- Test l10n content from another branch +- Work around branch naming conflicts + +## Branch Safety System + +The system prevents accidental writes to the production cache: + +### Local Development Rules + +- **Cannot write to main branch** of the l10n cage from local development +- **Must work on feature branches** in the website repository if new l10ns are to be captured + +### CI/CD Exception + +- **CI environments can write to main** for production deployments +- **Branch detection** uses environment variables and Git state to separate distinct work + +### Override Options + +If you need to work around branch safety: + +1. **Use a feature branch** in the website repository (recommended): + + ```bash + git checkout -b my-l10n-work + pnpm l10n # This creates/uses matching l10n cage branch + ``` + +2. **Set explicit branch** in environment: + ```bash + L10N_BRANCH="my-branch" pnpm l10n + ``` + +## L10n Cage Architecture + +### How the Cage Works + +The l10n cage is a Cache Adopting Git's Engine: a [repository](https://github.com/PauseAI/paraglide) that gets cloned locally as `l10n-cage/`. + +The cage: + +- **Holds captured l10n content** - in dry-run mode, you access cached l10ns previously captured +- **Captures new l10n content** - in perform mode, LLMs are prompted to find new l10ns that are captured in the cage +- **Locks newly captured l10ns by committing** - to cache the work and prevent reperforming it +- **Ships the captured l10ns to safety by pushing** - makes it safe to e.g. clean out the cage + +This provides: + +- **Version control** of all l10n decisions and changes +- **Branch isolation** between different feature development +- **Audit trail** of what was generated when and why +- **Cost efficiency** by avoiding duplicate LLM calls + +### Cage Structure + +``` +l10n-cage/ +├── json/ # Aggregated short messages +│ ├── de.json +│ ├── nl.json +│ └── ... +└── md/ # Localized markdown pages + ├── de/ + │ ├── faq.md + │ ├── proposal.md + │ └── ... + ├── nl/ + └── ... +``` + +### Branch Strategy + +- **main branch**: Production l10n content in the cage +- **Feature branches**: L10n operations create matching cage branches (website `my-feature` → cage `my-feature`) +- **Branch naming**: See [`l10nCageBranch()` function](./scripts/l10n/branch-safety.ts) for complete logic +- **Promotion**: You can use standard git merge commands to promote l10n commits between cage branches + +## Development Workflow + +### Day-to-Day Content Changes + +**Most developers never need to run l10n locally** - the CI/CD system handles it automatically: + +1. **Edit content** in English using Decap CMS or direct file editing +2. **Create pull request** from your website branch or fork - this triggers l10n work on a matching cage branch +3. **CMS users**: The CMS creates pull requests automatically, which also trigger l10n generation +4. **L10n is generated automatically** in CI/CD on the appropriate cage branch +5. **Preview l10n results** in staging before merging + +**Current limitation**: While main branch is locked to en-only locale, target your pull requests to `l10-preview` or similar branches that have all locales enabled in order to have l10ns automatically captured. + +### L10n Development + +When working on l10n system improvements: + +1. **Set up environment**: + + ```bash + cp template.env .env + # Add your L10N_OPENROUTER_API_KEY + ``` + +2. **Work on a feature branch** in the website repository: + + ```bash + git checkout -b improve-german-l10n + ``` + +3. **Test changes**: + + ```bash + # Preview/estimate the pending l10n-capturing work + pnpm l10n --dry-run --verbose + + # Perform the captures + PARAGLIDE_LOCALES=en,de pnpm l10n + ``` + +4. **Force recapturing particular l10ns**: + ```bash + pnpm l10n --force "specific-file.md" + ``` + +### Reusing L10n Content Between Branches + +If you want to use l10n content you already captured in another cage branch without regenerating it (to avoid cost or non-determinism), you can promote commits between l10n cage branches: + +```bash +# Switch to the l10n cage directory +cd l10n-cage + +# Merge commits from another branch +git checkout my-current-branch +git merge other-feature-branch + +# Return to website directory +cd .. +``` + +This lets you reuse expensive LLM-generated content across different feature branches. + +### Adding New Locales + +1. **Add locale to configuration**: + + ```javascript + // project.inlang/default-settings.js + locales: ['en', 'nl', 'de', 'fr'] // Add 'fr' + ``` + +2. **Estimate work locally**: + + ```bash + PARAGLIDE_LOCALES=en,fr pnpm l10n --dry-run --verbose + ``` + +3. You could **perform it locally** or **through a pull request** + +## Troubleshooting + +### Common Issues + +**"Cannot write to main branch"** + +- Solution: Work on a feature branch, hide your API key, or use `--dry-run` + +**"API key too short"** + +- Solution: Set a valid `L10N_OPENROUTER_API_KEY` (10+ characters) + +**"Git push authentication failed"** + +- Solution: Set up GitHub Personal Access Token or use `--dry-run` + +**Missing l10n files** + +- Solution: Run `pnpm l10n --dry-run` to ensure cache is set up + +### Debug Mode + +For detailed debugging: + +```bash +pnpm l10n --dry-run --verbose +``` + +This shows: + +- Mode determination reasoning +- Cache setup details +- File processing information +- Branch and environment status, and estimated costs + +## Production Considerations + +### CI/CD Integration + +The l10n system is designed to run automatically in CI/CD: + +- **Defaults to all locales** in CI environments without overrides +- **Uses production API keys** from environment +- **Writes to appropriate cache branches** [based on deployment context](./scripts/l10n/branch-safety.ts#:~:text=function%20l10nCageBranch) +- **Integrates with build process** via `pnpm build` + +### Performance + +- **Cache-first approach**: Avoids duplicate LLM calls +- **Selective regeneration**: Only processes changed content +- **Branch isolation**: Prevents conflicts between parallel development + +### Cost Management + +- **Git-based caching** eliminates duplicate API calls +- **Force mode** allows targeted regeneration when needed +- **Dry-run mode** enables testing without API costs diff --git a/README.md b/README.md index cab31635c..0faa36448 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,60 @@ -# PauseAI.info website +# PauseAI.info Website [![Netlify Status](https://api.netlify.com/api/v1/badges/628797a4-8d5a-4b5f-94d7-236b4604b23c/deploy-status)](https://app.netlify.com/sites/pauseai/deploys) -SvelteKit website for [PauseAI.info](https://pauseai.info/). +A SvelteKit website for [PauseAI.info](https://pauseai.info/) with automatic localization (l10n) support. + +## What is Localization (L10n)? + +Localization goes beyond simple translation — it adapts content for specific locales (e.g., `en`, `de`, sometimes more complicated combinations such as `en-US` or `fr-CA`). While translating text between languages is a major part of l10n, true localization also considers cultural context, regional preferences, and local conventions. This project can use LLMs to automatically generate locale-appropriate content. + +If you are not yourself developing/changing the l10n system, you can let it run automatically. + +## Quick Start + +```bash +# Clone the repository +git clone git@github.com:PauseAI/pauseai-website.git +cd pauseai-website + +# Install dependencies (we use pnpm, but npm or yarn also work) +pnpm install + +# Start development server (en-only mode) +pnpm dev + +# Open http://localhost:37572 +``` + +That's it! By default, all commands run in English-only mode for maximum speed. No API keys or special setup required. + +## Development Commands + +| Command | Description | +| -------------- | -------------------------------------------- | +| `pnpm dev` | Start development server | +| `pnpm build` | Build for production | +| `pnpm preview` | Preview production build | +| `pnpm test` | Run test suite | +| `pnpm lint` | Check code style | +| `pnpm format` | Auto-fix code style | +| `pnpm clean` | Clean build artifacts and caches | +| `pnpm l10n` | Run l10n manually (see [L10N.md](./L10N.md)) | +| `pnpm netlify` | Show Netlify preview options | + +## Environment Setup - for l10n and some dynamic pages + +To make some development choices, including seeing dynamic pages or l10ns locally, you'll have to set up your development environment. + +- But **no `.env` is needed** for basic development +- **Dynamic content** (teams, chat, geo, email validation, write features) needs API keys but has fallbacks +- **Multiple locales** needs `PARAGLIDE_LOCALES` set, while **developing the l10n system itself** usually also needs an OpenRouter API key + +Just start by copying the environment template. Comments in the template explain more about each option. +```bash +cp template.env .env +# then edit .env to add API keys and configure locales if required +``` ## Creating Articles @@ -31,41 +83,9 @@ You can create and edit most content using [Decap CMS](https://pauseai-cms.netli - **Ready**: The article is ready to be published. 8. Decap CMS will automatically create a pull request on GitHub to submit your changes for review. -The article will be published (and localized) automatically after approval. - -If you are sufficiently changing prominent text, consider inspecting the relevant automatic localizations/translations as a preview. - -## Running locally - -```bash - git clone git@github.com:PauseAI/pauseai-website.git - - # Instead of pnpm you could use npm or yarn - - # Install dependencies - pnpm install - - # Start development server. If it needs to, it will perform some further development setup before it runs. - pnpm dev - - # Open http://localhost:37572 -``` - -## Working with locales and other features - -To make other development choices, including building locally rather than running under dev, you'll have to set up your development environment. - -Start by copying the environment template. - -```bash -cp template.env .env -``` - -The setup script and copied template explain more. - -Briefly, some dynamic pages need API keys for service calls, and website translations require more opting in. +The article (and automatic l10ns of same) will be previewable. -We cache content for other locales in a [git repository](https://github.com/PauseAI/paraglide) that you clone locally. That's enough to access existing translations for other locales during local development - set the ones you want in your environment. New/changed translations are generated using LLMs, and you can further opt in to test that locally and even add new locales in `project.inlang/settings.json`, but for day-to-day content changes it is easier to wait for any updated translations to be automatically generated by our pre-production build, and previewed. +If you are sufficiently changing prominent text, consider inspecting relevant l10ns as well as the original. ## Deployment diff --git a/package.json b/package.json index beaaa5cd8..2ddc01797 100644 --- a/package.json +++ b/package.json @@ -6,23 +6,14 @@ "scripts": { "clean": "tsx scripts/clean.ts", "inlang:settings": "tsx scripts/inlang-settings.ts", - "inlang:force": "tsx scripts/inlang-settings.ts --force", - "l10n": "tsx scripts/translation/translate", - "l10n:debug": "tsx scripts/translation/translate --mode debug", - "l10n:dry-run": "tsx scripts/translation/translate --dryRun", - "l10n:estimate": "tsx scripts/translation/translate --dryRun --verbose", - "l10n:spend": "cross-env L10N_FORCE_TRANSLATE=true tsx scripts/translation/translate", - "l10n:tamer": "tsx scripts/l10ntamer.ts", - "l10n:tamer:validate": "tsx scripts/l10ntamer.ts --validate", - "dev": "run-s dev:steps", - "dev:steps": "run-s inlang:settings l10n && vite dev --host 0.0.0.0", - "build": "cross-env NODE_ENV=production run-s build:steps", - "build:dev": "cross-env NODE_ENV=development node scripts/filter-build-log.js \"run-s build:steps\"", - "build:steps": "run-s inlang:settings l10n && vite build --emptyOutDir=false && run-s _postbuild:*", + "l10n": "tsx scripts/l10n/run", + "dev": "tsx scripts/l10n/run --dryRun && vite dev --host 0.0.0.0", + "build": "cross-env NODE_ENV=production node scripts/filter-build-log.js \"tsx scripts/l10n/run && vite build --emptyOutDir=false && run-s _postbuild:*\"", + "netlify": "echo '📦 Netlify CLI vs Standard Preview:\n\n• Fast preview (no Netlify features):\n pnpm build && pnpm preview\n\n• Test with Netlify runtime (edge functions, redirects):\n netlify serve (rebuilds first!)\n\n• Development with Netlify features:\n pnpm dev (terminal 1) + netlify dev (terminal 2)\n\nNote: Install with npm install -g netlify-cli'", "_postbuild:pagefind": "tsx scripts/create-pagefind-index.ts", "_postbuild:exclude": "tsx scripts/exclude-from-edge-function.ts", "_postbuild:caching": "tsx scripts/opt-in-to-caching.ts", - "_postbuild:l10ntamer": "tsx scripts/l10ntamer.ts --validate", + "_postbuild:l10ntamer": "tsx scripts/l10ntamer.ts", "test": "vitest", "preview": "vite preview", "check": "svelte-kit sync && svelte-check --tsconfig ./tsconfig.json", @@ -39,6 +30,8 @@ "@sveltejs/vite-plugin-svelte": "^3.1.2", "@types/escape-html": "^1.0.4", "@types/glidejs__glide": "^3.6.6", + "@types/js-cookie": "^3.0.6", + "@types/minimatch": "^5.1.2", "@types/minimist": "^1.2.5", "@types/node": "^20.19.0", "@types/remark-heading-id": "^1.0.0", @@ -55,6 +48,7 @@ "husky": "^9.1.7", "lint-staged": "^15.5.2", "mdsvex": "^0.11.2", + "minimatch": "9", "minimist": "^1.2.8", "npm-run-all2": "^6.2.6", "openai": "^4.104.0", diff --git a/pnpm-lock.yaml b/pnpm-lock.yaml index 5ef12350c..e742c38c2 100644 --- a/pnpm-lock.yaml +++ b/pnpm-lock.yaml @@ -120,6 +120,12 @@ importers: '@types/glidejs__glide': specifier: ^3.6.6 version: 3.6.6 + '@types/js-cookie': + specifier: ^3.0.6 + version: 3.0.6 + '@types/minimatch': + specifier: ^5.1.2 + version: 5.1.2 '@types/minimist': specifier: ^1.2.5 version: 1.2.5 @@ -165,6 +171,9 @@ importers: mdsvex: specifier: ^0.11.2 version: 0.11.2(svelte@4.2.20) + minimatch: + specifier: '9' + version: 9.0.5 minimist: specifier: ^1.2.8 version: 1.2.8 @@ -982,6 +991,9 @@ packages: '@types/hast@2.3.10': resolution: {integrity: sha512-McWspRw8xx8J9HurkVBfYj0xKoE25tOFlHGdx4MJ5xORQrMGZNqJhVQWaIbm6Oyla5kYOXtDiopzKRJzEOkwJw==} + '@types/js-cookie@3.0.6': + resolution: {integrity: sha512-wkw9yd1kEXOPnvEeEV1Go1MmxtBJL0RR79aOTAApecWFVu7w0NNXNqhcWgvw2YgZDYadliXkl14pa3WXw5jlCQ==} + '@types/json-schema@7.0.15': resolution: {integrity: sha512-5+fP8P8MFNC+AyZCDxrB2pkZFPGzqQWUzpSeuuVLvm8VMcorNYavBqoFcxK8bQz4Qsbn4oUEEem4wDLfcysGHA==} @@ -1003,6 +1015,9 @@ packages: '@types/mdast@4.0.4': resolution: {integrity: sha512-kGaNbPh1k7AFzgpud/gMdvIm5xuECykRR+JnWKQno9TAXVa6WIVCGTPvYGekIDL4uwCZQSYbUxNBSb1aUo79oA==} + '@types/minimatch@5.1.2': + resolution: {integrity: sha512-K0VQKziLUWkVKiRVrx4a40iPaxTUefQmjtkQofBkYRcoaaL/8rhwDWww9qWbrgicNOgnpIsMxyNIUM4+n6dUIA==} + '@types/minimist@1.2.5': resolution: {integrity: sha512-hov8bUuiLiyFPGyFPE1lwWhmzYbirOXQNNo40+y3zow8aFVTeyn3VWL0VFFfdNddA8S4Vf0Tc062rzyNr7Paag==} @@ -3365,6 +3380,8 @@ snapshots: dependencies: '@types/unist': 2.0.11 + '@types/js-cookie@3.0.6': {} + '@types/json-schema@7.0.15': {} '@types/mapbox-gl@2.7.21': @@ -3389,6 +3406,8 @@ snapshots: dependencies: '@types/unist': 3.0.3 + '@types/minimatch@5.1.2': {} + '@types/minimist@1.2.5': {} '@types/node-fetch@2.6.12': diff --git a/project.inlang/default-settings.js b/project.inlang/default-settings.js index 78f60682b..cd96fd88a 100644 --- a/project.inlang/default-settings.js +++ b/project.inlang/default-settings.js @@ -13,7 +13,7 @@ export default { 'https://cdn.jsdelivr.net/npm/@inlang/plugin-paraglide-js-adapter@latest/dist/index.js' ], 'plugin.inlang.messageFormat': { - pathPattern: './messages/{locale}.json' + pathPattern: './l10n-cage/json/{locale}.json' }, 'plugin.paraglide-js-adapter': { routing: { diff --git a/scripts/check-setup-needed.js b/scripts/check-setup-needed.js index 1cca69121..d6929f045 100644 --- a/scripts/check-setup-needed.js +++ b/scripts/check-setup-needed.js @@ -7,7 +7,7 @@ import fs from 'fs' import path from 'path' import dotenv from 'dotenv' import { execSync } from 'child_process' -import { L10NS_BASE_DIR, MARKDOWN_L10NS } from '../src/lib/l10n.ts' +import { L10N_CAGE_DIR, MARKDOWN_L10NS } from '../src/lib/l10n.ts' dotenv.config() @@ -16,9 +16,9 @@ let activeLocales = Array.from(runtimeModule.locales) let setupNeeded = false let reason = '' -if (!fs.existsSync(L10NS_BASE_DIR)) { +if (!fs.existsSync(L10N_CAGE_DIR)) { setupNeeded = true - reason = `Basic setup directory not found (${L10NS_BASE_DIR})` + reason = `Basic setup directory not found (${L10N_CAGE_DIR})` } const nonEnglishLocales = activeLocales.filter((locale) => locale !== 'en') @@ -41,13 +41,13 @@ if (nonEnglishLocales.length > 0) { } // Always check if translations repo is needed but missing -if (activeLocales.length > 1 && !fs.existsSync(path.join(L10NS_BASE_DIR, '.git'))) { +if (activeLocales.length > 1 && !fs.existsSync(path.join(L10N_CAGE_DIR, '.git'))) { setupNeeded = true reason = 'Translation repository not found' } // Check if English messages file is available for Paraglide -const enMessageTarget = path.join(L10NS_BASE_DIR, 'json', 'en.json') +const enMessageTarget = path.join(L10N_CAGE_DIR, 'json', 'en.json') if (!fs.existsSync(enMessageTarget)) { setupNeeded = true reason = 'English messages file not found' @@ -64,7 +64,7 @@ console.log('\n🔍 Environment check:') console.log(`- Active locales: ${activeLocales.join(', ')}`) console.log(`- SvelteKit initialized: ${fs.existsSync('.svelte-kit') ? 'yes ✓' : 'no ❌'}`) console.log( - `- Base directory (${L10NS_BASE_DIR}): ${fs.existsSync(L10NS_BASE_DIR) ? 'exists ✓' : 'missing ❌'}` + `- Base directory (${L10N_CAGE_DIR}): ${fs.existsSync(L10N_CAGE_DIR) ? 'exists ✓' : 'missing ❌'}` ) console.log( `- English messages (${enMessageTarget}): ${fs.existsSync(enMessageTarget) ? 'exists ✓' : 'missing ❌'}` diff --git a/scripts/clean.ts b/scripts/clean.ts index ba0df0122..c4fb35172 100644 --- a/scripts/clean.ts +++ b/scripts/clean.ts @@ -4,10 +4,30 @@ import fs from 'fs' import path from 'path' -import { removeMultiple } from './translation/utils' +import { removeMultiple } from './l10n/utils' +import { hasEndangeredL10ns } from './l10n/branch-safety' +import { L10N_CAGE_DIR, MESSAGE_L10NS } from '../src/lib/l10n' console.log('Cleaning generated files...') +// Remove en.json symlink if it exists (this is always safe to remove during clean) +const enJsonSymlink = path.join(MESSAGE_L10NS, 'en.json') +if (fs.existsSync(enJsonSymlink) && fs.lstatSync(enJsonSymlink).isSymbolicLink()) { + fs.unlinkSync(enJsonSymlink) +} + +// Check for endangered l10ns before cleaning +const endangeredDetails = hasEndangeredL10ns(L10N_CAGE_DIR) +if (endangeredDetails) { + console.error('\n🚨 WARNING: Endangered l10ns detected!') + console.error('The l10n cage contains uncommitted changes or unpushed commits that may be lost.') + console.error('\nDetails:') + console.error(endangeredDetails) + console.error('\nTo force clean anyway (MAY LOSE DATA):') + console.error(` rm -rf ${L10N_CAGE_DIR}`) + process.exit(1) +} + removeMultiple( [ './src/lib/paraglide', @@ -17,12 +37,9 @@ removeMultiple( '.svelte-kit', '.netlify/functions-internal', './static/pagefind', - // Our L10N generated files + // Our L10N generated files (keep old dir for migration compatibility) './src/temp/translations', - './cache/l10n', - '.setup-cache', - '.setup-cache.json', - '.inlang-settings-cache.json' + L10N_CAGE_DIR ], /* description */ undefined, /* verbose */ true diff --git a/scripts/filter-build-log.js b/scripts/filter-build-log.js index f476beac4..9c3bb84f4 100755 --- a/scripts/filter-build-log.js +++ b/scripts/filter-build-log.js @@ -12,77 +12,88 @@ import { spawn } from 'child_process' import readline from 'readline' -// Default to running build:dev if no command provided -const buildCommand = process.argv[2] || 'npm run build:dev' - -// Split the command into program and args -const [program, ...args] = buildCommand.split(' ') - -// Stats counters -let clientChunks = 0 -let serverChunks = 0 -let clientSize = 0 -let serverSize = 0 - -// Chunk pattern - match svelte-kit output chunks with size info -const chunkPattern = /\.svelte-kit\/output\/(client|server)\/.*\s+(\d+\.\d+)\s+kB/ - -// Start the build process -const build = spawn(program, args, { shell: true, stdio: ['inherit', 'pipe', 'inherit'] }) - -// Create a readline interface to process the output line by line -const rl = readline.createInterface({ - input: build.stdout, - crlfDelay: Infinity -}) - -// Process each line -rl.on('line', (line) => { - // Check if this is a chunk line - const chunkMatch = chunkPattern.exec(line) - if (chunkMatch) { - // This is a chunk line - extract information but don't print it - const [_, type, sizeStr] = chunkMatch - const size = parseFloat(sizeStr) - - // Classify chunks solely based on their paths - if (type === 'client') { - clientChunks++ - clientSize += size - } else if (type === 'server') { - serverChunks++ - serverSize += size +// Check if --verbose is anywhere in the command line +const hasVerbose = process.argv.includes('--verbose') + +if (hasVerbose) { + // Just run the command with no filtering + const buildCommand = process.argv[2] + const [program, ...args] = buildCommand.split(' ') + const build = spawn(program, args, { shell: true, stdio: 'inherit' }) + build.on('exit', (code) => process.exit(code)) +} else { + // Default to running build:dev if no command provided + const buildCommand = process.argv[2] || 'npm run build:dev' + + // Split the command into program and args + const [program, ...args] = buildCommand.split(' ') + + // Stats counters + let clientChunks = 0 + let serverChunks = 0 + let clientSize = 0 + let serverSize = 0 + + // Chunk pattern - match svelte-kit output chunks with size info + const chunkPattern = /\.svelte-kit\/output\/(client|server)\/.*\s+(\d+\.\d+)\s+kB/ + + // Start the build process + const build = spawn(program, args, { shell: true, stdio: ['inherit', 'pipe', 'inherit'] }) + + // Create a readline interface to process the output line by line + const rl = readline.createInterface({ + input: build.stdout, + crlfDelay: Infinity + }) + + // Process each line + rl.on('line', (line) => { + // Check if this is a chunk line + const chunkMatch = chunkPattern.exec(line) + if (chunkMatch) { + // This is a chunk line - extract information but don't print it + const [_, type, sizeStr] = chunkMatch + const size = parseFloat(sizeStr) + + // Classify chunks solely based on their paths + if (type === 'client') { + clientChunks++ + clientSize += size + } else if (type === 'server') { + serverChunks++ + serverSize += size + } + + // Don't output the chunk line + return } - // Don't output the chunk line - return - } - - // Pass through all other lines - console.log(line) -}) - -// Print the final summary when the process exits -build.on('exit', (code) => { - // Print the complete build summary - if (clientChunks > 0 || serverChunks > 0) { - console.log(`\n📊 Complete build summary:`) - - if (clientChunks > 0) { - console.log(` Client: ${clientChunks} chunks (${clientSize.toFixed(2)} kB)`) - } - - if (serverChunks > 0) { - console.log(` Server: ${serverChunks} chunks (${serverSize.toFixed(2)} kB)`) - } - - if (clientChunks > 0 && serverChunks > 0) { - console.log( - ` Total: ${clientChunks + serverChunks} chunks (${(clientSize + serverSize).toFixed(2)} kB)` - ) + // Pass through all other lines + console.log(line) + }) + + // Print the final summary when the process exits + build.on('exit', (code) => { + // Print the complete build summary + if (clientChunks > 0 || serverChunks > 0) { + console.log(`\n📊 Complete build summary:`) + + if (clientChunks > 0) { + console.log(` Client: ${clientChunks} chunks (${clientSize.toFixed(2)} kB)`) + } + + if (serverChunks > 0) { + console.log(` Server: ${serverChunks} chunks (${serverSize.toFixed(2)} kB)`) + } + + if (clientChunks > 0 && serverChunks > 0) { + console.log( + ` Total: ${clientChunks + serverChunks} chunks (${(clientSize + serverSize).toFixed(2)} kB)` + ) + } } - } - console.log(`\n🏁 Build process completed with code ${code}`) - process.exit(code) -}) + console.log(`\n🏁 Build process completed with code ${code}`) + process.exit(code) + }) +} diff --git a/scripts/inlang-settings.ts b/scripts/inlang-settings.ts index a50221605..01fc1522d 100644 --- a/scripts/inlang-settings.ts +++ b/scripts/inlang-settings.ts @@ -5,14 +5,14 @@ import path from 'path' import { getDevContext, possiblyOverriddenLocales } from '../src/lib/env' import { getDefaultSettings, - L10NS_BASE_DIR, + L10N_CAGE_DIR, MARKDOWN_L10NS, MESSAGE_L10NS, MESSAGE_SOURCE, writeSettingsFile } from '../src/lib/l10n' -import { setupTranslationRepo } from './translation/git-ops' -import { createSymlinkIfNeeded, ensureDirectoriesExist } from './translation/utils' +import { setupL10nCage } from './l10n/git-ops' +import { cullCommentary, createSymlinkIfNeeded, ensureDirectoriesExist } from './l10n/utils' // Load environment variables from .env file dotenv.config() @@ -60,15 +60,15 @@ function regenerateSettings(verbose = false): void { console.log(`Using locales: ${settings.locales.join(', ')}`) } - // Determine if we're allowing translation generation based on API key presence + // Determine if we're allowing l10n generation based on API key presence const allowGeneration = !!process.env.TRANSLATION_OPENROUTER_API_KEY if (verbose) { - console.log(`\ud83e\udd16 Translation generation: ${allowGeneration ? 'ENABLED' : 'DISABLED'}`) + console.log(`\ud83e\udd16 L10n generation: ${allowGeneration ? 'ENABLED' : 'DISABLED'}`) } // Create required directories if (verbose) console.log('\n\ud83d\udcc1 Creating required directories...') - ensureDirectoriesExist([L10NS_BASE_DIR, MESSAGE_L10NS, MARKDOWN_L10NS], verbose) + ensureDirectoriesExist([L10N_CAGE_DIR, MESSAGE_L10NS, MARKDOWN_L10NS], verbose) // Create locale-specific directories settings.locales.forEach((locale) => { @@ -82,17 +82,18 @@ function regenerateSettings(verbose = false): void { // Skip repository setup if we're only using English if (settings.locales.length === 1 && settings.locales[0] === 'en') { if (verbose) { - console.log( - "\n\ud83d\udcdd Translation repository setup skipped - English-only mode doesn't need translations" - ) + console.log("\n\ud83d\udcdd L10n cage setup skipped - English-only mode doesn't need l10ns") } } else { - // Clone or update the translation repository + // Clone or update the l10n cage if (verbose) - console.log( - `\n\ud83d\udd04 Setting up translation repository (need at least ${settings.locales}...` - ) - setupTranslationRepo(L10NS_BASE_DIR, verbose) + console.log(`\n\ud83d\udd04 Setting up l10n cage (need at least ${settings.locales}...`) + setupL10nCage(L10N_CAGE_DIR, verbose) + if (verbose) console.log(`\n🧹 Cleaning up l10n files to remove LLM commentary...`) + for (const locale of settings.locales) { + if (locale === 'en') continue + cullCommentary(path.join(MESSAGE_L10NS, `${locale}.json`), verbose) + } } // For English locale, we only need to provide messages file for Paraglide diff --git a/scripts/translation/additions.ts b/scripts/l10n/additions.ts similarity index 100% rename from scripts/translation/additions.ts rename to scripts/l10n/additions.ts diff --git a/scripts/l10n/branch-safety.test.ts b/scripts/l10n/branch-safety.test.ts new file mode 100644 index 000000000..455777dba --- /dev/null +++ b/scripts/l10n/branch-safety.test.ts @@ -0,0 +1,114 @@ +import { describe, it, expect, beforeEach, afterEach } from 'vitest' +import { l10nCageBranch, validateBranchForWrite } from './branch-safety' + +describe('Branch Safety', () => { + // Save original env values + const originalEnv = { + CI: process.env.CI, + L10N_BRANCH: process.env.L10N_BRANCH, + BRANCH: process.env.BRANCH, + REVIEW_ID: process.env.REVIEW_ID + } + + beforeEach(() => { + // Clean environment for tests + delete process.env.CI + delete process.env.L10N_BRANCH + delete process.env.BRANCH + delete process.env.REVIEW_ID + }) + + afterEach(() => { + // Restore original env + Object.entries(originalEnv).forEach(([key, value]) => { + if (value === undefined) { + delete process.env[key] + } else { + process.env[key] = value + } + }) + }) + + describe('L10N_BRANCH override', () => { + it('detects current git branch without override', () => { + const branch = l10nCageBranch() + // Should be the current git branch (l10-preview or whatever is current) + expect(branch).toBeTruthy() + expect(typeof branch).toBe('string') + }) + + it('uses L10N_BRANCH when set', () => { + process.env.L10N_BRANCH = 'my-custom-branch' + const branch = l10nCageBranch() + expect(branch).toBe('my-custom-branch') + }) + + it('allows writing to custom branch', () => { + process.env.L10N_BRANCH = 'my-custom-branch' + const branch = l10nCageBranch() + expect(() => validateBranchForWrite(branch)).not.toThrow() + }) + + it('changes branch when L10N_BRANCH changes', () => { + process.env.L10N_BRANCH = 'first-branch' + expect(l10nCageBranch()).toBe('first-branch') + + process.env.L10N_BRANCH = 'second-branch' + expect(l10nCageBranch()).toBe('second-branch') + }) + }) + + describe('Main branch write protection', () => { + it('blocks local development from writing to main', () => { + process.env.L10N_BRANCH = 'main' + expect(() => { + validateBranchForWrite('main') + }).toThrow('Cannot write to main l10n branch from local development') + }) + + it('allows local development to write to feature branches', () => { + process.env.L10N_BRANCH = 'feature-branch' + const branch = l10nCageBranch() + expect(() => validateBranchForWrite(branch)).not.toThrow() + }) + + it('allows CI to write to main', () => { + process.env.CI = 'true' + process.env.L10N_BRANCH = 'main' + const branch = l10nCageBranch() + expect(() => validateBranchForWrite(branch)).not.toThrow() + }) + }) + + describe('CI environment detection', () => { + beforeEach(() => { + process.env.CI = 'true' + }) + + it('detects Netlify PR preview branches', () => { + process.env.REVIEW_ID = '456' + const branch = l10nCageBranch() + expect(branch).toBe('pr-456') + }) + + it('detects Netlify branch deploys', () => { + process.env.BRANCH = 'staging' + const branch = l10nCageBranch() + expect(branch).toBe('staging') + }) + + it('falls back to main in CI without branch info', () => { + // CI=true but no REVIEW_ID or BRANCH + const branch = l10nCageBranch() + expect(branch).toBe('main') + }) + + it('prioritizes L10N_BRANCH over CI detection', () => { + process.env.L10N_BRANCH = 'override-branch' + process.env.REVIEW_ID = '789' + process.env.BRANCH = 'other-branch' + const branch = l10nCageBranch() + expect(branch).toBe('override-branch') + }) + }) +}) diff --git a/scripts/l10n/branch-safety.ts b/scripts/l10n/branch-safety.ts new file mode 100644 index 000000000..90f882ce9 --- /dev/null +++ b/scripts/l10n/branch-safety.ts @@ -0,0 +1,278 @@ +/** + * L10n branch safety and Git authentication utilities + * + * This module provides branch safety validation and Git operations + * to ensure l10n operations don't interfere with main branch workflows. + */ + +import fs from 'fs' +import path from 'path' +import { execSync } from 'child_process' +import { SimpleGit } from 'simple-git' + +/** + * Get the current Git branch name of the pauseai-website working directory + * @returns The current branch name or 'main' as fallback + */ +function currentWebsiteBranch(): string { + try { + const branch = execSync('git rev-parse --abbrev-ref HEAD', { encoding: 'utf8' }).trim() + return branch || 'main' + } catch (error) { + console.warn('Failed to detect current Git branch, defaulting to main') + return 'main' + } +} + +/** + * Determine which l10n cage branch to use based on environment + * @returns The l10n cage branch name + */ +export function l10nCageBranch(): string { + // 1. Explicit override via environment variable + if (process.env.L10N_BRANCH) { + return process.env.L10N_BRANCH + } + + // 2. CI environment detection + if (process.env.CI === 'true') { + // Netlify PR preview + if (process.env.REVIEW_ID) { + return `pr-${process.env.REVIEW_ID}` + } + // Netlify branch deploy or other CI + if (process.env.BRANCH) { + return process.env.BRANCH + } + // Fallback for CI without branch info + return 'main' + } + + // 3. Local development - use current Git branch + return currentWebsiteBranch() +} + +/** + * Validate if writing to the l10n branch is allowed + * @param branch The branch to validate + * @throws Error if attempting to write to main from local development + */ +export function validateBranchForWrite(branch: string): void { + // Prevent local development from writing to main + if (process.env.CI !== 'true' && branch === 'main') { + throw new Error( + 'Cannot write to main l10n branch from local development.\n' + + 'Options:\n' + + '1. Work on a feature branch (recommended)\n' + + '2. Set L10N_BRANCH= in .env\n' + + '3. Use read-only mode by commenting out L10N_OPENROUTER_API_KEY' + ) + } +} + +/** + * Set up branch and tracking for the l10n cage + * @param cageDir Directory of the l10n cage + * @param branch The branch to set up + * @param verbose Whether to log detailed output + */ +function setupBranchAndTracking(cageDir: string, branch: string, verbose: boolean): void { + try { + // Check if the branch exists locally + const localBranchExists = + execSync( + `cd ${cageDir} && git rev-parse --verify ${branch} 2>/dev/null || echo "not found"`, + { encoding: 'utf8' } + ).trim() !== 'not found' + + if (!localBranchExists) { + // Check if remote branch exists + const remoteBranchExists = + execSync(`cd ${cageDir} && git ls-remote --heads origin ${branch} | wc -l`, { + encoding: 'utf8' + }).trim() !== '0' + + if (remoteBranchExists) { + // Create local branch from remote (automatically sets up tracking) + execSync(`cd ${cageDir} && git checkout -b ${branch} origin/${branch}`, { + stdio: verbose ? 'inherit' : 'ignore' + }) + if (verbose) console.log(` ✓ Created and switched to ${branch} branch from remote`) + } else { + // Create new local branch (NO upstream setup yet) + execSync(`cd ${cageDir} && git checkout -b ${branch}`, { + stdio: verbose ? 'inherit' : 'ignore' + }) + if (verbose) + console.log(` ✓ Created new ${branch} branch (upstream will be set on first push)`) + } + } else { + // Branch exists, just checkout + execSync(`cd ${cageDir} && git checkout ${branch}`, { stdio: verbose ? 'inherit' : 'ignore' }) + if (verbose) console.log(` ✓ Switched to ${branch} branch`) + } + + // Only set upstream if remote branch actually exists + const remoteBranchExists = + execSync(`cd ${cageDir} && git ls-remote --heads origin ${branch} | wc -l`, { + encoding: 'utf8' + }).trim() !== '0' + if (remoteBranchExists) { + try { + const currentUpstream = execSync( + `cd ${cageDir} && git rev-parse --abbrev-ref ${branch}@{upstream} 2>/dev/null || echo "none"`, + { encoding: 'utf8' } + ).trim() + if (currentUpstream === 'none') { + execSync(`cd ${cageDir} && git branch --set-upstream-to=origin/${branch} ${branch}`, { + stdio: 'ignore' + }) + if (verbose) console.log(` ✓ Set up tracking for ${branch}`) + } + } catch (e) { + // Tracking setup failed, but this is not critical + if (verbose) console.log(' ⚠️ Could not set up tracking (continuing anyway)') + } + } + } catch (e) { + if (verbose) console.log(` ℹ️ Could not switch to ${branch}, staying on current branch`) + } +} + +/** + * Check if the l10n cage has uncommitted changes or unpushed commits + * @param cageDir Directory of the l10n cage + * @returns Details string if there are endangered l10ns, empty string if safe + */ +export function hasEndangeredL10ns(cageDir: string): string { + try { + // Check if cage directory and git repo exist + if (!fs.existsSync(path.join(cageDir, '.git'))) { + return '' + } + + const details: string[] = [] + + // Check for uncommitted changes + const statusOutput = execSync(`cd ${cageDir} && git status --porcelain`, { encoding: 'utf8' }) + if (statusOutput.trim()) { + const lines = statusOutput.trim().split('\n') + details.push(`• ${lines.length} uncommitted file(s):`) + lines.slice(0, 5).forEach((line) => { + const status = line.substring(0, 2) + const file = line.substring(3) + const statusDesc = status.includes('M') + ? 'modified' + : status.includes('A') + ? 'added' + : status.includes('D') + ? 'deleted' + : 'changed' + details.push(` - ${file} (${statusDesc})`) + }) + if (lines.length > 5) { + details.push(` ... and ${lines.length - 5} more`) + } + } + + // Check for unpushed commits (if upstream exists) + try { + const unpushedCount = execSync(`cd ${cageDir} && git rev-list @{u}..HEAD --count`, { + encoding: 'utf8' + }) + const count = parseInt(unpushedCount.trim()) + if (count > 0) { + details.push(`• ${count} unpushed commit(s)`) + + // Show recent unpushed commit messages + const recentCommits = execSync(`cd ${cageDir} && git log @{u}..HEAD --oneline -n 3`, { + encoding: 'utf8' + }) + if (recentCommits.trim()) { + details.push(' Recent commits:') + recentCommits + .trim() + .split('\n') + .forEach((commit) => { + details.push(` - ${commit}`) + }) + } + } + } catch { + // No upstream set up or other git error - not necessarily dangerous + // Only report if we already found uncommitted changes + } + + return details.join('\n') + } catch (error) { + // Git error, assume safe to avoid blocking normal operations + return '' + } +} + +/** + * Push changes to remote, automatically setting upstream for new branches + * @param git SimpleGit instance for the repository + * @param verbose Whether to log detailed output + */ +export async function pushWithUpstream(git: SimpleGit, verbose = false): Promise { + try { + // Check if current branch has upstream tracking + await git.raw(['rev-parse', '--abbrev-ref', '@{upstream}']) + // Upstream exists, use normal push + await git.push() + if (verbose) console.log(' ✓ Pushed to existing upstream') + } catch (e) { + // No upstream, use push with --set-upstream + const currentBranch = await git.revparse(['--abbrev-ref', 'HEAD']) + if (verbose) console.log(` ✓ Setting upstream for new branch: ${currentBranch}`) + await git.push(['--set-upstream', 'origin', currentBranch]) + } +} + +/** + * Check if Git credentials work for pushing to remote by testing them now + * @param cageDir Path to the l10n cage directory + * @returns true if credentials work + */ +export function canPushToRemote(cageDir: string): boolean { + try { + // Configure Git for seamless operation + execSync(`cd ${cageDir} && git config push.autoSetupRemote true`, { stdio: 'pipe' }) + + // Check current remote URL + const currentRemote = execSync(`cd ${cageDir} && git remote get-url origin`, { + encoding: 'utf8' + }).trim() + + // If using HTTPS, try switching to SSH (which often works better) + if (currentRemote.startsWith('https://github.com/')) { + const sshUrl = currentRemote.replace('https://github.com/', 'git@github.com:') + try { + // Test SSH access quietly first + execSync(`cd ${cageDir} && git ls-remote ${sshUrl}`, { stdio: 'pipe', timeout: 5000 }) + // SSH works, switch to it + execSync(`cd ${cageDir} && git remote set-url origin ${sshUrl}`, { stdio: 'pipe' }) + console.log('🔧 Switched to SSH authentication (more reliable than HTTPS)') + } catch { + // SSH doesn't work, stick with HTTPS + console.log('🔐 Testing Git push credentials...') + console.log(' If prompted, use your Personal Access Token as the password') + console.log(' Create a token at: https://github.com/settings/tokens') + } + } + + // Test credentials with current URL (SSH or HTTPS) + execSync(`cd ${cageDir} && git push --dry-run`, { + stdio: 'inherit' + }) + + console.log('✅ Git push access verified') + return true + } catch (error) { + return false + } +} + +// Export setupBranchAndTracking for internal use by git-ops +export { setupBranchAndTracking } diff --git a/scripts/translation/dry-run.ts b/scripts/l10n/dry-run.ts similarity index 79% rename from scripts/translation/dry-run.ts rename to scripts/l10n/dry-run.ts index 15a0f09c0..555aa85c5 100644 --- a/scripts/translation/dry-run.ts +++ b/scripts/l10n/dry-run.ts @@ -1,5 +1,5 @@ /** - * This file handles the dry run mode functionality for the translation process. + * This file handles the dry run mode functionality for the l10n process. * It allows cost estimation and reporting without making actual API calls. */ @@ -17,8 +17,8 @@ export const MODEL_CONFIGS = { PROMPT_OVERHEAD_WORDS: 300, // Markdown formatting overhead (percentage of content words) MARKDOWN_OVERHEAD_PERCENT: 15, - // Translation quality description - TRANSLATION_QUALITY: 'High-quality for most language pairs; excellent with context' + // L10n quality description + L10N_QUALITY: 'High-quality for most language pairs; excellent with context' } // Additional models can be added here as needed } @@ -27,8 +27,8 @@ export const MODEL_CONFIGS = { const DEFAULT_MODEL = 'meta-llama/llama-3.1-405b-instruct' // Type definitions for statistics collection -export type TranslationStats = { - filesToTranslate: number +export type Stats = { + l10nsToCapture: number totalWordCount: number contentWordCount: number overheadWordCount: number @@ -44,8 +44,8 @@ export type TranslationStats = { } // Initialize statistics object -export const createDryRunStats = (): TranslationStats => ({ - filesToTranslate: 0, +export const createDryRunStats = (): Stats => ({ + l10nsToCapture: 0, totalWordCount: 0, contentWordCount: 0, overheadWordCount: 0, @@ -64,16 +64,16 @@ function countWords(text: string): number { } /** - * Track content that would be translated in dry run mode + * Track content that would be localized in dry run mode * * @param stats - The stats object to update - * @param content - The content that would be translated + * @param content - The content that would be localized * @param language - The target language * @param filePath - Optional file path for reporting * @param modelName - Optional model name to use (defaults to DEFAULT_MODEL) */ -export function trackTranslation( - stats: TranslationStats, +export function trackL10n( + stats: Stats, content: string, language: string, filePath?: string, @@ -92,8 +92,8 @@ export function trackTranslation( ) const totalOverheadWords = promptOverheadWords + markdownOverheadWords - // Calculate total words for the translation task - // For two-pass translation: original content + first pass + review + // Calculate total words for the l10n task + // For two-pass l10n: original content + first pass + review const firstPassWords = contentWordCount + totalOverheadWords const reviewPassWords = contentWordCount + totalOverheadWords + contentWordCount const totalWords = firstPassWords + reviewPassWords @@ -120,7 +120,7 @@ export function trackTranslation( stats.byLanguage[language].estimatedCost += estimatedCost if (filePath) { - stats.filesToTranslate++ + stats.l10nsToCapture++ stats.byLanguage[language].files.push(fileName) } } @@ -130,20 +130,20 @@ export function trackTranslation( * * @param stats - The statistics object * @param verbose - Whether to print detailed information - * @param cacheCount - Number of cached files (not needing translation) + * @param cacheCount - Number of cached files (not needing l10n) * @param modelName - Optional model name to display (defaults to DEFAULT_MODEL) */ export function printDryRunSummary( - stats: TranslationStats, + stats: Stats, verbose = false, cacheCount = 0, modelName: string = DEFAULT_MODEL ): void { const modelConfig = MODEL_CONFIGS[modelName] || MODEL_CONFIGS[DEFAULT_MODEL] - console.log('\n=== DRY RUN TRANSLATION SUMMARY ===') + console.log('\n=== DRY RUN L10N SUMMARY ===') console.log(`Model: ${modelName}`) - console.log(`Files to translate: ${stats.filesToTranslate}`) + console.log(`L10ns to capture: ${stats.l10nsToCapture}`) console.log(`Files using cache: ${cacheCount}`) // Word count breakdown @@ -151,13 +151,11 @@ export function printDryRunSummary( console.log( `Overhead word count: ${stats.overheadWordCount.toLocaleString()} (prompt instructions and formatting)` ) - console.log( - `Total word count: ${stats.totalWordCount.toLocaleString()} (includes two-pass translation)` - ) + console.log(`Total word count: ${stats.totalWordCount.toLocaleString()} (includes two-pass l10n)`) // Cost in 1000-word units const wordUnits = stats.totalWordCount / 1000 - console.log(`Translation workload: ${wordUnits.toFixed(2)} thousand-word units`) + console.log(`L10n workload: ${wordUnits.toFixed(2)} thousand-word units`) console.log(`Estimated cost: $${stats.estimatedCost.toFixed(2)}`) if (modelConfig.COST_PER_1000_WORDS === 0) { @@ -173,13 +171,13 @@ export function printDryRunSummary( ) if (verbose) { - console.log(' Files to translate:') + console.log(' L10ns to capture:') langStats.files.forEach((file) => { console.log(` - ${file}`) }) } }) - console.log('\nNote: This is a dry run - no translations were performed') + console.log('\nNote: This is a dry run - no l10ns were captured') console.log('===================================\n') } diff --git a/scripts/l10n/force.test.ts b/scripts/l10n/force.test.ts new file mode 100644 index 000000000..d07222ef4 --- /dev/null +++ b/scripts/l10n/force.test.ts @@ -0,0 +1,143 @@ +/** + * Integration tests for --force mode functionality + * Tests the actual force pattern resolution against the real filesystem + */ + +import { describe, it, expect, beforeAll } from 'vitest' +import { existsSync } from 'fs' +import { resolve as resolveForcePatterns } from './force' + +describe('Force mode integration tests', () => { + const MARKDOWN_SOURCE = 'src/posts' + + // Files we expect to exist and use in our tests + const expectedFiles = { + messages: 'messages/en.json', + posts: { + 'join.md': 'src/posts/join.md', + 'donate.md': 'src/posts/donate.md', + 'learn.md': 'src/posts/learn.md', + 'action.md': 'src/posts/action.md', + 'faq.md': 'src/posts/faq.md', + '2024-november.md': 'src/posts/2024-november.md', + '2024-february.md': 'src/posts/2024-february.md' + } + } + + beforeAll(() => { + // Verify our test assumptions - these files should exist + expect(existsSync(expectedFiles.messages), `Expected ${expectedFiles.messages} to exist`).toBe( + true + ) + + for (const [filename, filepath] of Object.entries(expectedFiles.posts)) { + expect(existsSync(filepath), `Expected ${filepath} to exist for testing`).toBe(true) + } + }) + + describe('Pattern resolution against real filesystem', () => { + it('resolves exact filename patterns', async () => { + const result = await resolveForcePatterns(['join.md'], MARKDOWN_SOURCE) + expect(result).toContain('join.md') + expect(result).not.toContain('donate.md') + expect(result.length).toBe(1) + }) + + it('resolves multiple exact filenames', async () => { + const result = await resolveForcePatterns( + ['join.md', 'donate.md', 'learn.md'], + MARKDOWN_SOURCE + ) + expect(result).toContain('join.md') + expect(result).toContain('donate.md') + expect(result).toContain('learn.md') + expect(result).not.toContain('action.md') + }) + + it('resolves "*" to include all source files', async () => { + const result = await resolveForcePatterns(['*'], MARKDOWN_SOURCE) + expect(result).toContain('en.json') + expect(result).toContain('join.md') + expect(result).toContain('donate.md') + expect(result).toContain('2024-november.md') + // Should have many files + expect(result.length).toBeGreaterThan(10) + }) + + it('resolves prefix patterns like "2024-*"', async () => { + const result = await resolveForcePatterns(['2024-*'], MARKDOWN_SOURCE) + expect(result).toContain('2024-november.md') + expect(result).toContain('2024-february.md') + expect(result).not.toContain('join.md') + expect(result).not.toContain('en.json') + // All results should start with "2024-" + expect(result.every((f) => f.startsWith('2024-'))).toBe(true) + }) + + it('combines different pattern types', async () => { + const result = await resolveForcePatterns(['en.json', '2024-*', 'donate.md'], MARKDOWN_SOURCE) + expect(result).toContain('en.json') + expect(result).toContain('donate.md') + expect(result).toContain('2024-november.md') + expect(result).toContain('2024-february.md') + }) + + it('handles non-existent files gracefully', async () => { + const result = await resolveForcePatterns(['nonexistent.md', 'join.md'], MARKDOWN_SOURCE) + expect(result).toContain('join.md') + expect(result).not.toContain('nonexistent.md') + expect(result.length).toBe(1) + }) + + it('returns empty array for no matches', async () => { + const result = await resolveForcePatterns(['xyz-nonexistent-*'], MARKDOWN_SOURCE) + expect(result).toEqual([]) + }) + + it('deduplicates when patterns overlap', async () => { + const result = await resolveForcePatterns(['join.md', 'join.md', 'joi*'], MARKDOWN_SOURCE) + const joinCount = result.filter((f) => f === 'join.md').length + expect(joinCount).toBe(1) + }) + + it('handles empty pattern array', async () => { + const result = await resolveForcePatterns([], MARKDOWN_SOURCE) + expect(result).toEqual([]) + }) + + // Test that glob patterns now work + it('supports *.md pattern', async () => { + const result = await resolveForcePatterns(['*.md'], MARKDOWN_SOURCE) + expect(result.length).toBeGreaterThan(0) + expect(result.every((f) => f.endsWith('.md'))).toBe(true) + expect(result).not.toContain('en.json') + }) + + it('supports patterns with extensions like 2024-*.md', async () => { + const result = await resolveForcePatterns(['2024-*.md'], MARKDOWN_SOURCE) + expect(result.length).toBeGreaterThan(0) + expect(result.every((f) => f.startsWith('2024-') && f.endsWith('.md'))).toBe(true) + }) + }) + + describe('Additional glob patterns', () => { + it('should support **/*.md for recursive markdown files', async () => { + const result = await resolveForcePatterns(['**/*.md'], MARKDOWN_SOURCE) + expect(result.length).toBeGreaterThan(0) + expect(result.every((f) => f.endsWith('.md'))).toBe(true) + expect(result).toContain('join.md') + }) + + it('should support character classes like 202[34]-*.md', async () => { + const result = await resolveForcePatterns(['202[34]-*.md'], MARKDOWN_SOURCE) + expect(result).toContain('2024-november.md') + expect(result).toContain('2023-november-uk.md') + expect(result).not.toContain('2025-something.md') + }) + + it('should support brace expansion like {join,donate,learn}.md', async () => { + const result = await resolveForcePatterns(['{join,donate,learn}.md'], MARKDOWN_SOURCE) + expect(result.sort()).toEqual(['donate.md', 'join.md', 'learn.md']) + }) + }) +}) diff --git a/scripts/l10n/force.ts b/scripts/l10n/force.ts new file mode 100644 index 000000000..e8747d10a --- /dev/null +++ b/scripts/l10n/force.ts @@ -0,0 +1,58 @@ +/** + * Force mode functionality for l10n + */ + +import { readdir } from 'fs/promises' +import { minimatch } from 'minimatch' + +/** + * Display help for force mode usage + */ +export function showForceHelp(): void { + console.error('🔄 Force mode requires pattern(s)') + console.error('\nUsage:') + console.error(' pnpm l10n --force "*" # Force all files') + console.error(' pnpm l10n --force en.json # Force specific file') + console.error(' pnpm l10n --force "*.md" # All markdown files') + console.error(' pnpm l10n --force "2024-*" # Files starting with "2024-"') + console.error(' pnpm l10n --force en.json join.md # Multiple patterns') + console.error('\nSupported patterns (using minimatch glob syntax):') + console.error(' - Exact match: en.json, learn.md') + console.error(' - Wildcards: *.md, 2024-*.md') + console.error(' - Character classes: 202[34]-*.md') + console.error(' - Brace expansion: {join,donate,learn}.md') + console.error('\nPatterns match against:') + console.error(' - messages/en.json') + console.error(' - src/posts/*.md') +} + +/** + * Resolves force patterns to actual source files + * @param patterns - Array of patterns to match against source files + * @param markdownSource - Path to markdown source directory + * @returns Array of source file paths that match the patterns + */ +export async function resolve(patterns: string[], markdownSource: string): Promise { + const sourceFiles: string[] = [] + + // Add message file + sourceFiles.push('en.json') + + // Add all markdown files from posts + try { + const markdownFiles = await readdir(markdownSource) + for (const file of markdownFiles) { + if (file.endsWith('.md')) { + sourceFiles.push(file) + } + } + } catch (error) { + console.warn(`⚠️ Could not read markdown source directory: ${markdownSource}`) + } + + // Match patterns against source files + const matches = patterns.flatMap((pattern) => + sourceFiles.filter((file) => minimatch(file, pattern)) + ) + return [...new Set(matches)] +} diff --git a/scripts/l10n/git-ops.ts b/scripts/l10n/git-ops.ts new file mode 100644 index 000000000..0d945865b --- /dev/null +++ b/scripts/l10n/git-ops.ts @@ -0,0 +1,213 @@ +/** + * Git operations for l10n management + * Handles l10n cage operations, commit tracking, and other Git operations + */ + +import fs from 'fs' +import fsPromises from 'fs/promises' +import path from 'path' +import { execSync } from 'child_process' +import simpleGit, { SimpleGit, SimpleGitOptions } from 'simple-git' +import { L10N_CAGE_DIR } from '../../src/lib/l10n' +import { ensureDirectoryExists } from './utils' +import { l10nCageBranch, validateBranchForWrite, setupBranchAndTracking } from './branch-safety' + +/** + * Configuration for Git operations + */ +export const GIT_CONFIG = { + EMAIL: 'example@example.com', + MAX_CONCURRENT_PROCESSES: 8, + USERNAME: 'L10nKeeper' +} + +// L10n cage URL (public access) +export const L10N_CAGE_URL = 'github.com/PauseAI/paraglide' + +/** + * Creates a SimpleGit instance with configured options + * + * @returns A configured SimpleGit instance + */ +export function createGitClient(): SimpleGit { + const gitOptions: Partial = { + maxConcurrentProcesses: GIT_CONFIG.MAX_CONCURRENT_PROCESSES + } + return simpleGit(gitOptions) +} + +/** + * Initialize or update the l10n cage repository + * Can be called directly to manage the l10n cage + * + * @param cageDir Directory where the l10n cage should be + * @param verbose Whether to log detailed output + * @param branch Optional branch to use (defaults to dynamic detection) + * @returns Success status + */ +export function setupL10nCage(cageDir: string, verbose = false, branch?: string): boolean { + try { + // Use provided branch or detect it + const targetBranch = branch || l10nCageBranch() + + // Log the branch being used + if (verbose) { + console.log(` ℹ️ Using l10n branch: ${targetBranch}`) + } + + // Check if the directory exists and is a git repo + if (fs.existsSync(path.join(cageDir, '.git'))) { + if (verbose) { + console.log(' ✓ L10n cage already exists, pulling latest changes...') + } + + // Ensure we're on the correct branch before pulling + setupBranchAndTracking(cageDir, targetBranch, verbose) + + // Pull latest changes (if upstream exists) + try { + // Check if current branch has upstream tracking + const hasUpstream = + execSync( + `cd ${cageDir} && git rev-parse --abbrev-ref @{upstream} 2>/dev/null || echo "none"`, + { encoding: 'utf8' } + ).trim() !== 'none' + + if (hasUpstream) { + execSync(`cd ${cageDir} && git pull`, { stdio: verbose ? 'inherit' : 'ignore' }) + if (verbose) console.log(' ✓ Updated l10n cage') + } else { + if (verbose) { + console.log(' ℹ️ Branch has no upstream yet (this is normal for new branches)') + console.log(' The upstream will be set automatically on first push') + } + } + } catch (e) { + if (verbose) console.log(' ⚠️ Could not pull latest changes: ' + (e as Error).message) + } + } else { + // Clone the l10n cage + if (verbose) console.log(' ✓ Cloning l10n cage repository...') + + // If directory exists but isn't a git repo, remove it + if (fs.existsSync(cageDir)) { + fs.rmSync(cageDir, { recursive: true, force: true }) + } + + // Ensure parent directory exists + ensureDirectoryExists(path.dirname(cageDir), verbose) + + // Clone public l10n cage - no token needed for public repos + const gitCommand = `git clone https://${L10N_CAGE_URL}.git ${cageDir}` + execSync(gitCommand, { stdio: verbose ? 'inherit' : 'ignore' }) + if (verbose) console.log(' ✓ Cloned l10n cage repository') + + // Switch to the target branch + setupBranchAndTracking(cageDir, targetBranch, verbose) + } + return true + } catch (error) { + console.error('\n❌ FAILED TO SET UP L10N CAGE!') + console.error(` Error accessing l10n cage: ${(error as Error).message}`) + console.error('\n Options:') + console.error(' 1. Continue with English-only: Edit .env to set PARAGLIDE_LOCALES=en') + console.error(' 2. Check your internet connection and try again') + console.error(' 3. Contact the project maintainers if the issue persists') + return false + } +} + +/** + * Initializes the Git cage by removing the existing directory, + * cloning the remote l10n cage, and configuring Git user settings. + * + * @param options - An object containing the target directory, authentication token, cage URL, username, and email. + * @returns A Promise that resolves when the cage has been cloned and configured. + */ +export async function initializeGitCage(options: { + dir: string + token?: string + repo: string + username: string + email: string + git: SimpleGit + branch?: string +}): Promise { + // Determine the branch to use + const targetBranch = options.branch || l10nCageBranch() + + // Validate branch for write operations + validateBranchForWrite(targetBranch) + + // Don't delete existing cage, use setupL10nCage instead + const setupSuccess = setupL10nCage(options.dir, true, targetBranch) + if (!setupSuccess) { + throw new Error('Failed to set up l10n cage') + } + + /* Skip the destructive rm and clone - setupL10nCage already handles everything */ + + await options.git.cwd(options.dir) + + // Test if we're authenticated by checking remote URL format + try { + const remoteUrl = await options.git.remote(['get-url', 'origin']) + const isAuthenticated = remoteUrl.includes('@') + console.log(`Authentication status: ${isAuthenticated ? 'SUCCESS' : 'FAILURE'}`) + } catch (err) { + console.log(`Failed to verify authentication: ${(err as Error).message}`) + } + + // Always set git config in case we need to make local commits + await options.git.addConfig('user.name', options.username) + await options.git.addConfig('user.email', options.email) +} +/** + * Extracts the latest commit dates for each file by parsing the Git log. + * + * @param git - The SimpleGit instance used to retrieve the log. + * @param repoType - Description of the repository type for logging + * @returns A Promise that resolves to a Map where keys are file paths and values are the latest commit dates. + */ +export async function getLatestCommitDates( + git: SimpleGit, + repoType: string +): Promise> { + console.log(`Starting git log retrieval for ${repoType} commit dates...`) + const latestCommitDatesMap = new Map() + const log = await git.log({ + '--stat': 4096 + }) + for (const entry of log.all) { + const files = entry.diff?.files + if (!files) continue + for (const file of files) { + if (!latestCommitDatesMap.has(file.file)) { + latestCommitDatesMap.set(file.file, new Date(entry.date)) + } + } + } + return latestCommitDatesMap +} + +/** + * Generates an appropriate commit message based on whether the l10n file already existed. + * + * @param sourceFileName - The name of the source file. + * @param locale - The locale for the l10n. + * @param fileExists - Boolean indicating if the file existed. + * @returns The commit message. + */ +export function getCommitMessage( + sourceFileName: string, + locale: string, + fileExists: boolean +): string { + return fileExists + ? `Update outdated l10n for ${sourceFileName} in ${locale}` + : `Create new l10n for ${sourceFileName} in ${locale}` +} + +export function cleanUpGitSecrets() { + fs.unlinkSync(path.join(L10N_CAGE_DIR, '.git/config')) +} diff --git a/scripts/l10n/heart.ts b/scripts/l10n/heart.ts new file mode 100644 index 000000000..5f108d450 --- /dev/null +++ b/scripts/l10n/heart.ts @@ -0,0 +1,326 @@ +/** + * Core l10n logic + * Handles the main l10n workflow and file operations + */ + +import fs from 'fs/promises' +import fsSync from 'fs' +import path from 'path' +import PQueue from 'p-queue' +import { SimpleGit } from 'simple-git' +import { PromptGenerator } from './prompts' +import { postChatCompletion } from './llm-client' +import { getCommitMessage } from './git-ops' +import { preprocessMarkdown, postprocessMarkdown, extractWebPath, placeInCage } from './utils' +import { trackL10n, Stats } from './dry-run' + +/** + * Type definition for target path locator function + */ +export type Targeting = (locale: string, sourcePath: string) => string + +/** + * Configuration options for l10n operations + */ +export interface Options { + /** Whether to run in dry run mode (skip actual API calls) */ + isDryRun: boolean + /** Whether to output verbose logs */ + verbose: boolean + /** Axios client for LLM API requests */ + llmClient: any + /** Queue for managing API request rate limiting */ + requestQueue: PQueue + /** Queue for Git operations to prevent concurrency issues */ + gitQueue: PQueue + /** Function to generate language names from language codes */ + languageNameGenerator: Intl.DisplayNames + /** Git client for the l10n cage */ + cageGit: SimpleGit + /** Map of latest commit dates in the l10n cage */ + cageLatestCommitDates: Map + /** Map of latest commit dates in the website repository */ + websiteLatestCommitDates: Map + /** Statistics object for dry run mode */ + dryRunStats: Stats + /** List of files to force re-l10n (ignore cache) */ + forceFiles: string[] +} + +/** + * Localizes the provided content to a specified language using a two-pass process, + * or collects statistics in dry run mode without making API calls. + * + * @param content - The original content to be localized. + * @param promptGenerators - Functions for generating the l10n and review prompt. + * @param locale - The target locale code. + * @param promptAdditions - Additional context to include in the prompt. + * @param options - L10n configuration options. + * @param filePath - Optional file path for dry run statistics. + * @returns A Promise that resolves to the reviewed (final) l10n, or a placeholder in dry run mode. + * @throws {Error} If either the l10n or review pass fails (in non-dry run mode). + */ +export async function l10nFromLLM( + content: string, + promptGenerators: PromptGenerator[], + locale: string, + promptAdditions: string, + options: Options, + filePath?: string +): Promise { + const languageName = options.languageNameGenerator.of(locale) + if (!languageName) throw new Error(`Couldn't resolve locale code: ${locale}`) + + const l10nPrompt = promptGenerators[0](languageName, content, promptAdditions) + // L10n prompt ready + + // In dry run mode, collect statistics instead of making API calls + if (options.isDryRun) { + // Track what would be localized for reporting + trackL10n(options.dryRunStats, content, locale, filePath) + + if (options.verbose) { + console.log( + `🔍 [DRY RUN] Would localize ${content.length} characters to ${languageName}${filePath ? ` (${path.basename(filePath)})` : ''}` + ) + } + + // Return a placeholder for localized content + return `[DRY RUN L10N PLACEHOLDER for ${languageName}]` + } + + // Regular API-based l10n for non-dry-run mode + // First pass: generate initial l10n + const firstPass = await postChatCompletion(options.llmClient, options.requestQueue, [ + { role: 'user', content: l10nPrompt } + ]) + + if (!firstPass) throw new Error(`L10n to ${languageName} failed`) + + if (options.verbose) { + console.log('First prompt: ', l10nPrompt) + console.log('First pass response:', firstPass) + } else { + console.log( + `Completed first pass l10n to ${languageName}${filePath ? ` for ${path.basename(filePath)}` : ''}` + ) + } + + // Second pass: review and refine l10n with context + const reviewPrompt = promptGenerators[1](languageName) + const reviewed = await postChatCompletion(options.llmClient, options.requestQueue, [ + { role: 'user', content: l10nPrompt }, + { role: 'assistant', content: firstPass }, + { role: 'user', content: reviewPrompt } + ]) + + if (!reviewed) throw new Error(`Review of ${languageName} l10n failed`) + + if (options.verbose) { + console.log('Review prompt: ', reviewPrompt) + console.log('Review pass response:', reviewed) + } else { + console.log( + `Completed review pass l10n to ${languageName}${filePath ? ` for ${path.basename(filePath)}` : ''}` + ) + } + + return reviewed +} + +/** + * Retrieves localized message files using a JSON prompt generator. + * Processes the source JSON file and creates separate l10ns for each target language. + * + * @param options - An object containing the source path, locales, prompt generator, target directory, and cache working directory. + * @param options - Global l10n configuration options + * @returns A Promise that resolves with results of the operation + */ +export async function retrieveMessages( + params: { + sourcePath: string + locales: string[] + promptGenerators: PromptGenerator[] + targetDir: string + cageWorkingDir: string + logMessageFn?: (msg: string) => void + }, + options: Options +): Promise<{ cacheCount: number; totalProcessed: number }> { + const result = await retrieve( + { + sourcePaths: [params.sourcePath], + locales: params.locales, + promptGenerators: params.promptGenerators, + locateTarget: (locale) => path.join(params.targetDir, locale + '.json'), + cageWorkingDir: params.cageWorkingDir, + logMessageFn: params.logMessageFn + }, + options + ) + return { cacheCount: result.cacheCount, totalProcessed: result.totalProcessed } +} + +/** + * Retrieves localized markdown files using a Markdown prompt generator. + * Reads markdown files from the source directory and outputs localized files organized by language. + * + * @param options - An object with sourcePaths, sourceBaseDir, locales, prompt generator, target directory, and cache working directory. + * @param l10nOptions - Global l10n configuration options + * @returns A Promise that resolves with results of the operation + */ +export async function retrieveMarkdown( + params: { + sourcePaths: string[] + sourceBaseDir: string + locales: string[] + promptGenerators: PromptGenerator[] + targetDir: string + cageWorkingDir: string + logMessageFn?: (msg: string) => void + }, + options: Options +): Promise<{ cacheCount: number; totalProcessed: number }> { + const result = await retrieve( + { + sourcePaths: params.sourcePaths, + locales: params.locales, + promptGenerators: params.promptGenerators, + locateTarget: (locale, sourcePath) => { + const relativePath = path.relative(params.sourceBaseDir, sourcePath) + return path.join(params.targetDir, locale, relativePath) + }, + cageWorkingDir: params.cageWorkingDir, + logMessageFn: params.logMessageFn + }, + options + ) + return { cacheCount: result.cacheCount, totalProcessed: result.totalProcessed } +} + +/** + * Generalized function that retrieves localized files for various languages. + * It checks whether a cached l10n is up-to-date before generating a new l10n. + * + * @param options - An object containing source file paths, locales, prompt generator, target locator, and the cache working directory. + * @param l10nOptions - Global l10n configuration options + * @returns A Promise that resolves with the results of the operation + */ +export async function retrieve( + params: { + sourcePaths: string[] + locales: string[] + promptGenerators: PromptGenerator[] + locateTarget: Targeting + cageWorkingDir: string + logMessageFn?: (msg: string) => void + }, + options: Options +): Promise<{ cacheCount: number; totalProcessed: number }> { + const log = params.logMessageFn || console.log + // Log function ready + let done = 1 + let total = 0 + let cacheCount = 0 + + await Promise.all( + params.sourcePaths.map(async (sourcePath) => { + const sourceFileName = path.basename(sourcePath) + /** Backslash to forward slash to match Git log and for web path */ + const processedSourcePath = path.relative('.', sourcePath).replaceAll(/\\/g, '/') + await Promise.all( + params.locales.map(async (locale) => { + const targetPath = params.locateTarget(locale, sourcePath) + let useCachedL10n = false + let fileExists = false + + // Check if this file is being forced (skip cache) + const sourceFileName = path.basename(sourcePath) + const isForced = options.forceFiles.includes(sourceFileName) + + // Check if target file exists (needed for commit messages) + if (fsSync.existsSync(targetPath)) { + fileExists = true + } + + if (isForced) { + if (options.verbose) { + log(`🔄 Force mode: Skipping cache for ${sourceFileName}`) + } + } + + // Check if we can use the cached l10n (unless forced) + if (!isForced && fileExists) { + const sourceLatestCommitDate = options.websiteLatestCommitDates.get(processedSourcePath) + if (!sourceLatestCommitDate) { + log( + `Didn't prepare latest commit date for ${processedSourcePath}, use Cached version` + ) + useCachedL10n = true + cacheCount++ // PATCH: Count uncommitted files as cached + } else { + // Only compare dates if source has a commit date + const cageRelativePath = path + .relative(params.cageWorkingDir, targetPath) + .replaceAll(/\\/g, '/') + const cageLatestCommitDate = options.cageLatestCommitDates.get(cageRelativePath) + if (!cageLatestCommitDate) + throw new Error(`Didn't prepare latest commit date for ${targetPath}`) + if (cageLatestCommitDate > sourceLatestCommitDate) { + useCachedL10n = true + cacheCount++ + } + } + } + + // If we can't use cache, generate a new l10n + if (!useCachedL10n) { + total++ + const content = await fs.readFile(sourcePath, 'utf-8') + const processedContent = preprocessMarkdown(content) + + if (options.verbose) { + log(processedContent) + } + + const page = extractWebPath(sourcePath) + const promptAdditions = '' // Will be handled by the prompt generator + + const capturedL10n = await l10nFromLLM( + processedContent, + params.promptGenerators, + locale, + promptAdditions, + options, + sourcePath + ) + + // Only perform actual file writes and Git operations in non-dry run mode + if (!options.isDryRun) { + const groomedL10n = postprocessMarkdown(processedContent, capturedL10n) + await placeInCage(targetPath, groomedL10n) + + const message = getCommitMessage(sourceFileName, locale, fileExists) + try { + await options.gitQueue.add(() => + (fileExists ? options.cageGit : options.cageGit.add('.')).commit(message, ['-a']) + ) + } catch (e) { + if (e instanceof Error && e.message.includes('nothing to commit')) { + log(`${sourceFileName} in ${locale} didn't change`) + } else { + throw e + } + } + log(`${message} (${done++} / ${total})`) + } + } + }) + ) + }) + ) + + // Total count of processed files is the count of source files multiplied by target languages + const totalProcessed = params.sourcePaths.length * params.locales.length + return { cacheCount, totalProcessed } +} diff --git a/scripts/translation/llm-client.ts b/scripts/l10n/llm-client.ts similarity index 95% rename from scripts/translation/llm-client.ts rename to scripts/l10n/llm-client.ts index 1d94a9a89..604fb0cbf 100644 --- a/scripts/translation/llm-client.ts +++ b/scripts/l10n/llm-client.ts @@ -1,6 +1,6 @@ /** - * LLM API client for translation operations - * Handles communication with language model APIs for translation + * LLM API client for l10n operations + * Handles communication with language model APIs for l10n */ import axios from 'axios' diff --git a/scripts/l10n/mode.test.ts b/scripts/l10n/mode.test.ts new file mode 100644 index 000000000..077137544 --- /dev/null +++ b/scripts/l10n/mode.test.ts @@ -0,0 +1,295 @@ +import { describe, it, expect, beforeEach, afterEach, vi } from 'vitest' +import { Mode } from './mode' + +describe('Mode', () => { + // Save original environment + const originalEnv = { + CI: process.env.CI, + L10N_BRANCH: process.env.L10N_BRANCH + } + + beforeEach(() => { + // Clean environment before each test + delete process.env.CI + delete process.env.L10N_BRANCH + }) + + afterEach(() => { + // Restore original environment + Object.entries(originalEnv).forEach(([key, value]) => { + if (value === undefined) { + delete process.env[key] + } else { + process.env[key] = value + } + }) + }) + + describe('English-only mode', () => { + it('returns en-only mode when only English locale is configured', () => { + const mode = new Mode({ locales: ['en'] }) + expect(mode.mode).toBe('en-only') + expect(mode.canReadCache).toBe(false) + expect(mode.canWrite).toBe(false) + expect(mode.reason).toContain('Only English locale') + }) + }) + + describe('Dry run mode', () => { + it('returns dry-run mode when isDryRun is true', () => { + const mode = new Mode({ + locales: ['en', 'de'], + isDryRun: true + }) + expect(mode.mode).toBe('dry-run') + expect(mode.canReadCache).toBe(true) + expect(mode.canWrite).toBe(false) + expect(mode.reason).toContain('Dry run mode') + }) + + it('allows reading from main branch in dry run mode', () => { + process.env.L10N_BRANCH = 'main' + const mode = new Mode({ + locales: ['en', 'de'], + isDryRun: true, + apiKey: 'valid-api-key-thats-long-enough' + }) + expect(mode.mode).toBe('dry-run') + expect(mode.branch).toBe('main') + expect(mode.canReadCache).toBe(true) + expect(mode.canWrite).toBe(false) + }) + }) + + describe('Dry-run mode', () => { + it('returns dry-run mode when API key is missing', () => { + const mode = new Mode({ + locales: ['en', 'de'] + }) + expect(mode.mode).toBe('dry-run') + expect(mode.canReadCache).toBe(true) + expect(mode.canWrite).toBe(false) + expect(mode.reason).toContain('No API key') + }) + + it('returns dry-run mode when API key is too short', () => { + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'short' + }) + expect(mode.mode).toBe('dry-run') + expect(mode.canReadCache).toBe(true) + expect(mode.canWrite).toBe(false) + expect(mode.reason).toContain('API key too short') + }) + + it('treats API key with exactly 10 characters as valid', () => { + process.env.L10N_BRANCH = 'feature-branch' + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: '1234567890' + }) + expect(mode.mode).toBe('perform') + expect(mode.canWrite).toBe(true) + }) + }) + + describe('Perform mode', () => { + it('returns perform mode for valid API key on feature branch', () => { + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough', + branch: 'feature-branch' + }) + expect(mode.mode).toBe('perform') + expect(mode.canReadCache).toBe(true) + expect(mode.canWrite).toBe(true) + expect(mode.reason).toContain('Local development on feature-branch') + }) + + it('allows CI to write to main branch', () => { + process.env.CI = 'true' + process.env.L10N_BRANCH = 'main' + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough' + }) + expect(mode.mode).toBe('perform') + expect(mode.branch).toBe('main') + expect(mode.canWrite).toBe(true) + expect(mode.reason).toContain('CI can write to main') + }) + + it('proceeds to perform mode in CI with invalid API key', () => { + process.env.CI = 'true' + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'short' // Invalid API key + }) + expect(mode.mode).toBe('perform') + expect(mode.canWrite).toBe(true) + expect(mode.reason).toContain('CI can write') + // This ensures CI fails loudly at LLM calls rather than silently dry-running + }) + }) + + describe('Main branch protection', () => { + it('throws error when local dev tries to write to main', () => { + process.env.L10N_BRANCH = 'main' + expect(() => { + new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough' + }) + }).toThrow('Cannot write to main branch of the l10n repos from local development') + }) + + it('allows reading from main with no API key', () => { + process.env.L10N_BRANCH = 'main' + const mode = new Mode({ + locales: ['en', 'de'] + }) + expect(mode.mode).toBe('dry-run') + expect(mode.branch).toBe('main') + expect(mode.canReadCache).toBe(true) + expect(mode.canWrite).toBe(false) + }) + }) + + describe('Branch detection', () => { + it('uses branch parameter when provided', () => { + const mode = new Mode({ + locales: ['en', 'de'], + branch: 'custom-branch' + }) + expect(mode.branch).toBe('custom-branch') + }) + + it('uses L10N_BRANCH env var when no branch parameter', () => { + process.env.L10N_BRANCH = 'env-branch' + const mode = new Mode({ + locales: ['en', 'de'] + }) + expect(mode.branch).toBe('env-branch') + }) + }) + + describe('announce method', () => { + it('can be called without throwing', () => { + const mode = new Mode({ + locales: ['en', 'de'], + verbose: true + }) + expect(() => mode.announce()).not.toThrow() + }) + }) + + describe('force mode options', () => { + let consoleLogSpy: any + + beforeEach(() => { + // Spy on console.log to verify announcement output + consoleLogSpy = vi.spyOn(console, 'log').mockImplementation(() => {}) + }) + + afterEach(() => { + consoleLogSpy.mockRestore() + }) + + it('stores force mode options correctly', () => { + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough', + branch: 'feature-branch', + force: true, + forceFiles: ['join.md', 'donate.md'] + }) + expect(mode.options.force).toBe(true) + expect(mode.options.forceFiles).toEqual(['join.md', 'donate.md']) + }) + + it('announces force mode when enabled', () => { + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough', + branch: 'feature-branch', + force: true, + forceFiles: ['join.md'] + }) + + mode.announce() + + const output = consoleLogSpy.mock.calls.flat().join('\n') + expect(output).toContain('Force mode enabled') + expect(output).toContain('join.md') + }) + + it('announces force mode with multiple files', () => { + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough', + branch: 'feature-branch', + force: true, + forceFiles: ['en.json', '2024-*', 'donate.md'] + }) + + mode.announce() + + const output = consoleLogSpy.mock.calls.flat().join('\n') + expect(output).toContain('Force mode enabled') + expect(output).toContain('en.json, 2024-*, donate.md') + }) + + it('announces force mode without specific files', () => { + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough', + branch: 'feature-branch', + force: true, + forceFiles: [] + }) + + mode.announce() + + const output = consoleLogSpy.mock.calls.flat().join('\n') + expect(output).toContain('Force mode enabled') + expect(output).toContain('all files') + }) + + it('shows full force file list in verbose mode', () => { + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough', + branch: 'feature-branch', + force: true, + forceFiles: ['join.md', 'donate.md', 'learn.md', 'action.md', 'faq.md'], + verbose: true + }) + + mode.announce() + + const output = consoleLogSpy.mock.calls.flat().join('\n') + expect(output).toContain('Force mode enabled') + // In verbose mode, should show all files + expect(output).toContain('join.md') + expect(output).toContain('donate.md') + expect(output).toContain('learn.md') + expect(output).toContain('action.md') + expect(output).toContain('faq.md') + }) + + it('does not announce force mode when disabled', () => { + const mode = new Mode({ + locales: ['en', 'de'], + apiKey: 'valid-api-key-thats-long-enough', + branch: 'feature-branch', + force: false + }) + + mode.announce() + + const output = consoleLogSpy.mock.calls.flat().join('\n') + expect(output).not.toContain('Force mode') + }) + }) +}) diff --git a/scripts/l10n/mode.ts b/scripts/l10n/mode.ts new file mode 100644 index 000000000..7d887d65d --- /dev/null +++ b/scripts/l10n/mode.ts @@ -0,0 +1,175 @@ +/** + * L10n mode determination and configuration + * Determines how the l10n process should operate based on environment and options + */ + +import { l10nCageBranch, canPushToRemote } from './branch-safety' +import { L10N_CAGE_DIR } from '../../src/lib/l10n' + +/** + * L10n operation modes + */ +export type L10nBreed = 'en-only' | 'dry-run' | 'perform' + +/** + * L10n mode configuration options + */ +export interface ModeOptions { + branch?: string + isDryRun?: boolean + apiKey?: string + locales?: readonly string[] + verbose?: boolean + force?: boolean + forceFiles?: string[] +} + +/** + * L10n mode manager + * Determines and describes how the l10n process will operate + */ +export class Mode { + readonly mode: L10nBreed + readonly canReadCache: boolean + readonly canWrite: boolean + readonly branch: string + readonly reason: string + readonly isCI: boolean + readonly options: ModeOptions + + constructor(options: ModeOptions = {}) { + this.options = options + this.isCI = process.env.CI === 'true' + const result = this.determine() + this.mode = result.mode + this.canReadCache = result.canReadCache + this.canWrite = result.canWrite + this.branch = result.branch + this.reason = result.reason + } + + /** + * Determine the l10n operation mode + */ + private determine(): { + mode: L10nBreed + canReadCache: boolean + canWrite: boolean + branch: string + reason: string + } { + const branch = this.options.branch || l10nCageBranch() + + // Check if only English locale + if ( + this.options.locales && + this.options.locales.length === 1 && + this.options.locales[0] === 'en' + ) { + return { + mode: 'en-only', + canReadCache: false, + canWrite: false, + branch, + reason: 'Only English locale configured' + } + } + + // Check for dry run mode (explicit flag or missing/invalid API key in local dev) + const hasValidApiKey = this.options.apiKey && this.options.apiKey.length >= 10 + const shouldDryRun = this.options.isDryRun || (!hasValidApiKey && !this.isCI) + + if (shouldDryRun) { + return { + mode: 'dry-run', + canReadCache: true, + canWrite: false, + branch, + reason: this.options.isDryRun + ? 'Dry run mode requested' + : this.options.apiKey + ? 'API key too short (< 10 chars)' + : 'No API key provided' + } + } + + // Full perform mode - check if allowed + const isMainBranch = branch === 'main' + + if (!this.isCI && isMainBranch) { + // This would write to main from local dev - not allowed + throw new Error( + 'Cannot write to main branch of the l10n repos from local development.\n' + + 'Options:\n' + + '1. Work on a feature branch (recommended)\n' + + '2. Set L10N_BRANCH= in .env\n' + + '3. Use dry-run mode by commenting out L10N_OPENROUTER_API_KEY\n' + + '4. Use --dry-run flag to preview without performing l10ns' + ) + } + + // Test Git credentials for pushing (only in local dev) + if (!this.isCI && !canPushToRemote(L10N_CAGE_DIR)) { + throw new Error( + 'Git push authentication failed.\n' + + 'GitHub requires a Personal Access Token, not your password.\n' + + 'Create one at: https://github.com/settings/tokens\n' + + 'Required scopes: repo (for private repos) or public_repo (for public repos)\n' + + '\nAlternatively, use dry-run mode: pnpm l10n --dryRun' + ) + } + + return { + mode: 'perform', + canReadCache: true, + canWrite: true, + branch, + reason: this.isCI ? `CI can write to ${branch}` : `Local development on ${branch} branch` + } + } + + /** + * Announce/describe what the l10n process will do + */ + announce(): void { + // Main mode announcement + switch (this.mode) { + case 'en-only': + console.log('🌐 L10n Mode: en-only: Can copy English files to build directory') + break + + case 'dry-run': + console.log(`🌐 L10n Mode: dry-run: Reading cage ${this.branch}, no LLM calls, no writes`) + break + + case 'perform': + console.log(`🌐 L10n Mode: perform: Cage ${this.branch}, can call LLM, update, and push`) + break + } + + // Force mode info + if (this.options.force) { + console.log('\n⚡ Force mode enabled') + if (this.options.forceFiles && this.options.forceFiles.length > 0) { + console.log(` → Forcing re-l10n of: ${this.options.forceFiles.join(', ')}`) + } else { + console.log(' → Forcing re-l10n of all files') + } + } + + // Verbose mode hint + if (!this.options.verbose && this.mode === 'dry-run') { + console.log('💡 Tip: Use --verbose to see detailed file-by-file status') + } + + // Show detailed reasoning in verbose mode + if (this.options.verbose) { + console.log('\n📋 Decision details:') + console.log(` → Reason: ${this.reason}`) + console.log(` → Can read cage: ${this.canReadCache}`) + console.log(` → Can write: ${this.canWrite}`) + } + + console.log() + } +} diff --git a/scripts/translation/prompts.ts b/scripts/l10n/prompts.ts similarity index 100% rename from scripts/translation/prompts.ts rename to scripts/l10n/prompts.ts diff --git a/scripts/l10n/run.ts b/scripts/l10n/run.ts new file mode 100644 index 000000000..e41f4ff20 --- /dev/null +++ b/scripts/l10n/run.ts @@ -0,0 +1,263 @@ +/** + * L10n script for the PauseAI website + * + * Main entry point for l10n operations. + * Uses modular components to handle different aspects of the l10n process. + */ + +import dotenv from 'dotenv' +import fs from 'fs/promises' +import minimist from 'minimist' +import path from 'path' +import { execSync } from 'child_process' + +// Import functionality from our own modules +import { createDryRunStats, printDryRunSummary } from './dry-run' +import { resolve as resolveForcePatterns, showForceHelp } from './force' +import { + cleanUpGitSecrets, + createGitClient, + getLatestCommitDates, + GIT_CONFIG, + initializeGitCage +} from './git-ops' +import { l10nCageBranch, pushWithUpstream } from './branch-safety' +import { createLlmClient, createRequestQueue, LLM_DEFAULTS } from './llm-client' +import { Mode } from './mode' +import { generateJsonPrompt, generateMarkdownPrompt, generateReviewPrompt } from './prompts' +import { retrieveMarkdown, retrieveMessages } from './heart' + +// Import from project modules +import { + L10N_CAGE_DIR, + MARKDOWN_L10NS, + MARKDOWN_SOURCE, + MESSAGE_L10NS, + MESSAGE_SOURCE +} from '../../src/lib/l10n.ts' + +// Load environment variables first +dotenv.config() + +// Parse command line arguments +const argv = minimist(process.argv.slice(2), { + boolean: ['dryRun', 'verbose', 'force'] +}) + +// Handle --force with no patterns (show help) +if (argv.force && argv._.length === 0) { + showForceHelp() + process.exit(1) +} + +// Validate arguments +const validArgs = ['dryRun', 'verbose', 'force', '_'] +const unknownArgs = Object.keys(argv).filter((arg) => !validArgs.includes(arg)) +if (unknownArgs.length > 0) { + console.error(`❌ Unknown argument(s): ${unknownArgs.join(', ')}`) + console.error('\nValid options:') + console.error(' --dryRun Run without making changes') + console.error(' --verbose Show detailed output') + console.error(' --force Force re-l10n with patterns (see --force for help)') + process.exit(1) +} + +// Ensure inlang settings are current before importing runtime +execSync('tsx scripts/inlang-settings.ts', { stdio: 'ignore' }) + +// This let / try / catch lets the ESM scan succeed in the absence of a runtime +let locales: readonly string[] +try { + const runtime = await import('../../src/lib/paraglide/runtime.js') + locales = runtime.locales + if (runtime.baseLocale !== 'en') + throw new Error( + `runtime.baseLocale set to ${runtime.baseLocale} but our code assumes and hardcodes 'en'` + ) +} catch (error) { + console.error('Failed to read locales from runtime', error.message) + process.exit(1) +} + +// Get API key early for mode determination +const LLM_API_KEY = process.env.L10N_OPENROUTER_API_KEY + +// Determine l10n mode +const mode = new Mode({ + locales, + apiKey: LLM_API_KEY, + isDryRun: argv.dryRun === true, + verbose: argv.verbose === true, + force: argv.force === true, + forceFiles: argv._ // Files passed as positional arguments +}) + +// Announce what we're going to do +mode.announce() + +// Exit early for en-only mode +if (mode.mode === 'en-only') { + process.exit(0) +} + +// L10n options configuration + +// Initialize statistics tracking for dry run mode +const dryRunStats = createDryRunStats() + +// Create Git clients +const cageGit = createGitClient() +const websiteGit = createGitClient() + +// Repository configuration +const GIT_REPO_PARAGLIDE = 'github.com/PauseAI/paraglide' +const GIT_TOKEN = process.env.GITHUB_TOKEN + +// Configure LLM API client +// For read-only/dry-run modes, we use a placeholder key +const llmClient = createLlmClient({ + baseUrl: LLM_DEFAULTS.BASE_URL, + apiKey: mode.canWrite ? LLM_API_KEY : 'dry-run-placeholder', + model: LLM_DEFAULTS.MODEL, + providers: LLM_DEFAULTS.PROVIDERS +}) + +// Create request queues +const requestQueue = createRequestQueue(LLM_DEFAULTS.REQUESTS_PER_SECOND) +const gitQueue = createRequestQueue(1) // Only one git operation at a time + +const languageNamesInEnglish = new Intl.DisplayNames('en', { type: 'language' }) + +// Main execution block +{ + // Track cached files count + let cacheCount = 0 + + // Only output file-by-file messages in verbose mode or when writing + const logMessage = (msg: string) => { + if (mode.options.verbose || mode.canWrite) { + console.log(msg) + } + } + + // Wrap the main execution in an async IIFE + ;(async () => { + // Resolve force patterns if force mode is enabled + let forceFiles: string[] = [] + if (mode.options.force) { + forceFiles = await resolveForcePatterns(argv._, MARKDOWN_SOURCE) + if (forceFiles.length > 0) { + console.log(`🔄 Force mode: Will re-l10n [${forceFiles.join(', ')}]`) + } else { + console.warn(`⚠️ Force patterns matched no files: [${argv._.join(', ')}]`) + } + } + + const options = { + isDryRun: !mode.canWrite, + verbose: mode.options.verbose, + llmClient, + requestQueue, + gitQueue, + languageNameGenerator: languageNamesInEnglish, + cageGit, + dryRunStats, + cageLatestCommitDates: new Map(), + websiteLatestCommitDates: new Map(), + forceFiles: forceFiles + } + + // Get non-English languages directly from compiled runtime + const targetLocales = Array.from(locales).filter((locale) => locale !== 'en') + console.log(`Using target locales from compiled runtime: [${targetLocales.join(', ')}]`) + + await Promise.all([ + (async () => { + await initializeGitCage({ + dir: L10N_CAGE_DIR, + token: GIT_TOKEN, + repo: GIT_REPO_PARAGLIDE, + username: GIT_CONFIG.USERNAME, + email: GIT_CONFIG.EMAIL, + git: cageGit + }) + options.cageLatestCommitDates = await getLatestCommitDates(cageGit, 'cage') + })(), + (async () => + (options.websiteLatestCommitDates = await getLatestCommitDates(websiteGit, 'website')))() + ]) + + // Process both message files and markdown files in parallel + // Begin message l10n + const results = await Promise.all([ + (async () => { + const result = await retrieveMessages( + { + sourcePath: MESSAGE_SOURCE, + locales: targetLocales, + promptGenerators: [generateJsonPrompt, generateReviewPrompt], + targetDir: MESSAGE_L10NS, + cageWorkingDir: L10N_CAGE_DIR, + logMessageFn: logMessage + }, + options + ) + + // Files are already in the correct location for paraglide to find them + + return result + })(), + (async () => { + const markdownPathsFromBase = await fs.readdir(MARKDOWN_SOURCE, { recursive: true }) + const markdownPathsFromRoot = markdownPathsFromBase.map((file) => + path.join(MARKDOWN_SOURCE, file) + ) + return await retrieveMarkdown( + { + sourcePaths: markdownPathsFromRoot, + sourceBaseDir: MARKDOWN_SOURCE, + locales: targetLocales, + promptGenerators: [generateMarkdownPrompt, generateReviewPrompt], + targetDir: MARKDOWN_L10NS, + cageWorkingDir: L10N_CAGE_DIR, + logMessageFn: logMessage + }, + options + ) + })() + ]) + + // Sum up cache counts and calculate how many new l10ns were created + cacheCount = results.reduce((total, result) => total + result.cacheCount, 0) + const totalFiles = results.reduce((total, result) => total + result.totalProcessed, 0) + const newL10ns = totalFiles - cacheCount + + // Only push changes in write mode + if (mode.canWrite) { + // Show a summary of cached l10ns + console.log(`\n📦 L10n summary:`) + if (cacheCount > 0) { + console.log(` - ${cacheCount} files used cached l10ns`) + } + console.log(` - ${newL10ns} files needed new l10ns`) + + if (newL10ns > 0) { + console.log(`\nPushing l10n changes to remote cage...`) + await pushWithUpstream(cageGit, mode.options.verbose) + } else { + console.log(`\nNo new l10ns to push to remote cage - skipping Git push.`) + } + } else { + // Print summary for read-only/dry-run mode + printDryRunSummary(dryRunStats, mode.options.verbose, cacheCount) + } + + // Clean up Git secrets in CI environments to prevent secret persistence + if (mode.isCI) { + cleanUpGitSecrets() + } + })().catch((error) => { + console.error('L10n process failed:', error) + process.exit(1) + }) +} diff --git a/scripts/translation/utils.ts b/scripts/l10n/utils.ts similarity index 80% rename from scripts/translation/utils.ts rename to scripts/l10n/utils.ts index adc06dae5..f6e4f5d3a 100644 --- a/scripts/translation/utils.ts +++ b/scripts/l10n/utils.ts @@ -1,5 +1,5 @@ /** - * Utility functions for translation operations + * Utility functions for l10n operations */ import fs from 'fs/promises' @@ -32,7 +32,7 @@ export const PREPROCESSING_COMMENT_AFTER_PATTERN: PatternCommentPair[] = [ }, { pattern: /\]\(#[a-z0-9-_.]+\)/g, - comment: `don't translate target, only label` + comment: `don't localize target, only label` } ] @@ -178,27 +178,26 @@ export function preprocessMarkdown(source: string): string { } /** - * Postprocesses translated markdown content by optionally adding heading IDs. - * It compares the headings in the source and the translated content and appends a generated ID to each heading. + * Postprocesses localized markdown content by optionally adding heading IDs. + * It compares the headings in the source and the localized content and appends a generated ID to each heading. * * @param source - The original markdown content. - * @param translation - The translated markdown content before postprocessing. + * @param l10n - The localized markdown content before postprocessing. * @returns The postprocessed markdown content with heading IDs. - * @throws {Error} If the number of headings in the translation does not match those in the source. + * @throws {Error} If the number of headings in the l10n does not match those in the source. */ -export function postprocessMarkdown(source: string, translation: string): string { +export function postprocessMarkdown(source: string, l10n: string): string { const slugger = new GithubSlugger() - let processed = translation + let processed = l10n if (MARKDOWN_CONFIG.POSTPROCESSING_ADD_HEADING_IDS) { const REGEX_HEADING = /^#+ (.*)/gm const headingsInSource = Array.from(source.matchAll(REGEX_HEADING)) if (headingsInSource.length > 0) { let i = 0 - processed = translation.replace(REGEX_HEADING, (_match) => { + processed = l10n.replace(REGEX_HEADING, (_match) => { const sourceResult = headingsInSource[i] - if (!sourceResult) - throw new Error(`Different heading count in translation:\n\n${translation}`) + if (!sourceResult) throw new Error(`Different heading count in l10n:\n\n${l10n}`) const headingInSource = sourceResult[1] const stripped = removeMarkdown(headingInSource) const slugged = slugger.slug(stripped) @@ -232,8 +231,34 @@ export function extractWebPath(localPath: string): string { * @param filePath - Target file path * @param content - Content to write to the file */ -export async function writeFileWithDir(filePath: string, content: string): Promise { +export async function placeInCage(filePath: string, content: string): Promise { const dir = path.dirname(filePath) await fs.mkdir(dir, { recursive: true }) fsSync.writeFileSync(filePath, content) } + +/** + * Cleans up potential LLM commentary in l10n JSON files + * Strips anything before the first '{' and after the last '}' + * + * @param filePath - Path to the JSON file to clean + * @param verbose - Whether to output verbose logs + */ +export function cullCommentary(filePath: string, verbose = false): boolean { + try { + const content = fsSync.readFileSync(filePath, 'utf-8') + const firstBrace = content.indexOf('{') + const lastBrace = content.lastIndexOf('}') + + if (firstBrace === -1 || lastBrace === -1) throw new Error('No JSON object found in file') + + const jsonContent = content.substring(firstBrace, lastBrace + 1) + JSON.parse(jsonContent) // checks validity + if (jsonContent === content) return + fsSync.writeFileSync(filePath, jsonContent, 'utf-8') + + if (verbose) console.log(`✅ Culled LLM commentary in ${filePath}`) + } catch (error) { + console.error(`Error cleaning up file ${filePath}:`, error.message) + } +} diff --git a/scripts/translation/git-ops.ts b/scripts/translation/git-ops.ts deleted file mode 100644 index e5ce4cf76..000000000 --- a/scripts/translation/git-ops.ts +++ /dev/null @@ -1,165 +0,0 @@ -/** - * Git operations for translation management - * Handles repository cloning, commit tracking, and other Git operations - */ - -import fs from 'fs' -import fsPromises from 'fs/promises' -import path from 'path' -import { execSync } from 'child_process' -import simpleGit, { SimpleGit, SimpleGitOptions } from 'simple-git' -import { L10NS_BASE_DIR } from '../../src/lib/l10n' -import { ensureDirectoryExists } from './utils' - -/** - * Configuration for Git operations - */ -export const GIT_CONFIG = { - EMAIL: 'example@example.com', - MAX_CONCURRENT_PROCESSES: 8, - USERNAME: 'Translations' -} - -// Translation repository URL (public access) -export const TRANSLATION_REPO_URL = 'github.com/PauseAI/paraglide' - -/** - * Creates a SimpleGit instance with configured options - * - * @returns A configured SimpleGit instance - */ -export function createGitClient(): SimpleGit { - const gitOptions: Partial = { - maxConcurrentProcesses: GIT_CONFIG.MAX_CONCURRENT_PROCESSES - } - return simpleGit(gitOptions) -} - -/** - * Initialize or update a translation repository - * Can be called directly to manage the translation repository - * - * @param repoDir Directory where the repository should be - * @param verbose Whether to log detailed output - * @returns Success status - */ -export function setupTranslationRepo(repoDir: string, verbose = false): boolean { - try { - // Check if the directory exists and is a git repo - if (fs.existsSync(path.join(repoDir, '.git'))) { - if (verbose) { - console.log(' \u2713 Translation repository already exists, pulling latest changes...') - } - - // Pull latest changes - execSync(`cd ${repoDir} && git pull`, { stdio: verbose ? 'inherit' : 'ignore' }) - if (verbose) console.log(' \u2713 Updated translation repository') - } else { - // Clone the repository - if (verbose) console.log(' \u2713 Cloning translation repository...') - - // If directory exists but isn't a git repo, remove it - if (fs.existsSync(repoDir)) { - fs.rmSync(repoDir, { recursive: true, force: true }) - } - - // Ensure parent directory exists - ensureDirectoryExists(path.dirname(repoDir), verbose) - - // Clone public repository - no token needed for public repos - const gitCommand = `git clone https://${TRANSLATION_REPO_URL}.git ${repoDir}` - execSync(gitCommand, { stdio: verbose ? 'inherit' : 'ignore' }) - if (verbose) console.log(' \u2713 Cloned translation repository') - } - return true - } catch (error) { - console.error('\n\u274c FAILED TO SET UP TRANSLATION REPOSITORY!') - console.error(` Error accessing translation repository: ${(error as Error).message}`) - console.error('\n Options:') - console.error(' 1. Continue with English-only: Edit .env to set PARAGLIDE_LOCALES=en') - console.error(' 2. Check your internet connection and try again') - console.error(' 3. Contact the project maintainers if the issue persists') - return false - } -} - -/** - * Initializes the Git cache by removing the existing directory, - * cloning the remote repository, and configuring Git user settings. - * - * @param options - An object containing the target directory, authentication token, repository URL, username, and email. - * @returns A Promise that resolves when the cache repository has been cloned and configured. - */ -export async function initializeGitCache(options: { - dir: string - token?: string - repo: string - username: string - email: string - git: SimpleGit -}): Promise { - await fsPromises.rm(options.dir, { - recursive: true, - force: true - }) - - // Use token if available (CI/CD with write access) or public URL (local dev, read-only) - const remote = options.token - ? `https://${options.token}@${options.repo}` - : `https://${options.repo}.git` - - console.log( - `\ud83d\udd04 Setting up translation repository from ${options.repo} into ${options.dir}` - ) - await options.git.clone(remote, options.dir) - await options.git.cwd(options.dir) - - // Always set git config in case we need to make local commits - await options.git.addConfig('user.name', options.username) - await options.git.addConfig('user.email', options.email) -} - -/** - * Extracts the latest commit dates for each file by parsing the Git log. - * - * @param git - The SimpleGit instance used to retrieve the log. - * @returns A Promise that resolves to a Map where keys are file paths and values are the latest commit dates. - */ -export async function getLatestCommitDates(git: SimpleGit): Promise> { - const latestCommitDatesMap = new Map() - const log = await git.log({ - '--stat': 4096 - }) - for (const entry of log.all) { - const files = entry.diff?.files - if (!files) continue - for (const file of files) { - if (!latestCommitDatesMap.has(file.file)) { - latestCommitDatesMap.set(file.file, new Date(entry.date)) - } - } - } - return latestCommitDatesMap -} - -/** - * Generates an appropriate commit message based on whether the translation file already existed. - * - * @param sourceFileName - The name of the source file. - * @param language - The language code for the translation. - * @param fileExists - Boolean indicating if the file existed. - * @returns The commit message. - */ -export function getCommitMessage( - sourceFileName: string, - language: string, - fileExists: boolean -): string { - return fileExists - ? `Update outdated translation for ${sourceFileName} in ${language}` - : `Create new translation for ${sourceFileName} in ${language}` -} - -export function cleanUpGitSecrets() { - fs.unlinkSync(path.join(L10NS_BASE_DIR, '.git/config')) -} diff --git a/scripts/translation/translate-core.ts b/scripts/translation/translate-core.ts deleted file mode 100644 index 82ebf6618..000000000 --- a/scripts/translation/translate-core.ts +++ /dev/null @@ -1,308 +0,0 @@ -/** - * Core translation logic - * Handles the main translation workflow and file operations - */ - -import fs from 'fs/promises' -import fsSync from 'fs' -import path from 'path' -import PQueue from 'p-queue' -import { SimpleGit } from 'simple-git' -import { PromptGenerator } from './prompts' -import { postChatCompletion } from './llm-client' -import { getCommitMessage } from './git-ops' -import { preprocessMarkdown, postprocessMarkdown, extractWebPath, writeFileWithDir } from './utils' -import { trackTranslation } from './dry-run' - -/** - * Type definition for target path strategy function - */ -export type TargetStrategy = (language: string, sourcePath: string) => string - -/** - * Configuration options for translation operations - */ -export interface TranslationOptions { - /** Whether to run in dry run mode (skip actual API calls) */ - isDryRun: boolean - /** Whether to output verbose logs */ - verbose: boolean - /** Axios client for LLM API requests */ - llmClient: any - /** Queue for managing API request rate limiting */ - requestQueue: PQueue - /** Queue for Git operations to prevent concurrency issues */ - gitQueue: PQueue - /** Function to generate language names from language codes */ - languageNameGenerator: Intl.DisplayNames - /** Git client for the cache repository */ - cacheGit: SimpleGit - /** Map of latest commit dates in the cache repository */ - cacheLatestCommitDates: Map - /** Map of latest commit dates in the source repository */ - mainLatestCommitDates: Map - /** Statistics object for dry run mode */ - dryRunStats: any -} - -/** - * Translates the provided content to a specified language using a two-pass process, - * or collects statistics in dry run mode without making API calls. - * - * @param content - The original content to be translated. - * @param promptGenerator - A function for generating the translation prompt. - * @param language - The target language code. - * @param promptAdditions - Additional context to include in the prompt. - * @param options - Translation configuration options. - * @param filePath - Optional file path for dry run statistics. - * @returns A Promise that resolves to the reviewed (final) translation, or a placeholder in dry run mode. - * @throws {Error} If either the translation or review pass fails (in non-dry run mode). - */ -export async function translate( - content: string, - promptGenerator: PromptGenerator, - language: string, - promptAdditions: string, - options: TranslationOptions, - filePath?: string -): Promise { - const languageName = options.languageNameGenerator.of(language) - if (!languageName) throw new Error(`Couldn't resolve language code: ${language}`) - - const translationPrompt = promptGenerator(languageName, content, promptAdditions) - - // In dry run mode, collect statistics instead of making API calls - if (options.isDryRun) { - // Track what would be translated for reporting - trackTranslation(options.dryRunStats, content, language, filePath) - - if (options.verbose) { - console.log( - `🔍 [DRY RUN] Would translate ${content.length} characters to ${languageName}${filePath ? ` (${path.basename(filePath)})` : ''}` - ) - } - - // Return a placeholder for translated content - return `[DRY RUN TRANSLATION PLACEHOLDER for ${languageName}]` - } - - // Regular API-based translation for non-dry-run mode - // First pass: generate initial translation - const firstPass = await postChatCompletion(options.llmClient, options.requestQueue, [ - { role: 'user', content: translationPrompt } - ]) - - if (!firstPass) throw new Error(`Translation to ${languageName} failed`) - - if (options.verbose) { - console.log('First pass response:', firstPass) - } else { - console.log( - `Completed first pass translation to ${languageName}${filePath ? ` for ${path.basename(filePath)}` : ''}` - ) - } - - // Second pass: review and refine translation with context - const reviewPrompt = `Please review your translation to ${languageName} for accuracy and naturalness. Make improvements where necessary but keep the meaning identical to the source.` - - const reviewed = await postChatCompletion(options.llmClient, options.requestQueue, [ - { role: 'user', content: translationPrompt }, - { role: 'assistant', content: firstPass }, - { role: 'user', content: reviewPrompt } - ]) - - if (!reviewed) throw new Error(`Review of ${languageName} translation failed`) - - if (options.verbose) { - console.log('Review pass response:', reviewed) - } else { - console.log( - `Completed review pass translation to ${languageName}${filePath ? ` for ${path.basename(filePath)}` : ''}` - ) - } - - return reviewed -} - -/** - * Translates or loads message files using a JSON prompt generator. - * Processes the source JSON file and creates separate translations for each target language. - * - * @param options - An object containing the source path, language tags, prompt generator, target directory, and cache working directory. - * @param translationOptions - Global translation configuration options - * @returns A Promise that resolves with results of the operation - */ -export async function translateOrLoadMessages( - options: { - sourcePath: string - languageTags: string[] - promptGenerator: PromptGenerator - targetDir: string - cacheGitCwd: string - logMessageFn?: (msg: string) => void - }, - translationOptions: TranslationOptions -): Promise<{ cacheCount: number; totalProcessed: number }> { - const result = await translateOrLoad( - { - sourcePaths: [options.sourcePath], - languageTags: options.languageTags, - promptGenerator: options.promptGenerator, - targetStrategy: (language) => path.join(options.targetDir, language + '.json'), - cacheGitCwd: options.cacheGitCwd, - logMessageFn: options.logMessageFn - }, - translationOptions - ) - return { cacheCount: result.cacheCount, totalProcessed: result.totalProcessed } -} - -/** - * Translates or loads markdown files using a Markdown prompt generator. - * Reads markdown files from the source directory and outputs translated files organized by language. - * - * @param options - An object with sourcePaths, sourceBaseDir, language tags, prompt generator, target directory, and cache working directory. - * @param translationOptions - Global translation configuration options - * @returns A Promise that resolves with results of the operation - */ -export async function translateOrLoadMarkdown( - options: { - sourcePaths: string[] - sourceBaseDir: string - languageTags: string[] - promptGenerator: PromptGenerator - targetDir: string - cacheGitCwd: string - logMessageFn?: (msg: string) => void - }, - translationOptions: TranslationOptions -): Promise<{ cacheCount: number; totalProcessed: number }> { - const result = await translateOrLoad( - { - sourcePaths: options.sourcePaths, - languageTags: options.languageTags, - promptGenerator: options.promptGenerator, - targetStrategy: (language, sourcePath) => { - const relativePath = path.relative(options.sourceBaseDir, sourcePath) - return path.join(options.targetDir, language, relativePath) - }, - cacheGitCwd: options.cacheGitCwd, - logMessageFn: options.logMessageFn - }, - translationOptions - ) - return { cacheCount: result.cacheCount, totalProcessed: result.totalProcessed } -} - -/** - * Generalized function that handles the translation or loading of files for various languages. - * It checks whether a cached translation is up-to-date before generating a new translation. - * - * @param options - An object containing source file paths, language tags, prompt generator, target strategy, and the cache working directory. - * @param translationOptions - Global translation configuration options - * @returns A Promise that resolves with the results of the operation - */ -export async function translateOrLoad( - options: { - sourcePaths: string[] - languageTags: string[] - promptGenerator: PromptGenerator - targetStrategy: TargetStrategy - cacheGitCwd: string - logMessageFn?: (msg: string) => void - }, - translationOptions: TranslationOptions -): Promise<{ cacheCount: number; totalProcessed: number }> { - const log = options.logMessageFn || console.log - let done = 1 - let total = 0 - let cacheCount = 0 - - await Promise.all( - options.sourcePaths.map(async (sourcePath) => { - const sourceFileName = path.basename(sourcePath) - /** Backslash to forward slash to match Git log and for web path */ - const processedSourcePath = path.relative('.', sourcePath).replaceAll(/\\/g, '/') - await Promise.all( - options.languageTags.map(async (languageTag) => { - const target = options.targetStrategy(languageTag, sourcePath) - let useCachedTranslation = false - let fileExists = false - - // Check if we can use the cached translation - if (fsSync.existsSync(target)) { - fileExists = true - const sourceLatestCommitDate = - translationOptions.mainLatestCommitDates.get(processedSourcePath) - if (!sourceLatestCommitDate) { - log( - `Didn't prepare latest commit date for ${processedSourcePath}, use Cached version` - ) - useCachedTranslation = true - } - const cachePathFromCwd = path.relative(options.cacheGitCwd, target) - const processedCachePathFromCwd = cachePathFromCwd.replaceAll(/\\/g, '/') - const cacheLatestCommitDate = - translationOptions.cacheLatestCommitDates.get(processedCachePathFromCwd) - if (!cacheLatestCommitDate) - throw new Error(`Didn't prepare latest commit date for ${target}`) - if (cacheLatestCommitDate > sourceLatestCommitDate) { - useCachedTranslation = true - cacheCount++ - } - } - - // If we can't use cache, generate a new translation - if (!useCachedTranslation) { - total++ - const content = await fs.readFile(sourcePath, 'utf-8') - const processedContent = preprocessMarkdown(content) - - if (translationOptions.verbose) { - log(processedContent) - } - - const page = extractWebPath(sourcePath) - const promptAdditions = '' // Will be handled by the prompt generator - - const translation = await translate( - processedContent, - options.promptGenerator, - languageTag, - promptAdditions, - translationOptions, - sourcePath - ) - - // Only perform actual file writes and Git operations in non-dry run mode - if (!translationOptions.isDryRun) { - const processedTranslation = postprocessMarkdown(processedContent, translation) - await writeFileWithDir(target, processedTranslation) - - const message = getCommitMessage(sourceFileName, languageTag, fileExists) - try { - await translationOptions.gitQueue.add(() => - (fileExists - ? translationOptions.cacheGit - : translationOptions.cacheGit.add('.') - ).commit(message, ['-a']) - ) - } catch (e) { - if (e instanceof Error && e.message.includes('nothing to commit')) { - log(`${sourceFileName} in ${languageTag} didn't change`) - } else { - throw e - } - } - log(`${message} (${done++} / ${total})`) - } - } - }) - ) - }) - ) - - // Total count of processed files is the count of source files multiplied by target languages - const totalProcessed = options.sourcePaths.length * options.languageTags.length - return { cacheCount, totalProcessed } -} diff --git a/scripts/translation/translate.ts b/scripts/translation/translate.ts deleted file mode 100644 index ef0a47890..000000000 --- a/scripts/translation/translate.ts +++ /dev/null @@ -1,239 +0,0 @@ -/** - * Translation script for the PauseAI website - * - * Main entry point for translation operations. - * Uses modular components to handle different aspects of the translation process. - */ - -import dotenv from 'dotenv' -import fs from 'fs/promises' -import minimist from 'minimist' -import path from 'path' - -// Import functionality from our own modules -import { createDryRunStats, printDryRunSummary } from './dry-run' -import { - cleanUpGitSecrets, - createGitClient, - getLatestCommitDates, - GIT_CONFIG, - initializeGitCache -} from './git-ops' -import { createLlmClient, createRequestQueue, LLM_DEFAULTS } from './llm-client' -import { generateJsonPrompt, generateMarkdownPrompt } from './prompts' -import { translateOrLoadMarkdown, translateOrLoadMessages } from './translate-core' -import { requireEnvVar } from './utils' - -// Import from project modules -import { - L10NS_BASE_DIR, - MARKDOWN_L10NS, - MARKDOWN_SOURCE, - MESSAGE_L10NS, - MESSAGE_SOURCE -} from '../../src/lib/l10n.ts' - -// This let / try / catch lets the ESM scan succeed in the absence of a runtime -let locales: readonly string[] -try { - const runtime = await import('../../src/lib/paraglide/runtime.js') - locales = runtime.locales - if (runtime.baseLocale !== 'en') - throw new Error( - `runtime.baseLocale set to ${runtime.baseLocale} but our code assumes and hardcodes 'en'` - ) - - if (locales.length === 1 && locales[0] === 'en') { - console.log('No translation needed: en only') - process.exit(0) - } -} catch (error) { - console.error('Failed to read locales from runtime', error.message) - process.exit(1) -} - -import { getDevContext, isDev } from '../../src/lib/env' - -// Translation options & debugging configuration -const DEBUG_RETRANSLATE_EVERYTHING = false -const DEBUG_RETRANSLATE_FILES: string[] = [ - 'en.json', - 'learn.md', - 'proposal.md', - 'events.md', - 'faq.md', - 'action.md', - 'donate.md', - 'join.md' -] - -// Load environment variables -dotenv.config() - -// Parse command line arguments -const argv = minimist(process.argv.slice(2)) - -// Configure dry run mode and verbosity -const DEBUG = argv.mode === 'debug' -const VERBOSE = argv.verbose || DEBUG - -// Add dry run mode via CLI flag or for development environments unless forced -const isDryRun = argv.dryRun || (isDev() && process.env.L10N_FORCE_TRANSLATE !== 'true') - -// Initialize statistics tracking for dry run mode -const dryRunStats = createDryRunStats() - -// Create Git clients -const cacheGit = createGitClient() -const mainGit = createGitClient() - -// Repository configuration -const GIT_REPO_PARAGLIDE = 'github.com/PauseAI/paraglide' -const GIT_TOKEN = process.env.GITHUB_TOKEN - -// Configure LLM API client -const LLM_API_KEY = requireEnvVar( - 'TRANSLATION_OPENROUTER_API_KEY', - 'dry-run-placeholder', - isDryRun, - VERBOSE -) -const llmClient = createLlmClient({ - baseUrl: LLM_DEFAULTS.BASE_URL, - apiKey: LLM_API_KEY, - model: LLM_DEFAULTS.MODEL, - providers: LLM_DEFAULTS.PROVIDERS -}) - -// Create request queues -const requestQueue = createRequestQueue(LLM_DEFAULTS.REQUESTS_PER_SECOND) -const gitQueue = createRequestQueue(1) // Only one git operation at a time - -// Create language name translator -const languageNamesInEnglish = new Intl.DisplayNames('en', { type: 'language' }) - -// Main execution block -{ - // Track cached files count - let cacheCount = 0 - - // Only output file-by-file messages in verbose mode - const logMessage = (msg: string) => { - if (VERBOSE || !isDryRun) { - console.log(msg) - } - } - - // Set up translation options - const translationOptions = { - isDryRun, - verbose: VERBOSE, - llmClient, - requestQueue, - gitQueue, - languageNameGenerator: languageNamesInEnglish, - cacheGit, - dryRunStats, - cacheLatestCommitDates: new Map(), - mainLatestCommitDates: new Map(), - debugMode: DEBUG, - debugRetranslateEverything: DEBUG_RETRANSLATE_EVERYTHING, - debugRetranslateFiles: DEBUG_RETRANSLATE_FILES - } - - // Wrap the main execution in an async IIFE - ;(async () => { - // Get non-English languages directly from compiled runtime - const languageTags = Array.from(locales).filter((locale) => locale !== 'en') - console.log(`Translation running in ${isDryRun ? 'DRY RUN' : 'ACTIVE'} mode ${getDevContext()}`) - console.log(`Using target locales from compiled runtime: [${languageTags.join(', ')}]`) - - await Promise.all([ - (async () => { - await initializeGitCache({ - dir: L10NS_BASE_DIR, - token: GIT_TOKEN, - repo: GIT_REPO_PARAGLIDE, - username: GIT_CONFIG.USERNAME, - email: GIT_CONFIG.EMAIL, - git: cacheGit - }) - translationOptions.cacheLatestCommitDates = await getLatestCommitDates(cacheGit) - })(), - (async () => - (translationOptions.mainLatestCommitDates = await getLatestCommitDates(mainGit)))() - ]) - - // Process both message files and markdown files in parallel - const results = await Promise.all([ - (async () => { - const result = await translateOrLoadMessages( - { - sourcePath: MESSAGE_SOURCE, - languageTags: languageTags, - promptGenerator: generateJsonPrompt, - targetDir: MESSAGE_L10NS, - cacheGitCwd: L10NS_BASE_DIR, - logMessageFn: logMessage - }, - translationOptions - ) - - // Only copy files in non-dry-run mode - if (!isDryRun) { - await fs.cp(MESSAGE_L10NS, L10NS_BASE_DIR, { recursive: true }) - } - - return result - })(), - (async () => { - const markdownPathsFromBase = await fs.readdir(MARKDOWN_SOURCE, { recursive: true }) - const markdownPathsFromRoot = markdownPathsFromBase.map((file) => - path.join(MARKDOWN_SOURCE, file) - ) - return await translateOrLoadMarkdown( - { - sourcePaths: markdownPathsFromRoot, - sourceBaseDir: MARKDOWN_SOURCE, - languageTags: languageTags, - promptGenerator: generateMarkdownPrompt, - targetDir: MARKDOWN_L10NS, - cacheGitCwd: L10NS_BASE_DIR, - logMessageFn: logMessage - }, - translationOptions - ) - })() - ]) - - // Sum up cache counts and calculate how many new translations were created - cacheCount = results.reduce((total, result) => total + result.cacheCount, 0) - const totalFiles = results.reduce((total, result) => total + result.totalProcessed, 0) - const newTranslations = totalFiles - cacheCount - - // Only push changes in non-dry-run mode - if (!isDryRun) { - // Show a summary of cached translations - console.log(`\n📦 Translation summary:`) - if (cacheCount > 0) { - console.log(` - ${cacheCount} files used cached translations`) - } - console.log(` - ${newTranslations} files needed new translations`) - - // Only push to Git if we actually created new translations - if (newTranslations > 0) { - console.log(`\nPushing translation changes to repository...`) - await cacheGit.push() - } else { - console.log(`\nNo new translations to push to repository - skipping Git push.`) - } - } else { - // Print summary for dry run mode - printDryRunSummary(dryRunStats, VERBOSE, cacheCount) - } - if (!isDev()) cleanUpGitSecrets() - })().catch((error) => { - console.error('Translation process failed:', error) - process.exit(1) - }) -} diff --git a/src/lib/env.ts b/src/lib/env.ts index 44fd6b7b6..757b926b4 100644 --- a/src/lib/env.ts +++ b/src/lib/env.ts @@ -20,36 +20,28 @@ export function getEnvironment(): Record { /** * Determines if the current environment is development or test + * In practice, this means "not in CI" unless explicitly overridden by env.DEV */ export function isDev(): boolean { const env = getEnvironment() - const envMode = env.NODE_ENV || env.MODE - // Note that an undefined environment defaults to dev - fits with the frameworks we use - return !envMode || envMode === 'development' || envMode === 'test' || env.DEV === true + // Explicit DEV=true forces dev mode regardless of CI + return env.DEV === true || env.DEV === 'true' || env.CI !== 'true' } /** * Returns a formatted string with complete environment context for logging - * Shows all factors that contribute to the isDev decision + * Shows CI status and DEV override which determine isDev behavior */ export function getDevContext(): string { const env = getEnvironment() const icon = isDev() ? '✓' : '✗' - // Build context showing all relevant environment variables - const parts = [] + // Build context string showing relevant factors + const parts = [`CI: ${env.CI || 'false'}`] - // Always show NODE_ENV, even if undefined - parts.push(`NODE_ENV=${env.NODE_ENV || 'undefined'}`) - - // Show MODE if it exists and differs from NODE_ENV - if (env.MODE && (!env.NODE_ENV || env.MODE !== env.NODE_ENV)) { - parts.push(`MODE=${env.MODE}`) - } - - // Show DEV flag if it's true (since it can override everything else) - if (env.DEV === true) { - parts.push(`DEV=true`) + // Show DEV if it's set (since it can override CI) + if (env.DEV === true || env.DEV === 'true') { + parts.push(`DEV: ${env.DEV}`) } return `isDev: ${icon} (${parts.join(', ')})` diff --git a/src/lib/l10n.ts b/src/lib/l10n.ts index e9d8947ae..a892a508b 100644 --- a/src/lib/l10n.ts +++ b/src/lib/l10n.ts @@ -7,11 +7,9 @@ import path from 'path' // Import default settings from our JavaScript module import defaultSettingsConfig from '../../project.inlang/default-settings.js' -export const L10NS_BASE_DIR = './src/temp/translations' -export const MARKDOWN_L10NS = `${L10NS_BASE_DIR}/md` -export const MESSAGE_L10NS = `${L10NS_BASE_DIR}/json` - -// Source paths for content to be translated +export const L10N_CAGE_DIR = './l10n-cage' +export const MARKDOWN_L10NS = `${L10N_CAGE_DIR}/md` +export const MESSAGE_L10NS = `${L10N_CAGE_DIR}/json` export const MESSAGE_SOURCE = './messages/en.json' export const MARKDOWN_SOURCE = './src/posts' @@ -27,7 +25,7 @@ export function getDefaultSettings(): typeof defaultSettingsConfig { return defaultSettingsConfig } -// For translation scripts that need to know target languages +// For l10n scripts that need to know target languages export function getTargetLocales(): string[] { return getDefaultSettings().locales.filter((tag) => tag !== 'en') } diff --git a/src/routes/[slug]/+page.ts b/src/routes/[slug]/+page.ts index fb6de47f5..159dfb845 100644 --- a/src/routes/[slug]/+page.ts +++ b/src/routes/[slug]/+page.ts @@ -25,7 +25,7 @@ async function importMarkdown(locale: string, slug: string) { return await import(`../../posts/${slug}.md`) } else { try { - return await import(`../../temp/translations/md/${locale}/${slug}.md`) + return await import(`../../../l10n-cage/md/${locale}/${slug}.md`) } catch (error) { if (import.meta.env.DEV) { return { diff --git a/template.env b/template.env index c0c49f2d9..5bba56a97 100644 --- a/template.env +++ b/template.env @@ -13,14 +13,17 @@ ANTHROPIC_API_KEY_FOR_WRITE = "" # The set of all locales is defined in project.inlang/default-settings.js, # but we allow an environment variable override because using fewer locales # can significantly improve local development convenience and build speed! -# We even default this to only "en" if not specified in a developer environment. +# Default: "en" for local development, "all" in CI/CD environments # This is a comma-separated list. "-" switches into exclude mode. Examples: "en", "en,nl", "all", "-de,fr" PARAGLIDE_LOCALES=en -# Only set this if you want to test generation of new translations locally -# ! Please don't use the default cache repos if so - your dev env should not write to it ! -# (Normally translations are generated only in CI/CD pipelines) -# If this is empty, only existing translations cloned from a cache will be used -TRANSLATION_OPENROUTER_API_KEY="" -# Uncomment the line above and add your API key to enable translation generation +# Only set this if you want to test generation of new l10n content locally +# (Normally l10n content is generated only in CI/CD pipelines) +# If this is empty, only existing l10n content from the cache will be used +L10N_OPENROUTER_API_KEY="" +# Add your OpenRouter API key above to enable l10n generation + +# Optionally override which branch of the l10n cache to use +# (defaults to same name as current website branch, or 'main' if on main) +# L10N_BRANCH="my-feature-branch" diff --git a/vite.config.ts b/vite.config.ts index bc10cc160..c240c4abc 100644 --- a/vite.config.ts +++ b/vite.config.ts @@ -36,7 +36,11 @@ export default defineConfig(({ command, mode }) => { }, server: { - port: 37572 + port: 37572, + fs: { + // Allow serving files from l10n-cage directory + allow: [MARKDOWN_L10NS] + } }, // Improve build performance and reduce log output