Skip to content

[STORY] Cluster Migrate Script #426

@jsbattig

Description

@jsbattig

Part of: #408

Story: Cluster Migrate Script (Seed Cluster from Single Server)

Part of: #408

[Conversation Reference: "cluster migrate is to seed a cluster with the initial state of a working local server"]

Story Overview

Objective: Create a script that seeds a new cluster with the complete state of a working standalone CIDX server. This orchestrates: running the SQLite-to-PostgreSQL data migration (Story 10), copying golden repo files to shared ONTAP storage, and converting the standalone server into the first node of the cluster.

User Value: An existing production CIDX server can be converted to a cluster without downtime beyond a maintenance window. All data, repositories, indexes, and configuration are preserved.

Acceptance Criteria

AC1: End-to-End Migration Orchestration

Scenario: The script migrates a standalone server to become the first cluster node.

Given a working standalone CIDX server with data in SQLite and golden repos on local disk
When the cluster-migrate script is run
Then it stops the CIDX server service
And it runs the SQLite-to-PostgreSQL data migration tool (Story 10)
And it copies golden-repos/ and .versioned/ to the shared NFS mount
And it updates alias JSON files to reflect new NFS-based paths
And it updates config.json for cluster mode
And it restarts the CIDX server in cluster mode
And the server comes up with all previous data and repos accessible

Technical Requirements:

  • Shell script: scripts/cluster-migrate.sh
  • Prerequisite check: cluster-join.sh must have been run first (NFS mount, PostgreSQL configured)
  • Stop server: sudo systemctl stop cidx-server
  • Run data migration: python3 -m code_indexer.server.tools.migrate_to_postgres
  • Copy repos: rsync -a from local golden-repos/ to NFS mount
  • Update alias JSON target_path values for new base paths
  • Start server: sudo systemctl start cidx-server
  • Verification step after restart

AC2: Golden Repo File Migration to Shared Storage

Scenario: All golden repo files are moved to the ONTAP FSx NFS mount.

Given golden-repos/ contains cloned repositories and .versioned/ contains snapshots
When the file migration runs
Then all files under golden-repos/ are copied to <nfs_mount>/golden-repos/
And all files under .versioned/ are copied to <nfs_mount>/.versioned/
And all alias JSON files are copied and updated with new paths
And .code-indexer/index/ directories (vector indexes) are copied
And file permissions are preserved
And rsync is used for efficient, resumable copy

Technical Requirements:

  • rsync -av --progress for copy with progress reporting
  • Source: ~/.cidx-server/golden-repos/ and ~/.cidx-server/.versioned/
  • Destination: <nfs_mount>/golden-repos/ and <nfs_mount>/.versioned/
  • Resumable: if interrupted, re-run copies only changed files
  • Space check: verify NFS mount has sufficient free space before starting

AC3: Alias JSON Path Update

Scenario: Alias JSON files are updated to reflect the new NFS-based paths.

Given alias JSON files contain target_path pointing to local filesystem paths
When the migration updates paths
Then all target_path values are rewritten to use the NFS mount base path
And the old local path prefix is replaced with the NFS mount path
And the JSON structure is otherwise unchanged

Technical Requirements:

  • Scan all *.json files in <nfs_mount>/golden-repos/
  • Replace path prefix: ~/.cidx-server/ -> <nfs_mount>/
  • Validate JSON after update (no corruption)
  • Backup original JSON files before modification

AC4: Pre-Migration Validation

Scenario: The script validates prerequisites before starting.

Given the operator runs the cluster-migrate script
When pre-migration checks run
Then it verifies cluster-join.sh has been run (config.json has cluster settings)
And it verifies NFS mount is active and writable
And it verifies PostgreSQL is reachable
And it verifies the CIDX server is stopped (not running during migration)
And it verifies sufficient disk space on NFS mount
And if any check fails, it reports the issue and exits without migrating

Technical Requirements:

  • Check config.json has storage_mode: "postgres" (cluster-join was run)
  • Check NFS mount: mountpoint -q <mount_point>
  • Check PostgreSQL: test connection
  • Check server stopped: systemctl is-active cidx-server returns "inactive"
  • Check disk space: compare local data size vs NFS free space
  • Exit with clear message on any failure

AC5: Post-Migration Verification

Scenario: The script verifies the migration was successful.

Given the migration has completed
When the post-migration verification runs
Then it starts the CIDX server
And it waits for the server to be healthy (health endpoint returns 200)
And it lists repositories via API and verifies all repos are accessible
And it runs a test query against one repository to verify indexes work
And it reports the verification results

Technical Requirements:

  • Start server: sudo systemctl start cidx-server
  • Health check: poll /health endpoint until ready (30s timeout)
  • List repos: call list_repositories API
  • Test query: simple search_code query against first repo
  • Report: "Migration verified: N repos accessible, query test passed"

AC6: Rollback Capability

Scenario: If migration fails, the standalone server can be restored.

Given the migration fails at any step
When the operator wants to rollback
Then the original SQLite databases are unchanged (read-only during migration)
And the original config.json is backed up as config.json.pre-cluster
And restoring config.json.pre-cluster and restarting returns to standalone mode
And the script provides rollback instructions on failure

Technical Requirements:

  • SQLite databases read during migration, never modified
  • Config backup: config.json.pre-cluster
  • On failure: print rollback instructions
  • Local golden-repos/ preserved (copied to NFS, not moved)

Implementation Status

  • Core implementation complete
  • Unit tests passing
  • Integration tests passing
  • E2E tests passing
  • Code review approved
  • Manual E2E testing completed
  • Documentation updated

Technical Implementation Details

File Structure

scripts/
    cluster-migrate.sh     # Main migration orchestration script

Script Flow

1. Pre-migration validation
   a. Check config.json has cluster settings
   b. Check NFS mount is active
   c. Check PostgreSQL is reachable
   d. Check server is stopped
   e. Check disk space
2. Backup config.json -> config.json.pre-cluster
3. Run SQLite-to-PostgreSQL data migration (Story 10)
4. Copy golden-repos/ to NFS mount (rsync)
5. Copy .versioned/ to NFS mount (rsync)
6. Update alias JSON paths
7. Start server in cluster mode
8. Post-migration verification
9. Print summary

Output Example

CIDX Cluster Migration
=======================

Pre-checks:
  Cluster config:  OK (storage_mode: postgres)
  NFS mount:       OK (/mnt/cidx-shared, 450GB free)
  PostgreSQL:      OK (postgresql://cidx@pg-host:5432/cidx)
  Server stopped:  OK

Step 1/5: SQLite to PostgreSQL data migration
  Migrating users...          45 rows  OK
  Migrating global_repos...   12 rows  OK
  [...]
  Validation: PASSED

Step 2/5: Copy golden repos to shared storage
  rsync golden-repos/ -> /mnt/cidx-shared/golden-repos/
  12 repositories, 8.4 GB total ... done

Step 3/5: Copy versioned snapshots
  rsync .versioned/ -> /mnt/cidx-shared/.versioned/
  12 snapshots, 8.2 GB total ... done

Step 4/5: Update alias JSON paths
  Updated 12 alias files

Step 5/5: Post-migration verification
  Server startup:  OK (healthy in 8s)
  Repositories:    12/12 accessible
  Query test:      OK (search_code returned results)

Migration complete!
This server is now the first node of the cluster.
To add more nodes: run 'scripts/cluster-join.sh' on each new server.

Rollback (if needed):
  sudo systemctl stop cidx-server
  cp ~/.cidx-server/config.json.pre-cluster ~/.cidx-server/config.json
  sudo systemctl start cidx-server

Testing Requirements

  • Automated: Pre-migration validation catches missing prerequisites.
  • Automated: Alias JSON path rewriting works correctly.
  • Automated: Rollback instructions are correct.
  • Manual E2E: Run full migration on a test standalone server with real data. Verify all repos accessible, queries work, dashboard shows data in cluster mode. Test rollback by restoring config.

Definition of Done

  • End-to-end migration orchestration script operational
  • SQLite-to-PostgreSQL data migration invoked successfully
  • Golden repo files copied to NFS mount via rsync
  • Alias JSON paths updated for NFS base
  • Pre-migration validation catches all prerequisites
  • Post-migration verification confirms repos and queries work
  • Rollback path documented and tested
  • Script is idempotent (safe to re-run on partial failure)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions