Skip to content

Conversation

@irjudson
Copy link
Member

Summary

This PR adds comprehensive test coverage for the multi-table rolling window functionality (issue #6) and fixes a timer cleanup bug that was causing test interference.

Changes

Test Coverage (25 new tests)

  • checkDataRange: Data range detection and validation (4 tests)
  • initializeGenerators: Generator lifecycle management (2 tests)
  • generateAndInsertBatch: Coordinated multi-table generation (2 tests)
  • cleanupOldData: Retention management and error handling (3 tests)
  • backfillTable: Historical data filling with progress tracking (3 tests)
  • start/stop lifecycle: Service state and timer management (5 tests)
  • start with backfill: Automatic and optional backfill modes (2 tests)
  • Comprehensive error handling and edge case coverage (4 tests)

Bug Fix

  • Fixed timer cleanup issue where the cleanup schedule setTimeout was not being tracked or cleared
  • Added cleanupScheduleTimeout field to track the initial 60-second setTimeout
  • Modified stop() method to properly clear all timers including the schedule timeout
  • This fix prevents timer interference between tests

Test Results

All 91 tests passing:

  • 45 existing tests (BigQuery client, config loader, sync engine, generator)
  • 25 new multi-table orchestrator tests
  • 21 sync engine tests

Files Changed

  • test/multi-table-orchestrator.test.js - New comprehensive test file
  • tools/maritime-data-synthesizer/multi-table-orchestrator.js - Timer cleanup fix
  • package.json - Added new test file to test scripts

This PR completes the testing requirements for the cleanup branch issues (#3, #5, #6).

Implemented exponential backoff with jitter for BigQuery API calls:

**Implementation:**
- Added configurable retry parameters (maxRetries, initialRetryDelay)
- Exponential backoff: initialDelay * 2^attempt with random jitter
- Capped maximum delay at 30 seconds
- Intelligent error detection (retryable vs non-retryable)
- Detailed retry logging (warnings on retry, errors on final failure)

**Retryable errors:**
- Rate limits (rateLimitExceeded, 429)
- Quota exceeded
- Internal/backend errors
- Service unavailable (503)

**Non-retryable errors** fail immediately:
- Invalid queries
- Permission errors
- Schema mismatches

**Testing:**
- Added comprehensive unit tests for retry logic
- Tests verify exponential backoff, jitter, max retries
- All 65 tests passing

**Documentation:**
- Added retry configuration section to README
- Included example delays and configuration options
Fixed unbounded memory growth in maritime vessel generator:

**Root Cause:**
- Journey Map grew indefinitely without cleanup
- Completed journeys were never removed
- Journey completion was never marked

**Solution:**
1. Added configurable maxJourneys limit (default: 10,000)
2. Implemented cleanupOldJourneys() method:
   - Removes oldest 20% of journeys when limit exceeded
   - Removes journeys older than 7 days
   - Prioritizes removing completed journeys
3. Mark journeys as completed:
   - After 12+ hours in port
   - When vessel leaves port to start new journey
4. Periodic cleanup:
   - Every 100 batches
   - When 80% of maxJourneys reached

**Testing:**
- Re-enabled generator tests (was generator.test.js.skip)
- Updated tests to match current API
- Added memory leak prevention test
- All 70 tests passing

**Impact:**
- Memory usage now bounded and predictable
- Enables safe long-running data generation
- No performance impact on generation speed
Implemented complete rolling window infrastructure for multi-table orchestrator:

**New Methods Added:**
1. `start()` - Continuous generation with per-table rolling windows
2. `stop()` - Graceful shutdown of all timers
3. `checkDataRange()` - Per-table data coverage analysis
4. `backfillTable()` - Historical data gap filling for each table
5. `cleanupOldData()` - Retention management across all tables
6. `generateAndInsertBatch()` - Coordinated multi-table generation
7. `initializeGenerators()` - Initialize generators for continuous mode

**Features:**
- **Rolling Window**: Maintains configurable retention period (default: 30 days)
- **Auto-Backfill**: Checks each table and fills gaps automatically
- **Continuous Generation**: Parallel generation for all 3 tables
  - vessel_positions: 100 records/batch
  - port_events: 10 records/batch
  - vessel_metadata: 1 record/batch
- **Automatic Cleanup**: Removes old data beyond retention period
- **State Management**: Tracks batches, records, errors, uptime
- **Graceful Shutdown**: Proper SIGINT/SIGTERM handling

**CLI Updates:**
- Added `start` command support for multi-table mode
- Maintains feature parity with single-table mode
- Supports `--no-backfill` flag for generation-only mode

**Configuration:**
- `batchSize`: Records per generation cycle (default: 100)
- `generationIntervalMs`: Generation frequency (default: 60000ms)
- `retentionDays`: Rolling window size (default: 30 days)
- `cleanupIntervalHours`: Cleanup frequency (default: 24 hours)

**Testing:**
- All 70 existing tests still passing
- Ready for integration testing with live BigQuery

**Impact:**
- Multi-table mode now feature-complete
- Enables production continuous data generation
- No breaking changes to existing functionality
This commit adds 25 unit tests for multi-table rolling window functionality
and fixes a test interference issue where timers were not being properly
cleaned up between tests.

Test Coverage Added (25 tests):
- checkDataRange: data range detection and validation (4 tests)
- initializeGenerators: generator lifecycle management (2 tests)
- generateAndInsertBatch: coordinated multi-table generation (2 tests)
- cleanupOldData: retention management and error handling (3 tests)
- backfillTable: historical data filling with progress tracking (3 tests)
- start/stop lifecycle: service state and timer management (5 tests)
- start with backfill: automatic and optional backfill modes (2 tests)
- Comprehensive error handling and edge case coverage (4 tests)

Bug Fix:
- Fixed timer cleanup issue where the cleanup schedule setTimeout was not
  being tracked or cleared, causing test interference
- Added cleanupScheduleTimeout field to track the initial 60-second setTimeout
- Modified stop() method to properly clear all timers including the schedule timeout

All 91 tests now passing (45 existing + 25 new + 21 sync engine tests).
@irjudson irjudson merged commit 779e02e into main Nov 13, 2025
4 checks passed
@irjudson irjudson deleted the cleanup branch November 13, 2025 18:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants