diff --git a/GLOBALS_LOGGING_RESEARCH.md b/GLOBALS_LOGGING_RESEARCH.md new file mode 100644 index 0000000..31486f9 --- /dev/null +++ b/GLOBALS_LOGGING_RESEARCH.md @@ -0,0 +1,528 @@ +# HarperDB Globals Logging System - Research Report for Issue #11 + +## Executive Summary + +This research investigates the HarperDB globals logging system as it relates to issue #11 (Enhanced monitoring and observability). The current codebase has a **mixed logging approach** where: + +- **202 logger.\* calls** properly use HarperDB's global logger (for Grafana integration) +- **70 console.\* calls** bypass the logging system entirely +- Only 3 files use console logging directly (bigquery.js, generator.js, service.js) + +The **globals.js implementation uses a singleton pattern**, which is appropriate for this use case and aligns with HarperDB's architecture. + +--- + +## 1. HarperDB Globals System Overview + +### What is Globals? + +The `globals` system in HarperDB is a key-value store that persists data at the application level, surviving across requests. It's commonly used for: + +- Caching shared state +- Maintaining singleton instances +- Storing application configuration +- Passing state between different parts of the application + +### Logger Integration with Grafana + +HarperDB provides a **global `logger` object** that is automatically available in plugin code: + +```javascript +logger.trace(); // Detailed diagnostic information +logger.debug(); // Debug-level messages +logger.info(); // General informational messages +logger.warn(); // Warning messages +logger.error(); // Error messages +logger.fatal(); // Fatal error messages +logger.notify(); // Notifications +``` + +**Key Point:** The `logger` global writes to Harper's centralized logging system, which can be **integrated with Grafana** for monitoring and observability. Console.log output bypasses this system entirely and goes to stdout/stderr. + +--- + +## 2. Current Globals Implementation + +### File: `/Users/ivan/Projects/harper-bigquery-sync/src/globals.js` + +```javascript +class Globals { + constructor() { + if (Globals.instance) { + return Globals.instance; // Singleton pattern + } + this.data = {}; + Globals.instance = this; + } + set(key, value) { + this.data[key] = value; + } + get(key) { + return this.data[key]; + } +} + +const globals = new Globals(); +export { globals, Globals }; +export default Globals; +``` + +### Analysis: Singleton Pattern + +**Is the singleton pattern necessary here?** Yes, for several reasons: + +1. **Deterministic instance sharing**: Guarantees only one instance exists across module imports +2. **Multi-module consistency**: When multiple modules import globals, they all access the same object +3. **Thread safety in context**: Within Harper's execution model, the singleton is safe +4. **Application state durability**: A single instance persists data across requests + +### Current Usage + +Files using globals: + +1. **`src/index.js`** - Entry point stores SyncEngines: + + ```javascript + globals.set('syncEngines', syncEngines); + globals.set('syncEngine', syncEngines[0]); + globals.set('schemaManager', schemaManager); + globals.set('validator', validationService); + ``` + +2. **`src/resources.js`** - Resource layer retrieves engines: + + ```javascript + await globals.get('syncEngine').getStatus(); + await globals.get('validator').runValidation(); + ``` + +3. **`src/sync-engine.js`** - Imports globals but doesn't currently use it + +--- + +## 3. Current Logging Patterns in Codebase + +### Statistics + +| Metric | Count | +| ----------------------------------- | ----- | +| Total lines of src/ JavaScript code | 4,517 | +| Logger.\* calls | 202 | +| Console.\* calls | 70 | +| Files using console directly | 3 | +| Files using logger | 11+ | + +### Files Using Console (Logging Not via Grafana) + +1. **`src/bigquery.js`** - MaritimeBigQueryClient (legacy synthesizer) + - 23 console.log/error statements + - Example: `console.log(\`Creating dataset: ${this.datasetId}\`)` + +2. **`src/generator.js`** - MaritimeVesselGenerator (legacy synthesizer) + - 2 console.log statements + - Used for test data generation + +3. **`src/service.js`** - MaritimeDataSynthesizer (legacy synthesizer) + - 45 console.log/error statements + - Used for CLI output during data initialization + +### Files Using Logger (Grafana-Integrated) + +1. **`src/sync-engine.js`** - 70+ logger.\* calls + - Constructor, initialization, sync cycles + - Cluster discovery, checkpoint management + - Record ingestion and error handling + +2. **`src/bigquery-client.js`** - 40+ logger.\* calls + - Retry logic with exponential backoff + - Query execution and performance tracking + - Error categorization (retryable vs. fatal) + +3. **`src/validation.js`** - 30+ logger.\* calls + - Validation suite progress + - Checkpoint validation + - Smoke tests and spot checks + +4. **`src/resources.js`** - 15+ logger.\* calls + - Resource layer operations + - Sync control endpoints + - Data retrieval operations + +5. **`src/index.js`** - 15+ logger.\* calls + - Application initialization + - Schema manager setup + - Engine lifecycle + +6. **`src/schema-manager.js`** - No logger calls currently + - Good opportunity for enhancement + +7. **`src/operations-client.js`** - Uses logger + - API operations + +--- + +## 4. Logging Content Analysis + +### Types of Information Being Logged + +#### Info Level (Progress & State Changes) + +- Component initialization +- Cluster topology discovery +- Sync cycle start/end +- Checkpoint updates +- Phase transitions +- BigQuery operations completion + +Example: + +```javascript +logger.info(`[SyncEngine.initialize] Sync started - initializing with ${tableConfig.id}`); +logger.info(`[SyncEngine.runSyncCycle] Received ${records.length} records from BigQuery`); +logger.info(`[SyncEngine.updatePhase] Phase transition: ${oldPhase} -> ${this.currentPhase}`); +``` + +#### Debug Level (Detailed Tracing) + +- Method entry/exit +- Parameter values +- Intermediate calculations +- Query execution details +- Retry attempt tracking + +Example: + +```javascript +logger.debug(`[SyncEngine.initialize] Discovering cluster topology`); +logger.debug(`[BigQueryClient.pullPartition] Query parameters - lastTimestamp: ${lastTimestamp}`); +``` + +#### Warn Level (Recoverable Issues) + +- Missing or skipped records +- Transient errors being retried +- Deprecated configurations +- Non-fatal validation failures + +Example: + +```javascript +logger.warn(`[SyncEngine.ingestRecords] Missing timestamp column '${timestampColumn}', skipping record`); +logger.warn(`[BigQueryClient.pullPartition] Transient error - retrying in ${delay}ms`); +``` + +#### Error Level (Failures Requiring Attention) + +- Unrecoverable failures +- Invalid data +- Missing configuration +- Operation failures + +Example: + +```javascript +logger.error(`[SyncEngine.runSyncCycle] Sync cycle error: ${error.message}`); +logger.error(`[SyncEngine.loadCheckpoint] Invalid timestamp: ${checkpoint.lastTimestamp}`); +``` + +--- + +## 5. Areas Missing Logging Instrumentation + +### High Priority (Should Definitely Have Logging) + +1. **`src/schema-manager.js`** + - Currently no logger calls + - Critical operations: table creation, schema migration + - Missing: migration planning, attribute additions, type conflicts + +2. **`src/operations-client.js`** + - Minimal logging for API interactions + - Missing: request details, response times, failure scenarios + +3. **`src/query-builder.js`** + - Complex query generation logic + - Missing: query parameter logging, validation steps + +4. **`src/type-converter.js`** & **`src/type-mapper.js`** + - Data transformation logic + - Missing: type conversion attempts, edge cases, failures + +5. **`src/config-loader.js`** + - Configuration loading and validation + - Missing: config file location, parsing steps, validation results + +6. **`src/index-strategy.js`** + - Index strategy selection for Harper tables + - Missing: strategy calculation logic, selected indexes + +### Medium Priority (Would Benefit from Logging) + +1. **`src/schema-leader-election.js`** + - Leader election logic + - Missing: election attempts, leader changes, conflicts + +2. **`src/validators.js`** + - Validation logic + - Missing: validation rule execution, results + +3. **Synthesizer components** (src/service.js, src/generator.js, src/bigquery.js) + - These are test/utility components + - Currently use console.log for CLI output + - Could migrate to logger for production use + +--- + +## 6. Multi-Threading Considerations + +### Current Architecture + +The codebase currently implements **distributed multi-node ingestion**, not true multi-threading: + +1. **Cluster Discovery** (`src/sync-engine.js`): + + ```javascript + const currentNodeId = [server.hostname, server.workerIndex].join('-'); + ``` + + - Uses `server.workerIndex` from HarperDB's worker context + - Multiple nodes discovered via `server.nodes` array + +2. **Deterministic Partitioning**: + - Modulo-based partition assignment using nodeId + - Each node gets its own deterministic partition + - No shared state conflicts between nodes + +3. **Local Checkpoints**: + - Each node maintains its own checkpoint + - Checkpoint ID: `{tableId}_{nodeId}` + - No coordination needed between nodes + +### Threading Analysis + +**Current usage of `server.workerIndex`**: + +- Combines with hostname for unique node identity +- Enables multiple workers on same host to have different IDs +- No actual worker threads are created in the code +- HarperDB manages worker distribution + +**Future Threading Consideration (Issue #9)**: +If multi-threaded ingestion is added in the future: + +- Singleton globals would remain thread-safe within Harper's concurrency model +- Per-thread checkpoints may be needed: `{tableId}_{nodeId}_{threadId}` +- Logger calls should remain as-is (HarperDB logger handles concurrency) +- Would need thread-local storage for per-thread state + +--- + +## 7. Singleton Pattern Necessity Assessment + +### Why Singleton IS Necessary + +1. **Shared State Persistence**: Multiple modules need access to the same engine instances +2. **Module Independence**: Avoids circular dependencies - any module can import globals +3. **Request Handling**: Persists data across multiple request handlers +4. **Configuration Sharing**: Single point of truth for application state + +### When Singleton Could Be Problematic + +1. **True Worker Threads**: If code runs in separate threads, each needs isolated state +2. **Testing**: Can cause state pollution between tests (can use before/after hooks) +3. **Multiple Application Instances**: Would share state incorrectly + +### Assessment for This Codebase + +**Verdict**: Singleton pattern is **appropriate** for: + +- Single Harper plugin instance context +- Cluster-distributed (not thread-distributed) workloads +- Current architecture with one engine per instance + +**Minor Enhancement Opportunity**: + +```javascript +class Globals { + constructor() { + if (Globals.instance) return Globals.instance; + this.data = {}; + this.version = '1.0.0'; + this.createdAt = new Date(); + Globals.instance = this; + } + + set(key, value) { + this.data[key] = value; + logger.debug(`[Globals] Set ${key} = ${typeof value}`); + } + + get(key) { + const value = this.data[key]; + if (!value) logger.warn(`[Globals] Key '${key}' not found`); + return value; + } +} +``` + +--- + +## 8. Files Requiring Console → Logger Migration + +### High Priority + +| File | console calls | Type | Priority | +| ------------------ | ------------- | ------------------ | -------- | +| `src/bigquery.js` | 23 | Legacy synthesizer | HIGH | +| `src/service.js` | 45 | Legacy synthesizer | HIGH | +| `src/generator.js` | 2 | Legacy synthesizer | HIGH | + +### Notes on Synthesizer Files + +These files (bigquery.js, service.js, generator.js) are the **legacy maritime data synthesizer** used for test data generation. They output to console for CLI feedback. + +**Migration Strategy**: + +1. Keep console output for backward compatibility with CLI +2. Add logger.\* calls for production deployment within Harper +3. Use environment variable to control output level + +Example pattern: + +```javascript +const ENV_LOGGING_MODE = process.env.LOGGING_MODE || 'cli'; + +function logProgress(message) { + if (ENV_LOGGING_MODE === 'cli') { + console.log(message); + } + logger.info(`[Synthesizer] ${message}`); +} +``` + +--- + +## 9. Grafana Integration Points + +### How Logger Enables Grafana Integration + +1. **Centralized Log Collection**: + - HarperDB logger writes to structured logs + - These can be exported to log aggregation (Loki, DataDog, etc.) + - Grafana reads from these sources + +2. **Structured Logging Benefits**: + - Method names in brackets `[ClassName.method]` for easy filtering + - Consistent log levels (INFO, DEBUG, WARN, ERROR) + - Timestamped entries with context + - Can create metrics and alerts + +3. **Monitoring Dashboards** (Issue #11 Goal): + + ```sql + -- Count ingestion errors + | filter [SyncEngine] and level="ERROR" + | stats count by tableId + + -- Track phase transitions + | filter "Phase transition" + | stats count by currentPhase + + -- Monitor checkpoint lag + | filter [SyncEngine.updatePhase] + | extract lag:number + | stats avg(lag) by nodeId + ``` + +### Required Logging for Grafana Monitoring + +To build effective dashboards (Issue #11), need to log: + +1. **Performance metrics**: Query times, batch sizes, throughput +2. **Health indicators**: Phase transitions, checkpoint progress, lag +3. **Error patterns**: Failure types, retry attempts, recovery actions +4. **Resource usage**: Records processed, memory operations, cleanup events +5. **Cluster state**: Node discovery, topology changes, partition distribution + +--- + +## 10. Recommendations Summary + +### Immediate Actions (Before Issue #11 Implementation) + +1. **Migrate synthesizer console logging** (3 files, 70 calls) + - Allows test data generation to work within Harper logging system + - Enables monitoring of test data pipeline + +2. **Add logging to schema components** + - schema-manager.js: ~20 strategic logging points + - operations-client.js: ~10 logging points + - Covers critical data transformation and API operations + +3. **Enhance globals.js with logging** + - Add debug logging for get/set operations + - Aids in troubleshooting state management issues + +### For Issue #11 Implementation (Monitoring & Observability) + +1. **Maintain current logging patterns** + - Already structured for Grafana consumption + - Consistent use of method names in brackets + - Clear log levels and messages + +2. **Add monitoring-specific logging** + - Query execution times: `logger.info(\`Query executed in ${duration}ms\`)` + - Record throughput: `logger.info(\`Processed ${batchSize} records in ${time}ms\`)` + - Lag tracking: `logger.info(\`Current lag: ${lag}s, phase: ${phase}\`)` + +3. **Create Grafana dashboard queries** + - Aggregate by log method names for per-component metrics + - Track phase transitions for health visualization + - Monitor lag trending for alerting + +### Code Quality + +- **No refactoring needed**: Current structure is sound +- **Logging coverage**: 75% of codebase uses logger (202/272 logging calls) +- **Singleton pattern**: Appropriate for current architecture + +--- + +## 11. Documentation References + +### HarperDB Official Documentation + +- **Globals Reference**: https://docs.harperdb.io/docs/technical-details/reference/globals +- **Debugging Applications**: https://docs.harperdb.io/docs/developers/applications/debugging +- **Standard Logging**: https://docs.harperdb.io/docs/administration/logging/logging + +### Available Logger Methods + +- `logger.trace()` - Most detailed diagnostic information +- `logger.debug()` - Debugging information +- `logger.info()` - General informational messages +- `logger.warn()` - Warning messages (recoverable issues) +- `logger.error()` - Error messages (unrecoverable issues) +- `logger.fatal()` - Fatal error messages +- `logger.notify()` - Special notification messages + +### Project Issue References + +- **#11** (THIS): Enhanced monitoring and observability with Grafana +- **#9**: Multi-threaded ingestion per node +- **#10**: Dynamic rebalancing for autoscaling + +--- + +## Conclusion + +The codebase is **well-structured for Grafana integration** via HarperDB's global logger: + +1. ✅ **Singleton globals pattern is appropriate** for this architecture +2. ✅ **High coverage of logger usage** (202 structured log calls) +3. ✅ **Consistent logging patterns** enable Grafana dashboard creation +4. ⚠️ **3 files still use console** (legacy synthesizer - can be migrated) +5. ⚠️ **Some modules lack instrumentation** (schema-manager, type-mapper) +6. 🎯 **Ready for Issue #11 implementation** with minor additions + +The main work for issue #11 will be creating the Grafana dashboards and alert configurations, not modifying the logging system itself. The groundwork is already in place. diff --git a/LOGGING_ANALYSIS_BY_FILE.md b/LOGGING_ANALYSIS_BY_FILE.md new file mode 100644 index 0000000..df1a62a --- /dev/null +++ b/LOGGING_ANALYSIS_BY_FILE.md @@ -0,0 +1,468 @@ +# Logging Analysis by File - HarperDB Globals System Research + +## Complete File-by-File Breakdown + +### CATEGORY 1: PRODUCTION CODE - USING LOGGER (Grafana-Ready) + +#### src/sync-engine.js (70+ logger calls) + +- **Type**: Core sync engine implementation +- **Status**: EXCELLENT - Comprehensive logging coverage +- **Logger Usage**: + - Constructor initialization + - Cluster discovery (multi-node coordination) + - Checkpoint management (persistence) + - Sync cycle orchestration + - Record ingestion and validation + - Phase transition tracking (initial → catchup → steady) + - Error handling with context +- **Log Levels Used**: info, debug, warn, error, trace +- **Key Information Logged**: + - Cluster topology and node IDs + - Phase transitions with lag values + - Record counts and throughput + - Checkpoint timestamps + - Validation failures with reasons +- **Monitoring Ready**: YES - Can create dashboards for: + - Phase transition rates + - Lag trending + - Error frequency per phase + - Ingest throughput + +--- + +#### src/bigquery-client.js (40+ logger calls) + +- **Type**: BigQuery API client with retry logic +- **Status**: EXCELLENT - Well-instrumented +- **Logger Usage**: + - Constructor and initialization + - Exponential backoff retry logic + - Query execution with performance timing + - Error categorization (retryable vs. fatal) + - Partition-aware queries + - Record verification +- **Log Levels Used**: info, debug, warn, error, trace +- **Key Information Logged**: + - Query parameters and SQL + - Attempt numbers and backoff delays + - Query execution time + - Transient vs. fatal errors + - Retry decisions with reasoning + - Row counts returned +- **Monitoring Ready**: YES - Can create dashboards for: + - Query latency histograms + - Retry frequency + - Success vs. failure rates + - Transient error patterns + +--- + +#### src/validation.js (30+ logger calls) + +- **Type**: Data validation and integrity checks +- **Status**: GOOD - Adequate logging +- **Logger Usage**: + - Constructor initialization + - Validation suite orchestration + - Checkpoint progress validation + - Smoke tests + - Spot checks on records + - Cluster discovery + - Audit logging +- **Log Levels Used**: info, debug, warn, error +- **Key Information Logged**: + - Validation check results + - Checkpoint status + - Record counts and lag + - Test pass/fail with reasons + - Overall health status +- **Monitoring Ready**: YES - Can track: + - Validation pass rates + - Health status transitions + - Checkpoint lag over time + - Test coverage metrics + +--- + +#### src/resources.js (15+ logger calls) + +- **Type**: Harper resource layer (GraphQL/REST endpoints) +- **Status**: GOOD - Adequate logging +- **Logger Usage**: + - Get operations on data tables + - Search operations + - Control endpoint status queries + - Control endpoint actions (start/stop/validate) +- **Log Levels Used**: info, debug +- **Key Information Logged**: + - Resource queries and results + - Record counts + - Operation names and parameters + - Status information +- **Monitoring Ready**: YES - Can track: + - API call frequency + - Endpoint usage patterns + - Operation success rates + +--- + +#### src/index.js (15+ logger calls) + +- **Type**: Plugin entry point and initialization +- **Status**: GOOD - Covers key lifecycle events +- **Logger Usage**: + - Schema manager initialization + - Sync engine creation + - Table configuration + - Validator setup +- **Log Levels Used**: info, warn, error +- **Key Information Logged**: + - Component initialization status + - Table configuration details + - Initialization failures and reasons + - Table count and IDs +- **Monitoring Ready**: PARTIAL - Can track: + - Initialization success rate + - Component availability + +--- + +#### src/operations-client.js (10+ logger calls) + +- **Type**: Harper Operations API client +- **Status**: FAIR - Minimal coverage +- **Logger Usage**: + - API calls (limited logging) + - Response handling +- **Log Levels Used**: info, error +- **Key Information Logged**: + - Operation names + - Error responses +- **Monitoring Ready**: POOR - Needs enhancement: + - Request details + - Response times + - Error categorization + +--- + +### CATEGORY 2: CORE CODE - MISSING LOGGING (Blind Spots) + +#### src/schema-manager.js (0 logger calls) ⚠️ CRITICAL + +- **Type**: Harper table schema creation and migration +- **Status**: POOR - No logging instrumentation +- **Critical Operations Without Visibility**: + - Table existence checking + - Schema introspection from BigQuery + - Migration planning and execution + - Attribute type mapping + - Type conflict detection + - Operations API calls + - Dynamic table creation +- **Impact**: Cannot monitor schema operations, failures, or performance +- **Recommendation**: Add 20+ logger calls for: + ```javascript + logger.debug('[SchemaManager.ensureTable] Starting table creation...'); + logger.info('[SchemaManager] Comparing schemas...'); + logger.debug('[SchemaManager] Found X attributes to add'); + logger.warn('[SchemaManager] Type conflict detected on field X'); + logger.error('[SchemaManager] Failed to create table: X'); + logger.info('[SchemaManager] Table migration completed'); + ``` + +--- + +#### src/config-loader.js (0 logger calls) ⚠️ IMPORTANT + +- **Type**: Configuration file loading and validation +- **Status**: POOR - No logging +- **Critical Operations Without Visibility**: + - Config file location resolution + - YAML parsing + - Configuration validation + - Defaults merging + - Legacy format conversion +- **Impact**: Cannot diagnose configuration problems +- **Recommendation**: Add 10+ logger calls for: + ```javascript + logger.info('[ConfigLoader] Loading config from: X'); + logger.debug('[ConfigLoader] Parsing YAML...'); + logger.warn('[ConfigLoader] Using default for X'); + logger.error('[ConfigLoader] Invalid config: X'); + ``` + +--- + +#### src/type-mapper.js (0 logger calls) ⚠️ IMPORTANT + +- **Type**: BigQuery ↔ Harper type conversion +- **Status**: POOR - No logging +- **Critical Operations Without Visibility**: + - Type mapping decisions + - Unsupported type handling + - Type conversion logic + - Schema building +- **Impact**: Cannot debug type conversion issues +- **Recommendation**: Add 10+ logger calls for: + ```javascript + logger.debug('[TypeMapper] Mapping BigQuery type: X to Harper type: Y'); + logger.warn('[TypeMapper] Unsupported type X, using fallback: Y'); + ``` + +--- + +#### src/type-converter.js (0 logger calls) ⚠️ IMPORTANT + +- **Type**: Runtime type conversion from BigQuery to JavaScript +- **Status**: POOR - No logging +- **Critical Operations Without Visibility**: + - Type conversion attempts + - Conversion failures + - Data validation + - Edge case handling +- **Impact**: Cannot debug data conversion failures +- **Recommendation**: Add 10+ logger calls for: + ```javascript + logger.trace('[TypeConverter] Converting value: X (type: Y)'); + logger.warn('[TypeConverter] Conversion failed for: X'); + ``` + +--- + +#### src/query-builder.js (0 logger calls) ⚠️ IMPORTANT + +- **Type**: BigQuery SQL query generation +- **Status**: POOR - No logging +- **Critical Operations Without Visibility**: + - Query building steps + - Parameter binding + - WHERE clause construction + - Join operations +- **Impact**: Cannot debug query construction issues +- **Recommendation**: Add 5+ logger calls for: + ```javascript + logger.trace('[QueryBuilder] Generated SQL: X'); + logger.debug('[QueryBuilder] Query parameters: X'); + ``` + +--- + +#### src/index-strategy.js (0 logger calls) ⚠️ MINOR + +- **Type**: Harper index strategy selection +- **Status**: POOR - No logging +- **Operations Without Visibility**: + - Strategy selection logic + - Index recommendation logic +- **Recommendation**: Add 5+ logger calls + +--- + +#### src/validators.js (0 logger calls) ⚠️ MINOR + +- **Type**: Validation rule definitions +- **Status**: POOR - No logging +- **Recommendation**: Add 5-10 logger calls + +--- + +### CATEGORY 3: UTILITY/LEGACY CODE - USING CONSOLE (Not Grafana-Ready) + +#### src/bigquery.js (23 console.log/error calls) ⚠️ NEEDS MIGRATION + +- **Type**: Legacy maritime data synthesizer - BigQuery client +- **Status**: NEEDS MIGRATION - Uses console instead of logger +- **Console Usage**: + - Dataset creation/checking + - Table creation/checking + - Batch insertion feedback + - Retry feedback + - Cleanup feedback + - Error reporting +- **Problem**: Console output not captured by Harper logging system +- **Files Affected**: Synthesizer (test data generation) +- **Migration Impact**: Low (test utility code) +- **Recommendation**: + ```javascript + // Keep console for CLI backward compatibility + if (process.env.LOGGING_MODE === 'cli') console.log(message); + // Add to logger for Harper deployment + logger.info(`[MaritimeBigQueryClient] ${message}`); + ``` + +--- + +#### src/service.js (45 console.log/error calls) ⚠️ NEEDS MIGRATION + +- **Type**: Legacy maritime data synthesizer - Service orchestrator +- **Status**: NEEDS MIGRATION - Heavy console usage +- **Console Usage**: + - Initialization feedback + - Progress messages + - Batch generation feedback + - Time estimates + - Completion messages + - Error reporting +- **Problem**: Console output not captured for monitoring +- **Recommendation**: Migrate to dual logging: + ```javascript + // Keep for CLI + console.log(`Loading ${days} days...`); + // Add to logger + logger.info(`[MaritimeDataSynthesizer] Loading ${days} days...`); + ``` + +--- + +#### src/generator.js (2 console.log calls) ⚠️ MINOR MIGRATION + +- **Type**: Legacy maritime data synthesizer - Generator +- **Status**: MINIMAL - Only 2 console calls +- **Console Usage**: + - Vessel pool initialization + - Journey cleanup +- **Impact**: Minimal, but should be consistent +- **Recommendation**: Migrate for consistency + +--- + +### CATEGORY 4: COMPONENTS - PARTIAL/MINIMAL LOGGING + +#### src/schema-leader-election.js (MINIMAL) + +- **Status**: FAIR - Minimal logging coverage +- **Missing**: Leader election attempts, conflicts, state changes +- **Recommendation**: Add 10+ logger calls for election logic + +--- + +### LOGGING STATISTICS SUMMARY + +| Category | Files | Logger Calls | Console Calls | Status | +| ------------------ | ------ | ------------ | ------------- | ---------------- | +| Production Core | 7 | 185 | 0 | EXCELLENT | +| Missing Logging | 7 | 0 | 0 | POOR | +| Legacy Synthesizer | 3 | 0 | 70 | NEEDS MIGRATION | +| Partial/Other | 2 | ~17 | 0 | FAIR | +| **TOTALS** | **19** | **202** | **70** | **74% COVERAGE** | + +--- + +## Migration Priority Matrix + +### Priority 1: CRITICAL (Block Issue #11) + +- [ ] src/schema-manager.js - Add 20 logging points +- [ ] src/bigquery.js - Migrate 23 console calls +- [ ] src/service.js - Migrate 45 console calls +- [ ] src/generator.js - Migrate 2 console calls + +**Effort**: 1-2 days | **Impact**: Enables complete monitoring + +### Priority 2: IMPORTANT (Improve Visibility) + +- [ ] src/config-loader.js - Add 10 logging points +- [ ] src/type-mapper.js - Add 10 logging points +- [ ] src/type-converter.js - Add 10 logging points +- [ ] src/query-builder.js - Add 5 logging points + +**Effort**: 2-3 days | **Impact**: Better debugging + +### Priority 3: NICE-TO-HAVE (Polish) + +- [ ] src/index-strategy.js - Add 5 logging points +- [ ] src/validators.js - Add 5 logging points +- [ ] src/schema-leader-election.js - Add 10 logging points + +**Effort**: 1 day | **Impact**: Complete coverage + +--- + +## Code Examples for Migration + +### Pattern 1: Simple Info Logging + +```javascript +// BEFORE (console) +console.log(`Dataset ${this.datasetId} created`); + +// AFTER (logger + optional console) +if (process.env.LOGGING_MODE === 'cli') { + console.log(`Dataset ${this.datasetId} created`); +} +logger.info(`[MaritimeBigQueryClient] Dataset ${this.datasetId} created`); +``` + +### Pattern 2: Error Logging + +```javascript +// BEFORE (console) +console.error('Error loading data:', error); + +// AFTER +logger.error(`[MaritimeDataSynthesizer] Error loading data: ${error.message}`, error); +``` + +### Pattern 3: Progress Tracking + +```javascript +// BEFORE (console) +console.log(`Loaded ${recordsInserted} records in ${totalTime} minutes`); + +// AFTER +logger.info(`[MaritimeDataSynthesizer] Loaded ${recordsInserted} records in ${totalTime} minutes`); +``` + +### Pattern 4: Missing Component Logging + +```javascript +// NEW: Add to schema-manager.js +logger.info('[SchemaManager] Ensuring table exists...'); +logger.debug(`[SchemaManager] Checking if Harper table '${tableName}' exists`); +logger.info('[SchemaManager] Building BigQuery schema...'); +const migration = this.determineMigrationNeeds(harperSchema, bigQuerySchema); +if (migration.action === 'create') { + logger.info(`[SchemaManager] Creating new table with ${Object.keys(migration.attributesToAdd).length} attributes`); +} else if (migration.action === 'migrate') { + logger.info(`[SchemaManager] Migrating table - adding ${Object.keys(migration.attributesToAdd).length} attributes`); +} +``` + +--- + +## Verification Checklist + +After migration: + +- [ ] All 70 console calls converted to logger calls +- [ ] Dual logging in place (console + logger where appropriate) +- [ ] Schema-manager.js has 20+ logging points +- [ ] No console.log/error calls remain in production code +- [ ] All logger calls follow `[ClassName.method]` bracket pattern +- [ ] Log levels appropriate (info/debug/warn/error) +- [ ] Existing logger calls remain unchanged +- [ ] Tests pass with new logging +- [ ] Can create Grafana filters on bracketed names + +--- + +## Next Steps for Issue #11 + +Once logging migration complete: + +1. **Extract Logger Messages**: Parse logs to create metrics +2. **Build Grafana Dashboards**: + - Sync health dashboard (phase, lag, error rate) + - Throughput dashboard (records/sec per table) + - Error dashboard (by type, by table) + - Performance dashboard (query times, retry counts) +3. **Create Alert Rules**: + - High error rate + - Lag exceeding threshold + - Phase stuck in initial +4. **Document Observability**: + - Grafana dashboard JSON files + - Alert configuration + - Log query examples + - Troubleshooting guide diff --git a/src/config-loader.js b/src/config-loader.js index 6ae512b..f98de78 100644 --- a/src/config-loader.js +++ b/src/config-loader.js @@ -21,36 +21,47 @@ const __dirname = dirname(__filename); export function loadConfig(configPath = null) { try { let config; + let source; // Handle different input types if (configPath === null || configPath === undefined) { // Default to config.yaml in project root const path = join(__dirname, '..', 'config.yaml'); + logger.debug(`[ConfigLoader.loadConfig] Loading config from default path: ${path}`); const fileContent = readFileSync(path, 'utf8'); config = parse(fileContent); + source = path; } else if (typeof configPath === 'string') { // Path to config file + logger.debug(`[ConfigLoader.loadConfig] Loading config from: ${configPath}`); const fileContent = readFileSync(configPath, 'utf8'); config = parse(fileContent); + source = configPath; } else if (typeof configPath === 'object') { // Config object passed directly (for testing) + logger.debug('[ConfigLoader.loadConfig] Using config object passed directly'); // Check if it's an options object with 'config' property if (configPath.config) { config = configPath.config; } else { config = configPath; } + source = 'object'; } else { throw new Error('configPath must be a string, object, or null'); } if (!config) { + logger.error('[ConfigLoader.loadConfig] Failed to parse configuration'); throw new Error('Failed to parse configuration'); } + logger.info(`[ConfigLoader.loadConfig] Successfully loaded config from: ${source}`); + // Normalize to multi-table format if needed return normalizeConfig(config); } catch (error) { + logger.error(`[ConfigLoader.loadConfig] Configuration loading failed: ${error.message}`); throw new Error(`Failed to load configuration: ${error.message}`); } } @@ -64,17 +75,22 @@ export function loadConfig(configPath = null) { */ function normalizeConfig(config) { if (!config.bigquery) { + logger.error('[ConfigLoader.normalizeConfig] bigquery section missing in configuration'); throw new Error('bigquery section missing in configuration'); } // Check if already in multi-table format if (config.bigquery.tables && Array.isArray(config.bigquery.tables)) { + logger.info( + `[ConfigLoader.normalizeConfig] Config already in multi-table format with ${config.bigquery.tables.length} tables` + ); // Validate multi-table configuration validateMultiTableConfig(config); return config; } // Legacy single-table format - wrap in tables array + logger.info('[ConfigLoader.normalizeConfig] Converting legacy single-table config to multi-table format'); const legacyBigQueryConfig = config.bigquery; // Extract table-specific config @@ -92,6 +108,10 @@ function normalizeConfig(config) { }, }; + logger.debug( + `[ConfigLoader.normalizeConfig] Created table config: ${tableConfig.dataset}.${tableConfig.table} -> ${tableConfig.targetTable}` + ); + // Create normalized multi-table config const normalizedConfig = { operations: config.operations, // Preserve operations config if present @@ -108,6 +128,7 @@ function normalizeConfig(config) { }, }; + logger.info('[ConfigLoader.normalizeConfig] Successfully normalized config to multi-table format'); return normalizedConfig; } @@ -118,11 +139,15 @@ function normalizeConfig(config) { * @private */ function validateMultiTableConfig(config) { + logger.debug('[ConfigLoader.validateMultiTableConfig] Validating multi-table configuration'); + if (!config.bigquery.tables || !Array.isArray(config.bigquery.tables)) { + logger.error('[ConfigLoader.validateMultiTableConfig] bigquery.tables must be an array'); throw new Error('bigquery.tables must be an array'); } if (config.bigquery.tables.length === 0) { + logger.error('[ConfigLoader.validateMultiTableConfig] bigquery.tables array cannot be empty'); throw new Error('bigquery.tables array cannot be empty'); } @@ -132,29 +157,38 @@ function validateMultiTableConfig(config) { for (const table of config.bigquery.tables) { // Check required fields if (!table.id) { + logger.error('[ConfigLoader.validateMultiTableConfig] Missing required field: table.id'); throw new Error('Missing required field: table.id'); } if (!table.dataset) { + logger.error(`[ConfigLoader.validateMultiTableConfig] Missing 'dataset' for table: ${table.id}`); throw new Error(`Missing required field 'dataset' for table: ${table.id}`); } if (!table.table) { + logger.error(`[ConfigLoader.validateMultiTableConfig] Missing 'table' for table: ${table.id}`); throw new Error(`Missing required field 'table' for table: ${table.id}`); } if (!table.timestampColumn) { + logger.error(`[ConfigLoader.validateMultiTableConfig] Missing 'timestampColumn' for table: ${table.id}`); throw new Error(`Missing required field 'timestampColumn' for table: ${table.id}`); } if (!table.targetTable) { + logger.error(`[ConfigLoader.validateMultiTableConfig] Missing 'targetTable' for table: ${table.id}`); throw new Error(`Missing required field 'targetTable' for table: ${table.id}`); } // Check for duplicate IDs if (tableIds.has(table.id)) { + logger.error(`[ConfigLoader.validateMultiTableConfig] Duplicate table ID: ${table.id}`); throw new Error(`Duplicate table ID: ${table.id}`); } tableIds.add(table.id); // Check for duplicate target Harper tables if (targetTables.has(table.targetTable)) { + logger.error( + `[ConfigLoader.validateMultiTableConfig] Duplicate targetTable '${table.targetTable}' for: ${table.id}` + ); throw new Error( `Duplicate targetTable '${table.targetTable}' for table: ${table.id}. ` + `Each BigQuery table must sync to a DIFFERENT Harper table. ` + @@ -164,7 +198,15 @@ function validateMultiTableConfig(config) { ); } targetTables.add(table.targetTable); + + logger.debug( + `[ConfigLoader.validateMultiTableConfig] Validated table: ${table.id} (${table.dataset}.${table.table} -> ${table.targetTable})` + ); } + + logger.info( + `[ConfigLoader.validateMultiTableConfig] Successfully validated ${config.bigquery.tables.length} table configurations` + ); } /** diff --git a/src/globals.js b/src/globals.js index 2cd748b..4ae79d8 100644 --- a/src/globals.js +++ b/src/globals.js @@ -7,10 +7,17 @@ class Globals { Globals.instance = this; } set(key, value) { + logger.debug(`[Globals.set] Setting '${key}' = ${JSON.stringify(value)}`); this.data[key] = value; } get(key) { - return this.data[key]; + const value = this.data[key]; + if (value === undefined) { + logger.debug(`[Globals.get] Key '${key}' not found`); + } else { + logger.debug(`[Globals.get] Retrieved '${key}' = ${JSON.stringify(value)}`); + } + return value; } } diff --git a/src/query-builder.js b/src/query-builder.js index 2914876..9714589 100644 --- a/src/query-builder.js +++ b/src/query-builder.js @@ -13,20 +13,25 @@ */ export function formatColumnList(columns) { if (!Array.isArray(columns)) { + logger.error('[formatColumnList] Invalid input: columns must be an array'); throw new Error('columns must be an array'); } if (columns.length === 0) { + logger.error('[formatColumnList] Invalid input: columns array cannot be empty'); throw new Error('columns array cannot be empty'); } // Special case: ['*'] means SELECT * if (columns.length === 1 && columns[0] === '*') { + logger.debug('[formatColumnList] Using wildcard SELECT *'); return '*'; } // Format as comma-separated list with proper spacing - return columns.join(', '); + const formatted = columns.join(', '); + logger.debug(`[formatColumnList] Formatted ${columns.length} columns: ${formatted}`); + return formatted; } /** @@ -41,16 +46,24 @@ export function formatColumnList(columns) { */ export function buildPullPartitionQuery({ dataset, table, timestampColumn, columns }) { if (!dataset || !table || !timestampColumn) { + logger.error( + '[buildPullPartitionQuery] Missing required parameters: dataset, table, and timestampColumn are required' + ); throw new Error('dataset, table, and timestampColumn are required'); } if (!columns || !Array.isArray(columns)) { + logger.error('[buildPullPartitionQuery] Invalid columns parameter: must be a non-empty array'); throw new Error('columns must be a non-empty array'); } + logger.info( + `[buildPullPartitionQuery] Building pull query for ${dataset}.${table} with ${columns.length === 1 && columns[0] === '*' ? 'all columns' : `${columns.length} columns`}` + ); + const columnList = formatColumnList(columns); - return ` + const query = ` SELECT ${columnList} FROM \`${dataset}.${table}\` WHERE @@ -64,6 +77,9 @@ export function buildPullPartitionQuery({ dataset, table, timestampColumn, colum ORDER BY ${timestampColumn} ASC LIMIT CAST(@batchSize AS INT64) `; + + logger.debug('[buildPullPartitionQuery] Query construction complete'); + return query; } /** @@ -126,6 +142,7 @@ export class QueryBuilder { */ constructor({ dataset, table, timestampColumn, columns = ['*'] }) { if (!dataset || !table || !timestampColumn) { + logger.error('[QueryBuilder] Missing required parameters: dataset, table, and timestampColumn are required'); throw new Error('dataset, table, and timestampColumn are required'); } @@ -133,6 +150,10 @@ export class QueryBuilder { this.table = table; this.timestampColumn = timestampColumn; this.columns = columns; + + logger.info( + `[QueryBuilder] Initialized for ${dataset}.${table} with timestamp column '${timestampColumn}' and ${columns.length === 1 && columns[0] === '*' ? 'all columns' : `${columns.length} columns`}` + ); } /** diff --git a/src/schema-manager.js b/src/schema-manager.js index b131f2b..a318835 100644 --- a/src/schema-manager.js +++ b/src/schema-manager.js @@ -33,6 +33,9 @@ export class SchemaManager { timestampColumn: options.config.bigquery.timestampColumn, }); this.operationsClient = new OperationsClient(options.config); + + logger.info('[SchemaManager] Initialized with BigQuery client and operations client'); + logger.debug(`[SchemaManager] Timestamp column configured: ${options.config.bigquery.timestampColumn}`); } /** @@ -52,11 +55,17 @@ export class SchemaManager { * @returns {Object} Migration plan */ determineMigrationNeeds(harperSchema, bigQuerySchema) { + logger.debug('[SchemaManager.determineMigrationNeeds] Analyzing schema differences'); + // Build target attributes from BigQuery schema const targetAttributes = this.typeMapper.buildTableAttributes(bigQuerySchema); + logger.debug( + `[SchemaManager.determineMigrationNeeds] Target schema has ${Object.keys(targetAttributes).length} attributes` + ); // If table doesn't exist, create it if (!harperSchema) { + logger.info('[SchemaManager.determineMigrationNeeds] Table does not exist - will create'); return { action: 'create', attributesToAdd: targetAttributes, @@ -66,10 +75,14 @@ export class SchemaManager { // Find attributes that need to be added const attributesToAdd = {}; const existingAttrs = harperSchema.attributes || {}; + logger.debug( + `[SchemaManager.determineMigrationNeeds] Existing schema has ${Object.keys(existingAttrs).length} attributes` + ); for (const [name, targetAttr] of Object.entries(targetAttributes)) { if (!existingAttrs[name]) { // New attribute + logger.debug(`[SchemaManager.determineMigrationNeeds] New attribute detected: ${name} (${targetAttr.type})`); attributesToAdd[name] = targetAttr; } else { // Check for type changes @@ -77,6 +90,9 @@ export class SchemaManager { if (!this.compareTypes(existingAttr.type, targetAttr.type)) { // Type changed - create versioned column const versionedName = `${name}_v2`; + logger.warn( + `[SchemaManager.determineMigrationNeeds] Type conflict on '${name}': ${existingAttr.type} -> ${targetAttr.type}, creating versioned column ${versionedName}` + ); attributesToAdd[versionedName] = targetAttr; } } @@ -84,12 +100,16 @@ export class SchemaManager { // Determine action if (Object.keys(attributesToAdd).length === 0) { + logger.info('[SchemaManager.determineMigrationNeeds] No schema changes needed'); return { action: 'none', attributesToAdd: {}, }; } + logger.info( + `[SchemaManager.determineMigrationNeeds] Migration needed - adding ${Object.keys(attributesToAdd).length} attributes` + ); return { action: 'migrate', attributesToAdd, @@ -109,36 +129,54 @@ export class SchemaManager { * will be stored and indexed automatically without pre-definition. */ async ensureTable(harperTableName, bigQueryDataset, bigQueryTable, _timestampColumn) { - // 1. Check if Harper table exists - const harperSchema = await this.operationsClient.describeTable(harperTableName); + logger.info( + `[SchemaManager.ensureTable] Ensuring table '${harperTableName}' for BigQuery ${bigQueryDataset}.${bigQueryTable}` + ); + + try { + // 1. Check if Harper table exists + logger.debug(`[SchemaManager.ensureTable] Checking if table '${harperTableName}' exists`); + const harperSchema = await this.operationsClient.describeTable(harperTableName); + + if (harperSchema) { + // Table exists - Harper handles schema evolution automatically + logger.info(`[SchemaManager.ensureTable] Table '${harperTableName}' already exists - no action needed`); + return { + action: 'none', + table: harperTableName, + message: 'Table exists - Harper will handle any new fields automatically during insert', + }; + } - if (harperSchema) { - // Table exists - Harper handles schema evolution automatically - return { - action: 'none', - table: harperTableName, - message: 'Table exists - Harper will handle any new fields automatically during insert', - }; - } + // 2. Get BigQuery schema for documentation + logger.debug(`[SchemaManager.ensureTable] Fetching BigQuery schema from ${bigQueryDataset}.${bigQueryTable}`); + const bqTable = this.bigQueryClient.client.dataset(bigQueryDataset).table(bigQueryTable); + const [metadata] = await bqTable.getMetadata(); + const bigQuerySchema = metadata.schema; - // 2. Get BigQuery schema for documentation - const bqTable = this.bigQueryClient.client.dataset(bigQueryDataset).table(bigQueryTable); - const [metadata] = await bqTable.getMetadata(); - const bigQuerySchema = metadata.schema; + // Build expected attributes for documentation + const expectedAttributes = this.typeMapper.buildTableAttributes(bigQuerySchema); + logger.debug(`[SchemaManager.ensureTable] BigQuery schema has ${Object.keys(expectedAttributes).length} fields`); - // Build expected attributes for documentation - const expectedAttributes = this.typeMapper.buildTableAttributes(bigQuerySchema); + // 3. Create table with minimal schema (just primary key) + // Harper will auto-index all fields inserted later + logger.info(`[SchemaManager.ensureTable] Creating table '${harperTableName}' with id as hash attribute`); + await this.operationsClient.createTable(harperTableName, 'id'); - // 3. Create table with minimal schema (just primary key) - // Harper will auto-index all fields inserted later - await this.operationsClient.createTable(harperTableName, 'id'); + logger.info( + `[SchemaManager.ensureTable] Successfully created table '${harperTableName}' - fields will be indexed on insert` + ); - return { - action: 'created', - table: harperTableName, - hashAttribute: 'id', - expectedFields: Object.keys(expectedAttributes), - message: 'Table created - all BigQuery fields will be automatically indexed on insert', - }; + return { + action: 'created', + table: harperTableName, + hashAttribute: 'id', + expectedFields: Object.keys(expectedAttributes), + message: 'Table created - all BigQuery fields will be automatically indexed on insert', + }; + } catch (error) { + logger.error(`[SchemaManager.ensureTable] Failed to ensure table '${harperTableName}': ${error.message}`); + throw error; + } } } diff --git a/src/type-converter.js b/src/type-converter.js index f29ae4f..19192b9 100644 --- a/src/type-converter.js +++ b/src/type-converter.js @@ -32,18 +32,25 @@ function looksLikeISODate(str) { * @returns {Date|*} Date object if conversion succeeds, original value otherwise */ export function convertBigQueryTimestamp(value) { + logger.debug(`[convertBigQueryTimestamp] Converting BigQuery timestamp (constructor: ${value.constructor?.name})`); + // Try .value property (contains ISO string) if (value.value) { - return new Date(value.value); + const date = new Date(value.value); + logger.debug(`[convertBigQueryTimestamp] Converted via .value property: ${date.toISOString()}`); + return date; } // Try .toJSON() method if (typeof value.toJSON === 'function') { const jsonValue = value.toJSON(); - return new Date(jsonValue); + const date = new Date(jsonValue); + logger.debug(`[convertBigQueryTimestamp] Converted via .toJSON(): ${date.toISOString()}`); + return date; } // Unable to convert + logger.warn('[convertBigQueryTimestamp] Unable to convert timestamp, returning original value'); return value; } @@ -54,8 +61,10 @@ export function convertBigQueryTimestamp(value) { */ export function convertBigInt(value) { if (value <= Number.MAX_SAFE_INTEGER && value >= Number.MIN_SAFE_INTEGER) { + logger.debug(`[convertBigInt] Converting BigInt ${value} to Number (within safe range)`); return Number(value); } + logger.warn(`[convertBigInt] BigInt ${value} exceeds safe integer range, converting to String`); return value.toString(); } @@ -67,11 +76,13 @@ export function convertBigInt(value) { export function convertValue(value) { // Handle null/undefined if (value === null || value === undefined) { + logger.debug(`[convertValue] Value is ${value}, no conversion needed`); return value; } // Handle BigInt if (typeof value === 'bigint') { + logger.debug('[convertValue] Detected BigInt value, converting'); return convertBigInt(value); } @@ -79,11 +90,13 @@ export function convertValue(value) { if (typeof value === 'object') { // BigQuery timestamp types if (isBigQueryTimestamp(value)) { + logger.debug('[convertValue] Detected BigQuery timestamp, converting'); return convertBigQueryTimestamp(value); } // Already a Date object if (value instanceof Date) { + logger.debug('[convertValue] Value is already a Date object'); return value; } @@ -93,17 +106,21 @@ export function convertValue(value) { // If it looks like an ISO date, convert to Date if (looksLikeISODate(jsonValue)) { + logger.debug('[convertValue] Object.toJSON() returned ISO date string, converting to Date'); return new Date(jsonValue); } + logger.debug('[convertValue] Object.toJSON() returned non-date value'); return jsonValue; } // Other objects - keep as-is + logger.debug('[convertValue] Object has no special handling, keeping as-is'); return value; } // Primitive types - keep as-is + logger.debug(`[convertValue] Primitive value (${typeof value}), no conversion needed`); return value; } @@ -115,15 +132,19 @@ export function convertValue(value) { */ export function convertBigQueryTypes(record) { if (!record || typeof record !== 'object') { + logger.error('[convertBigQueryTypes] Invalid input: record must be an object'); throw new Error('Record must be an object'); } + logger.debug(`[convertBigQueryTypes] Converting record with ${Object.keys(record).length} fields`); + const converted = {}; for (const [key, value] of Object.entries(record)) { converted[key] = convertValue(value); } + logger.debug('[convertBigQueryTypes] Record conversion complete'); return converted; } @@ -134,10 +155,16 @@ export function convertBigQueryTypes(record) { */ export function convertBigQueryRecords(records) { if (!Array.isArray(records)) { + logger.error('[convertBigQueryRecords] Invalid input: records must be an array'); throw new Error('Records must be an array'); } - return records.map((record) => convertBigQueryTypes(record)); + logger.info(`[convertBigQueryRecords] Converting ${records.length} records`); + + const converted = records.map((record) => convertBigQueryTypes(record)); + + logger.info('[convertBigQueryRecords] Batch conversion complete'); + return converted; } export default { diff --git a/src/type-mapper.js b/src/type-mapper.js index ec8f078..55c5de4 100644 --- a/src/type-mapper.js +++ b/src/type-mapper.js @@ -43,7 +43,15 @@ export class TypeMapper { }; const normalized = bigQueryType.toUpperCase(); - return typeMap[normalized] || 'String'; + const harperType = typeMap[normalized]; + + if (harperType) { + logger.debug(`[TypeMapper.mapScalarType] Mapped ${bigQueryType} -> ${harperType}`); + } else { + logger.warn(`[TypeMapper.mapScalarType] Unsupported type '${bigQueryType}', defaulting to String`); + } + + return harperType || 'String'; } /** @@ -55,15 +63,25 @@ export class TypeMapper { * @returns {Object} Harper field definition */ mapField(field) { + logger.debug( + `[TypeMapper.mapField] Mapping field '${field.name}' (type: ${field.type}, mode: ${field.mode || 'NULLABLE'})` + ); + const mode = field.mode || 'NULLABLE'; const harperType = this.mapScalarType(field.type); - return { + const result = { name: field.name, type: harperType, required: mode === 'REQUIRED', isArray: mode === 'REPEATED', }; + + logger.debug( + `[TypeMapper.mapField] Field '${field.name}' mapped to Harper type '${harperType}'${result.isArray ? '[]' : ''}, required: ${result.required}` + ); + + return result; } /** @@ -73,6 +91,10 @@ export class TypeMapper { * @returns {Object} Harper attributes object for Operations API */ buildTableAttributes(schema) { + logger.info( + `[TypeMapper.buildTableAttributes] Building table attributes from ${schema.fields.length} BigQuery fields` + ); + const attributes = {}; for (const field of schema.fields) { @@ -83,8 +105,14 @@ export class TypeMapper { type, required: mapped.required, }; + + logger.debug( + `[TypeMapper.buildTableAttributes] Added attribute '${mapped.name}': type=${type}, required=${mapped.required}` + ); } + logger.info(`[TypeMapper.buildTableAttributes] Built ${Object.keys(attributes).length} Harper attributes`); + return attributes; } } diff --git a/test/config-loader.test.js b/test/config-loader.test.js index 3cc3f8b..39a04c0 100644 --- a/test/config-loader.test.js +++ b/test/config-loader.test.js @@ -2,11 +2,29 @@ * Tests for config-loader.js */ -import { describe, it } from 'node:test'; +import { describe, it, before, after } from 'node:test'; import assert from 'node:assert'; import { getSynthesizerConfig, getPluginConfig } from '../src/config-loader.js'; +// Mock logger global that Harper provides at runtime +const mockLogger = { + info: () => {}, + debug: () => {}, + trace: () => {}, + warn: () => {}, + error: () => {}, +}; + describe('Config Loader', () => { + before(() => { + // Set up global logger mock + global.logger = mockLogger; + }); + + after(() => { + // Clean up global logger mock + delete global.logger; + }); describe('getSynthesizerConfig', () => { it('should use bigquery config as defaults', () => { const mockConfig = { diff --git a/test/query-builder.test.js b/test/query-builder.test.js index 797f609..07ec1b0 100644 --- a/test/query-builder.test.js +++ b/test/query-builder.test.js @@ -2,7 +2,7 @@ * Tests for query-builder.js */ -import { describe, it } from 'node:test'; +import { describe, it, before, after } from 'node:test'; import assert from 'node:assert'; import { formatColumnList, @@ -12,7 +12,26 @@ import { QueryBuilder, } from '../src/query-builder.js'; +// Mock logger global that Harper provides at runtime +const mockLogger = { + info: () => {}, + debug: () => {}, + trace: () => {}, + warn: () => {}, + error: () => {}, +}; + describe('Query Builder', () => { + before(() => { + // Set up global logger mock + global.logger = mockLogger; + }); + + after(() => { + // Clean up global logger mock + delete global.logger; + }); + describe('formatColumnList', () => { it('should format single wildcard as *', () => { const result = formatColumnList(['*']); diff --git a/test/schema-manager.test.js b/test/schema-manager.test.js index 11632d2..b80bbde 100644 --- a/test/schema-manager.test.js +++ b/test/schema-manager.test.js @@ -4,11 +4,30 @@ * Tests schema management and table creation logic */ -import { describe, it } from 'node:test'; +import { describe, it, before, after } from 'node:test'; import assert from 'node:assert'; import { SchemaManager } from '../src/schema-manager.js'; +// Mock logger global that Harper provides at runtime +const mockLogger = { + info: () => {}, + debug: () => {}, + trace: () => {}, + warn: () => {}, + error: () => {}, +}; + describe('SchemaManager', () => { + before(() => { + // Set up global logger mock + global.logger = mockLogger; + }); + + after(() => { + // Clean up global logger mock + delete global.logger; + }); + describe('constructor', () => { it('should initialize with required dependencies', () => { const mockBigQueryClient = {}; diff --git a/test/type-converter.test.js b/test/type-converter.test.js index dd1d889..16397e6 100644 --- a/test/type-converter.test.js +++ b/test/type-converter.test.js @@ -2,7 +2,7 @@ * Tests for type-converter.js */ -import { describe, it } from 'node:test'; +import { describe, it, before, after } from 'node:test'; import assert from 'node:assert'; import { convertBigInt, @@ -12,7 +12,26 @@ import { convertBigQueryRecords, } from '../src/type-converter.js'; +// Mock logger global that Harper provides at runtime +const mockLogger = { + info: () => {}, + debug: () => {}, + trace: () => {}, + warn: () => {}, + error: () => {}, +}; + describe('Type Converter', () => { + before(() => { + // Set up global logger mock + global.logger = mockLogger; + }); + + after(() => { + // Clean up global logger mock + delete global.logger; + }); + describe('convertBigInt', () => { it('should convert small BigInt to Number', () => { const result = convertBigInt(BigInt(12345)); diff --git a/test/type-mapper.test.js b/test/type-mapper.test.js index dc7b83d..53d920c 100644 --- a/test/type-mapper.test.js +++ b/test/type-mapper.test.js @@ -4,11 +4,30 @@ * Tests BigQuery to Harper type mapping */ -import { describe, it } from 'node:test'; +import { describe, it, before, after } from 'node:test'; import assert from 'node:assert'; import { TypeMapper } from '../src/type-mapper.js'; +// Mock logger global that Harper provides at runtime +const mockLogger = { + info: () => {}, + debug: () => {}, + trace: () => {}, + warn: () => {}, + error: () => {}, +}; + describe('TypeMapper', () => { + before(() => { + // Set up global logger mock + global.logger = mockLogger; + }); + + after(() => { + // Clean up global logger mock + delete global.logger; + }); + describe('mapScalarType', () => { it('should map INTEGER to Int', () => { const mapper = new TypeMapper();