Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -33,5 +33,6 @@ service-account-key.json
test-*.js
!test/**/*.test.js

# Internal docs (not for publication)
# Historical development artifacts (not for publication)
docs/internal/
docs/plans/
38 changes: 37 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- None yet

### Changed

- None yet

### Fixed

- None yet

## [2.0.0] - 2025-12-15

### Added

- **Multi-table support** - Sync multiple BigQuery tables simultaneously with independent settings
- **Column selection** - Reduce costs by fetching only needed columns from BigQuery
- **Per-table configuration** - Independent batch sizes, sync intervals, and strategies per table
- **Exponential backoff retry logic** - Smart retry with jitter for transient BigQuery errors
- **Comprehensive logging** - Structured logging throughout codebase for Grafana observability
- **Optional streaming insert API** - Configurable streaming inserts for production deployments
- **Multi-table validation** - Independent validation and monitoring per table
- **Multi-table maritime synthesizer** - Generate realistic data for multiple related tables
- Rolling window mode for automatic data window maintenance
- `clear` command to truncate table without deleting schema
- Automatic backfill on service start
Expand All @@ -30,15 +52,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- `start` command now auto-backfills by default (rolling window mode)
- Documentation reorganized into logical sections
- Test files moved to examples/ directory
- Improved error messages and retry handling
- Enhanced BigQuery client with smart retry logic
- Better organization of codebase (src/ vs tools/)
- Improved error messages and user feedback

### Fixed

- Memory leak in journey tracking system
- Checkpoint timestamp handling edge cases
- Prettier formatting for markdown documentation
- Configuration loading from config.yaml
- BigQuery credential handling
- Service account key path resolution

### Documentation

- Added streaming insert API design document
- Enhanced logging analysis and research
- Project history documentation
- Multi-table configuration examples
- "Why Maritime Data?" rationale section
- Backward compatibility maintained with single-table format

## [1.0.0] - 2024-XX-XX

### Added
Expand Down
50 changes: 33 additions & 17 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,13 +126,24 @@ generateBatch(count, timestampOffset = 0) {

## Testing

Currently, we have example scripts rather than formal tests. When adding functionality:
We have comprehensive unit tests for core functionality. When adding or modifying features:

1. Create an example in `examples/` directory
2. Document how to run it
3. Verify it works on clean install
1. Run the test suite: `npm test`
2. Add tests for new functionality in `test/` directory
3. Ensure all tests pass before submitting PR
4. Test coverage includes:
- BigQuery client with retry logic
- Config loader for single and multi-table formats
- Sync engine with phase calculation
- Maritime data generators
- Multi-table orchestrator

**Future**: We'll add proper unit and integration tests.
**Example test run:**

```bash
npm test
# All tests should pass (91 tests currently)
```

## Documentation

Expand All @@ -147,19 +158,19 @@ If your change affects user-facing features:

### High Priority

1. **Testing Infrastructure**
- Unit tests for generator
- Integration tests
- CI/CD pipeline
1. **Production Operations**
- Deployment documentation for Fabric and self-hosted
- Monitoring dashboard templates (Grafana, CloudWatch)
- Operational runbooks for common scenarios

2. **Error Handling**
- Better error messages
- Recovery strategies
- Validation improvements
2. **Testing Expansion**
- Integration tests for end-to-end sync validation
- Performance benchmarks
- Multi-node cluster testing

3. **Documentation**
- More examples
- Video tutorials
- Video tutorials (setup, configuration, troubleshooting)
- More real-world configuration examples
- Architecture diagrams

### Medium Priority
Expand Down Expand Up @@ -194,9 +205,14 @@ The BigQuery plugin integrates with HarperDB. When modifying plugin code:

**Key Files**:

- `src/sync-engine.js` - Main sync engine logic
- `src/validation.js` - Data validation
- `src/index.js` - Plugin entry point and Harper integration
- `src/sync-engine.js` - Core sync engine with adaptive batch sizing
- `src/bigquery-client.js` - BigQuery API client with retry logic
- `src/validation.js` - Data validation and auditing
- `src/query-builder.js` - SQL query construction
- `src/config-loader.js` - Configuration parsing (single/multi-table)
- `schema/harper-bigquery-sync.graphql` - GraphQL schema
- `test/` - Unit tests for all core functionality

## Synthesizer Development

Expand Down
Loading