Skip to content

Commit 0753b37

Browse files
authored
Merge pull request #18 from HarperFast/docs/cleanup-and-restructure
Documentation cleanup and v2.0 production release
2 parents f392ef9 + 168ce73 commit 0753b37

11 files changed

+556
-3864
lines changed

.gitignore

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,5 +33,6 @@ service-account-key.json
3333
test-*.js
3434
!test/**/*.test.js
3535

36-
# Internal docs (not for publication)
36+
# Historical development artifacts (not for publication)
3737
docs/internal/
38+
docs/plans/

CHANGELOG.md

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111

12+
- None yet
13+
14+
### Changed
15+
16+
- None yet
17+
18+
### Fixed
19+
20+
- None yet
21+
22+
## [2.0.0] - 2025-12-15
23+
24+
### Added
25+
26+
- **Multi-table support** - Sync multiple BigQuery tables simultaneously with independent settings
27+
- **Column selection** - Reduce costs by fetching only needed columns from BigQuery
28+
- **Per-table configuration** - Independent batch sizes, sync intervals, and strategies per table
29+
- **Exponential backoff retry logic** - Smart retry with jitter for transient BigQuery errors
30+
- **Comprehensive logging** - Structured logging throughout codebase for Grafana observability
31+
- **Optional streaming insert API** - Configurable streaming inserts for production deployments
32+
- **Multi-table validation** - Independent validation and monitoring per table
33+
- **Multi-table maritime synthesizer** - Generate realistic data for multiple related tables
1234
- Rolling window mode for automatic data window maintenance
1335
- `clear` command to truncate table without deleting schema
1436
- Automatic backfill on service start
@@ -30,15 +52,29 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
3052

3153
- `start` command now auto-backfills by default (rolling window mode)
3254
- Documentation reorganized into logical sections
33-
- Test files moved to examples/ directory
55+
- Improved error messages and retry handling
56+
- Enhanced BigQuery client with smart retry logic
57+
- Better organization of codebase (src/ vs tools/)
3458
- Improved error messages and user feedback
3559

3660
### Fixed
3761

62+
- Memory leak in journey tracking system
63+
- Checkpoint timestamp handling edge cases
64+
- Prettier formatting for markdown documentation
3865
- Configuration loading from config.yaml
3966
- BigQuery credential handling
4067
- Service account key path resolution
4168

69+
### Documentation
70+
71+
- Added streaming insert API design document
72+
- Enhanced logging analysis and research
73+
- Project history documentation
74+
- Multi-table configuration examples
75+
- "Why Maritime Data?" rationale section
76+
- Backward compatibility maintained with single-table format
77+
4278
## [1.0.0] - 2024-XX-XX
4379

4480
### Added

CONTRIBUTING.md

Lines changed: 33 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -126,13 +126,24 @@ generateBatch(count, timestampOffset = 0) {
126126

127127
## Testing
128128

129-
Currently, we have example scripts rather than formal tests. When adding functionality:
129+
We have comprehensive unit tests for core functionality. When adding or modifying features:
130130

131-
1. Create an example in `examples/` directory
132-
2. Document how to run it
133-
3. Verify it works on clean install
131+
1. Run the test suite: `npm test`
132+
2. Add tests for new functionality in `test/` directory
133+
3. Ensure all tests pass before submitting PR
134+
4. Test coverage includes:
135+
- BigQuery client with retry logic
136+
- Config loader for single and multi-table formats
137+
- Sync engine with phase calculation
138+
- Maritime data generators
139+
- Multi-table orchestrator
134140

135-
**Future**: We'll add proper unit and integration tests.
141+
**Example test run:**
142+
143+
```bash
144+
npm test
145+
# All tests should pass (91 tests currently)
146+
```
136147

137148
## Documentation
138149

@@ -147,19 +158,19 @@ If your change affects user-facing features:
147158

148159
### High Priority
149160

150-
1. **Testing Infrastructure**
151-
- Unit tests for generator
152-
- Integration tests
153-
- CI/CD pipeline
161+
1. **Production Operations**
162+
- Deployment documentation for Fabric and self-hosted
163+
- Monitoring dashboard templates (Grafana, CloudWatch)
164+
- Operational runbooks for common scenarios
154165

155-
2. **Error Handling**
156-
- Better error messages
157-
- Recovery strategies
158-
- Validation improvements
166+
2. **Testing Expansion**
167+
- Integration tests for end-to-end sync validation
168+
- Performance benchmarks
169+
- Multi-node cluster testing
159170

160171
3. **Documentation**
161-
- More examples
162-
- Video tutorials
172+
- Video tutorials (setup, configuration, troubleshooting)
173+
- More real-world configuration examples
163174
- Architecture diagrams
164175

165176
### Medium Priority
@@ -194,9 +205,14 @@ The BigQuery plugin integrates with HarperDB. When modifying plugin code:
194205

195206
**Key Files**:
196207

197-
- `src/sync-engine.js` - Main sync engine logic
198-
- `src/validation.js` - Data validation
208+
- `src/index.js` - Plugin entry point and Harper integration
209+
- `src/sync-engine.js` - Core sync engine with adaptive batch sizing
210+
- `src/bigquery-client.js` - BigQuery API client with retry logic
211+
- `src/validation.js` - Data validation and auditing
212+
- `src/query-builder.js` - SQL query construction
213+
- `src/config-loader.js` - Configuration parsing (single/multi-table)
199214
- `schema/harper-bigquery-sync.graphql` - GraphQL schema
215+
- `test/` - Unit tests for all core functionality
200216

201217
## Synthesizer Development
202218

0 commit comments

Comments
 (0)