Skip to content

Conversation

@irjudson
Copy link
Member

Summary

Implements #8 - adds opt-in streaming insert API alongside the existing load job API for BigQuery.

Features

  • New useStreamingAPIs config option - defaults to false for backward compatibility
  • Streaming Insert API - lower latency (sub-second to few seconds), has costs ($0.01 per 200 MB)
  • Load Job API - free tier compatible, higher latency (remains the default)
  • Per-table configuration - can enable streaming for specific tables in multi-table setup
  • Comprehensive error handling - retry logic for transient errors (429, 503, 5xx)
  • Full test coverage - 16 new unit tests covering all scenarios

Implementation Approach

Followed Test-Driven Development (TDD):

  1. RED: Wrote 16 unit tests (all failed)
  2. GREEN: Implemented _insertStreaming() and routing logic (all passed)
  3. REFACTOR: Clean separation between streaming and load job methods

Changes

Modified Files

  • src/bigquery.js - Added streaming insert support with dispatcher logic
  • config.yaml - Documentation and examples for both insert methods
  • test/bigquery-streaming.test.js - 16 comprehensive unit tests
  • docs/plans/2025-11-14-streaming-insert-api-design.md - Complete design document

Key Implementation Details

Configuration:

bigquery:
  tables:
    - id: vessel_positions
      useStreamingAPIs: false  # Default: batch/free tier
    
    - id: port_events
      useStreamingAPIs: true   # Real-time events

Dispatcher logic: insertBatch() routes to appropriate method based on config
Retry logic: Exponential backoff for transient errors (1s, 2s, 4s)
Error handling: Partial failure detection and detailed logging

Testing

All 107 tests passing:

  • 91 existing tests (unchanged)
  • 16 new streaming insert tests

Tests cover:

  • Configuration handling
  • Method selection/routing
  • Successful inserts
  • Error scenarios (partial failures, quota exceeded, service unavailable)
  • Retry behavior
  • Backward compatibility

Migration Guide

Existing deployments: No changes required - defaults to load job API

Enable streaming for a table:

tables:
  - id: your_table
    useStreamingAPIs: true  # Add this line

Decision Guide

Use Load Job API (default):

  • Development and testing
  • Batch workloads
  • Cost-sensitive deployments
  • Free tier compatible

Use Streaming API:

  • Production real-time dashboards
  • Low-latency requirements
  • Time-sensitive event data

Cost Implications

  • Load Job API: Free
  • Streaming API: $0.01 per 200 MB (minimum $0.01/day)

Example: 144K records/day × 1 KB/record = 144 MB/day = $0.30/month

Documentation

Complete design document included: docs/plans/2025-11-14-streaming-insert-api-design.md

Implements #8 - adds opt-in streaming insert API alongside existing load job API.

Features:
- New useStreamingAPIs config option (defaults to false for backward compatibility)
- Streaming API: lower latency (sub-second), has costs ($0.01 per 200 MB)
- Load Job API: free tier compatible, higher latency (default)
- Configurable per-table in multi-table setup
- Comprehensive error handling with retry logic
- 16 unit tests covering all scenarios (TDD approach)

Implementation follows Test-Driven Development:
1. RED: Wrote 16 unit tests (all failed)
2. GREEN: Implemented _insertStreaming() and routing logic (all passed)
3. REFACTOR: Clean separation between streaming and load job methods

All 107 tests passing (91 existing + 16 new).
@irjudson irjudson merged commit 09bf2cb into main Nov 14, 2025
4 checks passed
@irjudson irjudson deleted the feature/streaming-insert-api branch November 14, 2025 15:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants