Skip to content

Refactor to make use of QuestDB instead of tick csv files #1

@mccaffers

Description

@mccaffers

Refactor ingest pipeline to use QuestDB instead of tick CSV files

The current ingest system processes tick data through CSV files using StreamReader instances to maintain synchronisation across multiple symbols. While this approach handles the basic requirements of time-ordered ingestion, it creates significant limitations for multi-timeframe analysis. The CSV-based storage requires loading entire files for aggregation operations and lacks the query capabilities needed for efficient time-series analysis across different periods.

QuestDB would replace this CSV storage layer with a purpose-built time-series database that maintains the same ingestion synchronisation while adding multi-timeframe capabilities. The database's columnar storage and built-in time-series functions would enable direct SQL queries for generating 1-minute, 5-minute, hourly, and daily aggregations from the same tick data source. This eliminates the current bottleneck where different timeframe analyses require separate processing pipelines or memory-intensive CSV manipulations.

Implementation involves designing a schema that mirrors the current PriceObj structure, updating the ingest BufferBlock consumer to write to QuestDB instead of CSV files, and modifying the analysis modules to query the database. The migration would preserve the existing concurrent ingestion architecture while adding the query performance and multi-timeframe functionality needed for more sophisticated backtesting strategies.

Implementation Areas

1. Direct QuestDB Ingestion Pipeline

Replace the entire StreamReader dictionary and BufferBlock architecture with a direct CSV-to-QuestDB import process. This removes the need for the complex time-synchronisation logic across multiple symbols since QuestDB can handle the ordering through SQL queries.

2. Query-Based Data Processing

Build a service layer that retrieves tick data from QuestDB using time-based queries instead of streaming from the buffer. The backtesting engine would query for specific time ranges and symbols on-demand rather than processing a continuous stream.

3. Strategy Execution Model Redesign

Refactor the IStrategy interface and Consumer class since there's no longer a buffer to consume from. Strategies would now query QuestDB for the data they need (current tick, historical context, etc.) rather than receiving PriceObj events from the buffer.

4. Time-Series Analysis Integration

Implement multi-timeframe queries that generate OHLC data and moving averages directly in QuestDB. This replaces any current in-memory calculations and leverages the database's time-series functions for better performance.

5. Backtesting Loop Restructure

Redesign the main backtesting execution flow to iterate through time periods via database queries rather than processing the concurrent ingest/consume tasks. This fundamentally changes how the system steps through historical data during backtesting runs.

This approach trades the real-time streaming complexity for query-based simplicity, which makes more sense for historical backtesting scenarios.


This should eliminate so much of the complexity around time synchronisation. Right now, I'm spending time debugging the concurrent CSV processing, skipping time blocks, than working on trading strategies. The ConcurrentDictionary management and the careful orchestration of multiple StreamReader instances feels fragile, especially when scaling up on AWS. QuestDB looks awesome with initial tests.

Metadata

Metadata

Assignees

Labels

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions