Reasoning
Without batch support, Osprey users currently cannot:
- Run retroactive rules against historical data efficiently
- Perform investigations at scale
- Conduct bulk historical analysis
Current behavior
Osprey currently supports historical investigation through UI-based queries with the following limitations:
- Historical data queries are limited to a 90-day window (configurable via MAX_HISTORICAL_QUERY_WINDOW_DAYS)
- Events are retrieved through paginated scan queries requiring manual iteration
- Individual Top N results can be exported to CSV, but no batch processing mechanism exists
- Historical investigation is designed as a secondary capability to real-time event processing
- All queries require manual UI interaction or individual API calls
Currently, Osprey lacks:
- A mechanism to programmatically process large historical datasets in batches
- Background job support for retroactive event analysis
- Bulk query execution against historical data without UI interaction
Desired Behavior
Enable batch processing for historical investigation to support:
- Ability to queue/schedule multiple historical investigations to run together without manual UI interaction for each one
- UI support for defining, monitoring, and managing batch historical query jobs
- Background execution of batches of historical investigations across large time periods
- Bulk export/analysis of results from multiple historical queries
- Support for processing events beyond the current 90-day historical window in batch mode
Other similar projects:
https://medium.com/pinterest-engineering/fighting-spam-with-guardian-a-real-time-analytics-and-rules-engine-938e7e61fa27
Reasoning
Without batch support, Osprey users currently cannot:
Current behavior
Osprey currently supports historical investigation through UI-based queries with the following limitations:
Currently, Osprey lacks:
Desired Behavior
Enable batch processing for historical investigation to support:
Other similar projects:
https://medium.com/pinterest-engineering/fighting-spam-with-guardian-a-real-time-analytics-and-rules-engine-938e7e61fa27