"Mohair" is a project to prototype the use of substrait and arrow to handle pushdown of partial queries to remote storage. Initially, this project will largely inspired by Skytether--an extension of SkyhookDM for single-cell gene expression analysis and computational storage. However, work on skytether narrowly started before Arrow Flight, Acero, and substrait were started. So, a lot of skytether is low-level and specific, where mohair will try to be higher-level and more generic.
The overall goal is to be able to delegate part of a query plan to remote storage, execute the remaining query plan on the intermediate results (from remote storage), and then return the final results to the client. It is expected that this will require some amount of:
- splitting a query plan
- implementing a flight service
- communicating with a remote flight service
- communicating with flight services using substrait
For an informal tracking of progress, we list some milestones here (which will be updated as appropriate).
- Execute a single query without splitting
- Submit query as a substrait plan
- Submit query to a flight service
- Implement flight service as a very simple data server (probably using a file system for now)
- Execute a single query with a simple split
- Submit query as a substrait plan
- Split plan into 2 pieces
- Execute the 2nd piece on the intermediate results of the 1st piece.