Ballista Roadmap

There is an excellent discussion in #30 about the future of the project, and we encourage you to participate and add your feedback there if you are interested in using or contributing to Ballista.

The current focus is on the following items:

Make production ready
- Shuffle file cleanup
  - Periodically (#185)
  - Add gRPC & REST interfaces for clients/UI to actively call the cleanup for a job or the whole system
- Fill functional gaps between DataFusion and Ballista
- Improve task scheduling and data exchange efficiency
- Better error handling
  - Scheduler restart
- Improve monitoring, logging, and metrics
- Auto scaling support
- Better configuration management
- Support for multi-scheduler deployments. Initially for resiliency and fault tolerance but ultimately to support sharding for scalability and more efficient caching.
Shuffle improvement
- Shuffle memory control (#320)
- Improve shuffle IO to avoid producing too many files
- Support sort-based shuffle
- Support range partition
- Support broadcast shuffle (#342)
Scheduler Improvements
- All-at-once job task scheduling
- Executor deployment grouping based on resource allocation
Cloud Support
- Support Azure Blob Storage (#294)
- Support Google Cloud Storage (#293)
Performance and scalability
- Implement Adaptive Query Execution (#387)
- Implement bubble execution (#408)
- Improve benchmark results (#339)
Python Support
- Support Python UDFs (#173)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ROADMAP.md

ROADMAP.md

Ballista Roadmap

Files

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Ballista Roadmap