Skip to content

Discussion: DataFusion Improvement Proposal (DIPs) Process? #16886

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

@viirya says in #16800 (comment):

Sometimes, I feel that some important proposals in DataFusion lack sufficient context, or that the relevant context is scattered across various issues and PR comments. This makes it difficult to fully understand the proposals or to trace their motivations and evaluate their soundness. As a result, we sometimes see large PRs — hundreds or even thousands of lines — that are based on these proposals, making the review process even more challenging. Only the author or those who were involved in the initial discussions seem to be in a position to effectively review them.

For example, Spark has the SPIP (Spark Project Improvement Proposal) mechanism, where contributors submit formal documents for review when proposing significant changes. These documents typically consolidate the technical details, motivation, and background of the proposal into a single place. This approach helps the community better understand and participate in discussions around major changes.

I wonder if it would be beneficial for DataFusion to adopt a similar lightweight proposal process for major design changes — something that allows ideas and context to be collected and reviewed before implementation begins. It could help improve transparency, facilitate broader community involvement, and make the review process more accessible.

If the full SPIP process — including voting and formal approval — feels too heavy or unnecessary for our context, perhaps we could at least establish a lightweight template for major change proposals. This template could include sections for motivation, background, technical details, and other relevant context. Having a consistent format would make it easier for the community to follow and engage with significant design discussions.

My opinions:

  1. Finding the outstanding proposals and discussions is difficult. They are all public but there is lots of them going on
  2. The context for proposals is often scattered across issues and PRs
  3. It is hard to know when "enough" communication has been done for a proposal to move forward and when it needs more work
  4. Improving the communication around major changes is becoming more important as the project grows and we have more users and contributors

For example, there are several recent discussions that could benefit from this more formal proposal process, including but not limited to the discussion itself above

Describe the solution you'd like

Some sort of "process" that

  1. Makes it easy to find outstanding community improvement proposals
  2. Makes it easy to know the steps to create a new improvement proposal
  3. Is documented

Describe alternatives you've considered

Here is a strawman (for discussion) proposal:

  1. Add a new tag in the DataFusion repo ("DIP - DataFusion Improvement Proposal")
  2. Add a new ISSUE_TEMPLATE for proposals issues based on the SPIP one and current DataFusion issue template
  3. Add a section to the site documentation describing the process

I personally worry that DataFusion is not at a point I where formal voting / formal approval would add a lot of value, but I do think formalizing the proposal format and making them easier to find would be beneficial.

I propose starting with more formalization around the communication of proposals and we can add more explicitly approval / consensus standards if and when they become necessary.

Additional context

Here is the documentation for the spark process: The https://spark.apache.org/improvement-proposals.html

I looked through the list of SPIPs in Spark and the few I looked at didn't have huge amounts of discussion. They often linked to a google doc with more details.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions