Skip to content

Conversation

adriangb
Copy link
Contributor

@adriangb adriangb commented Aug 27, 2025

This will enable TableProvider's to produce files in an order and partitioning that optimizes query execution, e.g. to make a TopK operator stop earlier via dynamic filters or to completely optimize away a sort if the files can be ordered to do so.

Note that this is:

  1. A preference and not an absolute.
  2. Does not remove the sort node.
    Because physical optimizer rules should remove unnecessary sorts later on.
    This avoids complexity of negotiating sort order with the TableProvider: ExecutionPlan (the thing TableProvider returns) already has APIs to negotiate sort orders.
    So TableProvider can encode into the ExecutionPlan that it returns that "the sort is completely handled in the scan" and then phyiscal optimizer rules will remove the SortExec.

adriangb and others added 2 commits August 27, 2025 14:16
This commit adds a new optional field `preferred_ordering` to the `TableScan`
logical plan node to support sort pushdown optimizations.

Changes include:
- Add `preferred_ordering: Option<Vec<SortExpr>>` field to `TableScan` struct
- Add `try_new_with_preferred_ordering` constructor method
- Update all `TableScan` constructors throughout the codebase to include the new field
- Update `Debug`, `PartialEq`, `Hash`, and `PartialOrd` implementations
- Update pattern matching in optimizer and other modules

The preferred_ordering field is currently not used by any optimization rules
but provides the foundation for future sort pushdown implementations.

This is part 2 of 2 PRs split from apache#17273 as requested in
apache#17273 (comment)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
This commit adds a new optimizer rule that pushes sort expressions down into
TableScan nodes as preferred_ordering, enabling table providers to potentially
optimize scans based on sort requirements.

Features:
- PushDownSort optimizer rule that detects Sort -> TableScan patterns
- Pushes down simple column-based sort expressions only
- Sets TableScan.preferred_ordering field for table provider optimization
- Completely eliminates Sort node when all expressions can be pushed down
- Comprehensive test coverage

The rule is positioned strategically in the optimizer pipeline after limit
pushdown but before filter pushdown to maximize optimization opportunities.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@github-actions github-actions bot added logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate labels Aug 27, 2025
@adriangb adriangb changed the title Push down sorting preferences into TableScan logical plan node Push down sorts into TableScan logical plan node Aug 27, 2025
@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Aug 27, 2025
Comment on lines +2528 to +2529
/// Optional preferred ordering for the scan
pub preferred_ordering: Option<Vec<SortExpr>>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berkaysynnada do you think this is the right information to pass down? Or is there a world where it makes sense to pass down some sort of "equivalence" information?

cc @alamb

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think @suremarc and @ozankabak may be interested in this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more future-proof API (that we could change the internal representation) might be something like

/// Preferred ordering
///
/// Preferred orderings can potentially help DataFusion optimize queries, even in cases
/// when the output does not completely follow that order. This is information passed 
/// to the scan about what might help. 
///
/// For example, a query with `ORDER BY time DESC LIMIT 10`, DataFusion's dynamic
/// predicates and TopK operator will work better if the data is roughly ordered by descending
/// time (more recent data first)
struct PreferredOrdering {
  exprs: Vec<SortExpr>
}

And then change this API to

Suggested change
/// Optional preferred ordering for the scan
pub preferred_ordering: Option<Vec<SortExpr>>,
/// Optional preferred ordering for the scan
pub preferred_ordering: Option<PreferredOrdering>,

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@berkaysynnada do you think this is the right information to pass down? Or is there a world where it makes sense to pass down some sort of "equivalence" information?

cc @alamb

When we are registering the sources, we can provide multiple orderings if the table supports them. However, the requirements are singular, and I don't think there would be any meaning in ordering the table for both col_a and col_b simultaneously. So, I've always thought that requirements need only one ordering, but specs should be capable of having multiple orderings. So there isn't any obvious advantage of using equivalences here, IMO

@alamb alamb changed the title Push down sorts into TableScan logical plan node Push down preferred sorts into TableScan logical plan node Sep 5, 2025
@alamb
Copy link
Contributor

alamb commented Sep 5, 2025

🤖 ./gh_compare_branch_bench.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing add-preferred-ordering-to-table-scan (c84cf55) to b6a8a0e diff
BENCH_NAME=sql_planner
BENCH_COMMAND=cargo bench --bench sql_planner
BENCH_FILTER=
BENCH_BRANCH_NAME=add-preferred-ordering-to-table-scan
Results will be posted here when complete

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good but I was confused that the tests don't seem to show the preferred ordering. I think we should fix those tests before merging -- I also expect it to show that some of the pushdown isn't working quite as expected (aka pushing through a projection or filter)

I also recommend putting the prefered sort expressions in their own struct, but that is not required in my mind.

As I understand the plan, in the next few PRs, @adriangb will update the various APIs so that this preferred sort is provided to TableProvider::scan (really via scan_with_args)

I also wonder if we should wait for the DataFusion 50 release before merging this or if it is ok to merge now.

Comment on lines +2528 to +2529
/// Optional preferred ordering for the scan
pub preferred_ordering: Option<Vec<SortExpr>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more future-proof API (that we could change the internal representation) might be something like

/// Preferred ordering
///
/// Preferred orderings can potentially help DataFusion optimize queries, even in cases
/// when the output does not completely follow that order. This is information passed 
/// to the scan about what might help. 
///
/// For example, a query with `ORDER BY time DESC LIMIT 10`, DataFusion's dynamic
/// predicates and TopK operator will work better if the data is roughly ordered by descending
/// time (more recent data first)
struct PreferredOrdering {
  exprs: Vec<SortExpr>
}

And then change this API to

Suggested change
/// Optional preferred ordering for the scan
pub preferred_ordering: Option<Vec<SortExpr>>,
/// Optional preferred ordering for the scan
pub preferred_ordering: Option<PreferredOrdering>,


/// Sets the preferred ordering for this table scan using the builder pattern.
///
/// The preferred ordering serves as a hint to table providers about the desired
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mybe we can move some of this comment to PreferredOrdering if we go with the struct approach

}

// If the table scan already has preferred ordering, don't overwrite it
// This preserves any existing sort preferences from other optimizations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when would there be existing preferences?

Comment on lines +84 to +86
/// Currently, we only support pushing down simple column references
/// because table providers typically can't optimize complex expressions
/// in sort pushdown.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a fundamental limitation? I ask because @pepijnve was asking about "column only" support the other day at

///
/// # Examples
///
/// ```rust
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as much as I love examples, I am not sure this one adds much

Comment on lines +133 to +141
let new_table_scan = TableScan {
table_name: table_scan.table_name.clone(),
source: Arc::clone(&table_scan.source),
projection: table_scan.projection.clone(),
projected_schema: Arc::clone(&table_scan.projected_schema),
filters: table_scan.filters.clone(),
fetch: table_scan.fetch,
preferred_ordering: Some(sort.expr.clone()),
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be clearer and less error prone, if you could do something that only change the field of interest. Perhaps like this:

Suggested change
let new_table_scan = TableScan {
table_name: table_scan.table_name.clone(),
source: Arc::clone(&table_scan.source),
projection: table_scan.projection.clone(),
projected_schema: Arc::clone(&table_scan.projected_schema),
filters: table_scan.filters.clone(),
fetch: table_scan.fetch,
preferred_ordering: Some(sort.expr.clone()),
};
let new_table_scan = table_scan.clone()
.with_preferred_ordering(Some(sort.expr.clone()))

Sort: t1.a ASC NULLS LAST
Inner Join: t1.a = t2.a
TableScan: t1
TableScan: t2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests don't really show that the preferred ordering is pushed through. Perhaps we can update the plan to show any preferred ordering

#[derive(Default, Debug)]
pub struct PushDownSort {}

impl PushDownSort {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the EnforceSorting rule already pushes sorts down in the plan -- https://docs.rs/datafusion/latest/datafusion/physical_optimizer/enforce_sorting/struct.EnforceSorting.html

Do you think we will need more sort pushdown? Or will this always just be "pass down preferred sorts" to LogicalPlans?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. I was hoping another optimizer rule does the "hard work" so we can do just the simple thing here (only a subset of node types we need to support).

@ r"
Sort: test.a ASC NULLS LAST
Filter: test.a > Int32(10)
TableScan: test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW I think this table scan should have the preferred ordering passed to it, but I am not sure the current code will do so

@alamb
Copy link
Contributor

alamb commented Sep 5, 2025

Here is a PR that avoids some clones, which might improve performance

@alamb
Copy link
Contributor

alamb commented Sep 5, 2025

🤔 this seems to have caused a massive slowdown in the sql planner benchmark somehow:

Benchmarking physical_sorted_union_order_by_300: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 3942.3s, or reduce sample count to 10.
physical_sorted_union_order_by_300
                        time:   [38.914 s 38.997 s 39.079 s]

Benchmarking logical_plan_optimize: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 17189.6s, or reduce sample count to 10.

It is still running...

@adriangb
Copy link
Contributor Author

adriangb commented Sep 5, 2025

I'm sure it's just a dumb mistake on my end. Let me do a round of looking at your comments and investigating, thank you for your patience 🙏🏻

@adriangb
Copy link
Contributor Author

adriangb commented Sep 5, 2025

I also wonder if we should wait for the DataFusion 50 release before merging this

I think we should wait until after v50

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
logical-expr Logical plan and expressions optimizer Optimizer rules proto Related to proto crate sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants