Skip to content

Commit c91dcfd

Browse files
authored
Docs: add additional links to blog posts (#19833)
## Which issue does this PR close? - Part of #7013 ## Rationale for this change We have written some good blogs recently that provide additional context and backstory. Let's make sure they are available for others to read ## What changes are included in this PR? Add links to select doc pages ## Are these changes tested? <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
1 parent f3f6dec commit c91dcfd

File tree

5 files changed

+51
-0
lines changed

5 files changed

+51
-0
lines changed

docs/source/library-user-guide/extending-sql.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,11 @@ need to:
2727
- Add custom data types not natively supported
2828
- Implement SQL constructs like `TABLESAMPLE`, `PIVOT`/`UNPIVOT`, or `MATCH_RECOGNIZE`
2929

30+
You can read more about this topic in the [Extending SQL in DataFusion: from ->>
31+
to TABLESAMPLE] blog.
32+
33+
[extending sql in datafusion: from ->> to tablesample]: https://datafusion.apache.org/blog/2026/01/12/extending-sql
34+
3035
## Architecture Overview
3136

3237
When DataFusion processes a SQL query, it goes through these stages:

docs/source/library-user-guide/functions/adding-udfs.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -684,6 +684,10 @@ No function matches the given name and argument types substr(Utf8).
684684
Scalar UDFs are functions that take a row of data and return a single value. Window UDFs are similar, but they also have
685685
access to the rows around them. Access to the proximal rows is helpful, but adds some complexity to the implementation.
686686

687+
For background and other considerations, see the [User defined Window Functions in DataFusion] blog.
688+
689+
[user defined window functions in datafusion]: https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions
690+
687691
For example, we will declare a user defined window function that computes a moving average.
688692

689693
```rust

docs/source/library-user-guide/query-optimizer.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,8 +28,17 @@ This crate is a submodule of DataFusion that provides a query optimizer for logi
2828
contains an extensive set of [`OptimizerRule`]s and [`PhysicalOptimizerRule`]s that may rewrite the plan and/or its expressions so
2929
they execute more quickly while still computing the same result.
3030

31+
For a deeper background on optimizer architecture and rule types and predicates, see
32+
[Optimizing SQL (and DataFrames) in DataFusion, Part 1], [Part 2],
33+
[Using Ordering for Better Plans in Apache DataFusion], and
34+
[Dynamic Filters: Passing Information Between Operators During Execution for 25x Faster Queries].
35+
3136
[`optimizerrule`]: https://docs.rs/datafusion/latest/datafusion/optimizer/trait.OptimizerRule.html
3237
[`physicaloptimizerrule`]: https://docs.rs/datafusion/latest/datafusion/physical_optimizer/trait.PhysicalOptimizerRule.html
38+
[optimizing sql (and dataframes) in datafusion, part 1]: https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one
39+
[part 2]: https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-two
40+
[using ordering for better plans in apache datafusion]: https://datafusion.apache.org/blog/2025/03/11/ordering-analysis
41+
[dynamic filters: passing information between operators during execution for 25x faster queries]: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters
3342
[`logicalplan`]: https://docs.rs/datafusion/latest/datafusion/logical_expr/enum.LogicalPlan.html
3443

3544
## Running the Optimizer

docs/source/user-guide/concepts-readings-events.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,34 @@
3737

3838
This is a list of DataFusion related blog posts, articles, and other resources. Please open a PR to add any new resources you create or find
3939

40+
- **2026-01-12** [Blog: Extending SQL in DataFusion: from ->> to TABLESAMPLE](https://datafusion.apache.org/blog/2026/01/12/extending-sql)
41+
42+
- **2025-12-15** [Blog: Optimizing Repartitions in DataFusion: How I Went From Database Noob to Core Contribution](https://datafusion.apache.org/blog/2025/12/15/avoid-consecutive-repartitions)
43+
44+
- **2025-09-21** [Blog: Implementing User Defined Types and Custom Metadata in DataFusion](https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata)
45+
46+
- **2025-09-10** [Blog: Dynamic Filters: Passing Information Between Operators During Execution for 25x Faster Queries](https://datafusion.apache.org/blog/2025/09/10/dynamic-filters)
47+
48+
- **2025-08-15** [Blog: Using External Indexes, Metadata Stores, Catalogs and Caches to Accelerate Queries on Apache Parquet](https://datafusion.apache.org/blog/2025/08/15/external-parquet-indexes)
49+
50+
- **2025-07-14** [Blog: Embedding User-Defined Indexes in Apache Parquet Files](https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes)
51+
52+
- **2025-06-30** [Blog: Using Rust async for Query Execution and Cancelling Long-Running Queries](https://datafusion.apache.org/blog/2025/06/30/cancellation)
53+
54+
- **2025-06-15** [Blog: Optimizing SQL (and DataFrames) in DataFusion, Part 1: Query Optimization Overview](https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-one)
55+
56+
- **2025-06-15** [Blog: Optimizing SQL (and DataFrames) in DataFusion, Part 2: Optimizers in Apache DataFusion](https://datafusion.apache.org/blog/2025/06/15/optimizing-sql-dataframes-part-two)
57+
58+
- **2025-04-19** [Blog: User defined Window Functions in DataFusion](https://datafusion.apache.org/blog/2025/04/19/user-defined-window-functions)
59+
60+
- **2025-04-10** [Blog: tpchgen-rs World's fastest open source TPC-H data generator, written in Rust](https://datafusion.apache.org/blog/2025/04/10/fastest-tpch-generator)
61+
62+
- **2025-03-11** [Blog: Using Ordering for Better Plans in Apache DataFusion](https://datafusion.apache.org/blog/2025/03/11/ordering-analysis)
63+
64+
- **2024-05-07** [Blog: Announcing Apache Arrow DataFusion is now Apache DataFusion](https://datafusion.apache.org/blog/2024/05/07/datafusion-tlp)
65+
66+
- **2024-03-06** [Blog: Announcing Apache Arrow DataFusion Comet](https://datafusion.apache.org/blog/2024/03/06/comet-donation)
67+
4068
- **2025-03-21** [Blog: Efficient Filter Pushdown in Parquet](https://datafusion.apache.org/blog/2025/03/21/parquet-pushdown/)
4169

4270
- **2025-03-20** [Blog: Parquet Pruning in DataFusion: Read Only What Matters](https://datafusion.apache.org/blog/2025/03/20/parquet-pruning/)

docs/source/user-guide/sql/data_types.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,11 @@ execution. The SQL types from
2525
are mapped to [Arrow data types](https://docs.rs/arrow/latest/arrow/datatypes/enum.DataType.html) according to the following table.
2626
This mapping occurs when defining the schema in a `CREATE EXTERNAL TABLE` command or when performing a SQL `CAST` operation.
2727

28+
For background on extension types and custom metadata, see the
29+
[Implementing User Defined Types and Custom Metadata in DataFusion] blog.
30+
31+
[implementing user defined types and custom metadata in datafusion]: https://datafusion.apache.org/blog/2025/09/21/custom-types-using-metadata
32+
2833
You can see the corresponding Arrow type for any SQL expression using
2934
the `arrow_typeof` function. For example:
3035

0 commit comments

Comments
 (0)