diff --git a/vignettes/fallback.Rmd b/vignettes/fallback.Rmd index 96efeb320..99cb1672d 100644 --- a/vignettes/fallback.Rmd +++ b/vignettes/fallback.Rmd @@ -45,8 +45,11 @@ conflict_prefer("filter", "dplyr") ## Introduction The duckplyr package aims at providing a fully compatible drop-in replacement for dplyr. -To achieve this, only a carefully selected subset of dplyr's operations, R functions, and R data types are implemented. -Whenever a request cannot be handled by DuckDB, duckplyr falls back to dplyr. +All operations, R functions, and data types that are supported by dplyr should work in an identical way with duckplyr. +This is achieved in two ways: + +- A carefully selected subset of dplyr operations, R functions, and R data types are implemented in DuckDB, focusing on faithful translation. +- When DuckDB does not support an operation, duckplyr falls back to dplyr, guaranteeing identical behavior. ## DuckDB operation @@ -67,7 +70,12 @@ duckdb %>% explain() ``` -The plan shows three operations: a data frame scan (the input), a sort operation, and a projection (adding the `b` column and removing the `a` column). +The plan shows three operations: + +- a data frame scan (the input), +- a sort operation, +- a projection (adding the `b` column and removing the `a` column). + Because each operation is supported by DuckDB, the resulting object contains a plan for the entire pipeline. The plan is only executed when the data is needed. @@ -110,7 +118,7 @@ fallback <- select(-a) ``` -The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is handled by dplyr and already executed when the pipeline is defined. +The `verbose_plus_one()` function is not supported by DuckDB, so the `mutate()` step is forwarded to dplyr and already executed (eagerly) when the pipeline is defined. This is confirmed by the `last_rel()` function: ```{r} @@ -140,6 +148,26 @@ duckplyr::last_rel() The `last_rel()` function confirms that only the final `select()` is handled by DuckDB again. +## Enforce DuckDB operation + +For any duck frame, one can control the automatic materialization. +For fallbacks to dplyr, automatic materialization must be allowed for the frame at hand, as dplyr necessitate eager evaluation. + +Therefore, by making a data frame frugal, one can ensure a pipeline will error when a fallback to dplyr would have normally happened. +See `vignette("prudence")` for details. + +By using operations supported by duckplyr and avoiding fallbacks as much as possible, your pipelines will be executed by DuckDB in an optimized way. + +## Configure fallbacks + +Using the `fallback_sitrep()` and `fallback_config()` functions you can examine and change settings related to fallbacks. + +- You can choose to make fallbacks verbose with `fallback_config(info = TRUE)`. + +- You can change settings related to logging and reporting fallback to duckplyr development team to inform their work. + +See `vignette("telemetry")` for details. + ## Conclusion The fallback mechanism in duckplyr allows for a seamless integration of dplyr verbs and R functions that are not supported by DuckDB.