eduaguilera · jinfama · Dec 12, 2025 · Dec 12, 2025 · Dec 15, 2025 · Dec 15, 2025
diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md
@@ -4,13 +4,46 @@
 
 - Follow the workflow: https://lbm364dl.github.io/follow-the-workflow/
 - Follow tidyverse style guide: https://style.tidyverse.org/
-- In the documentation section of functions, ensure these tags exist: `@description`, `@param` (for each function parameter), `@return`, `@export`, `@examples`
 - Maximum line width is 80 characters
-- There must be one space after `#'`
-- If you start the text for an annotation in that same line, then the next lines should be indented two spaces to easily know it's for that same section
-- Finish all doc sentences with full stop
-- Use `snake_case` for column namings
+- For code formatting, use `air format` if this is available in the user's computer
+- Functions should be short. Ideally no more than 25 lines. There might be exceptions if the code is easy to follow, but try to keep functions short and modular. If necessary, split large functions into smaller ones, with meaningful names.
+- Main exported functions should come first in the file. The private helpers should come at the end, after all exported functions.
+- Use `snake_case` for column namings in tibbles
 - Always make sure that all rules in this document and all lintrs have passed after a change in the code
+- Don't use imported functions without namespace prefix (e.g. use `dplyr::filter()` instead of just `filter()`). This also means in the package functions you must not use `@importFrom` in the roxygen2 documentation.
+- This is a tidy data project. Don't use `data.frame` or `data.table`. Always use `tibble` from tidyverse.
+- For argument validation try to use `rlang` functions as much as possible instead of base R. For example, instead of `time_col %in% names(data)` use `rlang::has_name(data, time_col)`.
+- For error messages, try to use `cli::cli_abort()` instead of `stop()`. For example, instead of `stop("Time column '", time_col, "' not found in data")` use `cli::cli_abort("Time column '{time_col}' not found in data")`. This also applies to warnings with `cli::cli_warn()`, info messages, etc... Try to use well formatted cli messages instead of base R messages.
+- If the code uses regex, keep in mind that escaped characters must be double-escaped in R strings. For example, to match a dot (`.`) you must use `\\.` in the R string, so `\.` is not enough.
+- When defining functions that expect column names as arguments, expect symbolic names (unquoted) instead of strings (quoted). For example, use `function(data, time_col)` instead of `function(data, time_col_name)`. Inside the function then, use `{{ time_col }}` to refer to the column.
+- There must not be functions inside functions. All functions must be defined at the top level. This means that if you need a helper function, define it as a private function (with a name starting with a dot) at the end of the script, instead of including its definition inside the function where it's used.
+- Always strive to use pipes, native R pipes (`|>`). But in general, use piped expressions as much as possible to improve readability. If you can, make functions ideally look like this:
+```
+result <- data |>
+  dplyr::filter(...) |>
+  dplyr::mutate(...) |>
+  some_other_function(...) |>
+  dplyr::summarise(...) |>
+  some_final_function(...)
+```
+- Make as many functions as necessary to make code look like the above example. Avoid long intermediate expressions that assign to variables, unless necessary for readability.
+- For the same purpose, avoid things like for loops. Try to use vectorised operations, use `purrr` if necessary or just plain functions from `dplyr` or `tidyr` to operate on entire columns or datasets at once. Explicit loops should be the last resort.
+
+
+## Documentation
+
+- Use roxygen2 for function documentation
+- It's enough to document only exported functions. The private ones can remain undocumented. The private functions' names should start with a dot (`.`).
+- Finish all doc sentences with full stop
+- The first line must be the documentation title. No need to use `@title` tag. Keep it short and start with a verb in imperative form.
+- Next part should be the description, starting with `@description` tag. This part can be multiple lines. It can become long if necessary, but don't fill it with useless sentences that say nothing.
+- Next part should be the parameters, each starting with `@param` tag. Each parameter should have its own `@param` tag.
+- Next part should be the return value, starting with `@return` tag. Describe what the function returns.
+- Next part should be the `@export` tag.
+- Last part should be the examples, starting with `@examples` tag. Provide meaningful examples that show how to use the function. Avoid unnecessary examples.
+- In examples, the last line should show the output of the function, so that when the user runs the example, they can see what the function returns.
+- There must be one space after `#'` for each roxygen2 line
+- If you start the text for one tag in that same line, then the next lines should be indented two spaces to easily know it's for that same section. For example, if each parameter description starts in the same line as `@param`, then the next lines should be indented two spaces.
 
 ## Tests
 

diff --git a/DESCRIPTION b/DESCRIPTION
@@ -15,6 +15,7 @@ Description: A set of tools for processing and analyzing data developed in the
 License: MIT + file LICENSE
 Imports:
     cli,
+    data.table,
     dplyr,
     fs,
     FAOSTAT,
@@ -26,6 +27,7 @@ Imports:
     readr,
     rlang,
     stringr,
+    tibble,
     tidyr,
     withr,
     yaml,
@@ -40,8 +42,7 @@ Suggests:
     knitr,
     pointblank,
     rmarkdown,
-    testthat (>= 3.0.0),
-    tibble
+    testthat (>= 3.0.0)
 Config/testthat/edition: 3
 VignetteBuilder: knitr
 URL: https://eduaguilera.github.io/whep/, https://github.com/eduaguilera/whep

diff --git a/NAMESPACE b/NAMESPACE
@@ -8,19 +8,20 @@ export(add_item_cbs_name)
 export(add_item_prod_code)
 export(add_item_prod_name)
 export(build_supply_use)
+export(calculate_lmdi)
 export(expand_trade_sources)
+export(fill_growth)
+export(fill_linear)
+export(fill_sum)
 export(get_bilateral_trade)
 export(get_faostat_data)
 export(get_feed_intake)
 export(get_primary_production)
 export(get_primary_residues)
 export(get_processing_coefs)
 export(get_wide_cbs)
-export(linear_fill)
-export(proxy_fill)
-export(sum_fill)
 export(whep_list_file_versions)
 export(whep_read_file)
+importFrom(data.table,":=")
 importFrom(pins,pin_fetch)
-importFrom(rlang,":=")
 importFrom(stats,ave)
diff --git a/NEWS.md b/NEWS.md
@@ -2,7 +2,7 @@
 
 # whep 0.2.0
 
-* Add gapfilling functions `linear_fill()`, `proxy_fill()`, `sum_fill()` (@eduaguilera, #11).
+* Add gapfilling functions `fill_linear()`, `fill_sum()` (@eduaguilera, #11).
 * Now examples can't fail because of unavailable Internet resources (#58).
 
 # whep 0.1.0