-
Notifications
You must be signed in to change notification settings - Fork 0
Juan/functions #65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Juan/functions #65
Conversation
- Add calculate_lmdi(): LMDI decomposition with flexible identity parsing, automatic factor detection, multi-period support, and grouping capabilities - Add fill_growth() to gapfilling.R: Growth-based gap filling using proxy variables with hierarchical fallback and lambda adjustment - Add tests for calculate_lmdi function
…ntre rlang y data.table, declarar globalVariables
…for fill_sum and fill_linear
…sistent with the other gapfilling functions)
eduaguilera
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice, @jinfama! I have added a few comments on fill_growth() and I also used this opportunity to make changes in the functions I have added before:
-Removing proxy_fill() (fill_growth() does the same, but much better)
-Adopting function and variable names conventions of fill_growth()
-Adding smoothing to fill_linear()
-Adding time_col to fill_sum() (more robust if the input dataset is unsorted)
| #' @importFrom rlang enquo quo_is_null .data | ||
| #' @importFrom tibble as_tibble tibble | ||
| #' @importFrom stats approx | ||
| fill_growth <- function( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest "fill_proxy_growth", or similar, because just "growth" makes me think of filling a growth rate, and the key here is filling a variable with the growth rate of another variable
R/gapfilling.R
Outdated
| max_gap = Inf, | ||
| max_gap_linear = 3, | ||
| fill_scope = NULL, | ||
| smooth_window = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very nice. As I understand it, it refers to smoothing the proxy variable for the calculation of growth rates. I wonder if it would also be possible to add a similar smoothing to the gapfilled variable, as I've done now for fill_linear() (or maybe it exists and I've missed it?). Then, instead of smooth_window, we may have something like value_smooth_window and proxy_smooth_window
| value_col, | ||
| proxy_col, | ||
| time_col = "year", | ||
| group_by = NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the other functions we use .by. Is it ok to have here group_by or should we always use the same grouping method? @lbm364dl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we should keep everything consistent. We should change to .by.
… section (calculate_lmdi)
lbm364dl
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used this opportunity to add more content in copilot-instructions.md. I will keep adding more as I go through the code, though I'm afraid the AI might ignore some of it (it's far from perfect in that sense).
Some things we should revise:
- Good, extensive, informative examples for the functions. They need to cover what users would actually want to do with them. For example, the current examples for
fill_growthonly includes the easiest possible way of using the function. It must include more things. - Human-reviewed tests, to make sure these are actually checking the correct thing. This is very important because if I start making changes in the code I need to be sure they are not breaking anything else.
- Column inputs should be symbols directly, not quoted names, e.g.,
gdpand not"gdp", so that we follow tibble functions conventions. - We should strive to use tibbles, so that means no raw
data.frameordata.table.
I think this is going to be very useful, @lbm364dl ! |
No description provided.