Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft: specs and vctrs support #862

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

Conversation

krlmlr
Copy link
Contributor

@krlmlr krlmlr commented Jan 12, 2022

This is a very basic sketch for extracting the formatting logic out of the fmt_() into first-class objects.

The idea is to support the following workflow:

  • The formatting is attached to columns in a data frame at any point in the analysis
  • The formatting survives most arithmetic transformations
  • gt, ggplot2, and other users pick up the formatting and apply it automatically to the data

Reference:

Remaining work:

  • Arithmetics support for gt_vctr()
  • Perhaps the spec also needs to know the data types to which it applies, so that gt_vctr() can check.
  • Extract more gt_spec_() functions, watch out for open PRs
  • Automatically format data of type "gt_vctr"
  • Use the formatting for ggplot2 scales
  • Automatically format other data: pillar::num(), units package, ...
library(gt)

gt_spec_number(n_sigfig = 5, drop_trailing_zeros = TRUE)
#> <gt_spec_number(n_sigfig = 5, drop_trailing_zeros = TRUE)>
gt_spec_number(n_sigfig = 5, drop_trailing_zeros = FALSE)
#> <gt_spec_number(n_sigfig = 5)>

x <- gt_vctr(1:3, gt_spec_number(decimals = 3))
x
#> <gt_spec_number(decimals = 3)[3]>
#> [1] 1 2 3
# Duesn't work yet, should be supported ultimately:
# x + 1

Created on 2022-01-12 by the reprex package (v2.0.1)

@jcheng5
Copy link
Member

jcheng5 commented Jan 14, 2022

Hi @krlmlr, this is a really cool idea. We were already thinking that we should eventually refactor the formatting functions so that other table packages (like {rtables}) could take advantage of them, but your proposal goes much further in including non-table packages like ggplot2.

Two questions for you:

  1. Do you imagine these spec functions would ultimately live in gt, a different existing package, or an entirely new package?
  2. gt has to go out of its way to support HTML, LaTeX, RTF, and whatever output formats we want to suport in the future. Is that baggage that you want to carry? (Actually it seems like you will have even additional formats that are important, like plain text, VT100 text...)

@krlmlr
Copy link
Contributor Author

krlmlr commented Jan 20, 2022

Thanks for the heads up.

Home for this code

My initial thought was to keep this code in gt, but there is some advantage in putting it into a separate package. We could move to {pillar}, but that package brings in dependencies which might put off dependency-aware users -- this contradicts the idea of making the formatters easy to use for other users. I'm not sure we want it in {vctrs} either. So the options seem to be: gt or a new package, perhaps {fmts}, depending only on {vctrs} (and maybe even only suggesting it)?

If we consider a separate package, scope, extensibility, and back-compatibility becomes relevant. We could also consider starting the work in gt, perhaps without exporting the new code.

Scope

For this package to be useful "out of the box" to other table formatters, I think it should target HTML, LaTeX, RTF, plain, VT100, and ggplot2 directly. New important targets would have to be added and maintained in that package.

It is possible to change the way how the formatting is done to use a form of double dispatch, which we can simulate with 2x single dispatch until R7 is available. Because the number of targets is finite and very small, we would dispatch over the target first and then over the spec. This feels a bit over-engineered to me for this particular problem.

Extensibility

With the proposed approach, it is possible to create a derived spec that supports a new format by wrapping it. Extending a spec to add detail to how numbers are printed is difficult though.

To effectively support even the initial target formats, we might consider a new DSL that specifies, in a target-agnostic way, how numbers are to be formatted. Ideally we could use a single expression in this DSL to define the format of a number and then render it to all targets using the same expression. Do you feel that such a DSL is feasible for all the formats that gt currently supports?

Back-compatibility

Changes in the formats should not break gt or other packages. We could use the concept of "editions" similarly to what testthat and dbplyr already do.

Incremental approach

Keeping the specs in {gt} leads to much quicker results and more room for experimentation. We could implement here and mark as experimental, or even avoid exporting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants