Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supporting unit()s comprehensively in ggplot2 aesthetics #5609

Open
mjskay opened this issue Dec 22, 2023 · 7 comments
Open

Supporting unit()s comprehensively in ggplot2 aesthetics #5609

mjskay opened this issue Dec 22, 2023 · 7 comments

Comments

@mjskay
Copy link
Contributor

mjskay commented Dec 22, 2023

The problem

Currently, a big source of fiddlyness in ggplot2 happens when you need to make manual adjustments to the positions of objects in the final plot. Often (but not always) this happens when doing annotation, e.g. positioning labels such that they are nicely and consistently spaced or positioned relative to data, and/or relative to the plot itself. This can be frustrating because if the plot or data dimensions change, manually-positioned elements will also move, because the only way to position them is by supplying values in data units. However, annotation usually needs to be specified in plot units (e.g., points, npcs, etc)---or even worse---in a combination of data units and plot units.

That said, {grid} has robust support for unit conversion and combination in the form of grid::unit(). If ggplot was able to specify positions (and even sizes, linewidths, etc) using grid::unit()s, it would be a lot easier to create charts that look good even when the limits of underlying x/y scales change or when the dimensions of the plot changes.

Even better would be able to combine data units and plot units: e.g., to be able to specify something like "put this text label exactly 5 points to the left of this data position". grid::unit() already has a data unit (the "native" unit) that could be used for this purpose. It is somewhat unused in ggplot2 because ggplot internally scales everything into c(0,1), so "native" and "npc" are effectively equivalent. That means "native" units could be used to solve this problem...

A possible solution for positional aesthetics (draft PR: #5610)

I started playing with the above problem earlier this week, and I think I've come up with something that, with some polish, might be able to solve it without much surgery---at least for positional aesthetics. The solution would also allow extension packages to more-or-less automatically take advantage of it.

The basic idea is to allow unit() vectors containing at most one "native" unit to be assigned to positional aesthetics, and for this native unit to represent the (transformed) data. Then you can do stuff like this:

data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
  ggplot(aes(var1, var2)) + 
  geom_point() +
  # a line exactly 5 points lower than `var2`
  geom_line(aes(y = unit(var2, "native") - unit(5, "pt"))) +
  # labels exactly 10 points left of their points, no matter how the
  # plot is resized
  geom_text(aes(label = name, x = unit(var1, "native") - unit(10, "pt"))) +
  # an annotation that is always 10 points inset from the lower right
  annotate("text", 
    x = unit(1, "npc") - unit(10, "pt"), 
    y = unit(10, "pt"), 
    label = "some label", vjust = 0, hjust = 1
  )

image

The implementation is a work-in-progress (draft PR: #5610). It hides/exposes units in a way similar to what @teunbrand implemented for "AsIs" objects. It works slightly differently in that when a unit column is hidden, the "native" unit contained within the unit expression is left behind so scales can manipulate it, then the transformed native values replace their corresponding values in the hidden unit expression when it is unhidden later.

I was able to do this without modifying most components of the grammar, except for Coords --- these also need to do some hiding/unhiding of units which cannot be done in ggplot_build. The solution I came up with was to do the hiding/unhiding in Coord$transform() and move the implementation of each Coord's transform to Coord$transform_native(), which means that extension package Coords would continue to work (but without supporting units), and they would need only to change the name of their Coord$transform() functions to Coord$transform_native() to get support for units.

I also ran into some snags with grid::unit() in that it (1) does not directly support being added to data frames (needs an as.data.frame implementation); (2) doesn't support zero-length vectors (I had to construct them manually); and (3) needed some additional vctrs methods to be implemented to work within ggplot2 more easily. I put these in the draft PR (#5610), but these are probably more appropriate for some combination of vctrs and grid.

Finally, I think a subclass of unit() specific to ggplot2 (call it ggunit()) could be useful, as it would allow some simplification of syntax and improved semantics specific to the "only one native-unit component of the unit subexpression" interpretation of units. Specifically, if casting rules are written such that numerics are cast to "native" units when combined with ggunits(), this allows instances like unit(var1, "native") above to be replaced with var1, making the code much cleaner:

data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
  ggplot(aes(var1, var2)) + 
  geom_point() +
  # a line exactly 5 points lower than `var2`
  geom_line(aes(y = var2 - ggunit(5, "pt"))) +
  # labels exactly 10 points left of their points, no matter how the
  # plot is resized
  geom_text(aes(label = name, x = var1 - ggunit(10, "pt"))) +
  # an annotation that is always 10 points inset from the lower right
  annotate("text", 
    x = ggunit(1, "npc") - ggunit(10, "pt"), 
    y = ggunit(10, "pt"), 
    label = "some label", vjust = 0, hjust = 1
  )

Some shortcut functions for commonly-used units, like as_pt(...) and as_npc(...), simplify things further:

data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
  ggplot(aes(var1, var2)) + 
  geom_point() +
  # a line exactly 5 points lower than `var2`
  geom_line(aes(y = var2 - as_pt(5))) +
  # labels exactly 10 points left of their points, no matter how the
  # plot is resized
  geom_text(aes(label = name, x = var1 - as_pt(10))) +
  # an annotation that is always 10 points inset from the lower right
  annotate("text", 
    x = as_npc(1) - as_pt(10), 
    y = as_pt(10), 
    label = "some label", vjust = 0, hjust = 1
  )

It's worth noting that this all works with coordinate transformations, too:

data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
  ggplot(aes(var1, var2)) + 
  geom_point() +
  geom_line() +
  # labels exactly 10 points right of their points, no matter how the
  # plot is resized
  geom_text(aes(label = name, x = var1 + as_pt(10))) +
  # an annotation that is always 10 points inset from the lower right
  annotate("text", 
    x = as_npc(1) - as_pt(10), 
    y = as_pt(10), 
    label = "some label", vjust = 0, hjust = 1
  ) +
  coord_polar()

image

On the slightly crazier side, I also prototyped an implementation of pmin and pmax for units, which makes it easy to say things like "put the label 10 pts left/up from the point, but also make sure it's at least 10pts from the plot edge"):

data.frame(var1 = 1:5, var2 = 1:5, name = letters[1:5]) |>
  ggplot(aes(var1, var2)) + 
  geom_point() +
  geom_text(aes(
    label = name, 
    x = ggunit_pmax(var1 - as_pt(10), as_pt(10)), 
    y = ggunit_pmin(var2 + as_pt(10), as_npc(1) - as_pt(10))
  ))

image

(I don't like the names ggunit_pmin and ggunit_pmax; and possibly it would be better to make pmin/pmax generic)

unit in non-positional aesthetics

Getting unit to work in non-positional aesthetics, like size and linewidth, is a bit more complicated. The issue is that the corresponding properties in grid grobs for these aesthetics (e.g. fontsize and lwd) don't take units, so it is necessary to wrap the grid version of the grob to get the desired functionality.

I only did this to geom_point() to test it. Here are some points that are always the same width in data space when resized:

ggplot(data.frame(var1 = 1:5, var2 = c(1,3,3,3,5)), aes(var1, var2)) + 
  geom_point(size = unit(1.33, "native"))

image
image
image

While a bit of meta-programming could make this a straightforward task, it might make more sense to petition to get the underlying grid grobs to fully support units for these properties. Otherwise, extension package developers would not necessarily get these changes "for free", but would have to change to using ggplot's version of each grob. Tagging @pmur002 for thoughts.

Why I think this should be in ggplot2, not an extension

Fundamentally, annotation is very important to good visualization, and can be very fiddly to do well in ggplot2 currently (in fact, in a study of ggplot2 experts I conducted awhile back, this was one of their big pain points; see Sec 5.2.1). Comprehensive support for unit()s could go a long way to making this easier, and I think can only be done within core ggplot2 --- plus, if the solution works well, all (or nearly all) extension packages would then support it too.

@MattCowgill
Copy link
Contributor

Heavy ggplot2 user here: I would absolutely love functionality like this

@mjskay
Copy link
Contributor Author

mjskay commented Dec 23, 2023

Playing around more, here are some triangular points that always rest on the line even when resized (not possible without a custom geom currently AFAIK):

set.seed(1234)
data.frame(var1 = c("a","b"), var2 = rnorm(100, c(1,2))) |>
  ggplot(aes(y = var1, x = var2)) +
  stat_summary(fun.data = mean_se, geom = "linerange") +
  stat_summary(aes(y = stage(var1, after_stat = y + as_pt(4.5))), fun = mean, geom = "point", shape = 25, size = as_pt(5))

image
image

This suggests that perhaps positions should also support units (e.g. so the above could also be done with position_nudge()); the current implementation hides units from positions but that could perhaps be fixed (amount of surgery necessary unknown).

@teunbrand
Copy link
Collaborator

I agree that this would be very nice to have in principle.
At its core, I think grid's units have too many quircks in terms of their almost-but-not-quite vector like behaviour.
As #5610 shows, it requires a lot of defensive programming around these quirks.
I think for this to work, there should be some upstream changes in {grid} that make units behave more like vectors and make it straightforward to get and set a unit type (e.g. 'native' unit) component from a compound unit (e.g. 'sum' units).

unit in non-positional aesthetics

With regards to this, I think mostly the upstream plumbing is the issue, but wrapping gpar() to convert units should be feasible.

@mjskay
Copy link
Contributor Author

mjskay commented Jan 15, 2024

Thanks for taking a look @teunbrand! I agree an important question is what should live where. Re: non-positional aesthetics, I agree that may be largely {grid}, so perhaps this issue should be split into two issues: one for positional aesthetics and the other for non-positional aesthetics, and to focus on positional aesthetics first.

At this stage I would be curious: what is the sense among ggplot2 maintainers that this feature would be useful, and is it worth me investing additional time working on solutions? Basically: is there a chance of this getting in?

If there is, I'd be happy to continue prototyping, and then come back with a proposal (or maybe a few possible proposals) outlining changes to {ggplot2} and {grid} (and maybe {vctrs}) that would be necessary. I can already foresee different levels of complexity for different solutions, so I would probably propose at least a "minimal" solution and one or two more complex versions with varying tradeoffs. But I'd like to know the time wouldn't be wasted before putting the work in ;). Thanks!

@pmur002
Copy link
Contributor

pmur002 commented Jan 15, 2024

A separate issue for discussing improvements to {grid} units sounds useful. There are plenty of complications to consider. For example, something like gpar(fontsize=unit(2, "lines") presents problems of circularity and a thoroughly foolproof guard against that would be challenging. There would also be issues around when to "resolve" units within a gpar(), though there is precedent for this, for example, in the resolution of gradient and pattern fills.

@thomasp85
Copy link
Member

I'd say that there is definitely merit to the idea of providing access to some of grid's unit system for positional aesthetics. However, no amount of hacking or heuristical assumptions on the unit system will get into ggplot2 — that is simply too brittle

I think a good way foreward is to nail down what the minimal amount of features would be that can solve this. Is a single absolute unit enough? Then maybe we can avoid a lot of issues by not using grids unit system at all

As for unit support in non-positional aesthetics that will not make it into ggplot2 before they make it into gpar()

@mjskay
Copy link
Contributor Author

mjskay commented Jan 18, 2024

I'd say that there is definitely merit to the idea of providing access to some of grid's unit system for positional aesthetics. However, no amount of hacking or heuristical assumptions on the unit system will get into ggplot2 — that is simply too brittle

Makes sense :)

I think a good way foreward is to nail down what the minimal amount of features would be that can solve this. Is a single absolute unit enough? Then maybe we can avoid a lot of issues by not using grids unit system at all

Agreed re: nailing down minimal features. Though I'm not sure what you mean by "is a single absolute unit enough"?

I think the minimal version, conceptually, might be that the user needs some way to transform positional variables (x, y, xmin, ymin, etc) as/into units before the grob is created inside Geom$draw_layer(). This would allow them to specify things like "the x position in data space plus 5 points". The transformation also needs to happen after Coord$transform(), as arbitrary units cannot be properly transformed by non-linear transformations (like polar coordinates) because the dimensions of the drawing region are not known at that point. Because the code inside draw_layer() (or methods it calls) calls down to Coord$transform(), this suggests that Coord$transform() should be where these unit transformations are applied.

That is more or less what the prototype I came up with allows, but in a bit of a convoluted way, by essentially hiding the transformation inside a unit() that is an expression of a single native unit and swapping out the value of that single native unit. But as you (@thomasp85) point out, this is pretty hacky.

So maybe the problem is that we need some way for the user to be able to specify "unit transformations", where a unit transformation is a function (or expression or something) that takes in positional variables from the layer data (x, y, xmin, etc) and returns a new value for a positional variable as a unit(), and is applied after Coord$transform() (or most likely, by Coord$transform() as its final step). Then we'd also need an interface for people to specify unit transformations and a vehicle to deliver them to Coord$transform(), perhaps as a datatype that wraps the original value and the unit function, or perhaps as parallel hidden columns a la @teunbrand's solution for I().

One version of the interface could be an "after_coord" stage, like aes(x = stage(some_var, after_coord = unit(x, "native") + unit(5, "pt"))) which could generate the necessary wrapped data type or a hidden column containing the expression to be executed against the coord-transformed data frame.

Although, the hidden-column solution might not work with direct specification of position values outside of aes() or when using annotate(), which could be a big limitation given the value of unit()s in creating annotations. So, a wrapped pair of original data and unit transformation might work better, though the implementation would probably be more complex.

Finally, if desired (personally I think it makes things much more accessible), a lightweight ggunit() subtype of unit() (not like the hacky one in my prototype) could be created to simplify the above to something like aes(x = stage(some_var, after_coord = x + ggunit(5, "pt"))) or aes(x = stage(some_var, after_coord = x + as_pt(5))), by just implementing consistent coercion rules from a numeric to a ggunit in "native" units.

As for unit support in non-positional aesthetics that will not make it into ggplot2 before they make it into gpar()

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants