--- title: "User Guide: 2 Manipulating ggplots" subtitle: "'gginnards' `r packageVersion('gginnards')`" author: "Pedro J. Aphalo" date: "`r Sys.Date()`" output: rmarkdown::html_vignette: toc: yes vignette: > %\VignetteIndexEntry{User Guide: 2 Manipulating ggplots} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include=FALSE, echo=FALSE} library(knitr) opts_chunk$set(fig.align = 'center', fig.show = 'hold', fig.width = 7, fig.height = 4) options(warnPartialMatchArgs = FALSE) ``` ## Introduction The functions described here are not expected to be useful in everyday plotting as when using the _grammar of graphics_ one can simply change the order in which layers are added to a ggplot, or remove unused variables from the data before passing it as argument to the `ggplot()` constructor. However, if one uses high level methods like `autoplot()` or other functions that automatically produce a full plot using 'ggplot2' internally, one may need to add, move or delete layers so as to profit from such canned methods and retain enough flexibility. Some time ago I needed to manipulate the layers of a `ggplot`, and found a [matching question in Stackoverflow](https://stackoverflow.com/questions/13407236/remove-a-layer-from-a-ggplot2-chart). I used the answers found in Stackoverflow as the starting point for writing the functions described in the first part of this vignette. In a `ggplot` object, layers reside in a list, and their positions in the list determine the plotting order when generating the graphical output. The _grammar of graphics_ treats the list of layers as a _stack_ using only _push_ operations. In other words, always the most recently added layer resides at the end of the list, and during rendering over-plots all layers previously added. The functions described in this vignette allow overriding the **normal** syntax at the cost of breaking the expectations of the grammar. These functions are, as told above, to be used only in exceptional cases. This notwithstanding, they are rather easy to use and the user interface is consistent across all of them. Moreover, they are designed to return objects that are identical to objects created using the normal syntax rules of the _grammar of graphics_. The table below list the names and purpose of these functions. Function | Use -------- | ------------------------------------- `delete_layers()` | delete one or more layers `append_layers()` | append layers at a specific position `move_layers()` | move layers to an absolute position `shift_layers()` | move layers to a relative position `which_layers()` | obtain the index positions of layers `extract_layers()` | extract matched or indexed layers `num_layers()` | obtain number of layers `top_layer()` | obtain position of top layer `bottom_layer()` | obtain position of bottom layer Although their definitions do not rely on code internal to 'ggplot2', they rely on the internal structure of objects belonging to class `gg` and `ggplot`. Consequently, long-term backwards and forward compatibility cannot be guaranteed, or even expected. ## Preliminaries ```{r} library(ggplot2) library(gginnards) library(tibble) library(magrittr) library(stringr) eval_pryr <- requireNamespace("pryr", quietly = TRUE) ``` We generate some artificial data and create a data frame with them. ```{r} set.seed(4321) # generate artificial data my.data <- data.frame( group = factor(rep(letters[1:4], each = 30)), panel = factor(rep(LETTERS[1:2], each = 60)), y = rnorm(40), unused = "garbage" ) ``` We add attributes to the data frame with the fake data. ```{r} attr(my.data, "my.atr.char") <- "my.atr.value" attr(my.data, "my.atr.num") <- 12345678 ``` We change the default theme to an uncluttered one. ```{r} old_theme <- theme_set(theme_bw()) ``` We generate a plot to be used later to demonstrate the use of the functions. We ue `expand_limits()` to ensure that the effect of later manipulations is easier to notice. ```{r} p <- ggplot(my.data, aes(group, y)) + geom_point() + stat_summary(fun.data = mean_se, colour = "cornflowerblue", size = 1) + facet_wrap(~panel, scales = "free_x", labeller = label_both) + expand_limits(y = c(-2, 2)) p ``` ## Exploring how ggplots are stored To display summary textual information about a `gg` object we use method `summary()` from package 'ggplot2', while methods `print()` and `plot()` will display the actual plot. ```{r} summary(p) ``` Layers in a ggplot object are stored in a list as nameless members. This means that they have to be accessed using numerical indexes, and that we need to use some indirect way of finding the indexes corresponding to the layers of interest. ```{r} names(p$layers) ``` The output of `summary()` is compact. ```{r} summary(p$layers) ``` The default `print()` method for a list of layers displays only a small part of the information in a layer. ```{r} print(p$layers) ``` To see all the fields, we need to use `str()`, which we use here for a single layer. ```{r} str(p$layers[[1]]) ``` ## Manipulation of plot layers We start by using `which_layers()` as it produces simply a vector of indexes into the list of layers. The third statement is useless here, but demonstrates how layers are selected in all the functions described in this document. We can see that each layer, as described in the first volume of this User Guide, contains one geometry and one statistic. ```{r} which_layers(p, "GeomPoint") which_layers(p, "StatIdentity") which_layers(p, "GeomPointrange") which_layers(p, "StatSummary") which_layers(p, idx = 1L) ``` We can also easily extract matching layers with `extract_layers()`. Here one layer is returned, and displayed using the default `print()` method. Method `str()` can also be used as shown above. ```{r} extract_layers(p, "GeomPoint") ``` With `delete_layers()` we can remove layers from a plot, selecting them using the match to a class, as shown here, or by a positional index as shown next. ```{r} delete_layers(p, "GeomPoint") ``` ```{r} delete_layers(p, idx = 1L) ``` ```{r} delete_layers(p, "StatSummary") ``` With `move_layers()` we can alter the stacking order of layers. The layers to move are selected in the same way as in the examples above, while `position` gives where to move the layers to. Two character strings, `"top"` and `"bottom"` are accepted as `position` argument, as well as `integer`s. In the later case, the layer(s) is/are appended after the supplied position with reference to the list of layers not being moved. ```{r} move_layers(p, "GeomPoint", position = "top") ``` The equivalent operation using a relative position. A positive value for `shift` is interpreted as an upward displacement and a negative one as downwards displacement. ```{r} shift_layers(p, "GeomPoint", shift = +1) ``` Here we show how to add a layer behind all other layers. ```{r} append_layers(p, geom_line(colour = "orange", size = 1), position = "bottom") ``` It is also possible to append the new layer immediately above an arbitrary existing layer using a numeric index, which as shown here can be also obtained by matching to a class name. In this example we insert a new layer in-between two layers already present in the plot. As with the `+` operator of the Grammar of Graphics, `object` also accepts a list of layers as argument (no example shown). ```{r} append_layers(p, object = geom_line(colour = "orange", size = 1), position = which_layers(p, "GeomPoint")) ``` Annotations add layers, so they can be manipulated in the same way as other layers. ```{r} p1 <- p + annotate("text", label = "text label", x = 1.1, y = 0, hjust = 0) p1 ``` ```{r} delete_layers(p1, "GeomText") ``` ## Replacing scales, coordinates, whole themes and data. Elements that are normally _added_ to a ggplot with operator `+`, such as scales, themes, aesthetics can be replaced with the `%+%` operator. The situation with layers is different as a plot may contain multiple layers and layers are nameless. With layers `%+%` is not a replacement operator. ```{r} num_layers(p) num_layers(p %+% geom_point(colour = "blue")) num_layers(p + geom_point(colour = "blue")) ``` ```{r} p1 <- p + theme_bw() p1 p1 + theme_void() p1 %+% theme_void() ``` ## Editing theme elements Method `summary()` is available for themes. ```{r,eval=FALSE} summary(theme_bw()) ``` However, to see the actual values stored, we need to use `str()`. To avoid excessive output we first find the names for the elements of the theme and then look as how the default text settings are stored. ```{r, eval=FALSE} names(theme_bw()) ``` ```{r, eval=FALSE} str(theme_bw()$text) ``` Themes can be modified using `theme()`. See the 'ggplot2' documentation for details. ## Removing unused data The argument passed through `data` to `ggplot()` or a layer is stored in whole in the `ggplot` object, even the data columns not mapped to any aesthetic. In most cases this does not matter, but in the case of huge datasets, the use of RAM and disk space can add up, and occasionally printing of each plot can slow down. The reason for storing the whole data set is that it is always possible to add layers with the grammar of graphics to an existing plot and consequently only the user can know which variables can be removed or not. One obvious way of not storing unused data in `ggplot` objects is for the user to select the required variables and pass only these to the `ggplot()` constructor or layers. A less efficient alternative, but possibly easier to use for some users, is for users to drop the unused variables when they consider that a plot is ready. We show here how to do this, with a function that started as a self-imposed exercise. To simplify the embedded data objects we need to find which variables are mapped to aesthetics and which are not. Here is a naive attempt at handling the possibility of mappings to expressions involving computations and multiple variables per mapping, and facets. This is naive in that it ignores mapping within layers and variables used for faceting. ```{r} mapped.vars <- gsub("[~*\\%^]", " ", as.character(p$mapping)) %>% str_split(boundary("word")) %>% unlist() %>% c(names(p$facet$params$facets)) ``` We need also to find which variables are present in the data. ```{r} data.vars <- names(p$data) ``` Next we identify which variables in `data` are not used, and delete them. ```{r} unused.vars <- setdiff(data.vars, c(mapped.vars)) keep.idxs <- which(!data.vars %in% unused.vars) ``` ```{r} p1 <- p p1$data <- p$data[ , keep.idxs] ``` For a data set this small, removing a single column saves very little space. ```{r} object.size(my.data) object.size(p) object.size(p1) ``` ```{r} names(my.data) names(p$data) names(p1$data) ``` The plot has not changed. ```{r} p1 ``` We can assemble all the code into a function for convenience, and expand the code to also recognize mappings within layers and variables used in faceting. Such a function, only cursorily tested is included in the package as `drop_vars()`. Given its design the most likely failure mode is keeping too many variables rather than removing too many. ```{r} drop_vars(p) ``` When saving `ggplot` objects to disk avoiding to carry along unused data can be beneficial. Of course, removing unused data means that they will not be available at a later time if we want to add more layers to the same saved ggplot object. It was not clear to me when R does make a copy of the data embedded in a `ggplot` object and when not. R's policy is to copy data objects lazily, or only when modified. Does the 'ggplot2' code modify the argument passed to its `data` parameter triggering a real copy operation or not. We can check this with the help of package 'pryr'. ```{r, eval = eval_pryr} pryr::address(my.data) z <- p$data pryr::address(z) ``` In this case, R has not created a copy. So, from the point of view of total memory usage, deleting the unused columns in `p` is not always beneficial. If the object is saved to disk or `my.data` modified in any way after `p` was created a copy of `my.data` will be created at this later time. In this simple example we modify the value of an attribute. ```{r, eval = eval_pryr} attr(my.data, "my.atr.num") <- 1324567 pryr::address(z) pryr::address(my.data) ``` ## Attributes of the embedded data object 'ggplot2' version 3.1.0 and later preserves most attributes of the object passed as argument to the data parameter of the `ggplot()` constructor. The class of the object seems to be modified if it is derived from data frame or tibble, but other attributes are retained in the copy stored in the `gg` object. ```{r} data_attributes(p) ``` Another interesting question is whether these user attributes are copied when data are passed to geometries and statistics. We can find out with `geom_debug_panel()` that they are not. ```{r} p + geom_debug_panel(dbgfun.data = attributes, dbgfun.params = NULL) ``` ## Coda The are many other things that we could explore about ggplot objects, but a package to be submitted to CRAN cannot have too many pages of documentation, so we hope this package and its documentation can serve as a starting point for further exploration.