| Title: | Miscellaneous Extensions to 'ggplot2' |
|---|---|
| Description: | Extensions to 'ggplot2' respecting the grammar of graphics paradigm. Statistics to locate and tag peaks and valleys and to label plots with the equation of a fitted polynomial model by ordinary least squares, major axis, quantile and robust and resistant regression approaches. Line and model equation for Normal mixture models. Labels for P-value, R^2 or adjusted R^2 or information criteria for fitted models; parametric and non-parametric correlation; ANOVA table or summary table for fitted models as plot insets; annotations for multiple pairwise comparisons with adjusted P-values. Model fit classes for which suitable methods are provided by package 'broom' and 'broom.mixed' are supported as well as user-defined wrappers on model fit functions, allowing model selection and conditional labelling. Scales and stats to build volcano and quadrant plots based on outcomes, fold changes, p-values and false discovery rates. |
| Authors: | Pedro J. Aphalo [aut, cre] (ORCID: <https://orcid.org/0000-0003-3385-972X>), Kamil Slowikowski [ctb] (ORCID: <https://orcid.org/0000-0002-2843-6370>), Samer Mouksassi [ctb] (ORCID: <https://orcid.org/0000-0002-7152-6654>) |
| Maintainer: | Pedro J. Aphalo <[email protected]> |
| License: | GPL (>= 2) |
| Version: | 0.7.0.9003 |
| Built: | 2026-06-05 11:31:51 UTC |
| Source: | https://github.com/aphalo/ggpmisc |
Extensions to 'ggplot2' respecting the grammar of graphics paradigm. Statistics to locate and tag peaks and valleys and to label plots with the equation of a fitted polynomial model by ordinary least squares, major axis, quantile and robust and resistant regression approaches. Line and model equation for Normal mixture models. Labels for P-value, R^2 or adjusted R^2 or information criteria for fitted models; parametric and non-parametric correlation; ANOVA table or summary table for fitted models as plot insets; annotations for multiple pairwise comparisons with adjusted P-values. Model fit classes for which suitable methods are provided by package 'broom' and 'broom.mixed' are supported as well as user-defined wrappers on model fit functions, allowing model selection and conditional labelling. Scales and stats to build volcano and quadrant plots based on outcomes, fold changes, p-values and false discovery rates.
Package 'ggpmisc' is over 10 years-old but its development has tracked the changes in 'ggplot2' making possible the use of several new features soon after they became available. Support for additional model fitting functions has been added regularly.
The focus of package 'ggpmisc' is on statistical annotations, providing stats that generate labels useful to annotate plots and matching stats for consitenly adding prediction lines and bands. Model fitting is done by calling functions already available in R and other R packages. No new model fit method or algorithms are implemented, instead what 'ggpmisc' provides are new simpler ways of adding fitted values and other statistics as plot annotations.
Several geometries for annotations from package 'ggpp' are used by default in 'ggpmisc' statistics, with labels formatted by default ready to be parsed into R's plotmath expressions. However, other geometries can be also used. Two variations of Markdown-formatted labels work smoothly with geoms from package 'ggtext' or from package 'marquee'. LaTeX-formatted labels work smoothly with package 'xdvir' and most likely also with other approaches to the use of 'LaTeX' and 'TeX' formatted labels. 'LaTeX'-formatted labels can be generated as bare maths-mode-encoded text, or enclosed in "fences" that enable either in-line or display-maths modes.
The label formatting functions used to implement the statistics and scales are exported and can be used as an aid in building customised labels and scales.
Extensions provided:
Statistics for annotations for parametric and non-parametric correlations.
Statistics for generation of labels for fitted models, including formatted equations. By default labels are R's plotmath expressions but LaTeX, markdown and plain text formatted labels are optionally returned.
Matching statistics for plotting curves and confidence bands bands for the same fitted models.
Statistics for adding ANOVA tables and fitted model summaries as inset tables in plots.
Statistic for adding annotations based on pairwise multiple comparisons based on arbitrary contrasts and a choice of P adjustment methods.
Statistics for locating and tagging "peaks" and "valleys" (local or global maxima and minima) and spikes (very narrow peaks or valleys).
Access to functions and objects exported by package ggpp.
The signatures of stat_peaks() and stat_valleys() from
'ggpmisc' are nearly identical to those of stat_peaks() and
stat_valleys() from package 'ggspectra'. While those from 'ggpmisc'
are designed for numeric or time objects mapped to the x aesthetic,
those from 'ggspectra' are for light spectra and expect a numeric variable
describing wavelength mapped to the x aesthetic.
Maintainer: Pedro J. Aphalo [email protected] (ORCID)
Authors:
Pedro J. Aphalo [email protected] (ORCID)
Other contributors:
Kamil Slowikowski (ORCID) [contributor]
Samer Mouksassi [email protected] (ORCID) [contributor]
Useful links:
Report bugs at https://github.com/aphalo/ggpmisc/issues
ggplot(lynx, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red") + stat_peaks(geom = "text", colour = "red", angle = 66, hjust = -0.1, x.label.fmt = "%Y") + ylim(NA, 8000) formula <- y ~ poly(x, 2, raw = TRUE) ggplot(cars, aes(speed, dist)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("eq", "R2", "P"), formula = formula, parse = TRUE) + labs(x = expression("Speed, "*x~("mph")), y = expression("Stopping distance, "*y~("ft"))) formula <- y ~ x ggplot(PlantGrowth, aes(group, weight)) + stat_summary(fun.data = "mean_se") + stat_fit_tb(method = "lm", method.args = list(formula = formula), tb.type = "fit.anova", tb.vars = c(Term = "term", "df", "M.S." = "meansq", "italic(F)" = "statistic", "italic(p)" = "p.value"), tb.params = c("Group" = 1, "Error" = 2), table.theme = ttheme_gtbw(parse = TRUE)) + labs(x = "Group", y = "Dry weight of plants") + theme_classic()ggplot(lynx, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red") + stat_peaks(geom = "text", colour = "red", angle = 66, hjust = -0.1, x.label.fmt = "%Y") + ylim(NA, 8000) formula <- y ~ poly(x, 2, raw = TRUE) ggplot(cars, aes(speed, dist)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("eq", "R2", "P"), formula = formula, parse = TRUE) + labs(x = expression("Speed, "*x~("mph")), y = expression("Stopping distance, "*y~("ft"))) formula <- y ~ x ggplot(PlantGrowth, aes(group, weight)) + stat_summary(fun.data = "mean_se") + stat_fit_tb(method = "lm", method.args = list(formula = formula), tb.type = "fit.anova", tb.vars = c(Term = "term", "df", "M.S." = "meansq", "italic(F)" = "statistic", "italic(p)" = "p.value"), tb.params = c("Group" = 1, "Error" = 2), table.theme = ttheme_gtbw(parse = TRUE)) + labs(x = "Group", y = "Dry weight of plants") + theme_classic()
Replace NULL output.type based on geom and validate
other values. Convert synonyms and change into lower case mal-formed
input.
check_output_type( output.type, geom = "text", supported.types = c("expression", "text", "markdown", "marquee", "numeric", "latex", "latex.eqn", "latex.deqn") )check_output_type( output.type, geom = "text", supported.types = c("expression", "text", "markdown", "marquee", "numeric", "latex", "latex.eqn", "latex.deqn") )
output.type |
character User-set argument or default from stat. |
geom |
character The name of the geom that will be used to render the labels. |
supported.types |
character vector of accepted values for user input. |
If output.type is NULL a suitable value based on the
name of the geom is returned, defaulting to "expression". If not
NULL, the value is passed through unchanged.
The formatting of character strings to be displayed in plots are marked as mathematical equations. Depending on the geom used, the mark-up needs to be encoded differently, or in some cases mark-up not applied.
"expression"The labels are encoded as character strings to be parsed into R's plotmath expressions.
"LaTeX", "TeX", "tikz", "latex"The labels are encoded as 'LaTeX' maths equations, without the "fences" for switching in math mode.
"latex.eqn"Same as "latex" but enclosed in single $, i.e., as in-line maths.
"latex.deqn"Same as "latex" but enclosed in double $$, i.e., as display maths.
"markdown"The labels are encoded as character strings using markdown syntax, with some embedded HTML.
"marquee"The labels are encoded as character strings using markdown syntax, with 'marquee' supported spans.
"text"The labels are plain ASCII character strings.
"numeric"No labels are generated. This value is accepted by the statistics, but not by the label formatting functions.
NULLThe value used depends on the argument passed to geom.
If geom = "latex" (package 'xdvir') the output type used is
"latex.eqn". If geom = "richtext" (package 'ggtext') or
geom = "textbox" (package 'ggtext') the output type used is
"markdown". If geom = "marquee" (package 'marquee') the output
type used is "marquee". For all other values of geom the default
is "expression". Invalid values as argument trigger an error.
check_output_type(NULL) check_output_type("text") check_output_type(NULL, geom = "text") check_output_type(NULL, geom = "latex")check_output_type(NULL) check_output_type("text") check_output_type(NULL, geom = "text") check_output_type(NULL, geom = "latex")
Analyse a model formula to determine if it describes a polynomial with terms in order of increasing powers, and fulfils the expectations of the algorithm used to generate the equation-label.
check_poly_formula( formula, x.name = "x", warn.incr.poly.text = "'formula' not an increasing polynomial: 'eq.label' set to NA!", warn.transf.rhs.txt = paste0("rhs includes transformations requiring an argument for ", "'eq.x.rhs': 'eq.label' set to NA!."), warn.transf.lhs.txt = paste0("lhs includes transformations requiring an argument for ", "'eq.with.lhs': 'eq.label' set to NA!."), warn.as.is.txt = paste0("Power (^) terms in model formula of a polynomial need to ", "be protected by 'I()': 'eq.label' set to NA!."), warn.poly.raw.txt = paste0("'poly()' in model formula has to be passed 'raw = TRUE': ", "'eq.label' set to NA!"), stop.pow.poly.text = "Both 'poly()' and power (^) terms in model formula.", check.transf.rhs = TRUE, check.transf.lhs = TRUE )check_poly_formula( formula, x.name = "x", warn.incr.poly.text = "'formula' not an increasing polynomial: 'eq.label' set to NA!", warn.transf.rhs.txt = paste0("rhs includes transformations requiring an argument for ", "'eq.x.rhs': 'eq.label' set to NA!."), warn.transf.lhs.txt = paste0("lhs includes transformations requiring an argument for ", "'eq.with.lhs': 'eq.label' set to NA!."), warn.as.is.txt = paste0("Power (^) terms in model formula of a polynomial need to ", "be protected by 'I()': 'eq.label' set to NA!."), warn.poly.raw.txt = paste0("'poly()' in model formula has to be passed 'raw = TRUE': ", "'eq.label' set to NA!"), stop.pow.poly.text = "Both 'poly()' and power (^) terms in model formula.", check.transf.rhs = TRUE, check.transf.lhs = TRUE )
formula |
A model formula in |
x.name |
character The name of the explanatory variable in the formula. |
warn.incr.poly.text, warn.transf.lhs.txt, warn.transf.rhs.txt, warn.as.is.txt, warn.poly.raw.txt, stop.pow.poly.text
|
character Text for warnings and errors. |
check.transf.rhs, check.transf.lhs
|
logical flag enabling test for transformation of variables. |
The assumption is that this function will be called from within a
ggplot2 compatible layer function, and that model formulas will always have
a single explanatory variable, variables will be x and y. Its
behaviour is undefined or erroneous in other cases.
This validation check could return a false positive or a false negative
results with some formulas as it is difficult to test, or even list all
possible variations of supported vs. unsupported formulas. This makes
testing difficult. In addition, many valid model formulas that can be
succesfully fitted, are not correctly converted into character labels.
Thus, this function triggers a warning in case of failure, not an error,
and returns a logic value. If this value is FALSE, the statistics in
'ggpmisc' skip the generation of an equation label, setting it to
NA. However, if the formula is accepted by the model fit function,
other labels and the numeric estimates of the fitted coefficients remain
usable. The stats can be used also with models that are not polynomials or
containing transformations.
Model formulas with and without an intercept term are accepted as valid, as
+0, -1 and +1 are accepted. If a single as.is
power term is included or if arithmetic (sqrt(), exp(),
log()), or trigonometric functions (cos(), sin(),
tan(), etc.) are encountered a warning is issued about the need to
pass a matching argument to parameter eq.x.rhs of the statistic.
If two or more terms are as.is (I( ) protected) powers
(^), they are expected to be in increasing order with no missing
intermediate power terms. If poly() is used in the model formula, a
single term is expected. When calling function poly(),
raw = TRUE must be passed to obtain suitable estimates for the
fitted coefficients, and this is also checked.
When the formula rhs contains more than one power term, all power
terms defined using ^ must be protected as "as.is"
I(), as otherwise they are not powers but instead part of the
formula specification.
If the warning text is NULL or character(0) no warning is
issued, but the test is done. In contrast, check.transf.rhs,check =
FALSE and transf.lhs = FALSE skip these two tests. The caller
always receives a length-1 logical as returned value.
A logical, TRUE if the formula describes an increasing polynomial suitable for conversion into a text label, and FALSE otherwise. When validation fails, warnings are issued describing the problem encountered.
# polynomials check_poly_formula(y ~ 1) check_poly_formula(y ~ x) check_poly_formula(y ~ x^3) check_poly_formula(y ~ x + 0) check_poly_formula(y ~ x - 1) check_poly_formula(y ~ x + 1) check_poly_formula(y ~ x + I(x^2)) check_poly_formula(y ~ 1 + x + I(x^2)) check_poly_formula(y ~ x + I(x^2) + I(x^3)) check_poly_formula(y ~ I(x) + I(x^2) + I(x^3)) # transformations on x, first degree polynomials check_poly_formula(y ~ sqrt(x)) check_poly_formula(y ~ log(x)) check_poly_formula(y ~ I(x^2)) # incomplete or terms in decreasing/mixed order check_poly_formula(y ~ I(x^2) + x) check_poly_formula(y ~ I(x^2) + I(x^3)) check_poly_formula(y ~ I(x^2) + I(x^4)) check_poly_formula(y ~ x + I(x^3) + I(x^2)) # polynomials using poly() check_poly_formula(y ~ poly(x, 2, raw = TRUE)) # label o.k. check_poly_formula(y ~ poly(x, 2)) # orthogonal polynomial -> bad label# polynomials check_poly_formula(y ~ 1) check_poly_formula(y ~ x) check_poly_formula(y ~ x^3) check_poly_formula(y ~ x + 0) check_poly_formula(y ~ x - 1) check_poly_formula(y ~ x + 1) check_poly_formula(y ~ x + I(x^2)) check_poly_formula(y ~ 1 + x + I(x^2)) check_poly_formula(y ~ x + I(x^2) + I(x^3)) check_poly_formula(y ~ I(x) + I(x^2) + I(x^3)) # transformations on x, first degree polynomials check_poly_formula(y ~ sqrt(x)) check_poly_formula(y ~ log(x)) check_poly_formula(y ~ I(x^2)) # incomplete or terms in decreasing/mixed order check_poly_formula(y ~ I(x^2) + x) check_poly_formula(y ~ I(x^2) + I(x^3)) check_poly_formula(y ~ I(x^2) + I(x^4)) check_poly_formula(y ~ x + I(x^3) + I(x^2)) # polynomials using poly() check_poly_formula(y ~ poly(x, 2, raw = TRUE)) # label o.k. check_poly_formula(y ~ poly(x, 2)) # orthogonal polynomial -> bad label
coef is a generic function which extracts model coefficients from
objects returned by modeling functions. coefficients is an alias for
it.
## S3 method for class 'lmodel2' coef(object, method = "MA", ...)## S3 method for class 'lmodel2' coef(object, method = "MA", ...)
object |
a fitted model object. |
method |
character One of the methods available in |
... |
ignored by this method. |
Function lmodel2() from package 'lmodel2' returns a fitted
model object of class "lmodel2" which differs from that returned by
lm(). Here we implement a coef() method for objects of this
class. It differs from de generic method and that for lm objects in having
an additional formal parameter method that must be used to select
estimates based on which of the methods supported by lmodel2() are
to be extracted. The returned object is identical in its structure to that
returned by coef.lm().
A named numeric vector of length two.
Uses a vector of coefficients from a model fit of a polynomial to build the fitted model equation with embedded coefficient estimates.
coefs2poly_eq( coefs, coef.digits = 3L, coef.keep.zeros = TRUE, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), eq.x.rhs = "x", lhs = "y~`=`~", output.type = "expression", decimal.mark = "." )coefs2poly_eq( coefs, coef.digits = 3L, coef.keep.zeros = TRUE, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), eq.x.rhs = "x", lhs = "y~`=`~", output.type = "expression", decimal.mark = "." )
coefs |
numeric Terms always sorted by increasing powers. |
coef.digits |
integer |
coef.keep.zeros |
logical This flag refers to trailing zeros. |
decreasing |
logical It specifies the order of the terms in the returned character string; in increasing (default) or decreasing powers. |
eq.x.rhs |
character |
lhs |
character |
output.type |
character One of "expression", "latex", "tex", "text", "tikz", "markdown", "marquee". |
decimal.mark |
character |
A character string.
Terms with zero-valued coefficients are dropped from the polynomial.
coefs2poly_eq(c(1, 2, 0, 4, 5, 2e-5)) coefs2poly_eq(c(1, 2, 0, 4, 5, 2e-5), output.type = "latex") coefs2poly_eq(0:2) coefs2poly_eq(0:2, decreasing = TRUE) coefs2poly_eq(c(1, 2, 0, 4, 5), coef.keep.zeros = TRUE) coefs2poly_eq(c(1, 2, 0, 4, 5), coef.keep.zeros = FALSE)coefs2poly_eq(c(1, 2, 0, 4, 5, 2e-5)) coefs2poly_eq(c(1, 2, 0, 4, 5, 2e-5), output.type = "latex") coefs2poly_eq(0:2) coefs2poly_eq(0:2, decreasing = TRUE) coefs2poly_eq(c(1, 2, 0, 4, 5), coef.keep.zeros = TRUE) coefs2poly_eq(c(1, 2, 0, 4, 5), coef.keep.zeros = FALSE)
Computes confidence intervals for one or more parameters in a fitted model. This a method for objects inheriting from class "lmodel2".
## S3 method for class 'lmodel2' confint(object, parm, level = 0.95, method = "MA", ...)## S3 method for class 'lmodel2' confint(object, parm, level = 0.95, method = "MA", ...)
object |
a fitted model object. |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. Currently only 0.95 accepted. |
method |
character One of the methods available in |
... |
ignored by this method. |
Function lmodel2() from package 'lmodel2' returns a fitted
model object of class "lmodel2" which differs from that returned by
lm(). Here we implement a confint() method for objects of
this class. It differs from the generic method and that for lm objects in
having an additional formal parameter method that must be used to
select estimates based on which of the methods supported by
lmodel2() are to be extracted. The returned object is identical in
its structure to that returned by confint.lm().
A data frame with two rows and three columns.
These functions find peaks (maxima) and valleys (minima) in a numeric vector,
using a user selectable span and global and local size thresholds, returning
a logical vector.
find_peaks( x, global.threshold = NULL, local.threshold = NULL, local.reference = "median", threshold.range = NULL, span = 3, strict = FALSE, na.rm = FALSE ) find_valleys( x, global.threshold = NULL, local.threshold = NULL, local.reference = "median", threshold.range = NULL, span = 3, strict = FALSE, na.rm = FALSE )find_peaks( x, global.threshold = NULL, local.threshold = NULL, local.reference = "median", threshold.range = NULL, span = 3, strict = FALSE, na.rm = FALSE ) find_valleys( x, global.threshold = NULL, local.threshold = NULL, local.reference = "median", threshold.range = NULL, span = 3, strict = FALSE, na.rm = FALSE )
x |
numeric vector. |
global.threshold |
numeric A value belonging to class |
local.threshold |
numeric A value belonging to class |
local.reference |
character One of |
threshold.range |
numeric vector If of length 2 or a longer vector
|
span |
odd positive integer A peak is defined as an element in a
sequence which is greater than all other elements within a moving window of
width |
strict |
logical flag: if |
na.rm |
logical indicating whether |
As find_valleys, stat_peaks and stat_valleys
call find_peaks to search for peaks or valleys, this description
applies to all four functions.
Function find_peaks is a wrapper built onto function
peaks from splus2R, adds support for peak
height thresholds and handles span = NULL and non-finite (including
NA) values differently than splus2R::peaks. Instead of giving an
error when na.rm = FALSE and x contains NA values,
NA values are replaced with the smallest finite value in x.
span = NULL is treated as a special case and selects max(x).
Passing 'strict = TRUE' ensures that multiple global and within window
maxima are ignored, and can result in no peaks being returned.#'
Two tests make it possible to ignore irrelevant peaks. One test
(global.threshold) is based on the absolute height of the peaks and
can be used in all cases to ignore globally low peaks. A second test
(local.threshold) is available when the window defined by 'span'
does not include all observations and can be used to ignore peaks that are
not locally prominent. In this second approach the height of each peak is
compared to a summary computed from other values within the window of width
equal to span where it was found. In this second case, the reference
value used within each window containing a peak is given by
local.reference. Parameter threshold.range determines how the
bare numeric values passed as argument to global.threshold
and local.threshold are scaled. The default, NULL uses the
range of x. Thresholds for ignoring too small peaks are applied
after peaks are searched for, and threshold values can in some cases result
in no peaks being found. If either threshold is not available (NA)
the returned value is a NA vector of the same length as x.
The local.threshold argument is used as is when
local.reference is "median" or "farthest", i.e., the
same distance between peak and reference is used as cut-off irrespective of
the value of the reference. In cases when the prominence of peaks is
positively correlated with the baseline, a local.threshold that
increases together with increasing computed within window median or
farthest value applies apply a less stringent height requirement in regions
with overall low height. In this case, natural logarithm or square root
weighting can be requested with 'local.reference' arguments '"median.log"',
'"farthest.log"', '"median.sqrt"', and '"farthest.sqrt"' as arguments for
local.reference.
A vector of logical values of the same length as x. Values
that are TRUE correspond to local peaks in vector x and can be used
to extract the rows corresponding to peaks from a data frame.
The default for parameter strict is FALSE in functions
find_peaks() and find_valleys(), while it is
strict = TRUE in peaks.
Other peaks and valleys functions:
find_spikes()
# lynx is a time.series object lynx_num.df <- try_tibble(lynx, col.names = c("year", "lynx"), as.numeric = TRUE) # years -> as numeric which(find_peaks(lynx_num.df$lynx, span = 5)) which(find_valleys(lynx_num.df$lynx, span = 5)) lynx_num.df[find_peaks(lynx_num.df$lynx, span = 5), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = 51), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = NULL), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = 15, global.threshold = 2/3), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = 15, global.threshold = I(4000)), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = 15, local.threshold = 0.5), ]# lynx is a time.series object lynx_num.df <- try_tibble(lynx, col.names = c("year", "lynx"), as.numeric = TRUE) # years -> as numeric which(find_peaks(lynx_num.df$lynx, span = 5)) which(find_valleys(lynx_num.df$lynx, span = 5)) lynx_num.df[find_peaks(lynx_num.df$lynx, span = 5), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = 51), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = NULL), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = 15, global.threshold = 2/3), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = 15, global.threshold = I(4000)), ] lynx_num.df[find_peaks(lynx_num.df$lynx, span = 15, local.threshold = 0.5), ]
Find spikes in a numeric vector using the algorithm of Whitaker and Hayes (2018). Spikes are values in spectra that are unusually high or low compared to neighbours. They are usually individual values or very short runs of similar "unusual" values. Spikes caused by cosmic radiation are a frequent problem in Raman spectra. Another source of spikes are "hot pixels" in CCD and diode arrays. Other kinds of accidental "outliers" can be also detected.
find_spikes( x, x.is.delta = FALSE, height.threshold = 10, z.threshold = 5, k = 20, spike.direction = "both", na.rm = FALSE )find_spikes( x, x.is.delta = FALSE, height.threshold = 10, z.threshold = 5, k = 20, spike.direction = "both", na.rm = FALSE )
x |
numeric vector containing the data. |
x.is.delta |
logical Flag indicating whether |
height.threshold |
numeric The minimum height of spikes expressed
relative to the median amplitude of the baseline local variation of
|
z.threshold |
numeric Modified local |
k |
integer width of median window used for smoothing; must be odd |
spike.direction |
character One of |
na.rm |
logical indicating whether |
Spikes are detected based on a modified score calculated
from the differenced spectrum. The threshold used should be
adjusted to the characteristics of the input and desired sensitivity. The
lower the threshold the more stringent the test becomes, with shorter
spikes being detected.
The algorithm uses running differences to detect abrupt changes in value,
compared to an estimate of the baseline variation of the differences,
approximating a baseline from MAD and a baseline value from the
median differences. Currently, a single estimate of MAD is used but running
medians, when posisble, as baseline. This comparison detects running
differences that are unusually large, in most cases signalling a transition
between values near the baseline and far from it, in both directions.
Transitions into- and out of spikes are distinguished based on the median of the non-differenced values, as a descriptor of the data baseline. As for the median of the differences, a running median is used when possible.
This function thus detects the start and end of each spike, and distinguishes upward and downward spikes.
k is the width in number of observations of the window used for
running median smoothing to extract the baseline. A value several times the
width of the broader spike but narrow enough to track broader peaks needs
to be manually set in most cases.
With na.rm = TRUE, NA values are omitted before searching for
spikes and set to 0L in the returned vector.
If all spikes are guaranteed to be one observation-wide and either going up
or down from the baseline, it is possible to detect them based purely on
the z.threshold by passing height.threshold = NA and either
spike.direction = "up" or spike.direction = "down", which
ensures very fast computation.
An integer vector of the same length as x. Values that are
0, +1 or -1 corresponding to no-spike, upwards-spike,
and downwards-spike in the data. Conversion to logical with
as.logical() results in a vector with TRUE for spikes and
FALSE otherwise.
Whitaker, D. A.; Hayes, K. (2018) A simple algorithm for despiking Raman spectra. Chemometrics and Intelligent Laboratory Systems, 179, 82-84. doi:10.1016/j.chemolab.2018.06.009.
Other peaks and valleys functions:
find_peaks()
Methods implemented in package 'broom' to tidy, glance and augment the output
from model fits return a consistently organized tibble with generic column
names. Although this simplifies later steps in the data analysis and
reporting, it drops key information needed for interpretation.
keep_tidy() makes it possible to retain fields from the model fit
object passed as argument to parameter x in the attribute "fm".
The class of x is always stored, and by default also fields
"call", "terms", "formula", "fixed" and
"random" if available.
keep_tidy(x, ..., to.keep = c("call", "terms", "formula", "fixed", "random")) keep_glance(x, ..., to.keep = c("call", "terms", "formula", "fixed", "random")) keep_augment( x, ..., to.keep = c("call", "terms", "formula", "fixed", "random") )keep_tidy(x, ..., to.keep = c("call", "terms", "formula", "fixed", "random")) keep_glance(x, ..., to.keep = c("call", "terms", "formula", "fixed", "random")) keep_augment( x, ..., to.keep = c("call", "terms", "formula", "fixed", "random") )
x |
An object for which |
... |
Other named arguments passed along to |
to.keep |
character vector of field names in |
Functions keep_tidy(), keep_glance or
keep_augment are simple wrappers of the generic methods which make
it possible to add to the returned values an attribute named "fm"
preserving user selected fields and class of the model fit object. Fields
names in to.keep missing in x are silently ignored.
# these examples can only be run if package 'broom' is available if (requireNamespace("broom", quietly = TRUE)) { library(broom) mod <- lm(mpg ~ wt + qsec, data = mtcars) attr(keep_tidy(mod), "fm")[["class"]] attr(keep_glance(mod), "fm")[["class"]] attr(keep_augment(mod), "fm")[["class"]] attr(keep_tidy(summary(mod)), "fm")[["class"]] library(MASS) rmod <- rlm(mpg ~ wt + qsec, data = mtcars) attr(keep_tidy(rmod), "fm")[["class"]] }# these examples can only be run if package 'broom' is available if (requireNamespace("broom", quietly = TRUE)) { library(broom) mod <- lm(mpg ~ wt + qsec, data = mtcars) attr(keep_tidy(mod), "fm")[["class"]] attr(keep_glance(mod), "fm")[["class"]] attr(keep_augment(mod), "fm")[["class"]] attr(keep_tidy(summary(mod)), "fm")[["class"]] library(MASS) rmod <- rlm(mpg ~ wt + qsec, data = mtcars) attr(keep_tidy(rmod), "fm")[["class"]] }
Convert numeric ternary outcomes into a factor
outcome2factor(x, n.levels = 3L) threshold2factor(x, n.levels = 3L, threshold = 0)outcome2factor(x, n.levels = 3L) threshold2factor(x, n.levels = 3L, threshold = 0)
x |
a numeric vector of -1, 0, and +1 values, indicating down-regulation, uncertain response or up-regulation, or a numeric vector that can be converted into such values using a pair of thresholds. |
n.levels |
numeric Number of levels to create, either 3 or 2. |
threshold |
numeric vector Range enclosing the values to be considered uncertain. |
These functions convert the numerically encoded values into a factor
with the three levels "down", "uncertain" and "up", or
into a factor with two levels de and uncertain as expected by
default by scales scale_colour_outcome,
scale_fill_outcome and scale_shape_outcome.
When n.levels = 2 both -1 and +1 are merged to the same level of the
factor with label "de".
These are convenience functions that only save some typing. The same
result can be achieved by a direct call to factor and
comparisons. These functions aim at making it easier to draw volcano and
quadrant plots.
Other Functions for quadrant and volcano plots:
FC_format(),
scale_colour_outcome(),
scale_shape_outcome(),
scale_y_Pvalue(),
xy_outcomes2factor()
Other scales for omics data:
scale_colour_logFC(),
scale_shape_outcome(),
scale_x_logFC(),
xy_outcomes2factor()
outcome2factor(c(-1, 1, 0, 1)) outcome2factor(c(-1, 1, 0, 1), n.levels = 2L) threshold2factor(c(-0.1, -2, 0, +5)) threshold2factor(c(-0.1, -2, 0, +5), n.levels = 2L) threshold2factor(c(-0.1, -2, 0, +5), threshold = c(-1, 1))outcome2factor(c(-1, 1, 0, 1)) outcome2factor(c(-1, 1, 0, 1), n.levels = 2L) threshold2factor(c(-0.1, -2, 0, +5)) threshold2factor(c(-0.1, -2, 0, +5), n.levels = 2L) threshold2factor(c(-0.1, -2, 0, +5), threshold = c(-1, 1))
These functions format numeric values as character labels including the symbol for statistical parameter estimates suitable for adding to plots. The labels can be formatted as strings to be parsed as plotmath expressions, or encoded using LaTeX or Markdown.
plain_label( value, value.name, digits = 3, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) italic_label( value, value.name, digits = 3, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) bold_label( value, value.name, digits = 3, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) p_value_label( value, small.p = getOption("ggpmisc.small.p", default = FALSE), subscript = "", superscript = "", digits = 4, fixed = NULL, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) f_value_label( value, df1 = NULL, df2 = NULL, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) t_value_label( value, df = NULL, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) z_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) S_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) mean_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) var_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) sd_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) se_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) r_label( value, method = "pearson", small.r = getOption("ggpmisc.small.r", default = FALSE), digits = 3, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) rr_label( value, small.r = getOption("ggpmisc.small.r", default = FALSE), digits = 3, pc.out = FALSE, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) adj_rr_label( value, small.r = getOption("ggpmisc.small.r", default = FALSE), digits = 3, pc.out = FALSE, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) rr_ci_label( value, conf.level, range.brackets = c("[", "]"), range.sep = NULL, digits = 2, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) r_ci_label( value, conf.level, small.r = getOption("ggpmisc.small.r", default = FALSE), range.brackets = c("[", "]"), range.sep = NULL, digits = 2, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") )plain_label( value, value.name, digits = 3, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) italic_label( value, value.name, digits = 3, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) bold_label( value, value.name, digits = 3, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) p_value_label( value, small.p = getOption("ggpmisc.small.p", default = FALSE), subscript = "", superscript = "", digits = 4, fixed = NULL, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) f_value_label( value, df1 = NULL, df2 = NULL, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) t_value_label( value, df = NULL, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) z_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) S_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) mean_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) var_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) sd_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) se_value_label( value, digits = 4, fixed = FALSE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) r_label( value, method = "pearson", small.r = getOption("ggpmisc.small.r", default = FALSE), digits = 3, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) rr_label( value, small.r = getOption("ggpmisc.small.r", default = FALSE), digits = 3, pc.out = FALSE, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) adj_rr_label( value, small.r = getOption("ggpmisc.small.r", default = FALSE), digits = 3, pc.out = FALSE, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) rr_ci_label( value, conf.level, range.brackets = c("[", "]"), range.sep = NULL, digits = 2, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") ) r_ci_label( value, conf.level, small.r = getOption("ggpmisc.small.r", default = FALSE), range.brackets = c("[", "]"), range.sep = NULL, digits = 2, fixed = TRUE, output.type = "expression", decimal.mark = getOption("OutDec", default = ".") )
value |
numeric vector The value of the estimate(s), accepted vector length depends on the function. |
value.name |
character The symbol used to represent the value, or its name. |
digits |
integer Number of digits to which numeric values are formatetd. |
fixed |
logical Interpret |
output.type |
character One of "expression", "latex", "tex", "text", "tikz", "markdown". "marquee". |
decimal.mark |
character Defaults to the value of R option
|
small.p, small.r
|
logical If |
subscript, superscript
|
character Text for a subscript and superscript to P symbol. |
df, df1, df2
|
numeric The degrees of freedom of the estimate. |
method |
character The method used to estimate correlation, which selects the symbol used for the value. |
pc.out |
logical If |
conf.level |
numeric critical P-value expressed as fraction in [0..1]. |
range.brackets, range.sep
|
character Strings used to format a range. |
A character string with formatting, encoded to be parsed as an R plotmath expression, as plain text, as markdown or to be used with 'LaTeX' within math mode.
plain_label(value = 123, value.name = "n", output.type = "expression") plain_label(value = 123, value.name = "n", output.type = "markdown") plain_label(value = 123, value.name = "n", output.type = "latex") italic_label(value = 123, value.name = "n", output.type = "expression") italic_label(value = 123, value.name = "n", output.type = "markdown") italic_label(value = 123, value.name = "n", output.type = "latex") bold_label(value = 123, value.name = "n", output.type = "expression") bold_label(value = 123, value.name = "n", output.type = "markdown") bold_label(value = 123, value.name = "n", output.type = "latex") plain_label(value = NA, value.name = "n", output.type = "expression") plain_label(value = c(123, NA), value.name = "n", output.type = "latex") plain_label(value = c(123, 1.2), value.name = "n", output.type = "expression") plain_label(value = c(123, 1.2), value.name = "n", output.type = "markdown") plain_label(value = c(123, 1.2), value.name = "n", output.type = "latex") p_value_label(value = 0.345, digits = 2, output.type = "expression") p_value_label(value = 0.345, digits = Inf, output.type = "expression") p_value_label(value = 0.345, digits = 6, output.type = "expression") p_value_label(value = 0.345, output.type = "markdown") p_value_label(value = 0.345, output.type = "latex") p_value_label(value = 0.345, subscript = "Holm") p_value_label(value = 1e-25, digits = Inf, output.type = "expression") f_value_label(value = 123.4567, digits = 2, output.type = "expression") f_value_label(value = 123.4567, digits = Inf, output.type = "expression") f_value_label(value = 123.4567, digits = 6, output.type = "expression") f_value_label(value = 123.4567, output.type = "markdown") f_value_label(value = 123.4567, output.type = "latex") f_value_label(value = 123.4567, df1 = 3, df2 = 123, digits = 2, output.type = "expression") f_value_label(value = 123.4567, df1 = 3, df2 = 123, digits = 2, output.type = "latex") t_value_label(value = 123.4567, digits = 2, output.type = "expression") t_value_label(value = 123.4567, digits = Inf, output.type = "expression") t_value_label(value = 123.4567, digits = 6, output.type = "expression") t_value_label(value = 123.4567, output.type = "markdown") t_value_label(value = 123.4567, output.type = "latex") t_value_label(value = 123.4567, df = 12, digits = 2, output.type = "expression") t_value_label(value = 123.4567, df = 123, digits = 2, output.type = "latex") r_label(value = 0.95, digits = 2, output.type = "expression") r_label(value = -0.95, digits = 2, output.type = "expression") r_label(value = 0.0001, digits = 2, output.type = "expression") r_label(value = -0.0001, digits = 2, output.type = "expression") r_label(value = 0.1234567890, digits = Inf, output.type = "expression") r_label(value = 0.95, digits = 2, method = "pearson") r_label(value = 0.95, digits = 2, method = "kendall") r_label(value = 0.95, digits = 2, method = "spearman") rr_label(value = 0.95, digits = 2, output.type = "expression") rr_label(value = 0.0001, digits = 2, output.type = "expression") rr_label(value = 1e-17, digits = Inf, output.type = "expression") adj_rr_label(value = 0.95, digits = 2, output.type = "expression") adj_rr_label(value = 0.0001, digits = 2, output.type = "expression") rr_ci_label(value = c(0.3, 0.4), conf.level = 0.95) rr_ci_label(value = c(0.3, 0.4), conf.level = 0.95, output.type = "text") rr_ci_label(value = c(0.3, 0.4), conf.level = 0.95, range.sep = ",") r_ci_label(value = c(-0.3, 0.4), conf.level = 0.95) r_ci_label(value = c(-0.3, 0.4), conf.level = 0.95, output.type = "text") r_ci_label(value = c(-0.3, 0.4), conf.level = 0.95, range.sep = ",") r_ci_label(value = c(-1.0, 0.4), conf.level = 0.95, range.sep = ",")plain_label(value = 123, value.name = "n", output.type = "expression") plain_label(value = 123, value.name = "n", output.type = "markdown") plain_label(value = 123, value.name = "n", output.type = "latex") italic_label(value = 123, value.name = "n", output.type = "expression") italic_label(value = 123, value.name = "n", output.type = "markdown") italic_label(value = 123, value.name = "n", output.type = "latex") bold_label(value = 123, value.name = "n", output.type = "expression") bold_label(value = 123, value.name = "n", output.type = "markdown") bold_label(value = 123, value.name = "n", output.type = "latex") plain_label(value = NA, value.name = "n", output.type = "expression") plain_label(value = c(123, NA), value.name = "n", output.type = "latex") plain_label(value = c(123, 1.2), value.name = "n", output.type = "expression") plain_label(value = c(123, 1.2), value.name = "n", output.type = "markdown") plain_label(value = c(123, 1.2), value.name = "n", output.type = "latex") p_value_label(value = 0.345, digits = 2, output.type = "expression") p_value_label(value = 0.345, digits = Inf, output.type = "expression") p_value_label(value = 0.345, digits = 6, output.type = "expression") p_value_label(value = 0.345, output.type = "markdown") p_value_label(value = 0.345, output.type = "latex") p_value_label(value = 0.345, subscript = "Holm") p_value_label(value = 1e-25, digits = Inf, output.type = "expression") f_value_label(value = 123.4567, digits = 2, output.type = "expression") f_value_label(value = 123.4567, digits = Inf, output.type = "expression") f_value_label(value = 123.4567, digits = 6, output.type = "expression") f_value_label(value = 123.4567, output.type = "markdown") f_value_label(value = 123.4567, output.type = "latex") f_value_label(value = 123.4567, df1 = 3, df2 = 123, digits = 2, output.type = "expression") f_value_label(value = 123.4567, df1 = 3, df2 = 123, digits = 2, output.type = "latex") t_value_label(value = 123.4567, digits = 2, output.type = "expression") t_value_label(value = 123.4567, digits = Inf, output.type = "expression") t_value_label(value = 123.4567, digits = 6, output.type = "expression") t_value_label(value = 123.4567, output.type = "markdown") t_value_label(value = 123.4567, output.type = "latex") t_value_label(value = 123.4567, df = 12, digits = 2, output.type = "expression") t_value_label(value = 123.4567, df = 123, digits = 2, output.type = "latex") r_label(value = 0.95, digits = 2, output.type = "expression") r_label(value = -0.95, digits = 2, output.type = "expression") r_label(value = 0.0001, digits = 2, output.type = "expression") r_label(value = -0.0001, digits = 2, output.type = "expression") r_label(value = 0.1234567890, digits = Inf, output.type = "expression") r_label(value = 0.95, digits = 2, method = "pearson") r_label(value = 0.95, digits = 2, method = "kendall") r_label(value = 0.95, digits = 2, method = "spearman") rr_label(value = 0.95, digits = 2, output.type = "expression") rr_label(value = 0.0001, digits = 2, output.type = "expression") rr_label(value = 1e-17, digits = Inf, output.type = "expression") adj_rr_label(value = 0.95, digits = 2, output.type = "expression") adj_rr_label(value = 0.0001, digits = 2, output.type = "expression") rr_ci_label(value = c(0.3, 0.4), conf.level = 0.95) rr_ci_label(value = c(0.3, 0.4), conf.level = 0.95, output.type = "text") rr_ci_label(value = c(0.3, 0.4), conf.level = 0.95, range.sep = ",") r_ci_label(value = c(-0.3, 0.4), conf.level = 0.95) r_ci_label(value = c(-0.3, 0.4), conf.level = 0.95, output.type = "text") r_ci_label(value = c(-0.3, 0.4), conf.level = 0.95, range.sep = ",") r_ci_label(value = c(-1.0, 0.4), conf.level = 0.95, range.sep = ",")
Differs from polynom::as.character.polynomial() in that trailing zeros
are preserved.
poly2character( x, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), digits = 3, keep.zeros = TRUE )poly2character( x, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), digits = 3, keep.zeros = TRUE )
x |
a |
decreasing |
logical It specifies the order of the terms; in increasing (default) or decreasing powers. |
digits |
integer Giving the number of significant digits to use for printing. |
keep.zeros |
logical It indicates if zeros are to be retained in the formatted coefficients. |
A character string.
This is an edit of the code in package 'polynom' so that trailing zeros are retained during the conversion. It is not defined using a different name so as not to interfere with the original.
poly2character(1:3) poly2character(1:3, decreasing = TRUE)poly2character(1:3) poly2character(1:3, decreasing = TRUE)
predict is a generic function for predictions from the results of
various model fitting functions. predict.lmodel2 is the method
for model fit objects of class "lmodel2".
## S3 method for class 'lmodel2' predict( object, method = "MA", newdata = NULL, interval = c("none", "confidence"), level = 0.95, ... )## S3 method for class 'lmodel2' predict( object, method = "MA", newdata = NULL, interval = c("none", "confidence"), level = 0.95, ... )
object |
a fitted model object. |
method |
character One of the methods available in |
newdata |
An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are used. |
interval |
Type of interval calculation. |
level |
the confidence level required. Currently only 0.95 accepted. |
... |
ignored by this method. |
Function lmodel2() from package 'lmodel2' returns a fitted
model object of class "lmodel2" which differs from that returned by
lm(). Here we implement a predict() method for objects of
this class. It differs from the generic method and that for lm
objects in having an additional formal parameter method that must be
used to select which of the methods supported by lmodel2() are to be
used in the prediction. The returned object is similar in its structure to
that returned by predict.lm() but lacking names or rownames.
If interval = "none" a numeric vector is returned, while if
interval = "confidence" a data frame with columns fit,
lwr and upr is returned.
Continuous scales for colour and fill aesthetics with defaults
suitable for values expressed as log2 fold change in data and
fold-change in tick labels. Supports tick labels and data expressed in any
combination of fold-change, log2 fold-change and log10 fold-change. Supports
addition of units to legend title passed as argument to the name
formal parameter.
scale_colour_logFC( name = "Abundance of y%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, midpoint = NULL, low.colour = "dodgerblue2", mid.colour = "grey50", high.colour = "red", na.colour = "black", aesthetics = "colour", ... ) scale_color_logFC( name = "Abundance of y%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, midpoint = NULL, low.colour = "dodgerblue2", mid.colour = "grey50", high.colour = "red", na.colour = "black", aesthetics = "colour", ... ) scale_fill_logFC( name = "Abundance of y%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, midpoint = 1, low.colour = "dodgerblue2", mid.colour = "grey50", high.colour = "red", na.colour = "black", aesthetics = "fill", ... )scale_colour_logFC( name = "Abundance of y%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, midpoint = NULL, low.colour = "dodgerblue2", mid.colour = "grey50", high.colour = "red", na.colour = "black", aesthetics = "colour", ... ) scale_color_logFC( name = "Abundance of y%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, midpoint = NULL, low.colour = "dodgerblue2", mid.colour = "grey50", high.colour = "red", na.colour = "black", aesthetics = "colour", ... ) scale_fill_logFC( name = "Abundance of y%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, midpoint = 1, low.colour = "dodgerblue2", mid.colour = "grey50", high.colour = "red", na.colour = "black", aesthetics = "fill", ... )
name |
The name of the scale without units, used for the legend title. |
breaks |
The positions of ticks or a function to generate them. Default
varies depending on argument passed to |
labels |
The tick labels or a function to generate them from the tick
positions. The default is function that uses the arguments passed to
|
limits |
limits One of: NULL to use the default scale range from
ggplot2. A numeric vector of length two providing limits of the scale,
using NA to refer to the existing minimum or maximum. A function that
accepts the existing (automatic) limits and returns new limits. The default
is function |
oob |
Function that handles limits outside of the scale limits (out of bounds). The default squishes out-of-bounds values to the boundary. |
expand |
Vector of range expansion constants used to add some padding around the data, to ensure that they are placed some distance away from the axes. The default is to expand the scale by 15% on each end for log-fold-data, so as to leave space for counts annotations. |
log.base.labels, log.base.data
|
integer or logical Base of logarithms used to
express fold-change values in tick labels and in |
midpoint |
numeric Value at the middle of the colour gradient, defaults to FC = 1, assuming data is expressed as logarithm. |
low.colour, mid.colour, high.colour, na.colour
|
character Colour definitions to use for the gradient extremes and middle. |
aesthetics |
Character string or vector of character strings listing the name(s) of the aesthetic(s) that this scale works with. This can be useful, for example, to apply colour settings to the colour and fill aesthetics at the same time, via aesthetics = c("colour", "fill"). |
... |
other named arguments passed to |
These scales only alter default arguments of
scale_colour_gradient2() and scale_fill_gradient2(). Please,
see documentation for scale_continuous for details.
The name argument supports the use of "%unit" at the end of the
string to automatically add a units string, otherwise user-supplied values
for names, breaks, and labels work as usual. Tick labels in the legend are
built based on the transformation already applied to the data (log2 by
default) and a possibly different log transformation (default is
fold-change with no transformation). The default for handling out of
bounds values is to "squish" them to the extreme of the scale, which is
different from the default used in 'ggplot2'.
Other scales for omics data:
outcome2factor(),
scale_shape_outcome(),
scale_x_logFC(),
xy_outcomes2factor()
set.seed(12346) my.df <- data.frame(x = rnorm(50, sd = 4), y = rnorm(50, sd = 4)) # we assume that both x and y values are expressed as log2 fold change ggplot(my.df, aes(x, y, colour = y)) + geom_point(shape = "circle", size = 2.5) + scale_x_logFC() + scale_y_logFC() + scale_colour_logFC() ggplot(my.df, aes(x, y, fill = y)) + geom_point(shape = "circle filled", colour = "black", size = 2.5) + scale_x_logFC() + scale_y_logFC() + scale_fill_logFC() my.labels <- scales::trans_format(function(x) {log10(2^x)}, scales::math_format()) ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC(labels = my.labels) + scale_y_logFC(labels = my.labels) + scale_colour_logFC(labels = my.labels) ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC(log.base.labels = 2) + scale_y_logFC(log.base.labels = 2) + scale_colour_logFC(log.base.labels = 2) ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC(log.base.labels = 10) + scale_y_logFC(log.base.labels = 10) + scale_colour_logFC(log.base.labels = 10) ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC(log.base.labels = 10) + scale_y_logFC(log.base.labels = 10) + scale_colour_logFC(log.base.labels = 10, labels = FC_format(log.base.labels = 10, log.base.data = 2L, fmt = "% .*g")) # override default arguments. ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC() + scale_y_logFC() + scale_colour_logFC(name = "Change", labels = function(x) {paste(2^x, "fold")})set.seed(12346) my.df <- data.frame(x = rnorm(50, sd = 4), y = rnorm(50, sd = 4)) # we assume that both x and y values are expressed as log2 fold change ggplot(my.df, aes(x, y, colour = y)) + geom_point(shape = "circle", size = 2.5) + scale_x_logFC() + scale_y_logFC() + scale_colour_logFC() ggplot(my.df, aes(x, y, fill = y)) + geom_point(shape = "circle filled", colour = "black", size = 2.5) + scale_x_logFC() + scale_y_logFC() + scale_fill_logFC() my.labels <- scales::trans_format(function(x) {log10(2^x)}, scales::math_format()) ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC(labels = my.labels) + scale_y_logFC(labels = my.labels) + scale_colour_logFC(labels = my.labels) ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC(log.base.labels = 2) + scale_y_logFC(log.base.labels = 2) + scale_colour_logFC(log.base.labels = 2) ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC(log.base.labels = 10) + scale_y_logFC(log.base.labels = 10) + scale_colour_logFC(log.base.labels = 10) ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC(log.base.labels = 10) + scale_y_logFC(log.base.labels = 10) + scale_colour_logFC(log.base.labels = 10, labels = FC_format(log.base.labels = 10, log.base.data = 2L, fmt = "% .*g")) # override default arguments. ggplot(my.df, aes(x, y, colour = y)) + geom_point() + scale_x_logFC() + scale_y_logFC() + scale_colour_logFC(name = "Change", labels = function(x) {paste(2^x, "fold")})
Manual scales for colour and fill aesthetics with defaults suitable for the three way outcome from some statistical tests.
scale_colour_outcome( ..., name = "Outcome", ns.colour = "grey80", up.colour = "red", down.colour = "dodgerblue2", de.colour = "goldenrod", na.colour = "black", values = "outcome:updown", drop = TRUE, aesthetics = "colour" ) scale_color_outcome( ..., name = "Outcome", ns.colour = "grey80", up.colour = "red", down.colour = "dodgerblue2", de.colour = "goldenrod", na.colour = "black", values = "outcome:updown", drop = TRUE, aesthetics = "colour" ) scale_fill_outcome( ..., name = "Outcome", ns.colour = "grey80", up.colour = "red", down.colour = "dodgerblue2", de.colour = "goldenrod", na.colour = "black", values = "outcome:both", drop = TRUE, aesthetics = "fill" )scale_colour_outcome( ..., name = "Outcome", ns.colour = "grey80", up.colour = "red", down.colour = "dodgerblue2", de.colour = "goldenrod", na.colour = "black", values = "outcome:updown", drop = TRUE, aesthetics = "colour" ) scale_color_outcome( ..., name = "Outcome", ns.colour = "grey80", up.colour = "red", down.colour = "dodgerblue2", de.colour = "goldenrod", na.colour = "black", values = "outcome:updown", drop = TRUE, aesthetics = "colour" ) scale_fill_outcome( ..., name = "Outcome", ns.colour = "grey80", up.colour = "red", down.colour = "dodgerblue2", de.colour = "goldenrod", na.colour = "black", values = "outcome:both", drop = TRUE, aesthetics = "fill" )
... |
other named arguments passed to |
name |
The name of the scale, used for the axis-label. |
ns.colour, down.colour, up.colour, de.colour
|
The colour definitions to use for each of the three possible outcomes. |
na.colour |
colour definition used for NA. |
values |
a set of aesthetic values to map data values to. The values
will be matched in order (usually alphabetical) with the limits of the
scale, or with breaks if provided. If this is a named vector, then the
values will be matched based on the names instead. Data values that don't
match will be given na.value. In addition the special values
|
drop |
logical Should unused factor levels be omitted from the scale? The default, TRUE, uses the levels that appear in the data; FALSE uses all the levels in the factor. |
aesthetics |
Character string or vector of character strings listing the name(s) of the aesthetic(s) that this scale works with. This can be useful, for example, to apply colour settings to the colour and fill aesthetics at the same time, via aesthetics = c("colour", "fill"). |
These scales only alter the breaks, values, and
na.value default arguments of scale_colour_manual() and
scale_fill_manual(). Please, see documentation for
scale_manual for details.
In 'ggplot2' (3.3.4, 3.3.5, 3.3.6) scale_colour_manual() and
scale_fill_manual() do not obey drop, most likely due to a
bug as this worked in version 3.3.3 and earlier. This results in spureous
levels in the plot legend when using versions 3.3.4, 3.3.5, 3.3.6 of
'ggplot2'.
Other Functions for quadrant and volcano plots:
FC_format(),
outcome2factor(),
scale_shape_outcome(),
scale_y_Pvalue(),
xy_outcomes2factor()
set.seed(12346) outcome <- sample(c(-1, 0, +1), 50, replace = TRUE) my.df <- data.frame(x = rnorm(50), y = rnorm(50), outcome2 = outcome2factor(outcome, n.levels = 2), outcome3 = outcome2factor(outcome)) ggplot(my.df, aes(x, y, colour = outcome3)) + geom_point() + scale_colour_outcome() + theme_bw() ggplot(my.df, aes(x, y, colour = outcome2)) + geom_point() + scale_colour_outcome() + theme_bw() ggplot(my.df, aes(x, y, fill = outcome3)) + geom_point(shape = 21) + scale_fill_outcome() + theme_bw()set.seed(12346) outcome <- sample(c(-1, 0, +1), 50, replace = TRUE) my.df <- data.frame(x = rnorm(50), y = rnorm(50), outcome2 = outcome2factor(outcome, n.levels = 2), outcome3 = outcome2factor(outcome)) ggplot(my.df, aes(x, y, colour = outcome3)) + geom_point() + scale_colour_outcome() + theme_bw() ggplot(my.df, aes(x, y, colour = outcome2)) + geom_point() + scale_colour_outcome() + theme_bw() ggplot(my.df, aes(x, y, fill = outcome3)) + geom_point(shape = 21) + scale_fill_outcome() + theme_bw()
Manual scales for colour and fill aesthetics with defaults suitable for the three way outcome from some statistical tests.
scale_shape_outcome( ..., name = "Outcome", ns.shape = "circle filled", up.shape = "triangle filled", down.shape = "triangle down filled", de.shape = "square filled", na.shape = "cross" )scale_shape_outcome( ..., name = "Outcome", ns.shape = "circle filled", up.shape = "triangle filled", down.shape = "triangle down filled", de.shape = "square filled", na.shape = "cross" )
... |
other named arguments passed to |
name |
The name of the scale, used for the axis-label. |
ns.shape, down.shape, up.shape, de.shape
|
The shapes to use for each of the three possible outcomes. |
na.shape |
Shape used for NA. |
These scales only alter the values, and
na.value default arguments of
scale_shape_manual(). Please, see
documentation for scale_manual for details.
Other Functions for quadrant and volcano plots:
FC_format(),
outcome2factor(),
scale_colour_outcome(),
scale_y_Pvalue(),
xy_outcomes2factor()
Other scales for omics data:
outcome2factor(),
scale_colour_logFC(),
scale_x_logFC(),
xy_outcomes2factor()
set.seed(12346) outcome <- sample(c(-1, 0, +1), 50, replace = TRUE) my.df <- data.frame(x = rnorm(50), y = rnorm(50), outcome2 = outcome2factor(outcome, n.levels = 2), outcome3 = outcome2factor(outcome)) ggplot(my.df, aes(x, y, shape = outcome3)) + geom_point() + scale_shape_outcome() + theme_bw() ggplot(my.df, aes(x, y, shape = outcome3)) + geom_point() + scale_shape_outcome(guide = FALSE) + theme_bw() ggplot(my.df, aes(x, y, shape = outcome2)) + geom_point(size = 2) + scale_shape_outcome() + theme_bw() ggplot(my.df, aes(x, y, shape = outcome3, fill = outcome2)) + geom_point() + scale_shape_outcome() + scale_fill_outcome() + theme_bw() ggplot(my.df, aes(x, y, shape = outcome3, fill = outcome2)) + geom_point() + scale_shape_outcome(name = "direction") + scale_fill_outcome(name = "significance") + theme_bw()set.seed(12346) outcome <- sample(c(-1, 0, +1), 50, replace = TRUE) my.df <- data.frame(x = rnorm(50), y = rnorm(50), outcome2 = outcome2factor(outcome, n.levels = 2), outcome3 = outcome2factor(outcome)) ggplot(my.df, aes(x, y, shape = outcome3)) + geom_point() + scale_shape_outcome() + theme_bw() ggplot(my.df, aes(x, y, shape = outcome3)) + geom_point() + scale_shape_outcome(guide = FALSE) + theme_bw() ggplot(my.df, aes(x, y, shape = outcome2)) + geom_point(size = 2) + scale_shape_outcome() + theme_bw() ggplot(my.df, aes(x, y, shape = outcome3, fill = outcome2)) + geom_point() + scale_shape_outcome() + scale_fill_outcome() + theme_bw() ggplot(my.df, aes(x, y, shape = outcome3, fill = outcome2)) + geom_point() + scale_shape_outcome(name = "direction") + scale_fill_outcome(name = "significance") + theme_bw()
Continuous scales for x and y aesthetics with defaults suitable for values
expressed as log2 fold change in data and fold-change in tick labels.
Supports tick labels and data expressed in any combination of fold-change,
log2 fold-change and log10 fold-change. Supports addition of units to
axis labels passed as argument to the name formal parameter.
scale_x_logFC( name = "Abundance of x%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, ... ) scale_y_logFC( name = "Abundance of y%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, ... )scale_x_logFC( name = "Abundance of x%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, ... ) scale_y_logFC( name = "Abundance of y%unit", breaks = NULL, labels = NULL, limits = symmetric_limits, oob = scales::squish, expand = expansion(mult = 0.05, add = 0), log.base.labels = FALSE, log.base.data = 2L, ... )
name |
The name of the scale without units, used for the axis-label. |
breaks |
The positions of ticks or a function to generate them. Default
varies depending on argument passed to |
labels |
The tick labels or a function to generate them from the tick
positions. The default is function that uses the arguments passed to
|
limits |
limits One of: NULL to use the default scale range from
ggplot2. A numeric
vector of length two providing limits of the scale, using NA to refer to the
existing minimum or maximum. A function that accepts the existing
(automatic) limits and returns new limits. The default is function
|
oob |
Function that handles limits outside of the scale limits (out of bounds). The default squishes out-of-bounds values to the boundary. |
expand |
Vector of range expansion constants used to add some padding around the data, to ensure that they are placed some distance away from the axes. The default is to expand the scale by 15% on each end for log-fold-data, so as to leave space for counts annotations. |
log.base.labels, log.base.data
|
integer or logical Base of logarithms used to
express fold-change values in tick labels and in |
... |
other named arguments passed to |
These scales only alter default arguments of
scale_x_continuous() and scale_y_continuous(). Please, see
documentation for scale_continuous for details. The
name argument supports the use of "%unit" at the end of the string
to automatically add a units string, otherwise user-supplied values for
names, breaks, and labels work as usual. Tick labels are built based on the
transformation already applied to the data (log2 by default) and a possibly
different log transformation (default is fold-change with no
transformation). The default for handling out of bounds values is to
"squish" them to the extreme of the scale, which is different from the
default used in 'ggplot2'.
Other scales for omics data:
outcome2factor(),
scale_colour_logFC(),
scale_shape_outcome(),
xy_outcomes2factor()
set.seed(12346) my.df <- data.frame(x = rnorm(50, sd = 4), y = rnorm(50, sd = 4)) # we assume that both x and y values are expressed as log2 fold change ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC() + scale_y_logFC() ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC(labels = scales::trans_format(function(x) {log10(2^x)}, scales::math_format())) + scale_y_logFC(labels = scales::trans_format(function(x) {log10(2^x)}, scales::math_format())) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC(log.base.labels = 2) + scale_y_logFC(log.base.labels = 2) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC("A concentration%unit", log.base.labels = 10) + scale_y_logFC("B concentration%unit", log.base.labels = 10) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC("A concentration%unit", breaks = NULL) + scale_y_logFC("B concentration%unit", breaks = NULL) # taking into account that data are expressed as log2 FC. ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC("A concentration%unit", breaks = log2(c(1/100, 1, 100))) + scale_y_logFC("B concentration%unit", breaks = log2(c(1/100, 1, 100))) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC(labels = scales::trans_format(function(x) {log10(2^x)}, scales::math_format())) + scale_y_logFC(labels = scales::trans_format(function(x) {log10(2^x)}, scales::math_format())) # override "special" default arguments. ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC("A concentration", breaks = waiver(), labels = waiver()) + scale_y_logFC("B concentration", breaks = waiver(), labels = waiver()) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC() + scale_y_logFC() + geom_quadrant_lines() + stat_quadrant_counts(size = 3.5)set.seed(12346) my.df <- data.frame(x = rnorm(50, sd = 4), y = rnorm(50, sd = 4)) # we assume that both x and y values are expressed as log2 fold change ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC() + scale_y_logFC() ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC(labels = scales::trans_format(function(x) {log10(2^x)}, scales::math_format())) + scale_y_logFC(labels = scales::trans_format(function(x) {log10(2^x)}, scales::math_format())) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC(log.base.labels = 2) + scale_y_logFC(log.base.labels = 2) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC("A concentration%unit", log.base.labels = 10) + scale_y_logFC("B concentration%unit", log.base.labels = 10) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC("A concentration%unit", breaks = NULL) + scale_y_logFC("B concentration%unit", breaks = NULL) # taking into account that data are expressed as log2 FC. ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC("A concentration%unit", breaks = log2(c(1/100, 1, 100))) + scale_y_logFC("B concentration%unit", breaks = log2(c(1/100, 1, 100))) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC(labels = scales::trans_format(function(x) {log10(2^x)}, scales::math_format())) + scale_y_logFC(labels = scales::trans_format(function(x) {log10(2^x)}, scales::math_format())) # override "special" default arguments. ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC("A concentration", breaks = waiver(), labels = waiver()) + scale_y_logFC("B concentration", breaks = waiver(), labels = waiver()) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC() + scale_y_logFC() + geom_quadrant_lines() + stat_quadrant_counts(size = 3.5)
Scales for x and y aesthetics mapped to P-values and
false discovery rates (FDR), suitable for volcano plots as used for
transcriptomics and metabolomics data.
scale_y_Pvalue( ..., name = expression(italic(P) - plain(value)), transform = NULL, breaks = NULL, labels = NULL, limits = c(1, 1e-20), oob = NULL, expand = NULL ) scale_y_FDR( ..., name = "False discovery rate", transform = NULL, breaks = NULL, labels = NULL, limits = c(1, 1e-10), oob = NULL, expand = NULL ) scale_x_Pvalue( ..., name = expression(italic(P) - plain(value)), transform = NULL, breaks = NULL, labels = NULL, limits = c(1, 1e-20), oob = NULL, expand = NULL ) scale_x_FDR( ..., name = "False discovery rate", transform = NULL, breaks = NULL, labels = NULL, limits = c(1, 1e-10), oob = NULL, expand = NULL )scale_y_Pvalue( ..., name = expression(italic(P) - plain(value)), transform = NULL, breaks = NULL, labels = NULL, limits = c(1, 1e-20), oob = NULL, expand = NULL ) scale_y_FDR( ..., name = "False discovery rate", transform = NULL, breaks = NULL, labels = NULL, limits = c(1, 1e-10), oob = NULL, expand = NULL ) scale_x_Pvalue( ..., name = expression(italic(P) - plain(value)), transform = NULL, breaks = NULL, labels = NULL, limits = c(1, 1e-20), oob = NULL, expand = NULL ) scale_x_FDR( ..., name = "False discovery rate", transform = NULL, breaks = NULL, labels = NULL, limits = c(1, 1e-10), oob = NULL, expand = NULL )
... |
other named arguments passed to |
name |
The name of the scale without units, used for the axis-label. |
transform |
Either the name of a transformation object, or the object itself. Use NULL for the default. |
breaks |
The positions of ticks or a function to generate them. Default
varies depending on argument passed to |
labels |
The tick labels or a function to generate them from the tick
positions. The default is function that uses the arguments passed to
|
limits |
Use one of: |
oob |
Function that handles limits outside of the scale limits (out of bounds). The default squishes out-of-bounds values to the boundary. |
expand |
Vector of range expansion constants used to add some padding around the data, to ensure that they are placed some distance away from the axes. The default is to expand the scale by 15% on each end for log-fold-data, so as to leave space for counts annotations. |
These scales only reaplace default arguments of
scale_x_continuous() and scale_y_continuous(). Please, see
documentation for scale_continuous for details.
These scales set transformations for suitable for plotting
log-P-value, log-fold-change and FDR (false discovery rate) and
matching tick labels (breaksand labels and scale names
(axis titles).
Other Functions for quadrant and volcano plots:
FC_format(),
outcome2factor(),
scale_colour_outcome(),
scale_shape_outcome(),
xy_outcomes2factor()
set.seed(12346) my.df <- data.frame(x = rnorm(50, sd = 4), y = 10^-runif(50, min = 0, max = 20)) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC() + scale_y_Pvalue() ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC() + scale_y_FDR(limits = c(NA, 1e-20))set.seed(12346) my.df <- data.frame(x = rnorm(50, sd = 4), y = 10^-runif(50, min = 0, max = 20)) ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC() + scale_y_Pvalue() ggplot(my.df, aes(x, y)) + geom_point() + scale_x_logFC() + scale_y_FDR(limits = c(NA, 1e-20))
Using sprintf flexibly format numbers as character strings
encoded for parsing into R expressions or using LaTeX or markdown
notation.
sprintf_dm(fmt, ..., decimal.mark = getOption("OutDec", default = ".")) value2char( value, digits = Inf, format = "g", output.type = "expression", decimal.mark = getOption("OutDec", default = ".") )sprintf_dm(fmt, ..., decimal.mark = getOption("OutDec", default = ".")) value2char( value, digits = Inf, format = "g", output.type = "expression", decimal.mark = getOption("OutDec", default = ".") )
fmt |
character as in |
... |
as in |
decimal.mark |
character If |
value |
numeric The value of the estimate. |
digits |
integer Number of digits to which numeric values are formatted. |
format |
character One of "e", "f" or "g" for exponential, fixed, or significant digits formatting. |
output.type |
character One of "expression", "latex", "tex", "text", "tikz", "markdown", "marquee". |
These functions are used to format the character strings returned,
which can be used as labels in plots. Encoding used for the formatting is
selected by the argument passed to output.type, thus, supporting
different R graphic devices.
sprintf_dm("%2.3f", 2.34) sprintf_dm("%2.3f", 2.34, decimal.mark = ",") value2char(2.34) value2char(2.34, digits = 3, format = "g") value2char(2.34, digits = 3, format = "f") value2char(2.34, output.type = "text") value2char(2.34, output.type = "text", format = "f") value2char(2.34, output.type = "text", format = "g")sprintf_dm("%2.3f", 2.34) sprintf_dm("%2.3f", 2.34, decimal.mark = ",") value2char(2.34) value2char(2.34, digits = 3, format = "g") value2char(2.34, digits = 3, format = "f") value2char(2.34, output.type = "text") value2char(2.34, output.type = "text", format = "f") value2char(2.34, output.type = "text", format = "g")
Statistic stat_correlation() applies stats::cor.test()
respecting grouping with method = "pearson" default but alternatively
using "kendall" or "spearman" methods. It adds textual labels
to a plot.
stat_correlation( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., method = "pearson", n.min = 2L, alternative = "two.sided", exact = NULL, r.conf.level = ifelse(method == "pearson", 0.95, NA), continuity = FALSE, fit.seed = NA, small.r = getOption("ggpmisc.small.r", default = FALSE), small.p = getOption("ggpmisc.small.p", default = FALSE), coef.keep.zeros = TRUE, r.digits = 2, t.digits = 3, p.digits = 3, CI.brackets = c("[", "]"), label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, boot.R = ifelse(method == "pearson", 0, 999), na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE )stat_correlation( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., method = "pearson", n.min = 2L, alternative = "two.sided", exact = NULL, r.conf.level = ifelse(method == "pearson", 0.95, NA), continuity = FALSE, fit.seed = NA, small.r = getOption("ggpmisc.small.r", default = FALSE), small.p = getOption("ggpmisc.small.p", default = FALSE), coef.keep.zeros = TRUE, r.digits = 2, t.digits = 3, p.digits = 3, CI.brackets = c("[", "]"), label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, boot.R = ifelse(method == "pearson", 0, 999), na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
method |
character One of "pearson", "kendall" or "spearman". |
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
alternative |
character One of "two.sided", "less" or "greater". |
exact |
logical Whether an exact p-value should be computed. Used for Kendall's tau and Spearman's rho. |
r.conf.level |
numeric Confidence level for the returned confidence
interval. If set to |
continuity |
logical If TRUE , a continuity correction is used for Kendall's tau and Spearman's rho when not computed exactly. |
fit.seed |
RNG seed argument passed to
|
small.r, small.p
|
logical Flags to switch use of lower case r and p for
coefficient of correlation (only for |
coef.keep.zeros |
logical Keep or drop trailing zeros when formatting the correlation coefficients and t-value, z-value or S-value (see note below). |
r.digits, t.digits, p.digits
|
integer Number of digits after the decimal
point to use for R, r.squared, tau or rho and P-value in labels. If
|
CI.brackets |
character vector of length 2. The opening and closing brackets used for the CI label. |
label.x, label.y
|
|
hstep, vstep
|
numeric in npc units, the horizontal and vertical step used between labels for different groups. |
output.type |
character One of "expression", "text", "markdown", "marquee", "latex", "latex.eqn", "latex.deqn" or "numeric". |
boot.R |
interger The number of bootstrap resamples. Set to zero for no bootstrap estimates for the CI. |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
parse |
logical Passed to the geom. If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
This statistic can be used to annotate a plot with the correlation
coefficient and the outcome of its test of significance. It supports
Pearson, Kendall and Spearman methods to compute correlation. This
statistic generates labels as R expressions by default but LaTeX (use TikZ
device), markdown (use package 'ggtext') and plain text are also supported,
as well as numeric values for user-generated text labels. The character
labels include the symbol describing the quantity together with the numeric
value. For the confidence interval (CI) the default is to follow the APA
recommendation of using square brackets. As the CI is computed by
bootstrapping, fit.seed if different to NA immediately before
this computation.
The value of parse is set automatically based on output-type,
but if you assemble labels that need parsing from numeric output,
the default needs to be overridden. By default the value of
output.type is guessed from the name of the geometry.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. cor.test() is always applied to the variables
mapped to the x and y aesthetics, so the scales used for
x and y should both be continuous scales rather than
discrete.
If output.type is "numeric" the returned
tibble contains the columns listed below with variations depending on the
method. If the model fit function used does not return a value, the
variable is set to NA_real_.
x position
y position
numeric values for correlation coefficient estimates
numeric values for statistic estimates
numeric values.
numeric value, as fraction of one.
Confidence interval limit for r.
Confidence interval limit for r.
Set according to mapping in aes.
Set according method used.
character values
If output.type different from "numeric" the returned tibble contains
in addition to the columns listed above those listed below. If the numeric
value is missing the label is set to character(0L).
Correlation coefficient as a character string.
t-value and degrees of freedom, z-value or S-value as a character string.
P-value for test against zero, as a character string.
Confidence interval for r (only with method = "pearson").
Number of observations used in the fit, as a character string.
Set according to mapping in aes, as a character string.
To explore the computed values returned for a given input we suggest the use
of geom_debug as shown in the last examples below.
When data are grouped by mapping a factor to an aesthetic, e.g.,
colour, shape and/or linetype the model is fitted
separately to each group, and for each group a whole set of labels is
generated. If the argument passed to label.y is a vector of length
1, this value determines the position of the equation and/or other labels
for the first group, and the positions of the labels for the remaining
groups are generated by adding vspace based on the group number.
If the argument passed to label.y is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring vstep.
If the labels are rotated by 90 degrees then the automatic stepping is
best based on hstep with vstep = 0. Similarly as described
above, if label.x is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring hstep.
When using facets and with a grouping that does not repeat in each panel,
the automatic positioning in most cases will not be the desired one. Manual
positioning using a vector of length > 1 for label.x and/or
label.y is the currently available workaround.
The formatting of character strings to be displayed in plots are marked as mathematical equations. Depending on the geom used, the mark-up needs to be encoded differently, or in some cases mark-up not applied.
"expression"The labels are encoded as character strings to be parsed into R's plotmath expressions.
"LaTeX", "TeX", "tikz", "latex"The labels are encoded as 'LaTeX' maths equations, without the "fences" for switching in math mode.
"latex.eqn"Same as "latex" but enclosed in single $, i.e., as in-line maths.
"latex.deqn"Same as "latex" but enclosed in double $$, i.e., as display maths.
"markdown"The labels are encoded as character strings using markdown syntax, with some embedded HTML.
"marquee"The labels are encoded as character strings using markdown syntax, with 'marquee' supported spans.
"text"The labels are plain ASCII character strings.
"numeric"No labels are generated. This value is accepted by the statistics, but not by the label formatting functions.
NULLThe value used depends on the argument passed to geom.
If geom = "latex" (package 'xdvir') the output type used is
"latex.eqn". If geom = "richtext" (package 'ggtext') or
geom = "textbox" (package 'ggtext') the output type used is
"markdown". If geom = "marquee" (package 'marquee') the output
type used is "marquee". For all other values of geom the default
is "expression". Invalid values as argument trigger an error.
stat_correlation() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | grp.label |
|
| • | hjust |
→ "inward" |
| • | label |
→ after_stat(r.label) |
| • | npcx |
→ after_stat(npcx) |
| • | npcy |
→ after_stat(npcy) |
| • | vjust |
→ "inward"
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Currently coef.keep.zeros is ignored, with trailing zeros always
retained in the character labels returned but not protected from
being dropped by R when these character strings are parsed into
plotmath expressions (i.e., when output.type = "expression").
cor.test for details on the computations.
# generate artificial data set.seed(4321) x <- (1:100) / 10 y <- x + rnorm(length(x)) my.data <- data.frame(x = x, y = y, y.desc = - y, group = c("A", "B")) # by default only R is displayed ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation() ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(small.r = TRUE) ggplot(my.data, aes(x, y.desc)) + geom_point() + stat_correlation(label.x = "right") # non-default methods ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(method = "kendall") ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(method = "spearman") # use_label() can map a user selected label ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R2")) # use_label() can assemble and map a combined label ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "P", "n", "method")) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "R.CI")) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "R.CI"), r.conf.level = 0.95) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "R.CI"), method = "kendall", r.conf.level = 0.95) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "R.CI"), method = "spearman", r.conf.level = 0.95) # manually assemble and map a specific label using paste() and aes() ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(aes(label = paste(after_stat(r.label), after_stat(p.value.label), after_stat(n.label), sep = "*\", \"*"))) # manually format and map a specific label using sprintf() and aes() ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(aes(label = sprintf("%s*\" with \"*%s*\" for \"*%s", after_stat(r.label), after_stat(p.value.label), after_stat(t.value.label)))) # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) # the whole of computed data if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", method = "pearson") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", method = "kendall") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", method = "spearman") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", output.type = "numeric") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", output.type = "markdown") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", output.type = "LaTeX")# generate artificial data set.seed(4321) x <- (1:100) / 10 y <- x + rnorm(length(x)) my.data <- data.frame(x = x, y = y, y.desc = - y, group = c("A", "B")) # by default only R is displayed ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation() ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(small.r = TRUE) ggplot(my.data, aes(x, y.desc)) + geom_point() + stat_correlation(label.x = "right") # non-default methods ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(method = "kendall") ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(method = "spearman") # use_label() can map a user selected label ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R2")) # use_label() can assemble and map a combined label ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "P", "n", "method")) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "R.CI")) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "R.CI"), r.conf.level = 0.95) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "R.CI"), method = "kendall", r.conf.level = 0.95) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(use_label("R", "R.CI"), method = "spearman", r.conf.level = 0.95) # manually assemble and map a specific label using paste() and aes() ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(aes(label = paste(after_stat(r.label), after_stat(p.value.label), after_stat(n.label), sep = "*\", \"*"))) # manually format and map a specific label using sprintf() and aes() ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(aes(label = sprintf("%s*\" with \"*%s*\" for \"*%s", after_stat(r.label), after_stat(p.value.label), after_stat(t.value.label)))) # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) # the whole of computed data if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", method = "pearson") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", method = "kendall") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", method = "spearman") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", output.type = "numeric") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", output.type = "markdown") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_correlation(geom = "debug_group", output.type = "LaTeX")
Statistics stat_distrmix_line() and stat_distrmix_eq() fit a
Normal mixture model. While stat_distrmix_line() adds prediction
lines, stat_distrmix_eq() adds textual labels to a plot.
stat_distrmix_eq( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., orientation = NA, method = "normalmixEM", method.args = list(), n.min = 10L * k, level = 0.95, k = 2, free.mean = TRUE, free.sd = TRUE, se = FALSE, fit.seed = NA, fm.values = TRUE, components = NULL, eq.with.lhs = TRUE, eq.digits = 2, label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = NA, inherit.aes = TRUE ) stat_distrmix_line( mapping = NULL, data = NULL, geom = "line", position = "identity", ..., orientation = NA, method = "normalmixEM", se = NULL, fit.seed = NA, fm.values = FALSE, n = min(100 + 50 * k, 300), fullrange = TRUE, level = 0.95, method.args = list(), k = 2, free.mean = TRUE, free.sd = TRUE, components = "all", n.min = 10L * k, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )stat_distrmix_eq( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., orientation = NA, method = "normalmixEM", method.args = list(), n.min = 10L * k, level = 0.95, k = 2, free.mean = TRUE, free.sd = TRUE, se = FALSE, fit.seed = NA, fm.values = TRUE, components = NULL, eq.with.lhs = TRUE, eq.digits = 2, label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = NA, inherit.aes = TRUE ) stat_distrmix_line( mapping = NULL, data = NULL, geom = "line", position = "identity", ..., orientation = NA, method = "normalmixEM", se = NULL, fit.seed = NA, fm.values = FALSE, n = min(100 + 50 * k, 300), fullrange = TRUE, level = 0.95, method.args = list(), k = 2, free.mean = TRUE, free.sd = TRUE, components = "all", n.min = 10L * k, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
orientation |
character Either "x" or "y" controlling the aesthetic to
which the density model is fit. With the default |
method |
function or character If character, |
method.args |
named list with additional arguments. Not |
n.min |
integer Minimum number of distinct values in the variable for
fitting to the attempted. The default depends on |
level |
Level of confidence interval to use (0.95 by default). |
k |
integer Number of mixture components to fit. |
free.mean, free.sd
|
logical If TRUE, allow the fitted |
se |
logical If |
fit.seed |
RNG seed argument passed to
|
fm.values |
logical Add parameter estimates and their standard errors to the returned values ('FALSE' by default.) |
components |
character One of |
eq.with.lhs |
If |
eq.digits |
integer Number of digits after the decimal point to
use for parameters in labels. If |
label.x, label.y
|
|
hstep, vstep
|
numeric in npc units, the horizontal and vertical step used between labels for different groups. |
output.type |
character One of "expression", "text", "markdown", "marquee", "latex", "latex.eqn", "latex.deqn" or "numeric". |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
parse |
logical Passed to the geom. If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
n |
Number of points at which to predict with the fitted model. |
fullrange |
logical Should the fit prediction span the full range of the plot, or just the range of the explanatory variable? |
stat_distrmix_line() is similar to
stat_density but in addition to fitting a single
distribution it can fit a mixture of two or more Normal distributions,
using an approach related to clustering. Defaults are consistent between
stat_distrmix_line() and stat_distrmix_eq().
stat_distrmix_eq() can be used to add matched textual annotations.
If k >= 2 a mixture of Normals model is fitted with
normalmixEM(), while if k == 1 a single
Normal distribution is fitted with function fitdistr().
Only for k == 1 the SE values are exact estimates.
Parameter fit.seed if not NA is used in a call to
set.seed() immediately before calling the model fit function. As the
fitting procedure makes use of the (pseudo-)random number generator (RNG),
convergence can depend on it, and in such cases setting fit.seed to
the same value in stat_distrmix_line() and in
stat_distrmix_eq() can ensure consistency, and more generally,
reproducibility.
The minimum number of observations with distinct values in the explanatory
variable can be set through parameter n.min. The default depends on
k, the number of components in the mix. Model fits with too few
observations are unreliable, thus, using larger values of n.min than
the default is wise.
The value returned by the statistic is a data frame, with n
rows of predicted density for each component of the mixture plus their
sum and the corresponding vector of x values. Optionally it will
also include additional values related to the model fit.
The value returned by stat_distrmix_line() is a data frame, with n
rows of predicted density for each component of the mixture plus their
sum and the corresponding vector of x values.
The value returned by stat_distrmix_eq() is a data frame, with one
row of estimates for each group of data in the plot.
Both statistics optionally also return additional values related to the model fit.
stat_distrmix_line()
Some of the returned variables depend on the orientation.
predicted density values
the n values for the quantiles
A factor indexing the components and/or their sum
If fm.values = TRUE is passed then columns with diagnosis and
parameters estimates are added, with the same value in each row within a
group:
logical indicating if convergence was achieved
numeric the number of x values
numeric the number of density values
character the most derived class of the fitted model object
character the method, as given by the ft
field of the fitted model objects
This provides a simple and robust approach to achieve effects like colouring or hiding annotations by group depending on the outcome of model fitting.
stat_distrmix_eq()
Some of the variables depend on the orientation:
the location of text labels
the location of text labels
character string for equations
character string for number of observations
character string for model fit method
numeric the estimate of the contribution of the
component of the mixture towards the joint density
numeric the estimate of the mean
numeric the estimate of the standard deviation
A factor indexing the components of the mixture and/or their sum
If SE = TRUE is passed then columns with standard errors for the
parameter estimates:
numeric the estimate of the contribution
of the component of the mixture towards the joint density
numeric the estimate of the mean
numeric the estimate of the standard deviation
If fm.values = TRUE the same additional columns are returned as by
stat_distrmix_eq(). This is wasteful of storage space as values are
stored in multiple copies and, thus, disabled by default. However, it
provides a simple and robust approach to achieve effects like colouring or
hiding of the model fit line by group depending on the outcome of model
fitting.
The formatting of character strings to be displayed in plots are marked as mathematical equations. Depending on the geom used, the mark-up needs to be encoded differently, or in some cases mark-up not applied.
"expression"The labels are encoded as character strings to be parsed into R's plotmath expressions.
"LaTeX", "TeX", "tikz", "latex"The labels are encoded as 'LaTeX' maths equations, without the "fences" for switching in math mode.
"latex.eqn"Same as "latex" but enclosed in single $, i.e., as in-line maths.
"latex.deqn"Same as "latex" but enclosed in double $$, i.e., as display maths.
"markdown"The labels are encoded as character strings using markdown syntax, with some embedded HTML.
"marquee"The labels are encoded as character strings using markdown syntax, with 'marquee' supported spans.
"text"The labels are plain ASCII character strings.
"numeric"No labels are generated. This value is accepted by the statistics, but not by the label formatting functions.
NULLThe value used depends on the argument passed to geom.
If geom = "latex" (package 'xdvir') the output type used is
"latex.eqn". If geom = "richtext" (package 'ggtext') or
geom = "textbox" (package 'ggtext') the output type used is
"markdown". If geom = "marquee" (package 'marquee') the output
type used is "marquee". For all other values of geom the default
is "expression". Invalid values as argument trigger an error.
When data are grouped by mapping a factor to an aesthetic, e.g.,
colour, shape and/or linetype the model is fitted
separately to each group, and for each group a whole set of labels is
generated. If the argument passed to label.y is a vector of length
1, this value determines the position of the equation and/or other labels
for the first group, and the positions of the labels for the remaining
groups are generated by adding vspace based on the group number.
If the argument passed to label.y is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring vstep.
If the labels are rotated by 90 degrees then the automatic stepping is
best based on hstep with vstep = 0. Similarly as described
above, if label.x is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring hstep.
When using facets and with a grouping that does not repeat in each panel,
the automatic positioning in most cases will not be the desired one. Manual
positioning using a vector of length > 1 for label.x and/or
label.y is the currently available workaround.
stat_distrmix_eq() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x or y |
|
| • | group |
→ after_stat(component) |
| • | hjust |
→ "inward" |
| • | label |
→ after_stat(eq.label) |
| • | npcx |
→ after_stat(npcx) |
| • | npcy |
→ after_stat(npcy) |
| • | vjust |
→ "inward"
|
stat_distrmix_line() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x or y |
|
| • | group |
→ after_stat(component) |
| • | weight |
→ NULL
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Other 'ggpmisc' statistics for model fits:
stat_fit_deviations(),
stat_fit_glance(),
stat_fit_tb(),
stat_fit_tidy(),
stat_ma_eq(),
stat_poly_eq(),
stat_quant_band()
ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq() ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(use_label("eq", "n", "method")) ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(geom = "label_npc") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(geom = "text", label.x = "center", label.y = "bottom") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(geom = "text", hjust = "inward") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "members") + stat_distrmix_eq(components = "members") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "members") + stat_distrmix_eq(components = "members", se = TRUE) ggplot(faithful, aes(y = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(label.x = "right") ggplot(faithful, aes(x = waiting)) + geom_histogram(aes(y = after_stat(density)), bins = 20) + stat_distrmix_line(aes(colour = after_stat(component), fill = after_stat(component)), geom = "area", linewidth = 1, alpha = 0.25) + stat_distrmix_eq(aes(colour = after_stat(component))) ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(aes(colour = after_stat(component), fill = after_stat(component)), geom = "area", linewidth = 1, alpha = 0.25, components = "members") + stat_distrmix_eq(aes(colour = after_stat(component)), components = "members") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(geom = "area", linewidth = 1, alpha = 0.25, colour = "black", outline.type = "upper", components = "sum", se = FALSE) + stat_distrmix_eq(components = "sum") # special case of no mixture ggplot(subset(faithful, waiting > 66), aes(x = waiting)) + stat_distrmix_line(k = 1) + stat_distrmix_eq(k = 1) ggplot(subset(faithful, waiting > 66), aes(x = waiting)) + stat_distrmix_line(k = 1) + stat_distrmix_eq(k = 1, se = TRUE) # Inspecting the returned data using geom_debug_group() gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(geom = "debug_group", components = "all") stat_distrmix_eq(geom = "debug_group", components = "all") if (gginnards.installed) ggplot(faithful, aes(x = waiting)) + stat_distrmix_eq(geom = "debug_group", components = "sum") if (gginnards.installed) ggplot(faithful, aes(x = waiting)) + stat_distrmix_eq(geom = "debug_group", components = "members") if (gginnards.installed) ggplot(faithful, aes(x = waiting)) + stat_distrmix_eq(geom = "debug_group", components = "members", fm.values = TRUE)ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq() ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(use_label("eq", "n", "method")) ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(geom = "label_npc") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(geom = "text", label.x = "center", label.y = "bottom") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(geom = "text", hjust = "inward") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "members") + stat_distrmix_eq(components = "members") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(components = "members") + stat_distrmix_eq(components = "members", se = TRUE) ggplot(faithful, aes(y = waiting)) + stat_distrmix_line(components = "sum") + stat_distrmix_eq(label.x = "right") ggplot(faithful, aes(x = waiting)) + geom_histogram(aes(y = after_stat(density)), bins = 20) + stat_distrmix_line(aes(colour = after_stat(component), fill = after_stat(component)), geom = "area", linewidth = 1, alpha = 0.25) + stat_distrmix_eq(aes(colour = after_stat(component))) ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(aes(colour = after_stat(component), fill = after_stat(component)), geom = "area", linewidth = 1, alpha = 0.25, components = "members") + stat_distrmix_eq(aes(colour = after_stat(component)), components = "members") ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(geom = "area", linewidth = 1, alpha = 0.25, colour = "black", outline.type = "upper", components = "sum", se = FALSE) + stat_distrmix_eq(components = "sum") # special case of no mixture ggplot(subset(faithful, waiting > 66), aes(x = waiting)) + stat_distrmix_line(k = 1) + stat_distrmix_eq(k = 1) ggplot(subset(faithful, waiting > 66), aes(x = waiting)) + stat_distrmix_line(k = 1) + stat_distrmix_eq(k = 1, se = TRUE) # Inspecting the returned data using geom_debug_group() gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) ggplot(faithful, aes(x = waiting)) + stat_distrmix_line(geom = "debug_group", components = "all") stat_distrmix_eq(geom = "debug_group", components = "all") if (gginnards.installed) ggplot(faithful, aes(x = waiting)) + stat_distrmix_eq(geom = "debug_group", components = "sum") if (gginnards.installed) ggplot(faithful, aes(x = waiting)) + stat_distrmix_eq(geom = "debug_group", components = "members") if (gginnards.installed) ggplot(faithful, aes(x = waiting)) + stat_distrmix_eq(geom = "debug_group", components = "members", fm.values = TRUE)
stat_fit_augment() fits a model and returns a "tidy"
version of the model's data with prediction added, using augmnent()
methods from packages 'broom', 'broom.mixed', or other sources. The
prediction can be added to the plot as a line.
stat_fit_augment( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., method = "lm", method.args = list(formula = y ~ x), n.min = 2L, fit.seed = NA, augment.args = list(), level = 0.95, y.out = ".fitted", na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )stat_fit_augment( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., method = "lm", method.args = list(formula = y ~ x), n.min = 2L, fit.seed = NA, augment.args = list(), level = 0.95, y.out = ".fitted", na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
method |
function or character If character, "lm", "rlm", "lmrob",
"lts", "gls", "ma", "sma", "segreg", "rq" or the name of a model fit
function are accepted, possibly followed by the fit function's
|
method.args, augment.args
|
list of arguments to pass to |
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
fit.seed |
RNG seed argument passed to
|
level |
Level of confidence interval to use (0.95 by default). |
y.out |
character (or numeric) index to column to return as |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
stat_fit_augment() together with
stat_fit_glance() and stat_fit_tidy(), based on
package 'broom' can be used with a broad range of model fitting functions
as supported at any given time by 'broom'. In contrast to
stat_poly_eq() which can generate text or expression labels
automatically, for these functions the mapping of aesthetic label
needs to be explicitly supplied in the call, and labels built on the fly.
Although arguments passed to parameter augment.args will be
passed to augment() whether they are silently
ignored or obeyed depends on each specialization of augment(), so do
carefully read the documentation for the version of augment()
corresponding to the method used to fit the model. Be aware that
se_fit = FALSE is the default in these methods even when supported.
Warning! Not all augment() method specializations are
defined in package 'broom'. augment() specializations for mixed
models fits of classes "lme", "nlme", "lme4" and many
others are defined in package 'broom.mixed'.
stat_fit_augment() applies the function
given by method separately to each group of observations; in
'ggplot2' factors mapped to aesthetics generate a separate group for each
level. Because of this, stat_fit_augment() is not useful for
annotating plots with results from t.test() or ANOVA or ANCOVA
(e.g., when a factor is mapped to the _x_ or _y_ aesthetics. In such cases
use instead stat_fit_tb() which applies the model fitting per panel.
The output of augment() is
returned as is, except for y which is set based on y.out and
y.observed which preserves the y returned by the
generics::augment methods. This renaming is needed so that the geom
works as expected.
To explore the values returned by this statistic, which vary depending
on the model fitting function and model formula we suggest the use of
geom_debug. An example is shown below.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
The interpretation of the argument passed to formula is enhanced
compared to stat_smooth(). Formulas with x as explanatory
variable work as in stat_smooth() but formulas with y as
explanatory variable are also accepted. orientation is set
automatically based on which explanatory variable appears in the formula.
Spline-based smoothers are only partially supported.
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
stat_fit_augment() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | ymax |
→ after_stat(y + .se.fit * t.value) |
| • | ymin |
→ after_stat(y - .se.fit * t.value)
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Package broom for details on how the tidying of
the result of model fits is done.
# Package 'broom' needs to be installed to run these examples. # We check availability before running them to avoid errors. broom.installed <- requireNamespace("broom", quietly = TRUE) gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (broom.installed) { library(broom) } # Inspecting the returned data using geom_debug_group() if (gginnards.installed) { library(gginnards) } # Regression by panel, inspecting data if (broom.installed & gginnards.installed) { ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point(aes(colour = factor(cyl))) + stat_fit_augment(method = "lm", method.args = list(formula = y ~ x), geom = "debug_group", dbgfun.data = colnames) } # Regression by panel example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point(aes(colour = factor(cyl))) + stat_fit_augment(method = "lm", method.args = list(formula = y ~ x)) if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point(aes(colour = factor(cyl))) + stat_fit_augment(method = "lm", augment.args = list(se_fit = TRUE), method.args = list(formula = y ~ x + I(x^2))) # Residuals from regression by panel example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_hline(yintercept = 0, linetype = "dotted") + stat_fit_augment(geom = "point", method = "lm", method.args = list(formula = y ~ x), y.out = ".resid") # Regression by group example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, colour = factor(cyl))) + geom_point() + stat_fit_augment(method = "lm", augment.args = list(se_fit = TRUE), method.args = list(formula = y ~ x)) # Residuals from regression by group example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, colour = factor(cyl))) + geom_hline(yintercept = 0, linetype = "dotted") + stat_fit_augment(geom = "point", method.args = list(formula = y ~ x), y.out = ".resid") # Weighted regression example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, weight = cyl)) + geom_point(aes(colour = factor(cyl))) + stat_fit_augment(method = "lm", method.args = list(formula = y ~ x, weights = quote(weight))) # Residuals from weighted regression example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, weight = cyl)) + geom_hline(yintercept = 0, linetype = "dotted") + stat_fit_augment(geom = "point", method.args = list(formula = y ~ x, weights = quote(weight)), y.out = ".resid")# Package 'broom' needs to be installed to run these examples. # We check availability before running them to avoid errors. broom.installed <- requireNamespace("broom", quietly = TRUE) gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (broom.installed) { library(broom) } # Inspecting the returned data using geom_debug_group() if (gginnards.installed) { library(gginnards) } # Regression by panel, inspecting data if (broom.installed & gginnards.installed) { ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point(aes(colour = factor(cyl))) + stat_fit_augment(method = "lm", method.args = list(formula = y ~ x), geom = "debug_group", dbgfun.data = colnames) } # Regression by panel example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point(aes(colour = factor(cyl))) + stat_fit_augment(method = "lm", method.args = list(formula = y ~ x)) if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point(aes(colour = factor(cyl))) + stat_fit_augment(method = "lm", augment.args = list(se_fit = TRUE), method.args = list(formula = y ~ x + I(x^2))) # Residuals from regression by panel example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_hline(yintercept = 0, linetype = "dotted") + stat_fit_augment(geom = "point", method = "lm", method.args = list(formula = y ~ x), y.out = ".resid") # Regression by group example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, colour = factor(cyl))) + geom_point() + stat_fit_augment(method = "lm", augment.args = list(se_fit = TRUE), method.args = list(formula = y ~ x)) # Residuals from regression by group example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, colour = factor(cyl))) + geom_hline(yintercept = 0, linetype = "dotted") + stat_fit_augment(geom = "point", method.args = list(formula = y ~ x), y.out = ".resid") # Weighted regression example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, weight = cyl)) + geom_point(aes(colour = factor(cyl))) + stat_fit_augment(method = "lm", method.args = list(formula = y ~ x, weights = quote(weight))) # Residuals from weighted regression example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, weight = cyl)) + geom_hline(yintercept = 0, linetype = "dotted") + stat_fit_augment(geom = "point", method.args = list(formula = y ~ x, weights = quote(weight)), y.out = ".resid")
Statistic stat_fit_residuals fits a model and plots residuals vs.
x. Statistic stat_fit_deviations fits a model and and
highlighting residuals as segments in a y vs. x plot. Statistic
stat_fit_fitted plots the fitetd values vs. x.
stat_fit_deviations( mapping = NULL, data = NULL, geom = "segment", position = "identity", ..., orientation = NA, method = "lm", method.args = list(), n.min = 2L, formula = NULL, fit.seed = NA, na.rm = FALSE, show.legend = TRUE, inherit.aes = TRUE ) stat_fit_fitted( mapping = NULL, data = NULL, geom = "point", position = "identity", orientation = NA, ..., method = "lm", method.args = list(), n.min = 2L, formula = NULL, fit.seed = NA, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) stat_fit_residuals( mapping = NULL, data = NULL, geom = "point", position = "identity", ..., orientation = NA, method = "lm", method.args = list(), n.min = 2L, formula = NULL, fit.seed = NA, resid.type = NULL, weighted = FALSE, na.rm = FALSE, show.legend = TRUE, inherit.aes = TRUE )stat_fit_deviations( mapping = NULL, data = NULL, geom = "segment", position = "identity", ..., orientation = NA, method = "lm", method.args = list(), n.min = 2L, formula = NULL, fit.seed = NA, na.rm = FALSE, show.legend = TRUE, inherit.aes = TRUE ) stat_fit_fitted( mapping = NULL, data = NULL, geom = "point", position = "identity", orientation = NA, ..., method = "lm", method.args = list(), n.min = 2L, formula = NULL, fit.seed = NA, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) stat_fit_residuals( mapping = NULL, data = NULL, geom = "point", position = "identity", ..., orientation = NA, method = "lm", method.args = list(), n.min = 2L, formula = NULL, fit.seed = NA, resid.type = NULL, weighted = FALSE, na.rm = FALSE, show.legend = TRUE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
orientation |
character Either "x" or "y" controlling the default for
|
method |
function or character If character, "lm", "rlm", "lmrob",
"lts", "gls", "ma", "sma", "segreg", "rq" or the name of a model fit
function are accepted, possibly followed by the fit function's
|
method.args |
named list with additional arguments. Not |
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
formula |
a formula object. Using aesthetic names |
fit.seed |
RNG seed argument passed to
|
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
resid.type |
character passed to |
weighted |
logical If true weighted residuals will be returned. |
stat_fit_deviations() can be used to highlight residuals as
segments in a plot of a fitted model prediction. This statistic returns the
original x and y values and the fitted y or x
values depending on the orientation, together with prior and
posterior weights.
stat_fit_fitted() can be used to highlight as points the fitted
values. This statistic returns the original x or y values
and the fitted y or x values depending on the
orientation.
stat_fit_residuals() plots residuals as points. It applies to the
fitted model object methods residuals() or
weighted.residuals() depending on the argument passed
to parameter weighted. This statistic returns the original x
and y values and residuals depending on the orientation,
together with prior and posterior weights.
The returned value is always a data frame with the same number of
rows as the argument passed to data, except for the case failure of
the model fitting, in which case a data frame with no rows is returned. The
columns returned vary between the three statistics, and for each statistic
depending on the orientation..
To explore the values returned by statistics we suggest the use of
geom_debug_group(). Examples are shown below,
where one can also see in addition to the computed values the default
mapping of the fitted values to aesthetics xend and yend.
Two types of weights are possible: prior ones supplied in the call, and
posterior weights (called "robustness weights" in robust regression
methods) implicitly or explicitly used by fit methods to address
heterogeneity of error variance, including the presence of outlier
observations . Not all the supported methods accepts prior weights and
gls() returns posterior weights that are not in 0..1 like in the
case of most other fits. When not accessible weights are set to 1 when
known to be equal to 1, which is the most frequent case, or to NA
otherwise.
How weights are applied to residuals depends on the method used to fit the model. For ordinary least squares (OLS), weights are applied to the squares of the residuals, so the weighted residuals are obtained by multiplying the "deviance" residuals by the square root of the weights. When residuals are penalized differently to fit a model, the weighted residuals need to be computed accordingly.
stat_fit_residuals()
x coordinates of observations
y coordinates of observations
x residuals from fitted values
y residuals from fitted values
the weights
passed as input to lm(), rlm(), lmrob(),
or to other model fit functions
using aesthetic weight. More generally the value returned by
method weights() applied to the model fit object
the "weights"
of the applied minimization criterion relative to those of OLS in
rlm() or lmrob() or the divisor weights from
gls(), lme() or nlme()
stat_fit_deviations()
x coordinates of observations
y coordinates of observations
x coordinates of fitted values
y coordinates of fitted values
the weights passed as input to lm(), rlm(), or lmrob(),
using aesthetic weight. More generally the value returned by
weights()
the "weights"
of the applied minimization criterion relative to those of OLS in
rlm(), or lmrob()
stat_fit_fitted()
x coordinates of observations or fitted
y coordinates of observations or fitted
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
The interpretation of the argument passed to formula is enhanced
compared to stat_smooth(). Formulas with x as explanatory
variable work as in stat_smooth() but formulas with y as
explanatory variable are also accepted. orientation is set
automatically based on which explanatory variable appears in the formula.
Spline-based smoothers are only partially supported.
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
stat_fit_residuals() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
stat_fit_deviations() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | xend |
→ after_stat(x.fitted) |
| • | yend |
→ after_stat(y.fitted)
|
stat_fit_fitted() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
Learn more about setting these aesthetics in vignette("ggplot2-specs").
In the case of method = "rq" quantiles are fixed at tau =
0.5 unless method.args has length > 0. Parameter orientation
is redundant as it only affects the default for formula but is
included for consistency with ggplot2.
Other 'ggpmisc' statistics for model fits:
stat_distrmix_eq(),
stat_fit_glance(),
stat_fit_tb(),
stat_fit_tidy(),
stat_ma_eq(),
stat_poly_eq(),
stat_quant_band()
# generate artificial data set.seed(4321) x <- 1:100 y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4) my.data <- data.frame(x, y) # give a name to a formula my.formula <- y ~ poly(x, 3, raw = TRUE) my.y.formula <- x ~ poly(y, 3, raw = TRUE) # plot residuals from linear model ggplot(my.data, aes(x, y)) + stat_poly_line(method = "lm", formula = my.formula) + stat_fit_deviations(method = "lm", formula = my.formula, colour = "red") + geom_point() # plot residuals from linear model with y as explanatory variable ggplot(my.data, aes(x, y)) + stat_poly_line(method = "lm", formula = my.y.formula) + stat_fit_deviations(method = "lm", formula = my.y.formula, colour = "red") + geom_point() # plot robust regression ggplot(my.data, aes(x, y)) + stat_poly_line(formula = my.formula, method = "rlm") + stat_fit_deviations(formula = my.formula, method = "rlm", colour = "red") + geom_point() # plot robust regression with weights indicated by colour my.data.outlier <- my.data my.data.outlier[6, "y"] <- my.data.outlier[6, "y"] * 5 ggplot(my.data.outlier, aes(x, y)) + stat_poly_line(method = MASS::rlm, formula = my.formula) + stat_fit_deviations(formula = my.formula, method = "rlm", mapping = aes(colour = after_stat(robustness.weights)), show.legend = TRUE) + scale_color_gradient(low = "red", high = "blue", limits = c(0, 1), guide = "colourbar") + geom_point() # plot quantile regression (= median regression) ggplot(my.data, aes(x, y)) + stat_quantile(formula = my.formula, quantiles = 0.5) + stat_fit_deviations(formula = my.formula, method = "rq", colour = "red") + geom_point() # plot quantile regression (= "quartile" regression) ggplot(my.data, aes(x, y)) + stat_quantile(formula = my.formula, quantiles = 0.75) + stat_fit_deviations(formula = my.formula, colour = "red", method = "rq", method.args = list(tau = 0.75)) + geom_point() # plot residuals from linear model ggplot(my.data, aes(x, y)) + geom_hline(yintercept = 0, linetype = "dashed") + stat_fit_residuals(formula = my.formula) # plot residuals from linear model with y as explanatory variable ggplot(my.data, aes(x, y)) + geom_vline(xintercept = 0, linetype = "dashed") + stat_fit_residuals(formula = my.y.formula) + coord_flip() ggplot(my.data, aes(x, y)) + geom_hline(yintercept = 0, linetype = "dashed") + stat_fit_residuals(formula = my.formula, resid.type = "response") # plot residuals with weights indicated by colour my.data.outlier <- my.data my.data.outlier[6, "y"] <- my.data.outlier[6, "y"] * 5 ggplot(my.data.outlier, aes(x, y)) + stat_fit_residuals(formula = my.formula, method = "rlm", mapping = aes(colour = after_stat(robustness.weights)), show.legend = TRUE) + scale_color_gradient(low = "red", high = "blue", limits = c(0, 1), guide = "colourbar") # plot weighted residuals with weights indicated by colour ggplot(my.data.outlier) + stat_fit_residuals(formula = my.formula, method = "rlm", mapping = aes(x = x, y = stage(start = y, after_stat = y * weights), colour = after_stat(robustness.weights)), show.legend = TRUE) + scale_color_gradient(low = "red", high = "blue", limits = c(0, 1), guide = "colourbar") # inspecting the returned data gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) # plot, using geom_debug_group() to explore the after_stat data if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_fit_deviations(formula = my.formula, geom = "debug_group") if (gginnards.installed) ggplot(my.data.outlier, aes(x, y)) + stat_fit_deviations(formula = my.formula, method = "rlm", geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_fit_residuals(formula = my.formula, resid.type = "working", geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_fit_residuals(formula = my.formula, method = "rlm", geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_fit_fitted(formula = my.formula, geom = "debug_group")# generate artificial data set.seed(4321) x <- 1:100 y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4) my.data <- data.frame(x, y) # give a name to a formula my.formula <- y ~ poly(x, 3, raw = TRUE) my.y.formula <- x ~ poly(y, 3, raw = TRUE) # plot residuals from linear model ggplot(my.data, aes(x, y)) + stat_poly_line(method = "lm", formula = my.formula) + stat_fit_deviations(method = "lm", formula = my.formula, colour = "red") + geom_point() # plot residuals from linear model with y as explanatory variable ggplot(my.data, aes(x, y)) + stat_poly_line(method = "lm", formula = my.y.formula) + stat_fit_deviations(method = "lm", formula = my.y.formula, colour = "red") + geom_point() # plot robust regression ggplot(my.data, aes(x, y)) + stat_poly_line(formula = my.formula, method = "rlm") + stat_fit_deviations(formula = my.formula, method = "rlm", colour = "red") + geom_point() # plot robust regression with weights indicated by colour my.data.outlier <- my.data my.data.outlier[6, "y"] <- my.data.outlier[6, "y"] * 5 ggplot(my.data.outlier, aes(x, y)) + stat_poly_line(method = MASS::rlm, formula = my.formula) + stat_fit_deviations(formula = my.formula, method = "rlm", mapping = aes(colour = after_stat(robustness.weights)), show.legend = TRUE) + scale_color_gradient(low = "red", high = "blue", limits = c(0, 1), guide = "colourbar") + geom_point() # plot quantile regression (= median regression) ggplot(my.data, aes(x, y)) + stat_quantile(formula = my.formula, quantiles = 0.5) + stat_fit_deviations(formula = my.formula, method = "rq", colour = "red") + geom_point() # plot quantile regression (= "quartile" regression) ggplot(my.data, aes(x, y)) + stat_quantile(formula = my.formula, quantiles = 0.75) + stat_fit_deviations(formula = my.formula, colour = "red", method = "rq", method.args = list(tau = 0.75)) + geom_point() # plot residuals from linear model ggplot(my.data, aes(x, y)) + geom_hline(yintercept = 0, linetype = "dashed") + stat_fit_residuals(formula = my.formula) # plot residuals from linear model with y as explanatory variable ggplot(my.data, aes(x, y)) + geom_vline(xintercept = 0, linetype = "dashed") + stat_fit_residuals(formula = my.y.formula) + coord_flip() ggplot(my.data, aes(x, y)) + geom_hline(yintercept = 0, linetype = "dashed") + stat_fit_residuals(formula = my.formula, resid.type = "response") # plot residuals with weights indicated by colour my.data.outlier <- my.data my.data.outlier[6, "y"] <- my.data.outlier[6, "y"] * 5 ggplot(my.data.outlier, aes(x, y)) + stat_fit_residuals(formula = my.formula, method = "rlm", mapping = aes(colour = after_stat(robustness.weights)), show.legend = TRUE) + scale_color_gradient(low = "red", high = "blue", limits = c(0, 1), guide = "colourbar") # plot weighted residuals with weights indicated by colour ggplot(my.data.outlier) + stat_fit_residuals(formula = my.formula, method = "rlm", mapping = aes(x = x, y = stage(start = y, after_stat = y * weights), colour = after_stat(robustness.weights)), show.legend = TRUE) + scale_color_gradient(low = "red", high = "blue", limits = c(0, 1), guide = "colourbar") # inspecting the returned data gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) # plot, using geom_debug_group() to explore the after_stat data if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_fit_deviations(formula = my.formula, geom = "debug_group") if (gginnards.installed) ggplot(my.data.outlier, aes(x, y)) + stat_fit_deviations(formula = my.formula, method = "rlm", geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_fit_residuals(formula = my.formula, resid.type = "working", geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_fit_residuals(formula = my.formula, method = "rlm", geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_fit_fitted(formula = my.formula, geom = "debug_group")
stat_fit_glance() fits a model and returns a "tidy"
version of the model's fit, using 'glance() methods from packages
'broom', 'broom.mixed', or other sources.
stat_fit_glance( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., method = "lm", method.args = list(formula = y ~ x), n.min = 2L, fit.seed = NA, glance.args = list(), label.x = "left", label.y = "top", hstep = 0, vstep = 0.075, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )stat_fit_glance( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., method = "lm", method.args = list(formula = y ~ x), n.min = 2L, fit.seed = NA, glance.args = list(), label.x = "left", label.y = "top", hstep = 0, vstep = 0.075, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
method |
function or character If character, "lm", "rlm", "lmrob",
"lts", "gls", "ma", "sma", "segreg", "rq" or the name of a model fit
function are accepted, possibly followed by the fit function's
|
method.args, glance.args
|
list of arguments to pass to |
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
fit.seed |
RNG seed argument passed to
|
label.x, label.y
|
|
hstep, vstep
|
numeric in npc units, the horizontal and vertical step used between labels for different groups. |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
stat_fit_glance() together with stat_fit_tidy()
and stat_fit_augment(), based on package 'broom' can be used
with a broad range of model fitting functions as supported at any given
time by package 'broom'. In contrast to stat_poly_eq() which
can generate text or expression labels automatically, for these functions
the mapping of aesthetic label needs to be explicitly supplied in
the call, and labels built on the fly in the mapping to geom aesthetics.
Although arguments passed to parameter glance.args are
passed to glance() whether they are silently
ignored or obeyed depends on each specialization of glance(), so do
carefully read the documentation for the version of glance()
corresponding to the method used to fit the model.
Warning! Not all glance() methods are defined in package
'broom'. glance() specializations for mixed models fits of classes
"lme", "nlme", "lme4" and many others are defined in
package 'broom.mixed'.
The output of the glance() methods is returned almost as is in
the data object, as a data frame. The names of the columns in the
returned data are consistent with those returned by method glance()
from package 'broom', that will frequently differ from the name of values
returned by the print methods corresponding to the fit or test function
used. To explore the values returned by this statistic including the name
of variables/columns, which vary depending on the model fitting function
and model formula we suggest the use of
geom_debug. An example is shown below.
stat_fit_glance applies the function
given by method separately to each group of observations, and
factors mapped to aesthetics, including x and y, create a
separate group for each factor level. Because of this,
stat_fit_glance is not useful for annotating plots with results from
t.test(), ANOVA or ANCOVA. In such cases use the
stat_fit_tb() statistic which applies the model fitting per panel.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
The interpretation of the argument passed to formula is enhanced
compared to stat_smooth(). Formulas with x as explanatory
variable work as in stat_smooth() but formulas with y as
explanatory variable are also accepted. orientation is set
automatically based on which explanatory variable appears in the formula.
Spline-based smoothers are only partially supported.
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
stat_fit_glance() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | hjust |
→ "inward" |
| • | npcx |
→ after_stat(npcx) |
| • | npcy |
→ after_stat(npcy) |
| • | vjust |
→ "inward"
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Package broom for details on how the tidying of
the result of model fits is done.
Other 'ggpmisc' statistics for model fits:
stat_distrmix_eq(),
stat_fit_deviations(),
stat_fit_tb(),
stat_fit_tidy(),
stat_ma_eq(),
stat_poly_eq(),
stat_quant_band()
# package 'broom' needs to be installed to run these examples broom.installed <- requireNamespace("broom", quietly = TRUE) gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (broom.installed) { library(broom) } if (gginnards.installed) { library(gginnards) } # Inspecting the returned data using geom_debug_group() if (broom.installed && gginnards.installed) { ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm") + geom_point(aes(colour = factor(cyl))) + stat_fit_glance(method = "lm", method.args = list(formula = y ~ x), geom = "debug_group") } if (broom.installed) # Regression by panel example ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm", formula = y ~ x) + geom_point(aes(colour = factor(cyl))) + stat_fit_glance(method = "lm", label.y = "bottom", method.args = list(formula = y ~ x), mapping = aes(label = sprintf('italic(r)^2~"="~%.3f~~italic(P)~"="~%.2g', after_stat(r.squared), after_stat(p.value))), parse = TRUE) # Regression by group example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, colour = factor(cyl))) + stat_smooth(method = "lm") + geom_point() + stat_fit_glance(method = "lm", label.y = "bottom", method.args = list(formula = y ~ x), mapping = aes(label = sprintf('r^2~"="~%.3f~~italic(P)~"="~%.2g', after_stat(r.squared), after_stat(p.value))), parse = TRUE) # Weighted regression example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, weight = cyl)) + stat_smooth(method = "lm") + geom_point(aes(colour = factor(cyl))) + stat_fit_glance(method = "lm", label.y = "bottom", method.args = list(formula = y ~ x, weights = quote(weight)), mapping = aes(label = sprintf('r^2~"="~%.3f~~italic(P)~"="~%.2g', after_stat(r.squared), after_stat(p.value))), parse = TRUE) # correlation test if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point() + stat_fit_glance(method = "cor.test", label.y = "bottom", method.args = list(formula = ~ x + y), mapping = aes(label = sprintf('r[Pearson]~"="~%.3f~~italic(P)~"="~%.2g', after_stat(estimate), after_stat(p.value))), parse = TRUE) if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point() + stat_fit_glance(method = "cor.test", label.y = "bottom", method.args = list(formula = ~ x + y, method = "spearman", exact = FALSE), mapping = aes(label = sprintf('r[Spearman]~"="~%.3f~~italic(P)~"="~%.2g', after_stat(estimate), after_stat(p.value))), parse = TRUE)# package 'broom' needs to be installed to run these examples broom.installed <- requireNamespace("broom", quietly = TRUE) gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (broom.installed) { library(broom) } if (gginnards.installed) { library(gginnards) } # Inspecting the returned data using geom_debug_group() if (broom.installed && gginnards.installed) { ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm") + geom_point(aes(colour = factor(cyl))) + stat_fit_glance(method = "lm", method.args = list(formula = y ~ x), geom = "debug_group") } if (broom.installed) # Regression by panel example ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm", formula = y ~ x) + geom_point(aes(colour = factor(cyl))) + stat_fit_glance(method = "lm", label.y = "bottom", method.args = list(formula = y ~ x), mapping = aes(label = sprintf('italic(r)^2~"="~%.3f~~italic(P)~"="~%.2g', after_stat(r.squared), after_stat(p.value))), parse = TRUE) # Regression by group example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, colour = factor(cyl))) + stat_smooth(method = "lm") + geom_point() + stat_fit_glance(method = "lm", label.y = "bottom", method.args = list(formula = y ~ x), mapping = aes(label = sprintf('r^2~"="~%.3f~~italic(P)~"="~%.2g', after_stat(r.squared), after_stat(p.value))), parse = TRUE) # Weighted regression example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, weight = cyl)) + stat_smooth(method = "lm") + geom_point(aes(colour = factor(cyl))) + stat_fit_glance(method = "lm", label.y = "bottom", method.args = list(formula = y ~ x, weights = quote(weight)), mapping = aes(label = sprintf('r^2~"="~%.3f~~italic(P)~"="~%.2g', after_stat(r.squared), after_stat(p.value))), parse = TRUE) # correlation test if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point() + stat_fit_glance(method = "cor.test", label.y = "bottom", method.args = list(formula = ~ x + y), mapping = aes(label = sprintf('r[Pearson]~"="~%.3f~~italic(P)~"="~%.2g', after_stat(estimate), after_stat(p.value))), parse = TRUE) if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + geom_point() + stat_fit_glance(method = "cor.test", label.y = "bottom", method.args = list(formula = ~ x + y, method = "spearman", exact = FALSE), mapping = aes(label = sprintf('r[Spearman]~"="~%.3f~~italic(P)~"="~%.2g', after_stat(estimate), after_stat(p.value))), parse = TRUE)
stat_fit_tb() fits a model and returns a "tidy" version of
the model's summary or ANOVA table, using 'tidy() methods from
packages 'broom', 'broom.mixed', or other 'broom' extensions. The
annotation is added to the plots in tabular form.
stat_fit_tb( mapping = NULL, data = NULL, geom = "table_npc", position = "identity", ..., method = "lm", method.args = list(formula = y ~ x), n.min = 2L, fit.seed = NA, tidy.args = list(), tb.type = "fit.summary", tb.vars = NULL, tb.params = NULL, digits = 3, p.digits = digits, label.x = "center", label.y = "top", table.theme = NULL, table.rownames = FALSE, table.colnames = TRUE, table.hjust = 1, parse = FALSE, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )stat_fit_tb( mapping = NULL, data = NULL, geom = "table_npc", position = "identity", ..., method = "lm", method.args = list(formula = y ~ x), n.min = 2L, fit.seed = NA, tidy.args = list(), tb.type = "fit.summary", tb.vars = NULL, tb.params = NULL, digits = 3, p.digits = digits, label.x = "center", label.y = "top", table.theme = NULL, table.rownames = FALSE, table.colnames = TRUE, table.hjust = 1, parse = FALSE, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
method |
function or character If character, "lm", "rlm", "lmrob",
"lts", "gls", "ma", "sma", "segreg", "rq" or the name of a model fit
function are accepted, possibly followed by the fit function's
|
method.args, tidy.args
|
lists of arguments to pass to |
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
fit.seed |
RNG seed argument passed to
|
tb.type |
character One of |
tb.vars, tb.params
|
character or numeric vectors, optionally named, used to select and/or rename the columns or the parameters in the table returned. |
digits |
integer indicating the number of significant digits to be used for all numeric values in the table. |
p.digits |
integer indicating the number of decimal places to round
p-values to, with those rounded to zero displayed as the next larger
possible value preceded by "<". If |
label.x, label.y
|
|
table.theme |
NULL, list or function A 'gridExtra' |
table.rownames, table.colnames
|
logical flag to enable or disabling printing of row names and column names. |
table.hjust |
numeric Horizontal justification for the core and column headings of the table. |
parse |
logical Passed to the geom. If |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
stat_fit_tb() Applies a model fitting function per panel,
using the grouping factors from aesthetic mappings in the fitted model.
This is suitable, for example for analysis of variance used to test for
differences among groups.
The argument to method can be any fit method for which a suitable
tidy() method is available, including non-linear regression. Fit
methods retain their default arguments unless overridden.
A tibble with columns named fm.tb (a tibble returned by
tidy() with possibly renamed and subset columns and rows, within a
list), fm.tb.type (copy of argument passed to tb.type),
fm.class (the class of the fitted model object), fm.method
(the fit function's name), fm.call (the call if available), x
and y.
To explore the values returned by this statistic, which vary depending on
the model fitting function and model formula we suggest the use of
geom_debug.
The output of tidy() is returned as a
single "cell" in a tibble (i.e., a tibble nested within a tibble). The
returned data object contains a single tibble, containing the result
from a single model fit to all data in a panel. If grouping is present, it
is ignored in the sense of returning a single table, but the grouping
aesthetic can be a term in the fitted model.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
The interpretation of the argument passed to formula is enhanced
compared to stat_smooth(). Formulas with x as explanatory
variable work as in stat_smooth() but formulas with y as
explanatory variable are also accepted. orientation is set
automatically based on which explanatory variable appears in the formula.
Spline-based smoothers are only partially supported.
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
stat_fit_tb() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | hjust |
→ "inward" |
| • | label |
→ after_stat(fm.tb) |
| • | vjust |
→ "inward"
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Package broom for details on how the tidying of
the result of model fits is done. See geom_table for
details on how inset tables respond to mapped aesthetics and table themes.
For details on predefined table themes see
ttheme_gtdefault.
Other 'ggpmisc' statistics for model fits:
stat_distrmix_eq(),
stat_fit_deviations(),
stat_fit_glance(),
stat_fit_tidy(),
stat_ma_eq(),
stat_poly_eq(),
stat_quant_band()
# Package 'broom' needs to be installed to run these examples. # We check availability before running them to avoid errors. broom.installed <- requireNamespace("broom", quietly = TRUE) if (broom.installed) library(broom) # data for examples x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) covariate <- sqrt(x) + rnorm(9) group <- factor(c(rep("A", 4), rep("B", 5))) my.df <- data.frame(x, group, covariate) gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) ## covariate is a numeric or continuous variable # Linear regression fit summary, all defaults if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb() + expand_limits(y = 70) # we can use geom_debug_panel() and str() to inspect the returned value # and discover the variables that can be mapped to aesthetics with # after_stat() if (broom.installed && gginnards.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(geom = "debug_panel", dbgfun.data = str) + expand_limits(y = 70) # Linear regression fit summary, with default formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.summary") + expand_limits(y = 70) # Linear regression fit summary, with manual table formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(digits = 2, p.digits = 4, tb.params = c("intercept" = 1, "covariate" = 2), tb.vars = c(Term = 1, Estimate = 2, "italic(s)" = 3, "italic(t)" = 4, "italic(P)" = 5), parse = TRUE) + expand_limits(y = 70) # Linear regression ANOVA table, with default formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova") + expand_limits(y = 70) # Linear regression ANOVA table, with manual table formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", tb.params = c("Covariate" = 1, 2), tb.vars = c(Effect = 1, d.f. = 2, "M.S." = 4, "italic(F)" = 5, "italic(P)" = 6), parse = TRUE) + expand_limits(y = 67) # Linear regression fit coeficients, with default formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.coefs") + expand_limits(y = 67) # Linear regression fit coeficients, with manual table formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.coefs", tb.params = c(a = 1, b = 2), tb.vars = c(Term = 1, Estimate = 2)) + expand_limits(y = 67) ## x is also a numeric or continuous variable # Polynomial regression, with default formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(method.args = list(formula = y ~ poly(x, 2))) + expand_limits(y = 70) # Polynomial regression, with manual table formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(method.args = list(formula = y ~ poly(x, 2)), tb.params = c("x^0" = 1, "x^1" = 2, "x^2" = 3), tb.vars = c("Term" = 1, "Estimate" = 2, "S.E." = 3, "italic(t)" = 4, "italic(P)" = 5), parse = TRUE) + expand_limits(y = 70) ## group is a factor or discrete variable # ANOVA summary, with default formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb() + expand_limits(y = 70) # ANOVA table, with default formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova") + expand_limits(y = 70) # ANOVA table, with manual table formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", tb.vars = c(Effect = "term", "df", "italic(F)" = "statistic", "italic(P)" = "p.value"), tb.params = c(Group = 1, Error = 2), parse = TRUE) # ANOVA table, with manual table formatting # using column names with partial matching if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", tb.vars = c(Effect = "term", "df", "italic(F)" = "stat", "italic(P)" = "p"), tb.params = c(Group = "x", Error = "Resid"), parse = TRUE) # ANOVA summary, with default formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb() + expand_limits(y = 70) ## covariate is a numeric variable and group is a factor # ANCOVA (covariate not plotted) ANOVA table, with default formatting if (broom.installed) ggplot(my.df, aes(group, x, z = covariate)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", method.args = list(formula = y ~ x + z)) # ANCOVA (covariate not plotted) ANOVA table, with manual table formatting if (broom.installed) ggplot(my.df, aes(group, x, z = covariate)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", method.args = list(formula = y ~ x + z), tb.vars = c(Effect = 1, d.f. = 2, "M.S." = 4, "italic(F)" = 5, "italic(P)" = 6), tb.params = c(Group = 1, Covariate = 2, Error = 3), parse = TRUE) ## group is a factor or discrete variable # t-test, minimal output, with manual table formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(method = "t.test", tb.vars = c("italic(t)" = "statistic", "italic(P)" = "p.value"), parse = TRUE) # t-test, more detailed output, with manual table formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(method = "t.test", tb.vars = c("\"Delta \"*italic(x)" = "estimate", "CI low" = "conf.low", "CI high" = "conf.high", "italic(t)" = "statistic", "italic(P)" = "p.value"), parse = TRUE) + expand_limits(y = 67) # t-test (equal variances assumed), minimal output, with manual # table formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(method = "t.test", method.args = list(formula = y ~ x, var.equal = TRUE), tb.vars = c("italic(t)" = "statistic", "italic(P)" = "p.value"), parse = TRUE) ## covariate is a numeric or continuous variable # Linear regression using a table theme and non-default position if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(table.theme = ttheme_gtlight, npcx = "left", npcy = "bottom") + expand_limits(y = 35)# Package 'broom' needs to be installed to run these examples. # We check availability before running them to avoid errors. broom.installed <- requireNamespace("broom", quietly = TRUE) if (broom.installed) library(broom) # data for examples x <- c(44.4, 45.9, 41.9, 53.3, 44.7, 44.1, 50.7, 45.2, 60.1) covariate <- sqrt(x) + rnorm(9) group <- factor(c(rep("A", 4), rep("B", 5))) my.df <- data.frame(x, group, covariate) gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) ## covariate is a numeric or continuous variable # Linear regression fit summary, all defaults if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb() + expand_limits(y = 70) # we can use geom_debug_panel() and str() to inspect the returned value # and discover the variables that can be mapped to aesthetics with # after_stat() if (broom.installed && gginnards.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(geom = "debug_panel", dbgfun.data = str) + expand_limits(y = 70) # Linear regression fit summary, with default formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.summary") + expand_limits(y = 70) # Linear regression fit summary, with manual table formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(digits = 2, p.digits = 4, tb.params = c("intercept" = 1, "covariate" = 2), tb.vars = c(Term = 1, Estimate = 2, "italic(s)" = 3, "italic(t)" = 4, "italic(P)" = 5), parse = TRUE) + expand_limits(y = 70) # Linear regression ANOVA table, with default formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova") + expand_limits(y = 70) # Linear regression ANOVA table, with manual table formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", tb.params = c("Covariate" = 1, 2), tb.vars = c(Effect = 1, d.f. = 2, "M.S." = 4, "italic(F)" = 5, "italic(P)" = 6), parse = TRUE) + expand_limits(y = 67) # Linear regression fit coeficients, with default formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.coefs") + expand_limits(y = 67) # Linear regression fit coeficients, with manual table formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(tb.type = "fit.coefs", tb.params = c(a = 1, b = 2), tb.vars = c(Term = 1, Estimate = 2)) + expand_limits(y = 67) ## x is also a numeric or continuous variable # Polynomial regression, with default formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(method.args = list(formula = y ~ poly(x, 2))) + expand_limits(y = 70) # Polynomial regression, with manual table formatting if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(method.args = list(formula = y ~ poly(x, 2)), tb.params = c("x^0" = 1, "x^1" = 2, "x^2" = 3), tb.vars = c("Term" = 1, "Estimate" = 2, "S.E." = 3, "italic(t)" = 4, "italic(P)" = 5), parse = TRUE) + expand_limits(y = 70) ## group is a factor or discrete variable # ANOVA summary, with default formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb() + expand_limits(y = 70) # ANOVA table, with default formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova") + expand_limits(y = 70) # ANOVA table, with manual table formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", tb.vars = c(Effect = "term", "df", "italic(F)" = "statistic", "italic(P)" = "p.value"), tb.params = c(Group = 1, Error = 2), parse = TRUE) # ANOVA table, with manual table formatting # using column names with partial matching if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", tb.vars = c(Effect = "term", "df", "italic(F)" = "stat", "italic(P)" = "p"), tb.params = c(Group = "x", Error = "Resid"), parse = TRUE) # ANOVA summary, with default formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb() + expand_limits(y = 70) ## covariate is a numeric variable and group is a factor # ANCOVA (covariate not plotted) ANOVA table, with default formatting if (broom.installed) ggplot(my.df, aes(group, x, z = covariate)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", method.args = list(formula = y ~ x + z)) # ANCOVA (covariate not plotted) ANOVA table, with manual table formatting if (broom.installed) ggplot(my.df, aes(group, x, z = covariate)) + geom_point() + stat_fit_tb(tb.type = "fit.anova", method.args = list(formula = y ~ x + z), tb.vars = c(Effect = 1, d.f. = 2, "M.S." = 4, "italic(F)" = 5, "italic(P)" = 6), tb.params = c(Group = 1, Covariate = 2, Error = 3), parse = TRUE) ## group is a factor or discrete variable # t-test, minimal output, with manual table formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(method = "t.test", tb.vars = c("italic(t)" = "statistic", "italic(P)" = "p.value"), parse = TRUE) # t-test, more detailed output, with manual table formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(method = "t.test", tb.vars = c("\"Delta \"*italic(x)" = "estimate", "CI low" = "conf.low", "CI high" = "conf.high", "italic(t)" = "statistic", "italic(P)" = "p.value"), parse = TRUE) + expand_limits(y = 67) # t-test (equal variances assumed), minimal output, with manual # table formatting if (broom.installed) ggplot(my.df, aes(group, x)) + geom_point() + stat_fit_tb(method = "t.test", method.args = list(formula = y ~ x, var.equal = TRUE), tb.vars = c("italic(t)" = "statistic", "italic(P)" = "p.value"), parse = TRUE) ## covariate is a numeric or continuous variable # Linear regression using a table theme and non-default position if (broom.installed) ggplot(my.df, aes(covariate, x)) + geom_point() + stat_fit_tb(table.theme = ttheme_gtlight, npcx = "left", npcy = "bottom") + expand_limits(y = 35)
stat_fit_tidy() fits a model and returns a "tidy" version
of the model's summary, using tidy() method specializations from
packages 'broom', 'broom.mixed', or other sources.
stat_fit_tidy( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., method = "lm", method.args = list(formula = y ~ x), n.min = 2L, fit.seed = NA, tidy.args = list(), label.x = "left", label.y = "top", hstep = 0, vstep = NULL, sanitize.names = FALSE, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )stat_fit_tidy( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., method = "lm", method.args = list(formula = y ~ x), n.min = 2L, fit.seed = NA, tidy.args = list(), label.x = "left", label.y = "top", hstep = 0, vstep = NULL, sanitize.names = FALSE, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
method |
function or character If character, "lm", "rlm", "lmrob",
"lts", "gls", "ma", "sma", "segreg", "rq" or the name of a model fit
function are accepted, possibly followed by the fit function's
|
method.args, tidy.args
|
list of arguments to pass to |
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
fit.seed |
RNG seed argument passed to
|
label.x, label.y
|
|
hstep, vstep
|
numeric in npc units, the horizontal and vertical step used between labels for different groups. |
sanitize.names |
logical If true sanitize column names in the returned
|
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
stat_fit_tidy together with stat_fit_glance
and stat_fit_augment, based on package 'broom' can be used
with a broad range of model fitting functions as supported at any given
time by 'broom'. In contrast to stat_poly_eq which can
generate text or expression labels automatically, for these functions the
mapping of aesthetic label needs to be explicitly supplied in the
call, and labels built on the fly.
Although arguments passed to parameter tidy.args will be passed
to tidy() whether they are silently ignored or
obeyed depends on each specialization of tidy(), so do carefully
read the documentation for the version of tidy() corresponding to
the method used to fit the model. You will also need to manually
install the package, such as 'broom', where the tidier you intend to use
are defined.
Warning! Not all tidy() methods are defined in package
'broom'. glance() specializations for mixed models fits of classes
"lme", "nlme", "lme4" and many others are defined in
package 'broom.mixed'.
The output of tidy() is returned after reshaping it into a
single row. Grouping is respected, and the model fitted separately to each
group of data. The returned data object has one row for each group
within a panel. To use the intercept, note that output of tidy() is
renamed from (Intercept) to Intercept. Otherwise, the names
of the columns in the returned data are based on those returned by the
tidy() method for the model fit class returned by the fit function.
These will frequently differ from the name of values returned by the print
methods corresponding to the fit or test function used. To explore the
values returned by this statistic including the name of variables/columns,
which vary depending on the model fitting function and model formula, we
suggest the use of geom_debug. An example is shown
below. Names of columns as returned by default are not always syntactically
valid R names making it necessary to use back ticks to access them.
Syntactically valid names are guaranteed if sanitize.names = TRUE is
added to the call.
To explore the values returned by this statistic, which vary depending on
the model fitting function and model formula we suggest the use of
geom_debug. An example is shown below.
stat_fit_tidy applies the function
given by method separately to each group of observations; in ggplot2
factors mapped to aesthetics generate a separate group for each level.
Because of this, stat_fit_tidy is not useful for annotating plots
with results from t.test() or ANOVA or ANCOVA. In such cases use
instead stat_fit_tb() which applies the model fitting per panel.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
The interpretation of the argument passed to formula is enhanced
compared to stat_smooth(). Formulas with x as explanatory
variable work as in stat_smooth() but formulas with y as
explanatory variable are also accepted. orientation is set
automatically based on which explanatory variable appears in the formula.
Spline-based smoothers are only partially supported.
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
stat_fit_tidy() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | hjust |
→ "inward" |
| • | npcx |
→ after_stat(npcx) |
| • | npcy |
→ after_stat(npcy) |
| • | vjust |
→ "inward"
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Package broom for details on how the tidying of
the result of model fits is done.
Other 'ggpmisc' statistics for model fits:
stat_distrmix_eq(),
stat_fit_deviations(),
stat_fit_glance(),
stat_fit_tb(),
stat_ma_eq(),
stat_poly_eq(),
stat_quant_band()
# Package 'broom' needs to be installed to run these examples. # We check availability before running them to avoid errors. broom.installed <- requireNamespace("broom", quietly = TRUE) gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (broom.installed) { library(broom) } # Inspecting the returned data using geom_debug_group() if (gginnards.installed) { library(gginnards) } # Regression by panel, inspecting data if (broom.installed && gginnards.installed) { # Regression by panel, default column names ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm", formula = y ~ x + I(x^2)) + geom_point(aes(colour = factor(cyl))) + stat_fit_tidy(method = "lm", method.args = list(formula = y ~ x + I(x^2)), geom = "debug_group") # Regression by panel, sanitized column names ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm", formula = y ~ x + I(x^2)) + geom_point(aes(colour = factor(cyl))) + stat_fit_tidy(method = "lm", method.args = list(formula = y ~ x + I(x^2)), geom = "debug_group", sanitize.names = TRUE) } # Regression by panel example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm", formula = y ~ x) + geom_point(aes(colour = factor(cyl))) + stat_fit_tidy(method = "lm", label.x = "right", method.args = list(formula = y ~ x), mapping = aes(label = sprintf("Slope = %.3g\np-value = %.3g", after_stat(x_estimate), after_stat(x_p.value)))) # Regression by group example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, colour = factor(cyl))) + stat_smooth(method = "lm", formula = y ~ x) + geom_point() + stat_fit_tidy(method = "lm", label.x = "right", method.args = list(formula = y ~ x), mapping = aes(label = sprintf("Slope = %.3g, p-value = %.3g", after_stat(x_estimate), after_stat(x_p.value)))) # Weighted regression example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, weight = cyl)) + stat_smooth(method = "lm", formula = y ~ x) + geom_point(aes(colour = factor(cyl))) + stat_fit_tidy(method = "lm", label.x = "right", method.args = list(formula = y ~ x, weights = quote(weight)), mapping = aes(label = sprintf("Slope = %.3g\np-value = %.3g", after_stat(x_estimate), after_stat(x_p.value))))# Package 'broom' needs to be installed to run these examples. # We check availability before running them to avoid errors. broom.installed <- requireNamespace("broom", quietly = TRUE) gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (broom.installed) { library(broom) } # Inspecting the returned data using geom_debug_group() if (gginnards.installed) { library(gginnards) } # Regression by panel, inspecting data if (broom.installed && gginnards.installed) { # Regression by panel, default column names ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm", formula = y ~ x + I(x^2)) + geom_point(aes(colour = factor(cyl))) + stat_fit_tidy(method = "lm", method.args = list(formula = y ~ x + I(x^2)), geom = "debug_group") # Regression by panel, sanitized column names ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm", formula = y ~ x + I(x^2)) + geom_point(aes(colour = factor(cyl))) + stat_fit_tidy(method = "lm", method.args = list(formula = y ~ x + I(x^2)), geom = "debug_group", sanitize.names = TRUE) } # Regression by panel example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg)) + stat_smooth(method = "lm", formula = y ~ x) + geom_point(aes(colour = factor(cyl))) + stat_fit_tidy(method = "lm", label.x = "right", method.args = list(formula = y ~ x), mapping = aes(label = sprintf("Slope = %.3g\np-value = %.3g", after_stat(x_estimate), after_stat(x_p.value)))) # Regression by group example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, colour = factor(cyl))) + stat_smooth(method = "lm", formula = y ~ x) + geom_point() + stat_fit_tidy(method = "lm", label.x = "right", method.args = list(formula = y ~ x), mapping = aes(label = sprintf("Slope = %.3g, p-value = %.3g", after_stat(x_estimate), after_stat(x_p.value)))) # Weighted regression example if (broom.installed) ggplot(mtcars, aes(x = disp, y = mpg, weight = cyl)) + stat_smooth(method = "lm", formula = y ~ x) + geom_point(aes(colour = factor(cyl))) + stat_fit_tidy(method = "lm", label.x = "right", method.args = list(formula = y ~ x, weights = quote(weight)), mapping = aes(label = sprintf("Slope = %.3g\np-value = %.3g", after_stat(x_estimate), after_stat(x_p.value))))
Statistics stat_ma_line() and stat_ma_eq() fit model II
regressions. While stat_ma_line() adds a prediction line and band,
stat_ma_eq() adds textual labels to a plot.
stat_ma_eq( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., orientation = NA, formula = NULL, method = "lmodel2:MA", method.args = list(), n.min = 2L, range.y = NULL, range.x = NULL, nperm = 99, fit.seed = NA, eq.with.lhs = TRUE, eq.x.rhs = NULL, small.r = getOption("ggpmisc.small.r", default = FALSE), small.p = getOption("ggpmisc.small.p", default = FALSE), coef.digits = 3, coef.keep.zeros = TRUE, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), rr.digits = 2, theta.digits = 2, p.digits = max(1, ceiling(log10(nperm))), label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE ) stat_ma_line( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., orientation = NA, method = "lmodel2:MA", method.args = list(), n.min = 2L, formula = NULL, range.y = NULL, range.x = NULL, se = TRUE, fit.seed = NA, fm.values = FALSE, n = 80, nperm = 99, fullrange = FALSE, limit.to = NULL, level = 0.95, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )stat_ma_eq( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., orientation = NA, formula = NULL, method = "lmodel2:MA", method.args = list(), n.min = 2L, range.y = NULL, range.x = NULL, nperm = 99, fit.seed = NA, eq.with.lhs = TRUE, eq.x.rhs = NULL, small.r = getOption("ggpmisc.small.r", default = FALSE), small.p = getOption("ggpmisc.small.p", default = FALSE), coef.digits = 3, coef.keep.zeros = TRUE, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), rr.digits = 2, theta.digits = 2, p.digits = max(1, ceiling(log10(nperm))), label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE ) stat_ma_line( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., orientation = NA, method = "lmodel2:MA", method.args = list(), n.min = 2L, formula = NULL, range.y = NULL, range.x = NULL, se = TRUE, fit.seed = NA, fm.values = FALSE, n = 80, nperm = 99, fullrange = FALSE, limit.to = NULL, level = 0.95, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
orientation |
character Either "x" or "y" controlling the default for
|
formula |
a formula object. Using aesthetic names |
method |
function or character If character, "MA", "SMA" , "RMA" or
"OLS", alternatively "lmodel2" or the name of a model fit function are
accepted, possibly followed by the fit function's |
method.args |
named list with additional arguments. Not |
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
range.y, range.x
|
character Pass "relative" or "interval" if method "RMA" is to be computed. |
nperm |
integer Number of permutation used to estimate significance. |
fit.seed |
RNG seed argument passed to
|
eq.with.lhs |
If |
eq.x.rhs |
|
small.r, small.p
|
logical Flags to switch use of lower case r and p for coefficient of determination and p-value. |
coef.digits |
integer Number of significant digits to use for the fitted coefficients in the equation label. |
coef.keep.zeros |
logical Keep or drop trailing zeros when formatting the fitted coefficients and F-value. |
decreasing |
logical It specifies the order of the terms in the returned character string; in increasing (default) or decreasing powers. |
rr.digits, theta.digits, p.digits
|
integer Number of digits after the
decimal point to use for R^2, theta and P-value in labels. If |
label.x, label.y
|
|
hstep, vstep
|
numeric in npc units, the horizontal and vertical step used between labels for different groups. |
output.type |
character One of "expression", "text", "markdown", "marquee", "latex", "latex.eqn", "latex.deqn" or "numeric". |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
parse |
logical Passed to the geom. If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
se |
logical Return confidence interval around smooth? ('TRUE' by default, see 'level' to control.) |
fm.values |
logical Add metadata and parameter estimates extracted from
the fitted model object; |
n |
Number of points at which to predict with the fitted model. |
fullrange |
logical Should the fit prediction span the full range of the plot, or just the range of the explanatory variable? |
limit.to |
character or numeric If character one of |
level |
Level of confidence interval to use (only 0.95 currently). |
Statistics stat_ma_line() and stat_ma_eq fit major
axis ("MA") and other model II regressions with function
lmodel2 from package 'lmodel2'. They support linear major axis (MA),
standard major axis (SMA) and ranged major axis (RMA) regression.
MA and SMA regressions are supported also by stat_poly_line() and
stat_poly_eq() using package 'smatr' instead of 'lmodel2'.
stat_ma_line() adds the predicted line and confidence band based on
the uncertainty of the slope estimate.stat_ma_eq()
adds textual annotations with the fitted model equation and other parameter
estimates.
Model II regression is called for when both x and y are
subject to random variation and the intention is not to predict y
from x by means of the model but rather to study the relationship
between two independent variables. A frequent case in biology are
allometric relationships among body parts.
As the fitted line is the same whether x or y is on the rhs
of the model equation, orientation even if accepted does not have an
effect on the fitted line. It does, however, have an effect on the
formulation of the equation displayed in the label.
The minimum number of observations with distinct values can be set through
parameter n.min. The default n.min = 3L is the smallest
possible value. However, model fits with very few observations are of
little interest and using a larger number for n.min than the default
is wise. As model fitting functions could depend on the RNG,
fit.seed if different to NA is used as argument in a call to
set.seed() immediately ahead of model fitting.
In lmodel2() MA, SMA and OLS fits always computed
while RMA requires a numeric argument to at least one of range.y
or range.x. The statistics extract estimates for one of the methods
based on the argument for method.
Package 'lmodel2' implements a model fit function and fitted model object that differ from the usual approach of R. Thus, their use was implemented as a separate pair of statistics.
stat_ma_eq() returns data frame with a single row and columns
as described below. stat_ma_line() returns a data frame with
n rows. In cases when the number of observations is less than
n.min or when the model fit method returns NA or
NULL, a data frame with no rows or columns is returned and rendered
as an empty/invisible plot layer.
predicted value
lower pointwise confidence interval around the mean
upper pointwise confidence interval around the mean
standard error
If fm.values = TRUE is passed then columns based on the summary of
the model fit are added, with the same value in each row within a group.
This is wasteful and disabled by default, but provides a simple and robust
approach to achieve effects like colouring or hiding of the model fit line
based on P-values, r-squared or the number of observations.
If output.type is "numeric" the returned tibble contains columns
listed below. If the model fit function used does not return a value,
the variable is set to NA_real_.
x position
y position
list containing the "coefficients" matrix from the summary of the fit object
numeric values, from the model fit object
Set according to mapping in aes.
TRUE is polynomial is forced through the origin
One or two columns with the coefficient estimates
If output.type is different from "numeric" the returned tibble
contains columns listed below. If the fitted model does not contain a given
value, the label is set to character(0L).
x position
y position
equation for the fitted polynomial as a character string to be parsed
of the fitted model as a character string to be parsed
P-value if available, depends on method.
Angle in degrees between the two OLS lines for lines estimated from y ~ x and x ~ y linear model (lm) fits.
Number of observations used in the fit.
Set according to mapping in aes.
Set according method used.
numeric values, from the model fit object
To explore the computed values returned for a given input we suggest the use
of geom_debug() as shown in the last examples below.
The formatting of character strings to be displayed in plots are marked as mathematical equations. Depending on the geom used, the mark-up needs to be encoded differently, or in some cases mark-up not applied.
"expression"The labels are encoded as character strings to be parsed into R's plotmath expressions.
"LaTeX", "TeX", "tikz", "latex"The labels are encoded as 'LaTeX' maths equations, without the "fences" for switching in math mode.
"latex.eqn"Same as "latex" but enclosed in single $, i.e., as in-line maths.
"latex.deqn"Same as "latex" but enclosed in double $$, i.e., as display maths.
"markdown"The labels are encoded as character strings using markdown syntax, with some embedded HTML.
"marquee"The labels are encoded as character strings using markdown syntax, with 'marquee' supported spans.
"text"The labels are plain ASCII character strings.
"numeric"No labels are generated. This value is accepted by the statistics, but not by the label formatting functions.
NULLThe value used depends on the argument passed to geom.
If geom = "latex" (package 'xdvir') the output type used is
"latex.eqn". If geom = "richtext" (package 'ggtext') or
geom = "textbox" (package 'ggtext') the output type used is
"markdown". If geom = "marquee" (package 'marquee') the output
type used is "marquee". For all other values of geom the default
is "expression". Invalid values as argument trigger an error.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
The interpretation of the argument passed to formula is enhanced
compared to stat_smooth(). Formulas with x as explanatory
variable work as in stat_smooth() but formulas with y as
explanatory variable are also accepted. orientation is set
automatically based on which explanatory variable appears in the formula.
Spline-based smoothers are only partially supported.
By default the equation label uses as symbols the names of the aesthetics,
x and y. However, "x" and "y" can be
substituted by providing a replacement character string for the
right-hand-side and left-hand-side through eq.x.rhs and
eq.with.lhs, respectively. For backward compatibility a logical is
also accepted as argument for eq.with.lhs, with FALSE
suppressing the left-hand-side.
If the model formula includes a transformation of the explanatory
variable in its right-hand-side (rhs), a matching argument should be passed
to parameter eq.x.rhs as its default value would result in an
equation label that does not reflect the applied transformation. In most
cases, a transformation should not be applied within the left hand side
(lhs) of the model formula, but instead in the mapping of the response
variable within aes. In this case it may be necessary to also pass a
matching argument to parameter eq.with.lhs.
Parameter orientation is redundant as the orientation can be set
by the formula but is included for consistency with
ggplot2::stat_smooth().
When data are grouped by mapping a factor to an aesthetic, e.g.,
colour, shape and/or linetype the model is fitted
separately to each group, and for each group a whole set of labels is
generated. If the argument passed to label.y is a vector of length
1, this value determines the position of the equation and/or other labels
for the first group, and the positions of the labels for the remaining
groups are generated by adding vspace based on the group number.
If the argument passed to label.y is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring vstep.
If the labels are rotated by 90 degrees then the automatic stepping is
best based on hstep with vstep = 0. Similarly as described
above, if label.x is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring hstep.
When using facets and with a grouping that does not repeat in each panel,
the automatic positioning in most cases will not be the desired one. Manual
positioning using a vector of length > 1 for label.x and/or
label.y is the currently available workaround.
The range of the prediction line is
controlled by parameters fullrange and limit.to.
fullrange is backwards compatible both with earlier versions of
'ggpmisc' and with stat_smooth() from 'ggplot2'; an argument passed
to limit.to overrides fullrange making it possible to
constrain the range to that of x, y, or both simultaneously,
with "x", "y", or "xy", respectively, as argument.
limit.to also accepts a numeric vector of values to be used as
newdata when computing the prediction. Limiting the range based on
both aesthetics is the best approach for major axis regression (MA, SMA,
RMA) but can occasionally be useful also with some other methods when
slopes are very steep and error variance in the explanatory variable is
large. A numeric vector can be used to predict the response at specific
values of the explanatory variable. If a single or very few values are
predicted, it can be necessary to override the default geom =
"smooth" with geom = "pointrange".
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
stat_ma_line() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
stat_ma_eq() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | grp.label |
|
| • | hjust |
→ "inward" |
| • | label |
→ after_stat(rr.label) |
| • | npcx |
→ after_stat(npcx) |
| • | npcy |
→ after_stat(npcy) |
| • | vjust |
→ "inward"
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
The major axis regression model is fitted with function
lmodel2(), please consult its documentation. Statistic
stat_ma_eq() can return different ready formatted labels depending
on the argument passed to output.type.
Other 'ggpmisc' statistics for model fits:
stat_distrmix_eq(),
stat_fit_deviations(),
stat_fit_glance(),
stat_fit_tb(),
stat_fit_tidy(),
stat_poly_eq(),
stat_quant_band()
# generate artificial data set.seed(98723) my.data <- data.frame(x = rnorm(100) + (0:99) / 10 - 5, y = rnorm(100) + (0:99) / 10 - 5, group = c("A", "B")) # using defaults (major axis regression) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line() + stat_ma_eq() ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line() + stat_ma_eq(mapping = use_label("eq")) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line() + stat_ma_eq(mapping = use_label("eq"), decreasing = TRUE) # use_label() can assemble and map a combined label ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(method = "MA") + stat_ma_eq(mapping = use_label("eq", "R2", "P")) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(method = "MA") + stat_ma_eq(mapping = use_label("R2", "P", "theta", "method")) # using ranged major axis regression ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(method = "RMA", range.y = "interval", range.x = "interval") + stat_ma_eq(mapping = use_label("eq", "R2", "P"), method = "RMA", range.y = "interval", range.x = "interval") # No permutation-based test ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(method = "MA") + stat_ma_eq(mapping = use_label("eq", "R2"), method = "MA", nperm = 0) # explicit formula "x explained by y" ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(formula = x ~ y) + stat_ma_eq(formula = x ~ y, mapping = use_label("eq", "R2", "P")) # modifying both variables within aes() ggplot(my.data, aes(log(x + 10), log(y + 10))) + geom_point() + stat_poly_line() + stat_poly_eq(mapping = use_label("eq"), eq.x.rhs = "~~log(x+10)", eq.with.lhs = "log(y+10)~~`=`~~") # grouping ggplot(my.data, aes(x, y, color = group)) + geom_point() + stat_ma_line() + stat_ma_eq() # labelling equations ggplot(my.data, aes(x, y, shape = group, linetype = group, grp.label = group)) + geom_point() + stat_ma_line(color = "black") + stat_ma_eq(mapping = use_label("grp", "eq", "R2")) + theme_classic() # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) # default is output.type = "expression" if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_eq(geom = "debug_group") ## Not run: if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_eq(mapping = aes(label = after_stat(eq.label)), geom = "debug_group", output.type = "markdown") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_eq(geom = "debug_group", output.type = "text") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_eq(geom = "debug_group", output.type = "numeric") ## End(Not run)# generate artificial data set.seed(98723) my.data <- data.frame(x = rnorm(100) + (0:99) / 10 - 5, y = rnorm(100) + (0:99) / 10 - 5, group = c("A", "B")) # using defaults (major axis regression) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line() + stat_ma_eq() ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line() + stat_ma_eq(mapping = use_label("eq")) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line() + stat_ma_eq(mapping = use_label("eq"), decreasing = TRUE) # use_label() can assemble and map a combined label ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(method = "MA") + stat_ma_eq(mapping = use_label("eq", "R2", "P")) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(method = "MA") + stat_ma_eq(mapping = use_label("R2", "P", "theta", "method")) # using ranged major axis regression ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(method = "RMA", range.y = "interval", range.x = "interval") + stat_ma_eq(mapping = use_label("eq", "R2", "P"), method = "RMA", range.y = "interval", range.x = "interval") # No permutation-based test ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(method = "MA") + stat_ma_eq(mapping = use_label("eq", "R2"), method = "MA", nperm = 0) # explicit formula "x explained by y" ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_line(formula = x ~ y) + stat_ma_eq(formula = x ~ y, mapping = use_label("eq", "R2", "P")) # modifying both variables within aes() ggplot(my.data, aes(log(x + 10), log(y + 10))) + geom_point() + stat_poly_line() + stat_poly_eq(mapping = use_label("eq"), eq.x.rhs = "~~log(x+10)", eq.with.lhs = "log(y+10)~~`=`~~") # grouping ggplot(my.data, aes(x, y, color = group)) + geom_point() + stat_ma_line() + stat_ma_eq() # labelling equations ggplot(my.data, aes(x, y, shape = group, linetype = group, grp.label = group)) + geom_point() + stat_ma_line(color = "black") + stat_ma_eq(mapping = use_label("grp", "eq", "R2")) + theme_classic() # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) # default is output.type = "expression" if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_eq(geom = "debug_group") ## Not run: if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_eq(mapping = aes(label = after_stat(eq.label)), geom = "debug_group", output.type = "markdown") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_eq(geom = "debug_group", output.type = "text") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_ma_eq(geom = "debug_group", output.type = "numeric") ## End(Not run)
stat_multcomp fits a linear model by default with stats::lm()
but alternatively using other model fit functions. The model is passed to
function glht() from package 'multcomp' to fit Tukey, Dunnet or other
pairwise contrasts and generates labels based on adjusted
P-values.
stat_multcomp( mapping = NULL, data = NULL, geom = NULL, position = "identity", ..., orientation = "x", formula = y ~ factor(x), method = "lm", method.args = list(), contrasts = "Tukey", p.adjust.method = NULL, fit.seed = NA, fm.cutoff.p.value = 1, mc.cutoff.p.value = 1, mc.critical.p.value = 0.05, small.p = getOption("ggpmisc.small.p", default = FALSE), adj.method.tag = 4, p.digits = 3, label.type = "bars", label.y = NULL, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE )stat_multcomp( mapping = NULL, data = NULL, geom = NULL, position = "identity", ..., orientation = "x", formula = y ~ factor(x), method = "lm", method.args = list(), contrasts = "Tukey", p.adjust.method = NULL, fit.seed = NA, fm.cutoff.p.value = 1, mc.cutoff.p.value = 1, mc.critical.p.value = 0.05, small.p = getOption("ggpmisc.small.p", default = FALSE), adj.method.tag = 4, p.digits = 3, label.type = "bars", label.y = NULL, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use to display the data. |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
orientation |
character Either "x" or "y" controlling the default for
|
formula |
a formula object. Using aesthetic names |
method |
function or character If character, "lm" (or its equivalent
"aov"), "rlm" or the name of a model fit function are accepted, possibly
followed by the fit function's |
method.args |
named list with additional arguments. |
contrasts |
character vector of length one or a numeric matrix. If
character, one of "Tukey" or "Dunnet". If a matrix, one column per level
of the factor mapped to |
p.adjust.method |
character As the argument for parameter |
fit.seed |
RNG seed argument passed to |
fm.cutoff.p.value |
numeric [0..1] The P-value for the main
effect of factor |
mc.cutoff.p.value |
numeric [0..1] The P-value for the individual contrasts above which no labelled bars are generated. Default is 1, labelling all pairwise contrasts tested. |
mc.critical.p.value |
numeric The critical P-value used for tests when encoded as letters. |
small.p |
logical If true, use of lower case p instead of capital P as the symbol for P-value in labels. |
adj.method.tag |
numeric, character or function If |
p.digits |
integer Number of digits after the decimal point to
use for |
label.type |
character One of "bars", "letters" or "LETTERS", selects
how the results of the multiple comparisons are displayed. Only "bars" can
be used together with |
label.y |
numeric vector Values in native data units or if
|
vstep |
numeric in npc units, the vertical displacement step-size
used between labels for different contrasts when |
output.type |
character One of "expression", "LaTeX", "text",
"markdown" or "numeric". The default depends on the |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
parse |
logical Passed to the geom. If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
This statistic can be used to automatically annotate a plot with
P-values for pairwise multiple comparison tests, based on
Tukey contrasts (all pairwise), Dunnet contrasts (other levels against the
first one) or a subset of all possible pairwise contrasts. See Meier (2022,
Chapter 3) for an accessible explanation of multiple comparisons and
contrasts with package 'multcomp', of which stat_multcomp() is
mostly a wrapper.
The explanatory variable mapped to the x aesthetic must be a factor as this creates the required grouping. Currently, contrasts that involve more than two levels of a factor, such as the average of two treatment levels against a control level are not supported, mainly because they require a new geometry that I need to design, implement and add to package 'ggpp'.
Two ways of displaying the outcomes are implemented, and are selected by '"bars"', '"letters"' or '"LETTERS"' as argument to parameter 'label.type'. '"letters"' and '"LETTERS"' can be used only with Tukey contrasts, as otherwise the encoding is ambiguous. As too many bars clutter a plot, the maximum number of factor levels supported for '"bars"' together with Tukey contrasts is five, while together with Dunnet contrasts or contrasts defined by a numeric matrix, no limit is imposed.
stat_multcomp() by default generates character labels ready to be
parsed as R expressions but LaTeX (use TikZ device), markdown (use package
'ggtext') and plain text are also supported, as well as numeric values for
user-generated text labels. The value of parse is set automatically
based on output.type, but if you assemble labels that need parsing
from numeric output, the default needs to be overridden. This
statistic only generates annotation labels and segments connecting the
compared factor levels, or letter labels that discriminate significantly
different groups.
A data frame with one row per comparison for label.type =
"bars", or a data frame with one row per factor x level for
label.type = "letters" and for label.type = "LETTERS".
Variables (= columns) as described under Computed variables.
If output.type = "numeric" and
label.type = "bars" the returned tibble contains
columns listed below. In all cases if the model fit function used does not return a value,
the label is set to character(0L) and the numeric value to NA.
x position, numeric.
y position, numeric.
Delta estimate from pairwise contrasts, numeric.
Contrasts as two levels' ordinal "numbers" separated by a dash, character.
t-statistic estimates for the pairwise contrasts, numeric.
P-value for the pairwise contrasts.
Set according method used.
Most derived class of the fitted model object.
Formula extracted from the fitted model object if available, or the formula argument.
Formula extracted from the fitted model object if available, or the formula argument, formatted as character.
The method used to adjust the P-values.
The type of contrast used for multiple comparisons.
The total number of observations or rows in data.
text label, always included, but possibly NA.
If output.type is not "numeric" the returned data frame includes in
addition the following labels:
P-value for the pairwise contrasts encoded as "starts", character.
P-value for the pairwise contrasts, character.
The coefficient or estimate for the difference between compared pairs of levels.
t-statistic estimates for the pairwise contrasts, character.
If label.type = "letters" or label.type = "LETTERS" the returned tibble contains
columns listed below.
x position, numeric.
y position, numeric.
P-value used in pairwise tests, numeric.
Set according method used.
Most derived class of the fitted model object.
Formula extracted from the fitted model object if available, or the formula argument.
Formula extracted from the fitted model object if available, or the formula argument, formatted as character.
The method used to adjust the P-values.
The type of contrast used for multiple comparisons.
The total number of observations or rows in data.
text label, always included, but possibly NA.
If output.type is not "numeric" the returned data frame includes in
addition the following labels:
Letters that distinguish levels based on significance from multiple comparisons test.
stat_signif() in package 'ggsignif' is
an earlier and independent implementation of pairwise tests.
The formatting of character strings to be displayed in plots are marked as mathematical equations. Depending on the geom used, the mark-up needs to be encoded differently, or in some cases mark-up not applied.
"expression"The labels are encoded as character strings to be parsed into R's plotmath expressions.
"LaTeX", "TeX", "tikz", "latex"The labels are encoded as 'LaTeX' maths equations, without the "fences" for switching in math mode.
"latex.eqn"Same as "latex" but enclosed in single $, i.e., as in-line maths.
"latex.deqn"Same as "latex" but enclosed in double $$, i.e., as display maths.
"markdown"The labels are encoded as character strings using markdown syntax, with some embedded HTML.
"marquee"The labels are encoded as character strings using markdown syntax, with 'marquee' supported spans.
"text"The labels are plain ASCII character strings.
"numeric"No labels are generated. This value is accepted by the statistics, but not by the label formatting functions.
NULLThe value used depends on the argument passed to geom.
If geom = "latex" (package 'xdvir') the output type used is
"latex.eqn". If geom = "richtext" (package 'ggtext') or
geom = "textbox" (package 'ggtext') the output type used is
"markdown". If geom = "marquee" (package 'marquee') the output
type used is "marquee". For all other values of geom the default
is "expression". Invalid values as argument trigger an error.
stat_multcomp() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | hjust |
→ after_stat(just) |
| • | label |
→ after_stat(default.label) |
| • | size |
→ 2.5 |
| • | weight |
→ 1 |
| • | xmax |
→ after_stat(x.right.tip) |
| • | xmin |
→ after_stat(x.left.tip)
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
R option OutDec is obeyed based on its value at the time the plot
is rendered, i.e., displayed or printed. Set options(OutDec = ",")
for languages like Spanish or French.
stat_multcomp() understands x and
y, to be referenced in the formula and weight passed
as argument to parameter weights. A factor must be mapped to
x and numeric variables to y, and, if used, to
weight. In addition, the aesthetics understood by the geom
("label_pairwise" is the default for label.type = "bars",
"text" is the default for label.type = "letters" and for
label.type = "LETTERS") are understood and grouping
respected.
Meier, Lukas (2022) ANOVA and Mixed Models: A Short Introduction Using R. Chapter 3 Contrasts and Multiple Testing. The R Series. Boca Raton: Chapman and Hall/CRC. ISBN: 9780367704209, doi:10.1201/9781003146216.
This statistic uses the implementation of Tests of General Linear
Hypotheses in function glht. See
summary.glht and p.adjust
for the supported tests and the references therein for the theory
behind them.
p1 <- ggplot(mpg, aes(factor(cyl), hwy)) + geom_boxplot(width = 0.33) ## labeleld bars p1 + stat_multcomp() p1 + stat_multcomp(adj.method.tag = 0) # test against a control, with first level being the control # change order of factor levels in data to set the control group p1 + stat_multcomp(contrasts = "Dunnet") # arbitrary pairwise contrasts, in arbitrary order p1 + stat_multcomp(contrasts = rbind(c(0, 0, -1, 1), c(0, -1, 1, 0), c(-1, 1, 0, 0))) # different methods to adjust the contrasts p1 + stat_multcomp(p.adjust.method = "bonferroni") p1 + stat_multcomp(p.adjust.method = "holm") p1 + stat_multcomp(p.adjust.method = "fdr") # no correction, useful only for comparison p1 + stat_multcomp(p.adjust.method = "none") # sometimes we need to expand the plotting area p1 + stat_multcomp(geom = "text_pairwise") + scale_y_continuous(expand = expansion(mult = c(0.05, 0.10))) # position of contrasts' bars (based on scale limits) p1 + stat_multcomp(label.y = "bottom") p1 + stat_multcomp(label.y = 11) # use different labels: difference and P-value from hypothesis tests p1 + stat_multcomp(use_label("Delta", "P"), size = 2.75) # control smallest P-value displayed and number of digits p1 + stat_multcomp(p.digits = 4) # label only significant differences # but test and correct for all pairwise contrasts! p1 + stat_multcomp(mc.cutoff.p.value = 0.01) ## letters as labels for test results p1 + stat_multcomp(label.type = "letters") # use capital letters p1 + stat_multcomp(label.type = "LETTERS") # location p1 + stat_multcomp(label.type = "letters", label.y = "top") p1 + stat_multcomp(label.type = "letters", label.y = 0) # stricter critical p-value than default used for test p1 + stat_multcomp(label.type = "letters", mc.critical.p.value = 0.01) # Inspecting the returned data using geom_debug_panel() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) p1 + stat_multcomp(label.type = "bars", geom = "debug_panel") if (gginnards.installed) p1 + stat_multcomp(label.type = "letters", geom = "debug_panel") if (gginnards.installed) p1 + stat_multcomp(label.type = "bars", output.type = "numeric", geom = "debug_panel")p1 <- ggplot(mpg, aes(factor(cyl), hwy)) + geom_boxplot(width = 0.33) ## labeleld bars p1 + stat_multcomp() p1 + stat_multcomp(adj.method.tag = 0) # test against a control, with first level being the control # change order of factor levels in data to set the control group p1 + stat_multcomp(contrasts = "Dunnet") # arbitrary pairwise contrasts, in arbitrary order p1 + stat_multcomp(contrasts = rbind(c(0, 0, -1, 1), c(0, -1, 1, 0), c(-1, 1, 0, 0))) # different methods to adjust the contrasts p1 + stat_multcomp(p.adjust.method = "bonferroni") p1 + stat_multcomp(p.adjust.method = "holm") p1 + stat_multcomp(p.adjust.method = "fdr") # no correction, useful only for comparison p1 + stat_multcomp(p.adjust.method = "none") # sometimes we need to expand the plotting area p1 + stat_multcomp(geom = "text_pairwise") + scale_y_continuous(expand = expansion(mult = c(0.05, 0.10))) # position of contrasts' bars (based on scale limits) p1 + stat_multcomp(label.y = "bottom") p1 + stat_multcomp(label.y = 11) # use different labels: difference and P-value from hypothesis tests p1 + stat_multcomp(use_label("Delta", "P"), size = 2.75) # control smallest P-value displayed and number of digits p1 + stat_multcomp(p.digits = 4) # label only significant differences # but test and correct for all pairwise contrasts! p1 + stat_multcomp(mc.cutoff.p.value = 0.01) ## letters as labels for test results p1 + stat_multcomp(label.type = "letters") # use capital letters p1 + stat_multcomp(label.type = "LETTERS") # location p1 + stat_multcomp(label.type = "letters", label.y = "top") p1 + stat_multcomp(label.type = "letters", label.y = 0) # stricter critical p-value than default used for test p1 + stat_multcomp(label.type = "letters", mc.critical.p.value = 0.01) # Inspecting the returned data using geom_debug_panel() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) p1 + stat_multcomp(label.type = "bars", geom = "debug_panel") if (gginnards.installed) p1 + stat_multcomp(label.type = "letters", geom = "debug_panel") if (gginnards.installed) p1 + stat_multcomp(label.type = "bars", output.type = "numeric", geom = "debug_panel")
stat_peaks() tags or extracts rows in data containing local
or global maxima of y.
stat_valleys tags or extracts rows in data containing local
or global minima of y. They make it
easy to highlight and label peaks and valleys based on their x and/or y
coordinates. Orientations flipping as well as dates and times are
supported.
stat_peaks( mapping = NULL, data = NULL, geom = "point", position = "identity", ..., orientation = "x", span = 5, global.threshold = 0, local.threshold = 0, local.reference = "median", strict = FALSE, label.fmt = NULL, x.label.fmt = label.fmt, y.label.fmt = NULL, extract.peaks = NULL, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) stat_valleys( mapping = NULL, data = NULL, geom = "point", position = "identity", ..., orientation = "x", span = 5, global.threshold = 0.01, local.threshold = NULL, local.reference = "median", strict = FALSE, label.fmt = NULL, x.label.fmt = label.fmt, y.label.fmt = NULL, extract.valleys = NULL, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )stat_peaks( mapping = NULL, data = NULL, geom = "point", position = "identity", ..., orientation = "x", span = 5, global.threshold = 0, local.threshold = 0, local.reference = "median", strict = FALSE, label.fmt = NULL, x.label.fmt = label.fmt, y.label.fmt = NULL, extract.peaks = NULL, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE ) stat_valleys( mapping = NULL, data = NULL, geom = "point", position = "identity", ..., orientation = "x", span = 5, global.threshold = 0.01, local.threshold = NULL, local.reference = "median", strict = FALSE, label.fmt = NULL, x.label.fmt = label.fmt, y.label.fmt = NULL, extract.valleys = NULL, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset - only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer |
... |
other arguments passed on to |
orientation |
character The orientation of the layer can be set to
either |
span |
odd positive integer A peak is defined as an element in a
sequence which is greater than all other elements within a moving window of
width |
global.threshold |
numeric A value belonging to class |
local.threshold |
numeric A value belonging to class |
local.reference |
character One of |
strict |
logical flag: if |
label.fmt, x.label.fmt, y.label.fmt
|
character strings giving a format
definition for construction of character strings labels with function
|
extract.peaks, extract.valleys
|
If |
na.rm |
a logical value indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
As find_valleys, stat_peaks and stat_valleys
call find_peaks to search for peaks or valleys, this description
applies to all four functions.
Function find_peaks is a wrapper built onto function
peaks from splus2R, adds support for peak
height thresholds and handles span = NULL and non-finite (including
NA) values differently than splus2R::peaks. Instead of giving an
error when na.rm = FALSE and x contains NA values,
NA values are replaced with the smallest finite value in x.
span = NULL is treated as a special case and selects max(x).
Passing 'strict = TRUE' ensures that multiple global and within window
maxima are ignored, and can result in no peaks being returned.#'
Two tests make it possible to ignore irrelevant peaks. One test
(global.threshold) is based on the absolute height of the peaks and
can be used in all cases to ignore globally low peaks. A second test
(local.threshold) is available when the window defined by 'span'
does not include all observations and can be used to ignore peaks that are
not locally prominent. In this second approach the height of each peak is
compared to a summary computed from other values within the window of width
equal to span where it was found. In this second case, the reference
value used within each window containing a peak is given by
local.reference. Parameter threshold.range determines how the
bare numeric values passed as argument to global.threshold
and local.threshold are scaled. The default, NULL uses the
range of x. Thresholds for ignoring too small peaks are applied
after peaks are searched for, and threshold values can in some cases result
in no peaks being found. If either threshold is not available (NA)
the returned value is a NA vector of the same length as x.
The local.threshold argument is used as is when
local.reference is "median" or "farthest", i.e., the
same distance between peak and reference is used as cut-off irrespective of
the value of the reference. In cases when the prominence of peaks is
positively correlated with the baseline, a local.threshold that
increases together with increasing computed within window median or
farthest value applies apply a less stringent height requirement in regions
with overall low height. In this case, natural logarithm or square root
weighting can be requested with 'local.reference' arguments '"median.log"',
'"farthest.log"', '"median.sqrt"', and '"farthest.sqrt"' as arguments for
local.reference.
A data frame with one row for each peak (or valley) found in the data
extracted from the input data or all rows in data. Added columns
contain the labels.
x-value at the peak (or valley) as numeric.
y-value at the peak (or valley) as numeric.
x-value at the peak (or valley) formatted as character.
y-value at the peak (or valley) formatted as character.
logical vector, TRUE at peaks or valleys.
stat_peaks(),
stat_valleys() and stat_spikes() work nicely together with
geoms geom_text_repel(), geom_label_repel(), and
geom_marquee_repel() from package ggrepel to
solve the problem of overlapping labels by displacing them. If using
geom_text(), discard overlapping labels using
check_overlap = TRUE.
By default the labels are character values ready to be ploted as plain
text, but with a suitable label.fmt argument, labels formatted as
plotmath expressions, markdown or LaTeX can be
created (e.g., containing Greek letters or super or subscripts, maths or
colour) can be generated for use with geoms from packages 'marquee',
'ggtext' and 'xdvir'.
The default is geom = "point" it is likely to work well in almost
any situation. The default aesthetics mappings set by these stats allow
their direct use with geom_text(), geom_label(),
geom_line(), geom_rug(), geom_hline() and
geom_vline() by just passing an argument to geom.
stat_peaks() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | label |
→ after_stat(x.label) |
| • | xintercept |
→ after_stat(x) |
| • | yintercept |
→ after_stat(y)
|
stat_valleys() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | label |
→ after_stat(x.label) |
| • | xintercept |
→ after_stat(x) |
| • | yintercept |
→ after_stat(y)
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
find_peaks, for the functions used to located the
peaks and valleys.
# lynx and Nile are time.series objects recognized by # ggpp::ggplot.ts() and converted on-the-fly with a default mapping # numeric, date times and dates are supported with data frames # using defaults ggplot(Nile) + geom_line() + stat_peaks(colour = "red") + stat_valleys(colour = "blue") # using wider window ggplot(Nile) + geom_line() + stat_peaks(colour = "red", span = 11) + stat_valleys(colour = "blue", span = 11) # global threshold for peak height ggplot(Nile) + geom_line() + stat_peaks(colour = "red", global.threshold = 0.5) # half of data range ggplot(Nile) + geom_line() + stat_peaks(colour = "red", global.threshold = I(1100)) + # data unit expand_limits(y = c(0, 1500)) # local (within window) threshold for peak height # narrow peaks at the tip and locally tall ggplot(Nile) + geom_line() + stat_peaks(colour = "red", span = 9, local.threshold = 0.3, local.reference = "farthest") # with narrower window ggplot(Nile) + geom_line() + stat_peaks(colour = "red", span = 5, local.threshold = 0.25, local.reference = "farthest") ggplot(lynx) + geom_line() + stat_peaks(colour = "red", local.threshold = 1/5, local.reference = "median") ggplot(Nile) + geom_line() + stat_valleys(colour = "blue", global.threshold = I(700)) # orientation is supported ggplot(lynx, aes(lynx, time)) + geom_line(orientation = "y") + stat_peaks(colour = "red", orientation = "y") + stat_valleys(colour = "blue", orientation = "y") # default aesthetic mapping supports additional geoms ggplot(lynx) + geom_line() + stat_peaks(colour = "red") + stat_peaks(colour = "red", geom = "rug") ggplot(lynx) + geom_line() + stat_peaks(colour = "red") + stat_peaks(colour = "red", geom = "text", hjust = -0.1, angle = 33) ggplot(lynx, aes(lynx, time)) + geom_line(orientation = "y") + stat_peaks(colour = "red", orientation = "y") + stat_peaks(colour = "red", orientation = "y", geom = "text", hjust = -0.1) # Force conversion of time series time into POSIXct date time ggplot(lynx, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red") + stat_peaks(colour = "red", geom = "text", hjust = -0.1, x.label.fmt = "%Y", angle = 33) ggplot(Nile, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red") + stat_peaks(colour = "red", geom = "text_s", position = position_nudge_keep(x = 0, y = 60), hjust = -0.1, x.label.fmt = "%Y", angle = 90) + expand_limits(y = 2000) ggplot(lynx, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red", geom = "text_s", position = position_nudge_to(y = 7600), arrow = arrow(length = grid::unit(1.5, "mm")), point.padding = 0.7, x.label.fmt = "%Y", angle = 90) + expand_limits(y = 9000)# lynx and Nile are time.series objects recognized by # ggpp::ggplot.ts() and converted on-the-fly with a default mapping # numeric, date times and dates are supported with data frames # using defaults ggplot(Nile) + geom_line() + stat_peaks(colour = "red") + stat_valleys(colour = "blue") # using wider window ggplot(Nile) + geom_line() + stat_peaks(colour = "red", span = 11) + stat_valleys(colour = "blue", span = 11) # global threshold for peak height ggplot(Nile) + geom_line() + stat_peaks(colour = "red", global.threshold = 0.5) # half of data range ggplot(Nile) + geom_line() + stat_peaks(colour = "red", global.threshold = I(1100)) + # data unit expand_limits(y = c(0, 1500)) # local (within window) threshold for peak height # narrow peaks at the tip and locally tall ggplot(Nile) + geom_line() + stat_peaks(colour = "red", span = 9, local.threshold = 0.3, local.reference = "farthest") # with narrower window ggplot(Nile) + geom_line() + stat_peaks(colour = "red", span = 5, local.threshold = 0.25, local.reference = "farthest") ggplot(lynx) + geom_line() + stat_peaks(colour = "red", local.threshold = 1/5, local.reference = "median") ggplot(Nile) + geom_line() + stat_valleys(colour = "blue", global.threshold = I(700)) # orientation is supported ggplot(lynx, aes(lynx, time)) + geom_line(orientation = "y") + stat_peaks(colour = "red", orientation = "y") + stat_valleys(colour = "blue", orientation = "y") # default aesthetic mapping supports additional geoms ggplot(lynx) + geom_line() + stat_peaks(colour = "red") + stat_peaks(colour = "red", geom = "rug") ggplot(lynx) + geom_line() + stat_peaks(colour = "red") + stat_peaks(colour = "red", geom = "text", hjust = -0.1, angle = 33) ggplot(lynx, aes(lynx, time)) + geom_line(orientation = "y") + stat_peaks(colour = "red", orientation = "y") + stat_peaks(colour = "red", orientation = "y", geom = "text", hjust = -0.1) # Force conversion of time series time into POSIXct date time ggplot(lynx, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red") + stat_peaks(colour = "red", geom = "text", hjust = -0.1, x.label.fmt = "%Y", angle = 33) ggplot(Nile, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red") + stat_peaks(colour = "red", geom = "text_s", position = position_nudge_keep(x = 0, y = 60), hjust = -0.1, x.label.fmt = "%Y", angle = 90) + expand_limits(y = 2000) ggplot(lynx, as.numeric = FALSE) + geom_line() + stat_peaks(colour = "red", geom = "text_s", position = position_nudge_to(y = 7600), arrow = arrow(length = grid::unit(1.5, "mm")), point.padding = 0.7, x.label.fmt = "%Y", angle = 90) + expand_limits(y = 9000)
Statistics stat_poly_line and stat_poly_eq() fit a
model, by default with stats::lm(), but alternatively using other
model fit functions. While stat_poly_line adds a prediction line and
band, stat_poly_eq() adds textual labels to a plot.
stat_poly_eq( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., orientation = NA, formula = NULL, method = "lm", method.args = list(), n.min = 2L, fit.seed = NA, eq.with.lhs = TRUE, eq.x.rhs = NULL, small.r = getOption("ggpmisc.small.r", default = FALSE), small.p = getOption("ggpmisc.small.p", default = FALSE), CI.brackets = c("[", "]"), rsquared.conf.level = 0.95, coef.digits = 3, coef.keep.zeros = TRUE, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), rr.digits = 2, f.digits = 3, p.digits = 3, label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE ) stat_poly_line( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., orientation = NA, method = "lm", formula = NULL, se = NULL, fit.seed = NA, fm.values = FALSE, n = 80, fullrange = FALSE, limit.to = NULL, level = 0.95, method.args = list(), n.min = 2L, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )stat_poly_eq( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., orientation = NA, formula = NULL, method = "lm", method.args = list(), n.min = 2L, fit.seed = NA, eq.with.lhs = TRUE, eq.x.rhs = NULL, small.r = getOption("ggpmisc.small.r", default = FALSE), small.p = getOption("ggpmisc.small.p", default = FALSE), CI.brackets = c("[", "]"), rsquared.conf.level = 0.95, coef.digits = 3, coef.keep.zeros = TRUE, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), rr.digits = 2, f.digits = 3, p.digits = 3, label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE ) stat_poly_line( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., orientation = NA, method = "lm", formula = NULL, se = NULL, fit.seed = NA, fm.values = FALSE, n = 80, fullrange = FALSE, limit.to = NULL, level = 0.95, method.args = list(), n.min = 2L, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
orientation |
character Either "x" or "y" controlling the default for
|
formula |
a formula object. Using aesthetic names |
method |
function or character If character, "lm", "rlm", "lmrob",
"lts", "gls", "ma", "sma", "segreg", "rq" or the name of a model fit
function are accepted, possibly followed by the fit function's
|
method.args |
named list with additional arguments. Not |
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
fit.seed |
RNG seed argument passed to
|
eq.with.lhs |
If |
eq.x.rhs |
|
small.r, small.p
|
logical Flags to switch use of lower case r and p for coefficient of determination and p-value. |
CI.brackets |
character vector of length 2. The opening and closing brackets used for the CI label. |
rsquared.conf.level |
numeric Confidence level for the returned confidence interval. Set to NA to skip CI computation. |
coef.digits, f.digits
|
integer Number of significant digits to use for the fitted coefficients and F-value. |
coef.keep.zeros |
logical Keep or drop trailing zeros when formatting the fitted coefficients and F-value. |
decreasing |
logical It specifies the order of the terms in the returned character string; in increasing (default) or decreasing powers. |
rr.digits, p.digits
|
integer Number of digits after the decimal point to
use for |
label.x, label.y
|
|
hstep, vstep
|
numeric in npc units, the horizontal and vertical step used between labels for different groups. |
output.type |
character One of "expression", "text", "markdown", "marquee", "latex", "latex.eqn", "latex.deqn" or "numeric". |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
parse |
logical Passed to the geom. If |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
se |
Display confidence interval around smooth? ('TRUE' by default only
for fits with |
fm.values |
logical Add metadata and parameter estimates extracted from
the fitted model object; |
n |
Number of points at which to predict with the fitted model. |
fullrange |
logical Should the fit prediction span the full range of the plot, or just the range of the explanatory variable? |
limit.to |
character or numeric If character one of |
level |
Level of confidence interval to use (0.95 by default). |
Statistics stat_poly_line() and stat_poly_eq()
fit a model consistently, but return different values.
stat_poly_line() plots a prediction line and band, similarly to
stat_smooth()
but has different defaults and supports a different set of model fit
functions.
stat_poly_eq() adds textual labels for
, adjusted , the fitted model equation, , and
other parameters from a fitted model to a plot.
Lack of methods or explicit support for extraction of individual parameters
results in the affected estimates and corresponding labels being set to
NA. Similarly, confidence bands for the prediction line are not
plotted in some cases, while in the case of MA and SMA models, the band
only displays the uncertainty of the slope rather than for both slope plus
intercept. While strings for , adjusted , , and
annotations are returned for all valid linear models and many other
types of fitted models, an automatically constructed character string for
the fitted model equation is returned only for polynomials (see below).
However, when not generated automatically, the equation can still be
assembled by the user within the call to aes(). A
label for the confidence interval of , based on values computed
with function ci_rsquared() from package 'confintr'
is returned when possible.
When possible, i.e., nearly always, the formula used to build the
equation label is extracted from the returned fitted model object. Most
fitted model objects follow the example of lm() and include the
formula for the model that has been fitted. Thus, this model formula can
safely differ from the argument passed to parameter formula in the
call to stat_poly_eq().
The stats are designed to support user-defined methods that
implement any or all of method selection, model formula
selection, dynamically adjusted method.args and conditional skipping
of labelling on a by group basis.
The minimum number of observations with distinct values in the explanatory
variable can be set through parameter n.min. The default n.min
= 2L is the smallest suitable for method "lm" but too small for
method "rlm" for which n.min = 3L is needed. Anyway, model
fits with very few observations are of little interest and using larger
values of n.min than the default is wise.
As some model fitting approaches depend on the RNG (pseudo-Random Number
Generator), when fit.seed is not NA it is used as argument in
a call to set.seed() immediately ahead of model
fitting, i.e., once for each group of observations.
Singularity, convergence, etc., are handled by the model fit functions.
With method "lm", singularity results in terms being dropped with a
message if more numerous than can be fitted with a singular (exact) fit. In
this case and if the model results in a perfect fit due to low number of
observation, estimates for various parameters are NaN or NA.
With methods other than "lm", the model fit functions simply fail in
case of singularity, e.g., singular fits are not implemented in
"rlm".
stat_poly_eq() returns a data frame, with a single row per
group and columns as described below. stat_poly_line() returns a
data frame, with n rows per group and columns as described below. In
cases when the number of observations is less than n.min or when the
model fit function returns a single NA or NULL, a data frame
with no rows or columns (built by data.frame()) is returned, and
silently rendered as an empty/invisible plot layer.
When a predict() method is not available for the fitted model class,
the value returned by calling fitted(), if available, is replaces it
and the returned data frame with as many rows as observations, instead of
n rows, is returned with a message.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
The interpretation of the argument passed to formula is enhanced
compared to stat_smooth(). Formulas with x as explanatory
variable work as in stat_smooth() but formulas with y as
explanatory variable are also accepted. orientation is set
automatically based on which explanatory variable appears in the formula.
Spline-based smoothers are only partially supported.
By default the equation label uses as symbols the names of the aesthetics,
x and y. However, "x" and "y" can be
substituted by providing a replacement character string for the
right-hand-side and left-hand-side through eq.x.rhs and
eq.with.lhs, respectively. For backward compatibility a logical is
also accepted as argument for eq.with.lhs, with FALSE
suppressing the left-hand-side.
If the model formula includes a transformation of the explanatory
variable in its right-hand-side (rhs), a matching argument should be passed
to parameter eq.x.rhs as its default value would result in an
equation label that does not reflect the applied transformation. In most
cases, a transformation should not be applied within the left hand side
(lhs) of the model formula, but instead in the mapping of the response
variable within aes. In this case it may be necessary to also pass a
matching argument to parameter eq.with.lhs.
Parameter orientation is redundant as the orientation can be set
by the formula but is included for consistency with
ggplot2::stat_smooth().
When data are grouped by mapping a factor to an aesthetic, e.g.,
colour, shape and/or linetype the model is fitted
separately to each group, and for each group a whole set of labels is
generated. If the argument passed to label.y is a vector of length
1, this value determines the position of the equation and/or other labels
for the first group, and the positions of the labels for the remaining
groups are generated by adding vspace based on the group number.
If the argument passed to label.y is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring vstep.
If the labels are rotated by 90 degrees then the automatic stepping is
best based on hstep with vstep = 0. Similarly as described
above, if label.x is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring hstep.
When using facets and with a grouping that does not repeat in each panel,
the automatic positioning in most cases will not be the desired one. Manual
positioning using a vector of length > 1 for label.x and/or
label.y is the currently available workaround.
The range of the prediction line is
controlled by parameters fullrange and limit.to.
fullrange is backwards compatible both with earlier versions of
'ggpmisc' and with stat_smooth() from 'ggplot2'; an argument passed
to limit.to overrides fullrange making it possible to
constrain the range to that of x, y, or both simultaneously,
with "x", "y", or "xy", respectively, as argument.
limit.to also accepts a numeric vector of values to be used as
newdata when computing the prediction. Limiting the range based on
both aesthetics is the best approach for major axis regression (MA, SMA,
RMA) but can occasionally be useful also with some other methods when
slopes are very steep and error variance in the explanatory variable is
large. A numeric vector can be used to predict the response at specific
values of the explanatory variable. If a single or very few values are
predicted, it can be necessary to override the default geom =
"smooth" with geom = "pointrange".
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
Some of the variables depend on the orientation.
predicted value
lower confidence limit around the fitted line
upper confidence limit around the fitted line
standard error
If fm.values = TRUE is passed then columns based on the summary of
the model fit are added, with the same value in each row within a group.
This is wasteful and disabled by default, but provides a simple and robust
approach to achieve effects like colouring or hiding of the model fit line
based on , , or the number of
observations in a fit.
stat_poly_eq()
For all output.type arguments the following values are returned.
x position
y position
fitted coefficients, named numeric vector as a list member
numeric values, from the model fit object
Set according to mapping in aes.
list containing a numeric vector of knot or "psi" x-value for linear splines
name of method used, character
most derived class or the fitted model object, character
formatted model formula, character
If output.type is not "numeric" the returned tibble contains in
addition to those above the columns listed below, each containing a single
character string. The markup used depends on the value of output.type.
equation for the fitted polynomial as a character string to be parsed or NA
of the fitted model as a character string to be parsed
Adjusted of the fitted model as a character string to be parsed
Confidence interval for of the fitted model as a character string to be parsed
F value and degrees of freedom for the fitted model as a whole.
P-value for the F-value above.
AIC for the fitted model.
BIC for the fitted model.
Number of observations used in the fit.
The knots or change points in segmented regression.
Set according to mapping in aes.
Set according method used.
If output.type is "numeric" the returned tibble contains columns
listed below in addition to the base ones. If the model fit function used
does not return a value, the variable is set to NA_real_.
list containing the "coefficients" matrix from the summary of the fit object
TRUE is polynomial is forced through the origin
One or more columns with the coefficient estimates
To explore the computed values returned for a given input we suggest the use
of geom_debug() as shown in the last examples below.
The formatting of character strings to be displayed in plots are marked as mathematical equations. Depending on the geom used, the mark-up needs to be encoded differently, or in some cases mark-up not applied.
"expression"The labels are encoded as character strings to be parsed into R's plotmath expressions.
"LaTeX", "TeX", "tikz", "latex"The labels are encoded as 'LaTeX' maths equations, without the "fences" for switching in math mode.
"latex.eqn"Same as "latex" but enclosed in single $, i.e., as in-line maths.
"latex.deqn"Same as "latex" but enclosed in double $$, i.e., as display maths.
"markdown"The labels are encoded as character strings using markdown syntax, with some embedded HTML.
"marquee"The labels are encoded as character strings using markdown syntax, with 'marquee' supported spans.
"text"The labels are plain ASCII character strings.
"numeric"No labels are generated. This value is accepted by the statistics, but not by the label formatting functions.
NULLThe value used depends on the argument passed to geom.
If geom = "latex" (package 'xdvir') the output type used is
"latex.eqn". If geom = "richtext" (package 'ggtext') or
geom = "textbox" (package 'ggtext') the output type used is
"markdown". If geom = "marquee" (package 'marquee') the output
type used is "marquee". For all other values of geom the default
is "expression". Invalid values as argument trigger an error.
stat_poly_eq() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | grp.label |
|
| • | hjust |
→ "inward" |
| • | label |
→ after_stat(rr.label) |
| • | npcx |
→ after_stat(npcx) |
| • | npcy |
→ after_stat(npcy) |
| • | vjust |
→ "inward" |
| • | weight |
→ 1
|
stat_poly_line() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Originally written as an answer to question 7549694 at Stackoverflow but enhanced based on suggestions from several users and my own needs.
Consult the documentation of the model fit functions used
for the details and additional arguments that can be passed to
them by name through parameter method.args.
Please, see the articles in online-only documentation for additional use examples and guidance.
Other 'ggpmisc' statistics for model fits:
stat_distrmix_eq(),
stat_fit_deviations(),
stat_fit_glance(),
stat_fit_tb(),
stat_fit_tidy(),
stat_ma_eq(),
stat_quant_band()
# generate artificial data set.seed(4321) x <- 1:100 y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4) y <- y / max(y) my.data <- data.frame(x = x, y = y, group = c("A", "B"), y2 = y * c(1, 2) + c(0, 0.1), w = sqrt(x)) # give a name to a formula formula <- y ~ poly(x, 3, raw = TRUE) # using defaults ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line() + stat_poly_eq() # no weights ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula) # other labels ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("eq"), formula = formula) # other labels ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("eq"), formula = formula, decreasing = TRUE) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("eq", "R2"), formula = formula) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("R2", "R2.CI", "P", "method"), formula = formula) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("R2", "F", "P", "n", sep = "*\"; \"*"), formula = formula) # grouping ggplot(my.data, aes(x, y2, color = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula) # rotation ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, angle = 90) # label location ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, label.y = "bottom", label.x = "right") ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, label.y = 0.1, label.x = 0.9) # modifying the explanatory variable within the model formula # modifying the response variable within aes() # eq.x.rhs and eq.with.lhs defaults must be overridden!! formula.trans <- y ~ I(x^2) ggplot(my.data, aes(x, y + 1)) + geom_point() + stat_poly_line(formula = formula.trans) + stat_poly_eq(use_label("eq"), formula = formula.trans, eq.x.rhs = "~x^2", eq.with.lhs = "y + 1~~`=`~~") # using weights ggplot(my.data, aes(x, y, weight = w)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula) # no weights, 4 digits for R square ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, rr.digits = 4) # manually assemble and map a specific label using paste() and aes() ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(aes(label = paste(after_stat(rr.label), after_stat(n.label), sep = "*\", \"*")), formula = formula) # manually assemble and map a specific label using sprintf() and aes() ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(aes(label = sprintf("%s*\" with \"*%s*\" and \"*%s", after_stat(rr.label), after_stat(f.value.label), after_stat(p.value.label))), formula = formula) # x on y regression ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula, orientation = "y") + stat_poly_eq(use_label("eq", "adj.R2"), formula = x ~ poly(y, 3, raw = TRUE)) # conditional user specified label ggplot(my.data, aes(x, y2, color = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(aes(label = ifelse(after_stat(adj.r.squared) > 0.96, paste(after_stat(adj.rr.label), after_stat(eq.label), sep = "*\", \"*"), after_stat(adj.rr.label))), rr.digits = 3, formula = formula) # geom = "text" ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(geom = "text", label.x = 100, label.y = 0, hjust = 1, formula = formula) # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group", output.type = "numeric") # names of the variables if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group", dbgfun.data = colnames) # only data$eq.label if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group", output.type = "expression", dbgfun.data = function(x) {x[["eq.label"]]}) # only data$eq.label if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group", output.type = "text", dbgfun.data = function(x) {x[["eq.label"]]})# generate artificial data set.seed(4321) x <- 1:100 y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4) y <- y / max(y) my.data <- data.frame(x = x, y = y, group = c("A", "B"), y2 = y * c(1, 2) + c(0, 0.1), w = sqrt(x)) # give a name to a formula formula <- y ~ poly(x, 3, raw = TRUE) # using defaults ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line() + stat_poly_eq() # no weights ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula) # other labels ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("eq"), formula = formula) # other labels ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("eq"), formula = formula, decreasing = TRUE) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("eq", "R2"), formula = formula) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("R2", "R2.CI", "P", "method"), formula = formula) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(use_label("R2", "F", "P", "n", sep = "*\"; \"*"), formula = formula) # grouping ggplot(my.data, aes(x, y2, color = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula) # rotation ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, angle = 90) # label location ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, label.y = "bottom", label.x = "right") ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, label.y = 0.1, label.x = 0.9) # modifying the explanatory variable within the model formula # modifying the response variable within aes() # eq.x.rhs and eq.with.lhs defaults must be overridden!! formula.trans <- y ~ I(x^2) ggplot(my.data, aes(x, y + 1)) + geom_point() + stat_poly_line(formula = formula.trans) + stat_poly_eq(use_label("eq"), formula = formula.trans, eq.x.rhs = "~x^2", eq.with.lhs = "y + 1~~`=`~~") # using weights ggplot(my.data, aes(x, y, weight = w)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula) # no weights, 4 digits for R square ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, rr.digits = 4) # manually assemble and map a specific label using paste() and aes() ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(aes(label = paste(after_stat(rr.label), after_stat(n.label), sep = "*\", \"*")), formula = formula) # manually assemble and map a specific label using sprintf() and aes() ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(aes(label = sprintf("%s*\" with \"*%s*\" and \"*%s", after_stat(rr.label), after_stat(f.value.label), after_stat(p.value.label))), formula = formula) # x on y regression ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula, orientation = "y") + stat_poly_eq(use_label("eq", "adj.R2"), formula = x ~ poly(y, 3, raw = TRUE)) # conditional user specified label ggplot(my.data, aes(x, y2, color = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(aes(label = ifelse(after_stat(adj.r.squared) > 0.96, paste(after_stat(adj.rr.label), after_stat(eq.label), sep = "*\", \"*"), after_stat(adj.rr.label))), rr.digits = 3, formula = formula) # geom = "text" ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(geom = "text", label.x = 100, label.y = 0, hjust = 1, formula = formula) # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group", output.type = "numeric") # names of the variables if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group", dbgfun.data = colnames) # only data$eq.label if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group", output.type = "expression", dbgfun.data = function(x) {x[["eq.label"]]}) # only data$eq.label if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(formula = formula, geom = "debug_group", output.type = "text", dbgfun.data = function(x) {x[["eq.label"]]})
Statistics stat_quant_line(), stat_quant_band() and
stat_quant_eq() fit models by quantile regression. While
stat_quant_line() and stat_quant_band() add prediction lines and
bands, stat_quant_eq() adds textual labels to a plot.
stat_quant_band( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., orientation = NA, quantiles = c(0.25, 0.5, 0.75), formula = NULL, fit.seed = NA, fm.values = FALSE, n = 80, fullrange = FALSE, limit.to = NULL, method = "rq", method.args = list(), n.min = 3L, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_quant_eq( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., orientation = NA, formula = NULL, quantiles = c(0.25, 0.5, 0.75), method = "rq:br", method.args = list(), n.min = 10L, fit.seed = NA, eq.with.lhs = TRUE, eq.x.rhs = NULL, coef.digits = 3, coef.keep.zeros = TRUE, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), rho.digits = 4, label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE ) stat_quant_line( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., orientation = NA, quantiles = c(0.25, 0.5, 0.75), formula = NULL, se = length(quantiles) == 1L, fit.seed = NA, fm.values = FALSE, n = 80, fullrange = FALSE, limit.to = NULL, method = "rq", method.args = list(), n.min = 3L, level = 0.95, type = "direct", interval = "confidence", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )stat_quant_band( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., orientation = NA, quantiles = c(0.25, 0.5, 0.75), formula = NULL, fit.seed = NA, fm.values = FALSE, n = 80, fullrange = FALSE, limit.to = NULL, method = "rq", method.args = list(), n.min = 3L, na.rm = FALSE, show.legend = NA, inherit.aes = TRUE ) stat_quant_eq( mapping = NULL, data = NULL, geom = "text_npc", position = "identity", ..., orientation = NA, formula = NULL, quantiles = c(0.25, 0.5, 0.75), method = "rq:br", method.args = list(), n.min = 10L, fit.seed = NA, eq.with.lhs = TRUE, eq.x.rhs = NULL, coef.digits = 3, coef.keep.zeros = TRUE, decreasing = getOption("ggpmisc.decreasing.poly.eq", FALSE), rho.digits = 4, label.x = "left", label.y = "top", hstep = 0, vstep = NULL, output.type = NULL, na.rm = FALSE, parse = NULL, show.legend = FALSE, inherit.aes = TRUE ) stat_quant_line( mapping = NULL, data = NULL, geom = "smooth", position = "identity", ..., orientation = NA, quantiles = c(0.25, 0.5, 0.75), formula = NULL, se = length(quantiles) == 1L, fit.seed = NA, fm.values = FALSE, n = 80, fullrange = FALSE, limit.to = NULL, method = "rq", method.args = list(), n.min = 3L, level = 0.95, type = "direct", interval = "confidence", na.rm = FALSE, show.legend = NA, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset, only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer. |
... |
other arguments passed on to |
orientation |
character Either "x" or "y" controlling the default for
|
quantiles |
numeric vector Values in 0..1 indicating the quantiles. |
formula |
a formula object. Using aesthetic names |
fit.seed |
RNG seed argument passed to
|
fm.values |
logical Add metadata and parameter estimates extracted from
the fitted model object; |
n |
Number of points at which to predict with the fitted model. |
fullrange |
logical Should the fit prediction span the full range of the plot, or just the range of the explanatory variable? |
limit.to |
character or numeric If character one of |
method |
function or character If character, "rq", "rqss" or the name of
a model fit function are accepted, possibly followed by the fit function's
|
method.args |
named list with additional arguments passed to
|
n.min |
integer Minimum number of distinct values in the explanatory variable (on the rhs of formula) for fitting to the attempted. |
na.rm |
a logical indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
eq.with.lhs |
If |
eq.x.rhs |
|
coef.digits, rho.digits
|
integer Number of significant digits to use for the fitted coefficients and rho in labels. |
coef.keep.zeros |
logical Keep or drop trailing zeros when formatting the fitted coefficients and F-value. |
decreasing |
logical It specifies the order of the terms in the returned character string; in increasing (default) or decreasing powers. |
label.x, label.y
|
|
hstep, vstep
|
numeric in npc units, the horizontal and vertical step used between labels for different groups. |
output.type |
character One of "expression", "text", "markdown", "marquee", "latex", "latex.eqn", "latex.deqn" or "numeric". |
parse |
logical Passed to the geom. If |
se |
logical Passed to |
level |
numeric in range [0..1] Passed to |
type |
character Passed to |
interval |
character Passed to |
While stat_poly_line() and stat_poly_eq() fit
a single model per plot layer, stat_quant_line(), stat_quant_band()
and stat_quant_eq() can fit multiple models sharing the same
method and formula but differing in their
probability. These probabilities are passed a vector argument to parameter
quantiles.
stat_quant_line fits one or more quantile regressions and obtains
predictions similarly to stat_quantile() from
'ggplot2', but in addition it computes confidence regions for the
prediction lines. By default each quantile is plotted as a line, with a
confidence band when se = TRUE.
stat_quant_band() fits quantile regressions and obtains predictions
identically to stat_quant_line(). stat_quant_band() fits 2 or
3 quantiles in the same plot layer and displays the area between the
predicted regression lines for the extreme quantiles as a band.
stat_quant_eq() fits quantile regressions and generates a set of
labels for each regression line fitted. By default the labels are formatted
as R's plotmath expressions, LaTeX and
markdown are also supported.
stat_quant_eq(), stat_quant_line() and
stat_quant_band() support both "rq" and "rqss" as
method. In the case of "rqss" the model formula makes
normally use of qss() to formulate the spline and its constraints.
User defined functions are supported as method as long as they
accept arguments named formula, data, weights,
tau and method and return a model fit object of class
rq, rqs or rqss. Such user-defined functions can
implement model selection and/or method selection, or conditionally skip
model fitting on a per data group basis.
The minimum number of observations with distinct values in the explanatory
variable can be set through parameter n.min. The default n.min
= 10L is a bare minimum for quantile regression. Model fits with such a
small number of observations are of little interest and using larger values
of n.min than the default is wise.
There are interesting uses for double quantile regression, i.e., a
pair of quantile regressions on x and y on the same data. For
example, when two variables are subject to mutual constrains, it is useful
to consider both of them as explanatory and interpret the relationship
based on them considered as limiting. 'ggpmisc' (>= 0.4.1) supports
orientation making it easy implement the approach described by
Cardoso (2019) under the name of "Double quantile regression".
stat_quant_eq() returns a data frame, with one row per
quantile and columns as described below, while stat_quant_line()
and stat_quant_band() return a data frame, with n rows per
quantile and columns as described below. If the number of observations
is less than n.min or if the model fit method returns NA or
NULL, a data frame with no rows or columns is returned, resulting
in an empty/invisible plot layer.
stat_quant_eq()
If output.type is "numeric" the returned tibble contains columns
in addition to a modified version of the original group:
x position
y position
list containing the "coefficients" matrix from the summary of the fit object
numeric values extracted or computed from fit object
character, method used.
Set to "inward" to override the default of the "text" geom.
Indicating the quantile used for the fit
Factor with a level for each quantile
TRUE is polynomial is forced through the origin
One or columns with the coefficient estimates
If output.type different from "numeric" the returned tibble contains
columns below in addition to a modified version of the original group:
x position
y position
equation for the fitted polynomial as a character string to be parsed
of the fitted model as a character string to be parsed
AIC for the fitted model.
Number of observations used in the fit.
Set according method used.
character, method used.
numeric values extracted or computed from fit object.
Set to "inward" to override the default of the "text" geom.
Numeric value of the quantile used for the fit
Factor with a level for each quantile
To explore the computed values returned for a given input we suggest the use
of geom_debug as shown in the example below.
stat_quant_line()
predicted value
lower confidence limit around the fitted line
upper confidence limit around the fitted line
If fm.values = TRUE is passed then one column with the number of
observations n used for each fit is also included, with the same
value in each row within a group. This is wasteful and disabled by default,
but provides a simple and robust approach to achieve effects like colouring
or hiding of the model fit line based on the number of observations.
stat_quant_band()
Regression prediction for the middle quantile, if three quantiles are passed as argument
Regression prediction for the smallest quantile
Regression prediction for the largest quantile
If fm.values = TRUE is passed then one column with the number of
observations n used for each fit is also included, with the same
value in each row within a group. This is wasteful and disabled by default,
but provides a simple and robust approach to achieve effects like colouring
or hiding of the model fit line based on the number of observations.
The formatting of character strings to be displayed in plots are marked as mathematical equations. Depending on the geom used, the mark-up needs to be encoded differently, or in some cases mark-up not applied.
"expression"The labels are encoded as character strings to be parsed into R's plotmath expressions.
"LaTeX", "TeX", "tikz", "latex"The labels are encoded as 'LaTeX' maths equations, without the "fences" for switching in math mode.
"latex.eqn"Same as "latex" but enclosed in single $, i.e., as in-line maths.
"latex.deqn"Same as "latex" but enclosed in double $$, i.e., as display maths.
"markdown"The labels are encoded as character strings using markdown syntax, with some embedded HTML.
"marquee"The labels are encoded as character strings using markdown syntax, with 'marquee' supported spans.
"text"The labels are plain ASCII character strings.
"numeric"No labels are generated. This value is accepted by the statistics, but not by the label formatting functions.
NULLThe value used depends on the argument passed to geom.
If geom = "latex" (package 'xdvir') the output type used is
"latex.eqn". If geom = "richtext" (package 'ggtext') or
geom = "textbox" (package 'ggtext') the output type used is
"markdown". If geom = "marquee" (package 'marquee') the output
type used is "marquee". For all other values of geom the default
is "expression". Invalid values as argument trigger an error.
By default the equation label uses as symbols the names of the aesthetics,
x and y. However, "x" and "y" can be
substituted by providing a replacement character string for the
right-hand-side and left-hand-side through eq.x.rhs and
eq.with.lhs, respectively. For backward compatibility a logical is
also accepted as argument for eq.with.lhs, with FALSE
suppressing the left-hand-side.
If the model formula includes a transformation of the explanatory
variable in its right-hand-side (rhs), a matching argument should be passed
to parameter eq.x.rhs as its default value would result in an
equation label that does not reflect the applied transformation. In most
cases, a transformation should not be applied within the left hand side
(lhs) of the model formula, but instead in the mapping of the response
variable within aes. In this case it may be necessary to also pass a
matching argument to parameter eq.with.lhs.
Parameter orientation is redundant as the orientation can be set
by the formula but is included for consistency with
ggplot2::stat_smooth().
When data are grouped by mapping a factor to an aesthetic, e.g.,
colour, shape and/or linetype the model is fitted
separately to each group, and for each group a whole set of labels is
generated. If the argument passed to label.y is a vector of length
1, this value determines the position of the equation and/or other labels
for the first group, and the positions of the labels for the remaining
groups are generated by adding vspace based on the group number.
If the argument passed to label.y is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring vstep.
If the labels are rotated by 90 degrees then the automatic stepping is
best based on hstep with vstep = 0. Similarly as described
above, if label.x is a vector of length > 1, it is
used unchanged, possibly extended by recycling, ignoring hstep.
When using facets and with a grouping that does not repeat in each panel,
the automatic positioning in most cases will not be the desired one. Manual
positioning using a vector of length > 1 for label.x and/or
label.y is the currently available workaround.
A ggplot statistic receives as data a data frame that is not the one
passed as argument by the user, but instead a data frame with the variables
mapped to aesthetics. In stat_poly_eq() the compute function is
applied by group, each call "seeing" the subset of data for an
individual group. As supported models are for regression lines,
variables mapped to x and y should both be continuous, i.e.,
numeric or date time and model formulas defined using x and y
as variable names.
The interpretation of the argument passed to formula is enhanced
compared to stat_smooth(). Formulas with x as explanatory
variable work as in stat_smooth() but formulas with y as
explanatory variable are also accepted. orientation is set
automatically based on which explanatory variable appears in the formula.
Spline-based smoothers are only partially supported.
The range of the prediction line is
controlled by parameters fullrange and limit.to.
fullrange is backwards compatible both with earlier versions of
'ggpmisc' and with stat_smooth() from 'ggplot2'; an argument passed
to limit.to overrides fullrange making it possible to
constrain the range to that of x, y, or both simultaneously,
with "x", "y", or "xy", respectively, as argument.
limit.to also accepts a numeric vector of values to be used as
newdata when computing the prediction. Limiting the range based on
both aesthetics is the best approach for major axis regression (MA, SMA,
RMA) but can occasionally be useful also with some other methods when
slopes are very steep and error variance in the explanatory variable is
large. A numeric vector can be used to predict the response at specific
values of the explanatory variable. If a single or very few values are
predicted, it can be necessary to override the default geom =
"smooth" with geom = "pointrange".
Several model fit functions are supported explicitly (see tables), and some
of their differences smoothed out. Compatibility is checked late, based on
the class of the returned fitted model object. This makes it possible to
use wrapper functions that do model selection or other adjustments to the
fit procedure on a per panel or per group basis. Moreover, if the value
returned as model fit object is NULL or NA, plotting is
skipped on a per group within panel basis.
In the case of fitted model objects of classes not explicitly supported, an attempt is made to find the usual accessors and/or fitted object members, and if found, either complete or partial support is frequently achieved. In this case a message is issued encouraging users to check the validity of the values extracted as the structure of fitted model objects belonging to different classes and the values returned by their accessors can vary, potentially resulting in decoding errors leading to the return of wrong values for estimates.
The argument to parameter method can be either the name of a
function object, possibly using double colon notation in case its package
is not attached, or a character string matching the function name for
functions in the search path. This approach makes it possible to support
model fit functions that are not dependencies of 'ggpmisc'. Either by
attaching the package where the function is defined and passing it by name
or as string, or using double colon notation when passing the name of the
function.
User-defined functions can be passed as argument to parameter method
as long as they have parameters formula, data subset
and possibly weights. Additional arguments can be passed to any
method as a named list through parameter method.args. As in
stat_smooth() prior weights are
passed to the model fit functions' weights (plural!) parameter by
mapping a numeric variable to plot aesthetic weight (singular!).
Tables 1 lists natively supported model fit functions, with the caveat that only some 'broom' methods' specializations have been actually tested with statistics from 'ggpmisc'. In addition, the statistics based on 'broom' methods require the user to tailor their behaviour by passing additional arguments in the call and occasionally some detective work to find out the names of variables in the returned data frame as these names are set by methods from 'broom'.
Table 1. Model fit methods supported by the different statistics
available in package 'ggpmisc'. Column indicates whether
computations are done by group (G) or by plot panel (P).
| Statistic | |
Supported model fit methods |
stat_poly_line() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with methods predict() or fitted() |
stat_poly_eq() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", others with needed accesors |
stat_quant_line() |
G | "rq", "rqss" |
stat_quant_band() |
G | "rq", "rqss" |
stat_quant_eq() |
G | "rq", "rqss" |
stat_ma_line() |
G | "SMA", "MA", "RMA", "OLS" |
stat_ma_eq() |
G | "SMA", "MA", "RMA", "OLS" |
stat_fit_residuals() |
G | "lm", "rlm", "lts", "sma", "ma", "gls", "rq", "rqss" others with method residuals() |
stat_fit_fitted() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with method fitted() |
stat_fit_deviations() |
G | "lm", "rlm", "lts", "gls", "rq", "rqss" others with methods fitted() and weights() |
stat_fit_augment() |
G | any with 'broom' method augment() |
stat_fit_glance() |
G | any with 'broom' method glance() |
stat_fit_tidy() |
G | any with 'broom' method tidy() |
stat_fit_tb() |
P | any with 'broom' method tidy()
|
The single colon notation is based on parsing
the name and is available when passing the name of the fit method as a
character string. In a string such as "head:tail" the "head" gives the name
of the model fit function and the "tail" gives the argument to pass it's
method parameter. This is only a convenience, as method.args
can be also used. In some methods, i.e., splines, the default
formula = y ~ x needs to be overridden by the user.
Table 2 lists the correspondence of pre-defined method names to model fit method functions. As mentioned above, these are only a subset of the model fit methods that are expected to work. When using these names there is no need for users to attach additional packages but the packages must be available (installed).
Table 2. Available predefined method names, the model fit functions
they call, the packages where the functions reside, the class of the
returned fitted model object and the arguments that can be
passed to their method parameter using single colon notation.
| Predefined method names | Model fit methods | R package | Object class |
| "lm", "lm:qr" | lm() |
'stats' | "lm" |
| "rlm", "rlm:M", "rlm:MM" | rlm() |
'MASS' | "rlm" ("lm") |
| "lts", "ltsReg" | ltsReg() |
'robustbase' | "lts" |
| "ma", "sma", "sma:SMA", "sma:MA", "sma:OLS" | sma() |
'smatr' | "ma" or "sma" |
| "gls", "gls:REML", "gls:ML" | gls() |
'nlme' | "gls" |
| "rq", "rq:sfn", "rq:sfnc", "rq:lasso" | rq() |
'quantreg' | "rq" |
| "rqss", "rqss:sfn", "rqss:sfnc", "rqss:lasso" | rqss() |
'quantreg' | "rqss" |
| "SMA", "MA", "RMA", "OLS" | lmodel2() |
'lmodel2' | ("list") |
stat_quant_line() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ after_stat(group) |
| • | weight |
→ 1
|
stat_quant_band() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
stat_quant_eq() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | grp.label |
|
| • | hjust |
→ "inward" |
| • | label |
→ after_stat(eq.label) |
| • | npcx |
→ after_stat(npcx) |
| • | npcy |
→ after_stat(npcy) |
| • | vjust |
→ "inward"
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Cardoso, G. C. (2019) Double quantile regression accurately assesses distance to boundary trade-off. Methods in ecology and evolution, 10(8), 1322-1331.
Other 'ggpmisc' statistics for model fits:
stat_distrmix_eq(),
stat_fit_deviations(),
stat_fit_glance(),
stat_fit_tb(),
stat_fit_tidy(),
stat_ma_eq(),
stat_poly_eq()
# generate artificial data set.seed(4321) x <- 1:100 y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4) y <- y / max(y) my.data <- data.frame(x = x, y = y, group = c("A", "B"), y2 = y * c(1, 2) + max(y) * c(0, 0.1), w = sqrt(x)) # Predictions as lines ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(quantiles = 0.5, se = TRUE) # Predictions as band ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_band() # y as explanatory variable (orientation = y) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_band(formula = x ~ y) # Using splines library(quantreg) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(method = "rqss", formula = y ~ qss(x, constraint = "D"), quantiles = 0.5, se = FALSE) # Adding annotations ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() + stat_quant_eq() ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() + stat_quant_eq(mapping = use_label("eq")) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() + stat_quant_eq(mapping = use_label("eq"), decreasing = TRUE) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() + stat_quant_eq(mapping = use_label("eq", "method")) # same formula as default ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = y ~ x) + stat_quant_eq(formula = y ~ x) # explicit formula "x explained by y" ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = x ~ y) + stat_quant_eq(formula = x ~ y) # using color ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(mapping = aes(color = after_stat(quantile.f))) + stat_quant_eq(mapping = aes(color = after_stat(quantile.f))) + labs(color = "Quantiles") # location and colour ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(mapping = aes(color = after_stat(quantile.f))) + stat_quant_eq(mapping = aes(color = after_stat(quantile.f)), label.y = "bottom", label.x = "right") + labs(color = "Quantiles") # give a name to a formula formula <- y ~ poly(x, 3, raw = TRUE) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, linewidth = 0.5) + stat_quant_eq(formula = formula) # angle ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, linewidth = 0.5) + stat_quant_eq(formula = formula, angle = 90, hstep = 0.04, vstep = 0, label.y = 0.02, hjust = 0, size = 3) + expand_limits(x = -15) # make space for equations # user set quantiles ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, quantiles = 0.5) + stat_quant_eq(formula = formula, quantiles = 0.5) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_band(formula = formula, quantiles = c(0.1, 0.5, 0.9)) + stat_quant_eq(formula = formula, parse = TRUE, quantiles = c(0.1, 0.5, 0.9)) # grouping ggplot(my.data, aes(x, y2, color = group)) + geom_point() + stat_quant_line(formula = formula, linewidth = 0.5) + stat_quant_eq(formula = formula) ggplot(my.data, aes(x, y2, color = group)) + geom_point() + stat_quant_band(formula = formula, linewidth = 0.75) + stat_quant_eq(formula = formula) + theme_bw() # labelling equations ggplot(my.data, aes(x, y2, shape = group, linetype = group, grp.label = group)) + geom_point() + stat_quant_band(formula = formula, color = "black", linewidth = 0.75) + stat_quant_eq(mapping = use_label("grp", "eq", sep = "*\": \"*"), formula = formula) + expand_limits(y = 3) + theme_classic() # modifying the explanatory variable within the model formula # modifying the response variable within aes() formula.trans <- y ~ I(x^2) ggplot(my.data, aes(x, y + 1)) + geom_point() + stat_quant_line(formula = formula.trans) + stat_quant_eq(mapping = use_label("eq"), formula = formula.trans, eq.x.rhs = "~x^2", eq.with.lhs = "y + 1~~`=`~~") # using weights ggplot(my.data, aes(x, y, weight = w)) + geom_point() + stat_quant_line(formula = formula, linewidth = 0.5) + stat_quant_eq(formula = formula) # no weights, quantile set to upper boundary ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, quantiles = 0.95) + stat_quant_eq(formula = formula, quantiles = 0.95) # manually assemble and map a specific label using paste() and aes() ggplot(my.data, aes(x, y2, color = group, grp.label = group)) + geom_point() + stat_quant_line(method = "rq", formula = formula, quantiles = c(0.05, 0.5, 0.95), linewidth = 0.5) + stat_quant_eq(mapping = aes(label = paste(after_stat(grp.label), "*\": \"*", after_stat(eq.label), sep = "")), quantiles = c(0.05, 0.5, 0.95), formula = formula, size = 3) # manually assemble and map a specific label using sprintf() and aes() ggplot(my.data, aes(x, y2, color = group, grp.label = group)) + geom_point() + stat_quant_band(method = "rq", formula = formula, quantiles = c(0.05, 0.5, 0.95)) + stat_quant_eq(mapping = aes(label = sprintf("%s*\": \"*%s", after_stat(grp.label), after_stat(eq.label))), quantiles = c(0.05, 0.5, 0.95), formula = formula, size = 3) # geom = "text" ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, quantiles = 0.5) + stat_quant_eq(label.x = "left", label.y = "top", formula = formula, quantiles = 0.5) # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics using after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_quant_line(geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_quant_band(geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, geom = "debug_group") ## Not run: if (gginnards.installed) ggplot(mpg, aes(displ, hwy)) + stat_quant_line(geom = "debug_group", fm.values = TRUE) if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_quant_band(geom = "debug_group", fm.values = TRUE) if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(mapping = aes(label = after_stat(eq.label)), formula = formula, geom = "debug_group", output.type = "markdown") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, geom = "debug_group", output.type = "text") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, geom = "debug_group", output.type = "numeric") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, quantiles = c(0.25, 0.5, 0.75), geom = "debug_group", output.type = "text") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, quantiles = c(0.25, 0.5, 0.75), geom = "debug_group", output.type = "numeric") ## End(Not run)# generate artificial data set.seed(4321) x <- 1:100 y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4) y <- y / max(y) my.data <- data.frame(x = x, y = y, group = c("A", "B"), y2 = y * c(1, 2) + max(y) * c(0, 0.1), w = sqrt(x)) # Predictions as lines ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(quantiles = 0.5, se = TRUE) # Predictions as band ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_band() # y as explanatory variable (orientation = y) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_band(formula = x ~ y) # Using splines library(quantreg) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(method = "rqss", formula = y ~ qss(x, constraint = "D"), quantiles = 0.5, se = FALSE) # Adding annotations ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() + stat_quant_eq() ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() + stat_quant_eq(mapping = use_label("eq")) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() + stat_quant_eq(mapping = use_label("eq"), decreasing = TRUE) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line() + stat_quant_eq(mapping = use_label("eq", "method")) # same formula as default ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = y ~ x) + stat_quant_eq(formula = y ~ x) # explicit formula "x explained by y" ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = x ~ y) + stat_quant_eq(formula = x ~ y) # using color ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(mapping = aes(color = after_stat(quantile.f))) + stat_quant_eq(mapping = aes(color = after_stat(quantile.f))) + labs(color = "Quantiles") # location and colour ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(mapping = aes(color = after_stat(quantile.f))) + stat_quant_eq(mapping = aes(color = after_stat(quantile.f)), label.y = "bottom", label.x = "right") + labs(color = "Quantiles") # give a name to a formula formula <- y ~ poly(x, 3, raw = TRUE) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, linewidth = 0.5) + stat_quant_eq(formula = formula) # angle ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, linewidth = 0.5) + stat_quant_eq(formula = formula, angle = 90, hstep = 0.04, vstep = 0, label.y = 0.02, hjust = 0, size = 3) + expand_limits(x = -15) # make space for equations # user set quantiles ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, quantiles = 0.5) + stat_quant_eq(formula = formula, quantiles = 0.5) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_band(formula = formula, quantiles = c(0.1, 0.5, 0.9)) + stat_quant_eq(formula = formula, parse = TRUE, quantiles = c(0.1, 0.5, 0.9)) # grouping ggplot(my.data, aes(x, y2, color = group)) + geom_point() + stat_quant_line(formula = formula, linewidth = 0.5) + stat_quant_eq(formula = formula) ggplot(my.data, aes(x, y2, color = group)) + geom_point() + stat_quant_band(formula = formula, linewidth = 0.75) + stat_quant_eq(formula = formula) + theme_bw() # labelling equations ggplot(my.data, aes(x, y2, shape = group, linetype = group, grp.label = group)) + geom_point() + stat_quant_band(formula = formula, color = "black", linewidth = 0.75) + stat_quant_eq(mapping = use_label("grp", "eq", sep = "*\": \"*"), formula = formula) + expand_limits(y = 3) + theme_classic() # modifying the explanatory variable within the model formula # modifying the response variable within aes() formula.trans <- y ~ I(x^2) ggplot(my.data, aes(x, y + 1)) + geom_point() + stat_quant_line(formula = formula.trans) + stat_quant_eq(mapping = use_label("eq"), formula = formula.trans, eq.x.rhs = "~x^2", eq.with.lhs = "y + 1~~`=`~~") # using weights ggplot(my.data, aes(x, y, weight = w)) + geom_point() + stat_quant_line(formula = formula, linewidth = 0.5) + stat_quant_eq(formula = formula) # no weights, quantile set to upper boundary ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, quantiles = 0.95) + stat_quant_eq(formula = formula, quantiles = 0.95) # manually assemble and map a specific label using paste() and aes() ggplot(my.data, aes(x, y2, color = group, grp.label = group)) + geom_point() + stat_quant_line(method = "rq", formula = formula, quantiles = c(0.05, 0.5, 0.95), linewidth = 0.5) + stat_quant_eq(mapping = aes(label = paste(after_stat(grp.label), "*\": \"*", after_stat(eq.label), sep = "")), quantiles = c(0.05, 0.5, 0.95), formula = formula, size = 3) # manually assemble and map a specific label using sprintf() and aes() ggplot(my.data, aes(x, y2, color = group, grp.label = group)) + geom_point() + stat_quant_band(method = "rq", formula = formula, quantiles = c(0.05, 0.5, 0.95)) + stat_quant_eq(mapping = aes(label = sprintf("%s*\": \"*%s", after_stat(grp.label), after_stat(eq.label))), quantiles = c(0.05, 0.5, 0.95), formula = formula, size = 3) # geom = "text" ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_line(formula = formula, quantiles = 0.5) + stat_quant_eq(label.x = "left", label.y = "top", formula = formula, quantiles = 0.5) # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics using after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_quant_line(geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_quant_band(geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, geom = "debug_group") ## Not run: if (gginnards.installed) ggplot(mpg, aes(displ, hwy)) + stat_quant_line(geom = "debug_group", fm.values = TRUE) if (gginnards.installed) ggplot(my.data, aes(x, y)) + stat_quant_band(geom = "debug_group", fm.values = TRUE) if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(mapping = aes(label = after_stat(eq.label)), formula = formula, geom = "debug_group", output.type = "markdown") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, geom = "debug_group", output.type = "text") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, geom = "debug_group", output.type = "numeric") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, quantiles = c(0.25, 0.5, 0.75), geom = "debug_group", output.type = "text") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_point() + stat_quant_eq(formula = formula, quantiles = c(0.25, 0.5, 0.75), geom = "debug_group", output.type = "numeric") ## End(Not run)
stat_spikes() tags or extracts rows in data containing local
y narrow maxima and/or minima with very steep shoulders. It makes it
possible to highlight and label spikes based on their x and/or
y coordinates. Orientations flipping as well as dates and times are
supported.
stat_spikes( mapping = NULL, data = NULL, geom = "point", position = "identity", ..., orientation = "x", height.threshold = 20, z.threshold = 7, k = 20, spike.direction = "both", label.fmt = NULL, x.label.fmt = label.fmt, y.label.fmt = NULL, extract.spikes = NULL, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )stat_spikes( mapping = NULL, data = NULL, geom = "point", position = "identity", ..., orientation = "x", height.threshold = 20, z.threshold = 7, k = 20, spike.direction = "both", label.fmt = NULL, x.label.fmt = label.fmt, y.label.fmt = NULL, extract.spikes = NULL, na.rm = FALSE, show.legend = FALSE, inherit.aes = TRUE )
mapping |
The aesthetic mapping, usually constructed with
|
data |
A layer specific dataset - only needed if you want to override the plot defaults. |
geom |
The geometric object to use display the data |
position |
The position adjustment to use for overlapping points on this layer |
... |
other arguments passed on to |
orientation |
character The orientation of the layer can be set to
either |
height.threshold |
numeric The minimum height of spikes expressed
relative to the median amplitude of the baseline local variation of
|
z.threshold |
numeric Modified local |
k |
integer width of median window used for smoothing; must be odd |
spike.direction |
character One of |
label.fmt, x.label.fmt, y.label.fmt
|
character strings giving a format
definition for construction of character strings labels with function
|
extract.spikes |
If |
na.rm |
a logical value indicating whether NA values should be stripped before the computation proceeds. |
show.legend |
logical. Should this layer be included in the legends?
|
inherit.aes |
If |
Spikes are detected based on a modified score calculated
from the differenced spectrum. The threshold used should be
adjusted to the characteristics of the input and desired sensitivity. The
lower the threshold the more stringent the test becomes, with shorter
spikes being detected.
The algorithm uses running differences to detect abrupt changes in value,
compared to an estimate of the baseline variation of the differences,
approximating a baseline from MAD and a baseline value from the
median differences. Currently, a single estimate of MAD is used but running
medians, when posisble, as baseline. This comparison detects running
differences that are unusually large, in most cases signalling a transition
between values near the baseline and far from it, in both directions.
Transitions into- and out of spikes are distinguished based on the median of the non-differenced values, as a descriptor of the data baseline. As for the median of the differences, a running median is used when possible.
This function thus detects the start and end of each spike, and distinguishes upward and downward spikes.
k is the width in number of observations of the window used for
running median smoothing to extract the baseline. A value several times the
width of the broader spike but narrow enough to track broader peaks needs
to be manually set in most cases.
With na.rm = TRUE, NA values are omitted before searching for
spikes and set to 0L in the returned vector.
If all spikes are guaranteed to be one observation-wide and either going up
or down from the baseline, it is possible to detect them based purely on
the z.threshold by passing height.threshold = NA and either
spike.direction = "up" or spike.direction = "down", which
ensures very fast computation.
A data frame with one row for each spike found in the data
extracted from the input data or all rows in data. Added columns
contain the labels.
x-values at the spikes as numeric.
y-values at the spikes as numeric.
x-values at the spikes formatted as character.
y-values at the spikes formatted as character.
integer vector of 0, 1 or -1.
stat_peaks(),
stat_valleys() and stat_spikes() work nicely together with
geoms geom_text_repel(), geom_label_repel(), and
geom_marquee_repel() from package ggrepel to
solve the problem of overlapping labels by displacing them. If using
geom_text(), discard overlapping labels using
check_overlap = TRUE.
By default the labels are character values ready to be ploted as plain
text, but with a suitable label.fmt argument, labels formatted as
plotmath expressions, markdown or LaTeX can be
created (e.g., containing Greek letters or super or subscripts, maths or
colour) can be generated for use with geoms from packages 'marquee',
'ggtext' and 'xdvir'.
The default is geom = "point" it is likely to work well in almost
any situation. The default aesthetics mappings set by these stats allow
their direct use with geom_text(), geom_label(),
geom_line(), geom_rug(), geom_hline() and
geom_vline() by just passing an argument to geom.
stat_spikes() understands the following aesthetics. Required aesthetics are displayed in bold and defaults are displayed for optional aesthetics:
| • | x |
|
| • | y |
|
| • | group |
→ inferred |
| • | label |
→ after_stat(x.label) |
| • | xintercept |
→ after_stat(x) |
| • | yintercept |
→ after_stat(y)
|
Learn more about setting these aesthetics in vignette("ggplot2-specs").
Whitaker, D. A.; Hayes, K. (2018) A simple algorithm for despiking Raman spectra. Chemometrics and Intelligent Laboratory Systems, 179, 82-84. doi:10.1016/j.chemolab.2018.06.009.
find_spikes, for the function used to located the
spikes.
# lynx and Nile are time.series objects recognized by # ggpp::ggplot.ts() and converted on-the-fly with a default mapping n = 500 set.seed(45678) my.data <- data.frame(x = 1:n, y = rep(sin((0:19)/20 * 2 * pi), n / 20) + stats::rnorm(n, sd = 0.5)) selector <- sample(seq_len(n), 5) my.data$y[selector] <- my.data$y[selector] + 10 ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "orange") ggplot(my.data, aes(x, -y)) + geom_line() + stat_spikes(colour = "orange") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(geom = "text", vjust = -0.5) + stat_spikes(geom = "rug", colour = "red") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "red", spike.direction = "up") + stat_spikes(colour = "blue", spike.direction = "down") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "red", spike.direction = "up") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "blue", spike.direction = "down") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(z.threshold = 2, colour = "orange") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(z.threshold = 20, colour = "orange") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "red", spike.direction = "up", height.threshold = NA) # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(geom = "debug_group", extract.spikes = FALSE)# lynx and Nile are time.series objects recognized by # ggpp::ggplot.ts() and converted on-the-fly with a default mapping n = 500 set.seed(45678) my.data <- data.frame(x = 1:n, y = rep(sin((0:19)/20 * 2 * pi), n / 20) + stats::rnorm(n, sd = 0.5)) selector <- sample(seq_len(n), 5) my.data$y[selector] <- my.data$y[selector] + 10 ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "orange") ggplot(my.data, aes(x, -y)) + geom_line() + stat_spikes(colour = "orange") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(geom = "text", vjust = -0.5) + stat_spikes(geom = "rug", colour = "red") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "red", spike.direction = "up") + stat_spikes(colour = "blue", spike.direction = "down") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "red", spike.direction = "up") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "blue", spike.direction = "down") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(z.threshold = 2, colour = "orange") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(z.threshold = 20, colour = "orange") ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(colour = "red", spike.direction = "up", height.threshold = NA) # Inspecting the returned data using geom_debug_group() # This provides a quick way of finding out the names of the variables that # are available for mapping to aesthetics with after_stat(). gginnards.installed <- requireNamespace("gginnards", quietly = TRUE) if (gginnards.installed) library(gginnards) if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(geom = "debug_group") if (gginnards.installed) ggplot(my.data, aes(x, y)) + geom_line() + stat_spikes(geom = "debug_group", extract.spikes = FALSE)
By default a formula of x on y is converted into a formula of y
on x, while the reverse swap is done only if backward = TRUE.
swap_xy(f, backwards = FALSE)swap_xy(f, backwards = FALSE)
f |
formula An R model formula |
backwards |
logical If |
If backwards = TRUE, a formula with x in the lhs is
always, returned. If backwards = FALSE, a formula with y in the
lhs is always, returned. If backwards = NULL x and y
are always swapped.
This function is meant to be used only as a helper within 'ggplot2'
statistics. Normally together with geometries supporting orientation when we
want to automate the change in orientation based on a user-supplied formula.
Only x and y are exchanged, and in other respects the formula
is rebuilt copying the environment from f.
A copy of f with x and y swapped by each other
in the lhs and rhs.
Expand scale limits to make them symmetric around zero. Can be
passed as argument to parameter limits of continuous scales from
packages 'ggplot2' or 'scales'. Can be also used to obtain an enclosing
symmetric range for numeric vectors.
symmetric_limits(x)symmetric_limits(x)
x |
numeric The automatic limits when used as argument to a scale's
|
A numeric vector of length two with the new limits, which are always such that the absolute value of upper and lower limits is the same.
symmetric_limits(c(-1, 1.8)) symmetric_limits(c(-10, 1.8)) symmetric_limits(-5:20)symmetric_limits(c(-1, 1.8)) symmetric_limits(c(-10, 1.8)) symmetric_limits(-5:20)
Typeset/format numbers preserving trailing zeros
typeset_numbers(eq.char, output.type)typeset_numbers(eq.char, output.type)
eq.char |
character A polynomial model equation as a character string. |
output.type |
character One of "expression", "latex", "tex", "text", "tikz", "markdown", "marquee". |
A character string.
exponential number notation to typeset equivalent: Protecting trailing zeros in negative numbers is more involved than I would like. Not only we need to enclose numbers in quotations marks but we also need to replace dashes with the minus character. I am not sure we can do the replacement portably, but that recent R supports UTF gives some hope.
Assemble model-fit-derived text or expressions and map them to
the label aesthetic.
use_label(..., labels = NULL, other.mapping = NULL, sep = "*\", \"*")use_label(..., labels = NULL, other.mapping = NULL, sep = "*\", \"*")
... |
character Strings giving the names of at most six label components in the order they will be included in the combined label. |
labels |
character A vector with the name of at most six label
components. If provided, values passed through |
other.mapping |
An unevaluated expression constructed with function
|
sep |
character A string used as separator when pasting the label components together. |
Statistics stat_poly_eq(), stat_ma_eq(),
stat_quant_eq() and stat_correlation() return
multiple text strings to be used individually or assembled into longer
character strings depending on the labels actually desired. Assembling and
mapping them requires verbose R code and familiarity with R expression
syntax. Function use_label() automates these two tasks and accepts
abbreviated familiar names for the parameters in addition to the name of
the columns in the data object returned by the statistics. The default
separator is suitable for plotmath expressions.
These four statistics return several character variables with names
ending in .label. This ending can be omitted, as well as
.value for f.value.label, t.value.label,
z.value.label, S.value.label and p.value.label.
R2 can be used in place of rr. Furthermore, case is ignored.
Thus, use_label("eq", "R2") is equivalent to
aes(label = paste(after_stat(eq.label), after_stat(rr.label), sep = ", "))
Function use_label() calls aes() to create a mapping for
the label aesthetic, but it can in addition combine this mapping
with other mappings directly created with aes().
A mapping to the label aesthetic and optionally additional
mappings as an unevaluated R expression, built using function
aes(), ready to be passed as argument to the
mapping parameter of the supported statistics.
Function use_label() can be used to generate an argument
passed to formal parameter mapping of the statistics
stat_poly_eq, stat_ma_eq,
stat_quant_eq and stat_correlation. Please,
see their documentation for the labels they generate.
# generate artificial data set.seed(4321) x <- 1:100 y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4) my.data <- data.frame(x = x, y = y * 1e-5, group = c("A", "B"), y2 = y * 1e-5 + c(2, 0)) # give a name to a formula formula <- y ~ poly(x, 3, raw = TRUE) # default label constructed by use_label() ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label(), formula = formula) # user specified label components ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label("eq", "F"), formula = formula) # user specified label components and separator ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label("R2", "F", sep = "*\" with \"*"), formula = formula) # combine the mapping to the label aesthetic with other mappings ggplot(data = my.data, mapping = aes(x = x, y = y2)) + geom_point(mapping = aes(colour = group)) + stat_poly_line(mapping = aes(colour = group), formula = formula) + stat_poly_eq(mapping = use_label("grp", "eq", "F", aes(grp.label = group)), formula = formula) # combine other mappings with default labels ggplot(data = my.data, mapping = aes(x = x, y = y2)) + geom_point(mapping = aes(colour = group)) + stat_poly_line(mapping = aes(colour = group), formula = formula) + stat_poly_eq(mapping = use_label(aes(colour = group)), formula = formula) # example with other available components ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label("eq", "adj.R2", "n"), formula = formula) # multiple labels ggplot(data = my.data, mapping = aes(x, y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label("R2", "F", "P", "AIC", "BIC"), formula = formula) + stat_poly_eq(mapping = use_label(c("eq", "n")), formula = formula, label.y = "bottom", label.x = "right") # quantile regression ggplot(data = my.data, mapping = aes(x, y)) + stat_quant_band(formula = formula) + stat_quant_eq(mapping = use_label("eq", "n"), formula = formula) + geom_point() # major axis regression ggplot(data = my.data, aes(x = x, y = y)) + stat_ma_line() + stat_ma_eq(mapping = use_label("eq", "n")) + geom_point() # correlation ggplot(data = my.data, mapping = aes(x = x, y = y)) + stat_correlation(mapping = use_label("r", "t", "p")) + geom_point()# generate artificial data set.seed(4321) x <- 1:100 y <- (x + x^2 + x^3) + rnorm(length(x), mean = 0, sd = mean(x^3) / 4) my.data <- data.frame(x = x, y = y * 1e-5, group = c("A", "B"), y2 = y * 1e-5 + c(2, 0)) # give a name to a formula formula <- y ~ poly(x, 3, raw = TRUE) # default label constructed by use_label() ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label(), formula = formula) # user specified label components ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label("eq", "F"), formula = formula) # user specified label components and separator ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label("R2", "F", sep = "*\" with \"*"), formula = formula) # combine the mapping to the label aesthetic with other mappings ggplot(data = my.data, mapping = aes(x = x, y = y2)) + geom_point(mapping = aes(colour = group)) + stat_poly_line(mapping = aes(colour = group), formula = formula) + stat_poly_eq(mapping = use_label("grp", "eq", "F", aes(grp.label = group)), formula = formula) # combine other mappings with default labels ggplot(data = my.data, mapping = aes(x = x, y = y2)) + geom_point(mapping = aes(colour = group)) + stat_poly_line(mapping = aes(colour = group), formula = formula) + stat_poly_eq(mapping = use_label(aes(colour = group)), formula = formula) # example with other available components ggplot(data = my.data, mapping = aes(x = x, y = y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label("eq", "adj.R2", "n"), formula = formula) # multiple labels ggplot(data = my.data, mapping = aes(x, y2, colour = group)) + geom_point() + stat_poly_line(formula = formula) + stat_poly_eq(mapping = use_label("R2", "F", "P", "AIC", "BIC"), formula = formula) + stat_poly_eq(mapping = use_label(c("eq", "n")), formula = formula, label.y = "bottom", label.x = "right") # quantile regression ggplot(data = my.data, mapping = aes(x, y)) + stat_quant_band(formula = formula) + stat_quant_eq(mapping = use_label("eq", "n"), formula = formula) + geom_point() # major axis regression ggplot(data = my.data, aes(x = x, y = y)) + stat_ma_line() + stat_ma_eq(mapping = use_label("eq", "n")) + geom_point() # correlation ggplot(data = my.data, mapping = aes(x = x, y = y)) + stat_correlation(mapping = use_label("r", "t", "p")) + geom_point()
Convert two numeric ternary outcomes into a factor
xy_outcomes2factor(x, y) xy_thresholds2factor(x, y, x_threshold = 0, y_threshold = 0)xy_outcomes2factor(x, y) xy_thresholds2factor(x, y, x_threshold = 0, y_threshold = 0)
x, y
|
numeric vectors of -1, 0, and +1 values, indicating down regulation, uncertain response or up-regulation, or numeric vectors that can be converted into such values using a pair of thresholds. |
x_threshold, y_threshold
|
numeric vector Ranges enclosing the values to be considered uncertain for each of the two vectors.. |
This function converts the numerically encoded values into a factor
with the four levels "xy", "x", "y" and "none".
The factor created can be used for faceting or can be mapped to aesthetics.
This is an utility function that only saves some typing. The same
result can be achieved by a direct call to factor. This
function aims at making it easier to draw quadrant plots with facets
based on the combined outcomes.
Other Functions for quadrant and volcano plots:
FC_format(),
outcome2factor(),
scale_colour_outcome(),
scale_shape_outcome(),
scale_y_Pvalue()
Other scales for omics data:
outcome2factor(),
scale_colour_logFC(),
scale_shape_outcome(),
scale_x_logFC()
xy_outcomes2factor(c(-1, 0, 0, 1, -1), c(0, 1, 0, 1, -1)) xy_thresholds2factor(c(-1, 0, 0, 1, -1), c(0, 1, 0, 1, -1)) xy_thresholds2factor(c(-1, 0, 0, 0.1, -5), c(0, 2, 0, 1, -1))xy_outcomes2factor(c(-1, 0, 0, 1, -1), c(0, 1, 0, 1, -1)) xy_thresholds2factor(c(-1, 0, 0, 1, -1), c(0, 1, 0, 1, -1)) xy_thresholds2factor(c(-1, 0, 0, 0.1, -5), c(0, 2, 0, 1, -1))