Computes summary statistics by groups, similar to the summary
procedure in SAS.
A more flexible alternative to base R's aggregate
.
summary_by(
data,
formula,
id = NULL,
FUN = mean,
keep.names = FALSE,
p2d = FALSE,
order = TRUE,
full.dimension = FALSE,
var.names = NULL,
fun.names = NULL,
...
)
summaryBy(
formula,
data = parent.frame(),
id = NULL,
FUN = mean,
keep.names = FALSE,
p2d = FALSE,
order = TRUE,
full.dimension = FALSE,
var.names = NULL,
fun.names = NULL,
...
)
A data frame.
A formula specifying response and grouping variables.
A formula indicating variables to retain (not grouped by).
A function or list of functions to apply to the response variables.
Logical; keep original variable names if only one function is applied.
Replace parentheses in output names with dots?
Logical; should result be ordered by grouping variables?
Logical; if TRUE, repeat rows so output matches input size.
Optional custom names for response variables.
Optional custom names for functions applied.
Additional arguments passed to functions in FUN
.
A data frame of grouped summary statistics.
Extra arguments in ...
are passed to all functions in FUN
. If needed, wrap functions to handle these consistently (e.g., for na.rm = TRUE
).
data(CO2)
# Simple groupwise mean
summaryBy(uptake ~ Type + Treatment, data = CO2, FUN = mean)
#> Type Treatment uptake.mean
#> 1 Quebec nonchilled 35.33333
#> 2 Quebec chilled 31.75238
#> 3 Mississippi nonchilled 25.95238
#> 4 Mississippi chilled 15.81429
summaryBy(cbind(uptake, conc) ~ Type + Treatment, data = CO2, FUN = mean)
#> Type Treatment uptake.mean conc.mean
#> 1 Quebec nonchilled 35.33333 435
#> 2 Quebec chilled 31.75238 435
#> 3 Mississippi nonchilled 25.95238 435
#> 4 Mississippi chilled 15.81429 435
# Compare with
aggregate(cbind(uptake, conc) ~ Type + Treatment, data = CO2, FUN = mean)
#> Type Treatment uptake conc
#> 1 Quebec nonchilled 35.33333 435
#> 2 Mississippi nonchilled 25.95238 435
#> 3 Quebec chilled 31.75238 435
#> 4 Mississippi chilled 15.81429 435
## Using '.' on the right hand side of a formula means to stratify by
## all variables not used elsewhere:
summaryBy(uptake ~ ., data = CO2, FUN = mean)
#> Plant Type Treatment uptake.mean
#> 1 Qn1 Quebec nonchilled 33.22857
#> 2 Qn2 Quebec nonchilled 35.15714
#> 3 Qn3 Quebec nonchilled 37.61429
#> 4 Qc1 Quebec chilled 29.97143
#> 5 Qc3 Quebec chilled 32.58571
#> 6 Qc2 Quebec chilled 32.70000
#> 7 Mn3 Mississippi nonchilled 24.11429
#> 8 Mn2 Mississippi nonchilled 27.34286
#> 9 Mn1 Mississippi nonchilled 26.40000
#> 10 Mc2 Mississippi chilled 12.14286
#> 11 Mc3 Mississippi chilled 17.30000
#> 12 Mc1 Mississippi chilled 18.00000
# Multiple functions using a custom summary function
myfun <- function(x, ...)
c(m = mean(x, na.rm = TRUE), v = var(x, na.rm = TRUE), n = length(x))
summaryBy(uptake ~ Type + Treatment, data = CO2, FUN = myfun)
#> Type Treatment uptake.m uptake.v uptake.n
#> 1 Quebec nonchilled 35.33333 92.09033 21
#> 2 Quebec chilled 31.75238 93.02262 21
#> 3 Mississippi nonchilled 25.95238 54.79162 21
#> 4 Mississippi chilled 15.81429 16.47529 21
# Summary on transformed variables
# works:
summaryBy(cbind(lu=log(uptake), conc) ~ Type, data = CO2, FUN = mean)
#> Type lu.mean conc.mean
#> 1 Quebec 3.453928 435
#> 2 Mississippi 2.964352 435
# fails:
#summaryBy(cbind(log(uptake), conc) ~ Type, data = CO2, FUN = mean)