Computes summary statistics by groups, similar to the summary procedure in SAS. A more flexible alternative to base R's aggregate.

summary_by(
  data,
  formula,
  id = NULL,
  FUN = mean,
  keep.names = FALSE,
  p2d = FALSE,
  order = TRUE,
  full.dimension = FALSE,
  var.names = NULL,
  fun.names = NULL,
  ...
)

summaryBy(
  formula,
  data = parent.frame(),
  id = NULL,
  FUN = mean,
  keep.names = FALSE,
  p2d = FALSE,
  order = TRUE,
  full.dimension = FALSE,
  var.names = NULL,
  fun.names = NULL,
  ...
)

Arguments

data

A data frame.

formula

A formula specifying response and grouping variables.

id

A formula indicating variables to retain (not grouped by).

FUN

A function or list of functions to apply to the response variables.

keep.names

Logical; keep original variable names if only one function is applied.

p2d

Replace parentheses in output names with dots?

order

Logical; should result be ordered by grouping variables?

full.dimension

Logical; if TRUE, repeat rows so output matches input size.

var.names

Optional custom names for response variables.

fun.names

Optional custom names for functions applied.

...

Additional arguments passed to functions in FUN.

Value

A data frame of grouped summary statistics.

Details

Extra arguments in ... are passed to all functions in FUN. If needed, wrap functions to handle these consistently (e.g., for na.rm = TRUE).

Author

Søren Højsgaard, sorenh@math.aau.dk

Examples

data(CO2)

# Simple groupwise mean
summaryBy(uptake ~ Type + Treatment, data = CO2, FUN = mean)
#>          Type  Treatment uptake.mean
#> 1      Quebec nonchilled    35.33333
#> 2      Quebec    chilled    31.75238
#> 3 Mississippi nonchilled    25.95238
#> 4 Mississippi    chilled    15.81429
summaryBy(cbind(uptake, conc) ~ Type + Treatment, data = CO2, FUN = mean)
#>          Type  Treatment uptake.mean conc.mean
#> 1      Quebec nonchilled    35.33333       435
#> 2      Quebec    chilled    31.75238       435
#> 3 Mississippi nonchilled    25.95238       435
#> 4 Mississippi    chilled    15.81429       435

# Compare with
aggregate(cbind(uptake, conc) ~ Type + Treatment, data = CO2, FUN = mean)
#>          Type  Treatment   uptake conc
#> 1      Quebec nonchilled 35.33333  435
#> 2 Mississippi nonchilled 25.95238  435
#> 3      Quebec    chilled 31.75238  435
#> 4 Mississippi    chilled 15.81429  435

## Using '.' on the right hand side of a formula means to stratify by
## all variables not used elsewhere:
summaryBy(uptake ~ ., data = CO2, FUN = mean)
#>    Plant        Type  Treatment uptake.mean
#> 1    Qn1      Quebec nonchilled    33.22857
#> 2    Qn2      Quebec nonchilled    35.15714
#> 3    Qn3      Quebec nonchilled    37.61429
#> 4    Qc1      Quebec    chilled    29.97143
#> 5    Qc3      Quebec    chilled    32.58571
#> 6    Qc2      Quebec    chilled    32.70000
#> 7    Mn3 Mississippi nonchilled    24.11429
#> 8    Mn2 Mississippi nonchilled    27.34286
#> 9    Mn1 Mississippi nonchilled    26.40000
#> 10   Mc2 Mississippi    chilled    12.14286
#> 11   Mc3 Mississippi    chilled    17.30000
#> 12   Mc1 Mississippi    chilled    18.00000

# Multiple functions using a custom summary function
myfun <- function(x, ...)
  c(m = mean(x, na.rm = TRUE), v = var(x, na.rm = TRUE), n = length(x))
summaryBy(uptake ~ Type + Treatment, data = CO2, FUN = myfun)
#>          Type  Treatment uptake.m uptake.v uptake.n
#> 1      Quebec nonchilled 35.33333 92.09033       21
#> 2      Quebec    chilled 31.75238 93.02262       21
#> 3 Mississippi nonchilled 25.95238 54.79162       21
#> 4 Mississippi    chilled 15.81429 16.47529       21

# Summary on transformed variables
# works:
summaryBy(cbind(lu=log(uptake), conc) ~ Type, data = CO2, FUN = mean)
#>          Type  lu.mean conc.mean
#> 1      Quebec 3.453928       435
#> 2 Mississippi 2.964352       435
# fails:
#summaryBy(cbind(log(uptake), conc) ~ Type, data = CO2, FUN = mean)