Model comparison using parametric bootstrap methods.

Model comparison of nested models using parametric bootstrap methods. Implemented for some commonly applied model types.

PBmodcomp(
  largeModel,
  smallModel,
  nsim = 1000,
  ref = NULL,
  seed = NULL,
  cl = NULL,
  details = 0
)

# S3 method for merMod
PBmodcomp(
  largeModel,
  smallModel,
  nsim = 1000,
  ref = NULL,
  seed = NULL,
  cl = NULL,
  details = 0
)

# S3 method for lm
PBmodcomp(
  largeModel,
  smallModel,
  nsim = 1000,
  ref = NULL,
  seed = NULL,
  cl = NULL,
  details = 0
)

seqPBmodcomp(largeModel, smallModel, h = 20, nsim = 1000, cl = 1)

Arguments

largeModel: A model object. Can be a linear mixed effects model or generalized linear mixed effects model (as fitted with lmer() and glmer() function in the lme4 package) or a linear normal model or a generalized linear model. The largeModel must be larger than smallModel (see below).
smallModel: A model of the same type as largeModel or a restriction matrix.
nsim: The number of simulations to form the reference distribution.
ref: Vector containing samples from the reference distribution. If NULL, this vector will be generated using PBrefdist().
seed: A seed that will be passed to the simulation of new datasets.
cl: A vector identifying a cluster; used for calculating the reference distribution using several cores. See examples below.
details: The amount of output produced. Mainly relevant for debugging purposes.
h: For sequential computing for bootstrap p-values: The number of extreme cases needed to generate before the sampling process stops.

Details

The model object must be fitted with maximum likelihood (i.e. with REML=FALSE). If the object is fitted with restricted maximum likelihood (i.e. with REML=TRUE) then the model is refitted with REML=FALSE before the p-values are calculated. Put differently, the user needs not worry about this issue.

Under the fitted hypothesis (i.e. under the fitted small model) nsim samples of the likelihood ratio test statistic (LRT) are generated.

Then p-values are calculated as follows:

LRT: Assuming that LRT has a chi-square distribution.

PBtest: The fraction of simulated LRT-values that are larger or equal to the observed LRT value.

Bartlett: A Bartlett correction is of LRT is calculated from the mean of the simulated LRT-values

Gamma: The reference distribution of LRT is assumed to be a gamma distribution with mean and variance determined as the sample mean and sample variance of the simulated LRT-values.

F: The LRT divided by the number of degrees of freedom is assumed to be F-distributed, where the denominator degrees of freedom are determined by matching the first moment of the reference distribution.

Note

It can happen that some values of the LRT statistic in the reference distribution are negative. When this happens one will see that the number of used samples (those where the LRT is positive) are reported (this number is smaller than the requested number of samples).

In theory one can not have a negative value of the LRT statistic but in practice on can: We speculate that the reason is as follows: We simulate data under the small model and fit both the small and the large model to the simulated data. Therefore the large model represents - by definition - an over fit; the model has superfluous parameters in it. Therefore the fit of the two models will for some simulated datasets be very similar resulting in similar values of the log-likelihood. There is no guarantee that the the log-likelihood for the large model in practice always will be larger than for the small (convergence problems and other numerical issues can play a role here).

To look further into the problem, one can use the PBrefdist() function for simulating the reference distribution (this reference distribution can be provided as input to PBmodcomp()). Inspection sometimes reveals that while many values are negative, they are numerically very small. In this case one may try to replace the negative values by a small positive value and then invoke PBmodcomp() to get some idea about how strong influence there is on the resulting p-values. (The p-values get smaller this way compared to the case when only the originally positive values are used).

References

Ulrich Halekoh, Søren Højsgaard (2014)., A Kenward-Roger Approximation and Parametric Bootstrap Methods for Tests in Linear Mixed Models - The R Package pbkrtest., Journal of Statistical Software, 58(10), 1-30., https://www.jstatsoft.org/v59/i09/

Author

Søren Højsgaard sorenh@math.aau.dk

Examples


data(beets, package="pbkrtest")
head(beets)
#>   harvest  block  sow yield sugpct
#> 1   harv1 block1 sow3 128.0   17.1
#> 2   harv1 block1 sow4 118.0   16.9
#> 3   harv1 block1 sow5  95.0   16.6
#> 4   harv1 block1 sow2 131.0   17.0
#> 5   harv1 block1 sow1 136.5   17.0
#> 6   harv2 block2 sow3 136.5   17.0

NSIM <- 50 ## Simulations in parametric bootstrap

## Linear mixed effects model:
sug   <- lmer(sugpct ~ block + sow + harvest + (1|block:harvest),
              data=beets, REML=FALSE)
sug.h <- update(sug, .~. -harvest)
sug.s <- update(sug, .~. -sow)
#> boundary (singular) fit: see help('isSingular')

anova(sug, sug.h)
#> Data: beets
#> Models:
#> sug.h: sugpct ~ block + sow + (1 | block:harvest)
#> sug: sugpct ~ block + sow + harvest + (1 | block:harvest)
#>       npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)    
#> sug.h    9 -69.084 -56.473 43.542  -87.084                         
#> sug     10 -79.998 -65.986 49.999  -99.998 12.914  1  0.0003261 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
PBmodcomp(sug, sug.h, nsim=NSIM, cl=1)
#> Bootstrap test; time: 1.82 sec; samples: 50; extremes: 2;
#> large : sugpct ~ block + sow + harvest + (1 | block:harvest)
#> sugpct ~ block + sow + (1 | block:harvest)
#>          stat df   p.value    
#> LRT    12.914  1 0.0003261 ***
#> PBtest 12.914    0.0588235 .  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova(sug, sug.s)
#> Data: beets
#> Models:
#> sug.s: sugpct ~ block + harvest + (1 | block:harvest)
#> sug: sugpct ~ block + sow + harvest + (1 | block:harvest)
#>       npar     AIC     BIC logLik deviance  Chisq Df Pr(>Chisq)    
#> sug.s    6  -2.795   5.612  7.398  -14.795                         
#> sug     10 -79.998 -65.986 49.999  -99.998 85.203  4  < 2.2e-16 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
PBmodcomp(sug, sug.s, nsim=NSIM, cl=1)
#> boundary (singular) fit: see help('isSingular')
#> boundary (singular) fit: see help('isSingular')
#> Bootstrap test; time: 1.71 sec; samples: 50; extremes: 0;
#> large : sugpct ~ block + sow + harvest + (1 | block:harvest)
#> sugpct ~ block + harvest + (1 | block:harvest)
#>          stat df p.value    
#> LRT    85.203  4 < 2e-16 ***
#> PBtest 85.203    0.01961 *  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## Linear normal model:
sug <- lm(sugpct ~ block + sow + harvest, data=beets)
sug.h <- update(sug, .~. -harvest)
sug.s <- update(sug, .~. -sow)

anova(sug, sug.h)
#> Analysis of Variance Table
#> 
#> Model 1: sugpct ~ block + sow + harvest
#> Model 2: sugpct ~ block + sow
#>   Res.Df      RSS Df Sum of Sq      F    Pr(>F)    
#> 1     22 0.062667                                  
#> 2     23 0.159000 -1 -0.096333 33.819 7.508e-06 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
PBmodcomp(sug, sug.h, nsim=NSIM, cl=1)
#> Bootstrap test; time: 0.19 sec; samples: 50; extremes: 0;
#> large : sugpct ~ block + sow + harvest
#> sugpct ~ block + sow
#>          stat df   p.value    
#> LRT    27.932  1 1.256e-07 ***
#> PBtest 27.932      0.01961 *  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

anova(sug, sug.s)
#> Analysis of Variance Table
#> 
#> Model 1: sugpct ~ block + sow + harvest
#> Model 2: sugpct ~ block + harvest
#>   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
#> 1     22 0.06267                                  
#> 2     26 1.07267 -4     -1.01 88.644 3.073e-13 ***
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
PBmodcomp(sug, sug.s, nsim=NSIM, cl=1)
#> Bootstrap test; time: 0.15 sec; samples: 50; extremes: 0;
#> large : sugpct ~ block + sow + harvest
#> sugpct ~ block + harvest
#>          stat df p.value    
#> LRT    85.202  4 < 2e-16 ***
#> PBtest 85.202    0.01961 *  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

## Generalized linear model
counts    <- c(18, 17, 15, 20, 10, 20, 25, 13, 12)
outcome   <- gl(3, 1, 9)
treatment <- gl(3, 3)
d.AD      <- data.frame(treatment, outcome, counts)
head(d.AD)
#>   treatment outcome counts
#> 1         1       1     18
#> 2         1       2     17
#> 3         1       3     15
#> 4         2       1     20
#> 5         2       2     10
#> 6         2       3     20
glm.D93   <- glm(counts ~ outcome + treatment, family = poisson())
glm.D93.o <- update(glm.D93, .~. -outcome)
glm.D93.t <- update(glm.D93, .~. -treatment)

anova(glm.D93, glm.D93.o, test="Chisq")
#> Analysis of Deviance Table
#> 
#> Model 1: counts ~ outcome + treatment
#> Model 2: counts ~ treatment
#>   Resid. Df Resid. Dev Df Deviance Pr(>Chi)  
#> 1         4     5.1291                       
#> 2         6    10.5814 -2  -5.4523  0.06547 .
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
PBmodcomp(glm.D93, glm.D93.o, nsim=NSIM, cl=1)
#> Error in eval(predvars, data, env): object 'outcome' not found

anova(glm.D93, glm.D93.t, test="Chisq")
#> Analysis of Deviance Table
#> 
#> Model 1: counts ~ outcome + treatment
#> Model 2: counts ~ outcome
#>   Resid. Df Resid. Dev Df    Deviance Pr(>Chi)
#> 1         4     5.1291                        
#> 2         6     5.1291 -2 -2.6645e-15        1
PBmodcomp(glm.D93, glm.D93.t, nsim=NSIM, cl=1)
#> Error in eval(predvars, data, env): object 'outcome' not found

## Generalized linear mixed model (it takes a while to fit these)

if (FALSE) {
(gm1 <- glmer(cbind(incidence, size - incidence) ~ period + (1 | herd),
              data = cbpp, family = binomial))
(gm2 <- update(gm1, .~.-period))
anova(gm1, gm2)
PBmodcomp(gm1, gm2)
}


if (FALSE) {
(fmLarge <- lmer(Reaction ~ Days + (Days|Subject), sleepstudy))
## removing Days
(fmSmall <- lmer(Reaction ~ 1 + (Days|Subject), sleepstudy))
anova(fmLarge, fmSmall)
PBmodcomp(fmLarge, fmSmall, cl=1)

## The same test using a restriction matrix
L <- cbind(0,1)
PBmodcomp(fmLarge, L, cl=1)

## Vanilla
PBmodcomp(beet0, beet_no.harv, nsim=NSIM, cl=1)

## Simulate reference distribution separately:
refdist <- PBrefdist(beet0, beet_no.harv, nsim=1000)
PBmodcomp(beet0, beet_no.harv, ref=refdist, cl=1)

## Do computations with multiple processors:
## Number of cores:
(nc <- detectCores())
## Create clusters
cl <- makeCluster(rep("localhost", nc))

## Then do:
PBmodcomp(beet0, beet_no.harv, cl=cl)

## Or in two steps:
refdist <- PBrefdist(beet0, beet_no.harv, nsim=NSIM, cl=cl)
PBmodcomp(beet0, beet_no.harv, ref=refdist)

## It is recommended to stop the clusters before quitting R:
stopCluster(cl)
}

## Linear and generalized linear models:

m11 <- lm(dist ~ speed + I(speed^2), data=cars)
m10 <- update(m11, ~.-I(speed^2))
anova(m11, m10)
#> Analysis of Variance Table
#> 
#> Model 1: dist ~ speed + I(speed^2)
#> Model 2: dist ~ speed
#>   Res.Df   RSS Df Sum of Sq     F Pr(>F)
#> 1     47 10825                          
#> 2     48 11354 -1   -528.81 2.296 0.1364

PBmodcomp(m11, m10, cl=1, nsim=NSIM)
#> Bootstrap test; time: 0.12 sec; samples: 50; extremes: 5;
#> large : dist ~ speed + I(speed^2)
#> dist ~ speed
#>          stat df p.value
#> LRT    2.3848  1  0.1225
#> PBtest 2.3848     0.1176
PBmodcomp(m11, ~.-I(speed^2), cl=1, nsim=NSIM)
#> Bootstrap test; time: 0.10 sec; samples: 50; extremes: 7;
#> large : dist ~ speed + I(speed^2)
#> dist ~ speed
#>          stat df p.value
#> LRT    2.3848  1  0.1225
#> PBtest 2.3848     0.1569
PBmodcomp(m11, c(0, 0, 1), cl=1, nsim=NSIM)
#> Bootstrap test; time: 0.11 sec; samples: 50; extremes: 8;
#> large : dist ~ speed + I(speed^2)
#> small : 
#>      [,1] [,2] [,3]
#> [1,]    0    0    1
#>          stat df p.value
#> LRT    2.3848  1  0.1225
#> PBtest 2.3848     0.1765

m21 <- glm(dist ~ speed + I(speed^2), family=Gamma("identity"), data=cars)
m20 <- update(m21, ~.-I(speed^2))
anova(m21, m20, test="Chisq")
#> Analysis of Deviance Table
#> 
#> Model 1: dist ~ speed + I(speed^2)
#> Model 2: dist ~ speed
#>   Resid. Df Resid. Dev Df Deviance Pr(>Chi)
#> 1        47     7.6741                     
#> 2        48     8.0945 -1 -0.42037    0.106

PBmodcomp(m21, m20, cl=1, nsim=NSIM)
#> Bootstrap test; time: 0.22 sec; samples: 50; extremes: 5;
#> large : dist ~ speed + I(speed^2)
#> dist ~ speed
#>          stat df p.value  
#> LRT    2.7364  1 0.09809 .
#> PBtest 2.7364    0.11765  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
PBmodcomp(m21, ~.-I(speed^2), cl=1, nsim=NSIM)
#> Bootstrap test; time: 0.20 sec; samples: 50; extremes: 7;
#> large : dist ~ speed + I(speed^2)
#> dist ~ speed
#>          stat df p.value  
#> LRT    2.7364  1 0.09809 .
#> PBtest 2.7364    0.15686  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
PBmodcomp(m21, c(0, 0, 1), cl=1, nsim=NSIM)
#> Bootstrap test; time: 0.23 sec; samples: 50; extremes: 10;
#> large : dist ~ speed + I(speed^2)
#> small : 
#>      [,1] [,2] [,3]
#> [1,]    0    0    1
#>          stat df p.value  
#> LRT    2.7364  1 0.09809 .
#> PBtest 2.7364    0.21569  
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1