How to do power simulations for structural equation models in R

Computing a priori power analyses for simple statistical models can be done analytically (e.g., with G*Power or the pwr package in R). However, estimating the power for more complex models and in particular structural equation models (SEM) is not as straightforward and requires simulations. I recently came across the package paramtest (Hughes, 2017) which provides a great framework for conducting more complex power simulations.

In what follows, I provide some examples of how to simulate power for SEMs with the ultimate goal of estimating the required sample size. On a side note, SEM are comparatively complex models. Sample size justifications are hence not only based on power calculations but should also be informed by general considerations of model complexity (see e.g., Kline, 2015).

For the following analyses, we only need two packages. I will use various functions from the tidyverse (primarily from dplyr and ggplot2) and the functions gen_data() and grid_search() from the paramtest package.

library(tidyverse)
library(paramtest)

Power for a bivariate relationship with manifest variables

Let us first consider a simple bivariate relationship between two manifest variables. In other words, we investigate the power for a relationship between two variables with “no measurement error”. The approach that I am going to use throughout this post is always as follows:

Define a function that simulates data, draw samples, estimate the relevant models, and extract parameters of interest.
Use grid_search() to run the simulations.
Count the number of significant results to “get” power.
Visualize the power curve.

The first step requires defining a function with the following arguments: a) simNum = the iteration number (do not change this argument, it is required by grid_search()), b) n = the sample sizes that should be tests, c) any other variables that should vary across the simulations (could be effect size(s) of the relationship(s) of interest or factor loadings of latent variables, etc.). Within this function, we use the function gen_data() that generates samples from the predefined data properties.
These properties include a “factor matrix” and an “effect matrix”. Defining both matrixes requires a good knowledge about the model that one wants to estimate. All steps are outlined below.

# Define the function
model_bivar <- function(simNum, n, b1) {
  
  # compute factor matrix 
  factor <- matrix(c(1, 0,    # x (no measurement error)
                     0, 1),   # y (no measurement error)
                   nrow = 2, ncol = 2, 
                   byrow = TRUE, 
                   dimnames = list(c('x', 'y'),  # rows
                                   c('x', 'y'))) # cols 
  # compute effect matrix
  y_res <- sqrt(1 - b1^2) 
  effect <- matrix(c(1, b1,    # b1 = effect size of the relationship
                     0, y_res), 
                    nrow = 2, ncol = 2, 
                    byrow = TRUE, 
                    dimnames = list(c('x', 'y'),    # rows
                                    c('x', 'y')))   # columns
  
  # generate data based on both matrices
  data <- paramtest::gen_data(factor, effect, n_cases = n)
  
  # fit lavaan model
  model <- '
    y ~ b1*x
  '
  fit <- lavaan::sem(model, data = data)
  
  # extract relevant parameters
  ests <- subset(lavaan::parameterEstimates(fit), label == "b1") 
  return(ests)
}

Now, we use grid_search() to run the simulations. In this case, we investigate how many participants we need to sample in order to test a bivariate effect between manifest variables of \(\beta\) = .10 (small according to Cohen) to reach different levels of power.

We hence provide a list with the varying parameters (in this case only the sample sizes). It is important to run enough iterations. As always, results stabilize with higher iterations. Side note: We can profit from parallel computing by using parallel = 'snow' and ncpus = [number of available cores]. This increases the computation speed considerably.

# Run simulations
power_bivar <- grid_search(
  func = model_bivar, 
  params = list(n = seq(100, 1500, by = 50)), 
  b1 = .1,
  n.iter = 1000, 
  output = 'data.frame',
  parallel='snow', 
  ncpus=4
  )

## Running 29,000 tests...

# result table
results(power_bivar) %>% tbl_df

## # A tibble: 29,000 x 12
##     iter n.test lhs   op    rhs   label      est     se       z  pvalue ci.lower
##    <int>  <dbl> <chr> <chr> <chr> <chr>    <dbl>  <dbl>   <dbl>   <dbl>    <dbl>
##  1     1    100 y     ~     x     b1     0.0335  0.0862  0.388  6.98e-1  -0.135 
##  2     2    100 y     ~     x     b1     0.0391  0.0958  0.408  6.83e-1  -0.149 
##  3     3    100 y     ~     x     b1    -0.150   0.111  -1.35   1.77e-1  -0.368 
##  4     4    100 y     ~     x     b1     0.238   0.0961  2.48   1.31e-2   0.0501
##  5     5    100 y     ~     x     b1     0.164   0.0983  1.67   9.53e-2  -0.0287
##  6     6    100 y     ~     x     b1     0.120   0.109   1.10   2.71e-1  -0.0935
##  7     7    100 y     ~     x     b1     0.319   0.0968  3.30   9.82e-4   0.129 
##  8     8    100 y     ~     x     b1     0.200   0.108   1.85   6.41e-2  -0.0117
##  9     9    100 y     ~     x     b1     0.00610 0.0954  0.0639 9.49e-1  -0.181 
## 10    10    100 y     ~     x     b1     0.0585  0.0929  0.630  5.29e-1  -0.124 
## # … with 28,990 more rows, and 1 more variable: ci.upper <dbl>

The resulting data frame includes relevant parameters for the relationship between x and y. The column n.test indicates the sample size of each iteration.

To compute the power, we only need to count the number of significant results per sample size group (n = 100, 150, 200, 250,…). We can then check the “needed” sample size for various power levels. It would also suggest plotting a power curve as it provides more information than a single value alone.

# compute power
(result_bivar <- results(power_bivar) %>%
  group_by(n.test) %>%
  mutate(sig = (pvalue < .05)) %>%   
  summarise(power = mean(sig, na.rm=TRUE)))

## # A tibble: 29 x 2
##    n.test power
##     <dbl> <dbl>
##  1    100 0.178
##  2    150 0.258
##  3    200 0.325
##  4    250 0.376
##  5    300 0.424
##  6    350 0.439
##  7    400 0.485
##  8    450 0.559
##  9    500 0.606
## 10    550 0.663
## # … with 19 more rows

# Extract sample size for 80% power
result_bivar %>%
  filter(power >= 0.80)

## # A tibble: 15 x 2
##    n.test power
##     <dbl> <dbl>
##  1    800 0.813
##  2    850 0.821
##  3    900 0.867
##  4    950 0.865
##  5   1000 0.878
##  6   1050 0.905
##  7   1100 0.921
##  8   1150 0.931
##  9   1200 0.949
## 10   1250 0.944
## 11   1300 0.953
## 12   1350 0.97 
## 13   1400 0.979
## 14   1450 0.957
## 15   1500 0.973

# plot power curve
(p1 <- result_bivar %>%  
  ggplot(., aes(x = n.test, 
                y = power)) +
  geom_point() +
  geom_smooth(se = F, color = "black") +
  geom_hline(yintercept = .80, linetype = "dashed") +
  geom_vline(xintercept = 800, linetype = "dashed") +
  theme_bw() +
  labs (x = "sample size", 
        y = "power"))

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

In this example, we see that we need roughly 800 participants in order to test the small correlation between the two variables with a power of 80% (assuming an alpha-level of .05!).

For this simple example, this procedure is a bit complex as a solution can also be derived analytically (the difference in the resulting sample size is explainable by the lack of precision of our simulations. If we would use smaller intervals between the sample sizes and higher iterations, we would get closer to the analytical solution):

library(pwr)
pwr.r.test(r = c(.1), 
           sig.level = .05, 
           power = .80, 
           alternative = "two.sided")

## 
##      approximate correlation power calculation (arctangh transformation) 
## 
##               n = 781.7516
##               r = 0.1
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Power for a bivariate relationship between latent variables

The whole procedure becomes more complex if we try to simulate the power for a relationship between latent variables. Here, we need to make assumptions about the items’ reliability. In this first example, we assume that we have comparatively good scales that have item reliability around .70. Defining the factor matrix is hence more complex and the model is now a structural equation model with latent variables. Otherwise, the procedure remains the same.

# define function
model_sem <- function(simNum, 
                      n, 
                      b1, 
                      x_ind1, x_ind2, x_ind3, 
                      y_ind1, y_ind2, y_ind3) {
  
  # compute factor matrix (in this case with measurement error)
  factor <- matrix(c(x_ind1, 0,   # y1
                     x_ind2, 0,   # y2
                     x_ind3, 0,   # y3
                     0, y_ind1,   # y1
                     0, y_ind2,   # y2
                     0, y_ind3),  # y3
                   nrow=6, ncol=2, 
                   byrow=TRUE, 
                   dimnames=list(c('x1', 'x2', 'x3', 'y1', 'y2', 'y3'),  # rows (observed)
                                 c('x', 'y')))                           # columns (latent)
  
  # matrix of the effects structure (variance-covariance)
  y_res <- sqrt(1 - b1^2)
  effect <- matrix(c(1, b1,     # x
                     0, y_res), # y
                    nrow=2, ncol=2, 
                    byrow=TRUE, 
                    dimnames=list(c('x', 'y'),    # rows
                                  c('x', 'y')))   # columns
  
  # generate data based on both matrices
  data <- paramtest::gen_data(factor, effect, n_cases = n)

  # fit lavaan model
  model <- '
   # measurement model
   x =~ x1 + x2 + x3
   y =~ y1 + y2 + y3

   # structural regression model
   y ~ b1*x
  '
  fit <- lavaan::sem(model, data = data)
  ests <- lavaan::parameterEstimates(fit)
  
  # extract relevant parameters
  est <- ests[ests$label == 'b1', 'est']
  p <- ests[ests$label == 'b1', 'pvalue']
  return(c(est = est, 
           p = p, 
           sig = (p < .05)))
}

# Measurement error specified by factor loading = .7
power_sem1 <- grid_search(
  func = model_sem, 
  params = list(n = seq(500, 4000, by = 100)),
  b1 = .1,
  n.iter = 1000, 
  output = 'data.frame',
  x_ind1 = .7,    
  x_ind2 = .7, 
  x_ind3 = .7,      
  y_ind1 = .7, 
  y_ind2 = .7, 
  y_ind3 = .7, 
  parallel='snow', 
  ncpus=4
  )

## Running 36,000 tests...

Let us quickly check the resulting data frame.

results(power_sem1) %>% tbl_df

## # A tibble: 36,000 x 5
##     iter n.test     est        p   sig
##    <int>  <dbl>   <dbl>    <dbl> <dbl>
##  1     1    500 0.0856  0.136        0
##  2     2    500 0.150   0.00839      1
##  3     3    500 0.00381 0.946        0
##  4     4    500 0.0483  0.383        0
##  5     5    500 0.186   0.00151      1
##  6     6    500 0.0677  0.154        0
##  7     7    500 0.0792  0.241        0
##  8     8    500 0.119   0.0264       1
##  9     9    500 0.231   0.000533     1
## 10    10    500 0.173   0.00175      1
## # … with 35,990 more rows

This time, we only extracted the estimate, the p-value, and already transformed the p-value into a binary variable (p > .05).

But before we estimate the power, let us quickly run the simulation again, but change the reliability of the items to more realistic values (at least in the social sciences, scales often have items with factor loadings around .50). This is very easy, as the defined function has arguments for each factor loading in the model.

We then combine the results from both simulations. After a bit of data wrangling, we can now plot both power curves.

# Let's also test a more realistic example (factor loadings = .5)
power_sem2 <- grid_search(
  func = model_sem, 
  params = list(n = seq(500, 4000, by = 100)),
  b1 = .1,
  n.iter = 1000, 
  output = 'data.frame',
  x_ind1 = .5,    # changed to .5 
  x_ind2 = .5, 
  x_ind3 = .5,      
  y_ind1 = .5, 
  y_ind2 = .5, 
  y_ind3 = .5, 
  parallel='snow', 
  ncpus=4
  )

## Running 36,000 tests...

# Extract result table
(result_sem1 <- results(power_sem1) %>%
    mutate(rel = .7) %>%
    bind_rows(results(power_sem2) %>%
                mutate(rel = .5)) %>%
  group_by(rel, n.test) %>%
  summarise(power=mean(sig, na.rm=TRUE)))

## # A tibble: 72 x 3
## # Groups:   rel [2]
##      rel n.test power
##    <dbl>  <dbl> <dbl>
##  1   0.5    500 0.155
##  2   0.5    600 0.211
##  3   0.5    700 0.247
##  4   0.5    800 0.266
##  5   0.5    900 0.315
##  6   0.5   1000 0.365
##  7   0.5   1100 0.376
##  8   0.5   1200 0.379
##  9   0.5   1300 0.427
## 10   0.5   1400 0.442
## # … with 62 more rows

# plot power curve
(p2 <- result_sem1 %>%  
  ggplot(., aes(x = n.test, 
                y = power,
                color = factor(rel))) +
  geom_point() +
  geom_smooth(se = F) +
  geom_hline(yintercept = .80, linetype = "dashed") +
  geom_hline(yintercept = .95, linetype = "dashed") +
  theme_bw() +
  labs (x = "sample size", 
        y = "power",
        color = "reliability\nof items"))

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

We can see two very interesting things in this plot: First, a small correlation between two latent variables requires a LOT more participants than the same correlation between manifest variables (of course: There is no measurement error in the model). But the difference is considerable depending on the items’ reliability. Whereas factor loadings >= .70 require already more participants to reach a power of 80% (roughly 1,500 compared to 800 in the first example), reducing factor loadings to .50 (the more realistic scenario) requires about 3,000 (!!!) participants to reach a power of 80%.

Power for a simple mediation model with latent variables

At last, let us consider a simple mediation model with latent variables. Again, the procedure remains the same, only the factor and effect matrix needs to be defined correctly (I again assumed good scales with high item reliabilities: .70). Likewise, the SEM needs to be specified correctly and the parameters of interest need to be extracted. Interestingly, we can also compute the power to detect the indirect or total effect even though we may only have assumptions about the direct paths in the model!

model_med <- function(simNum, 
                      n, 
                      b1, 
                      b2,
                      b3,
                      x_ind1, x_ind2, x_ind3,
                      m_ind1, m_ind2, 
                      y_ind1, y_ind2, y_ind3) {
  
  # compute factor matrix (in this case with measurement error)
  factor <- matrix(c(x_ind1, 0, 0,   # x1
                     x_ind2, 0, 0,   # x2
                     x_ind3, 0, 0,   # x3
                     0, m_ind1, 0,   # m1
                     0, m_ind2, 0,   # m2
                     0, 0, y_ind1,   # y1
                     0, 0, y_ind2,   # y2
                     0, 0, y_ind3),  # y3
                   nrow=8, ncol=3, 
                   byrow=TRUE, 
                   dimnames=list(c('x1', 'x2', 'x3', 'm1', 'm2', 'y1', 'y2', 'y3'),  # rows (observed)
                                 c('x', 'm', 'y')))                           # columns (latent)
  
  # matrix of the effects structure (variance-covariance)
  m_res <- sqrt(1 - b1^2)
  y_res <- sqrt(1 - b2^2 - b3^2)
  effect <- matrix(c(1, b1, b2,     # x
                     0, m_res, b3,
                     0, 0, y_res), # y
                    nrow=3, ncol=3, 
                    byrow=TRUE, 
                    dimnames=list(c('x', 'm', 'y'),    # rows
                                  c('x', 'm', 'y')))   # columns
  
  # generate data based on both matrices
  data <- paramtest::gen_data(factor, effect, n_cases = n)

  # fit lavaan model
  model <- '
   # measurement model
   x =~ x1 + x2 + x3
   y =~ y1 + y2 + y3
   m =~ m1 + m2

   # structural regression model
   m ~ b1*x
   y ~ b2*x + b3*m

   ind := b1*b3
   total := b2 + ind
  '
  fit <- lavaan::sem(model, data = data)
  ests <- lavaan::parameterEstimates(fit)
  
  # extract relevant parameters
  b1_est     <- ests[ests$label == 'b1', 'est']
  b1_p       <- ests[ests$label == 'b1', 'pvalue']
  b2_est     <- ests[ests$label == 'b2', 'est']
  b2_p       <- ests[ests$label == 'b2', 'pvalue']
  b3_est     <- ests[ests$label == 'b3', 'est']
  b3_p       <- ests[ests$label == 'b3', 'pvalue']
  ind_est    <- ests[ests$label == 'ind', 'est']
  ind_p      <- ests[ests$label == 'ind', 'pvalue']
  total_est  <- ests[ests$label == 'total', 'est']
  total_p    <- ests[ests$label == 'total', 'pvalue']

  return(c(b1_est   = b1_est,    b1_p    = b1_p,
           b2_est   = b2_est,    b2_p    = b2_p,
           b3_est   = b3_est,    b3_p    = b3_p,
           ind_est  = ind_est,   ind_p   = ind_p,
           total_est= total_est, total_p = total_p))
}

power_med <- grid_search(
  func = model_med, 
  params = list(n = seq(700, 4500, by = 100)),
  b1 = .1,  # assuming a small effect for all three paths       
  b2 = .1,
  b3 = .1,
  n.iter = 1000, 
  output = 'data.frame',
  x_ind1 = .7,    
  x_ind2 = .7, 
  x_ind3 = .7, 
  m_ind1 = .7,    
  m_ind2 = .7,
  y_ind1 = .7, 
  y_ind2 = .7, 
  y_ind3 = .7, 
  parallel='snow', 
  ncpus=4
  )

## Running 39,000 tests...

# Extract result table
(result_med <- results(power_med) %>%
  group_by(n.test) %>%
    mutate(b1_sig = (b1_p < .05),
           b2_sig = (b2_p < .05),
           b3_sig = (b3_p < .05),
           ind_sig = (ind_p < .05),
           total_sig = (total_p < .05)) %>%
  summarise(b1_power = mean(b1_sig, na.rm=TRUE),
            b2_power = mean(b2_sig, na.rm=TRUE),
            b3_power = mean(b3_sig, na.rm=TRUE),
            ind_power = mean(ind_sig, na.rm=TRUE),
            total_power = mean(total_sig, na.rm=TRUE)))

## # A tibble: 39 x 6
##    n.test b1_power b2_power b3_power ind_power total_power
##     <dbl>    <dbl>    <dbl>    <dbl>     <dbl>       <dbl>
##  1    700    0.370    0.404    0.382    0.0504       0.491
##  2    800    0.432    0.493    0.438    0.0802       0.555
##  3    900    0.507    0.540    0.484    0.107        0.619
##  4   1000    0.522    0.552    0.546    0.130        0.654
##  5   1100    0.587    0.588    0.549    0.159        0.678
##  6   1200    0.621    0.654    0.604    0.214        0.743
##  7   1300    0.665    0.670    0.622    0.253        0.761
##  8   1400    0.710    0.687    0.676    0.310        0.785
##  9   1500    0.717    0.728    0.713    0.336        0.810
## 10   1600    0.745    0.754    0.728    0.372        0.845
## # … with 29 more rows

# plot power curve
(p3 <- result_med %>%
  gather(key, value, -n.test) %>%
  separate(key, c("effect", "power")) %>%
  ggplot(aes(x = n.test, 
             y = value,
             color = effect)) +
  geom_point() +
  geom_smooth(se = F) +
  geom_hline(yintercept = .80, linetype = "dashed") +
  geom_hline(yintercept = .95, linetype = "dashed") +
  theme_bw() +
  labs(x = "sample size", y = "power",
       color = "type of effect"))

## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

The result shows (as expected) that we need the same sample size (~1,800-2,000) to test the three direct paths in the model (as we assumed the same effect size for all three relationships). However, we need considerably more participants to test the indirect effect with the same power! Here, we would need roughly between 2,800 – 3,000 participants!

Conclusion

Simulating power for complex models is always going to be challenging due to the large number of parameters that one needs to make assumptions about. That said, the paramtest package provides a framework for doing so. In the examples above, I only varied the sample sizes. But I could have varied the effect sizes (or any other parameter in the simulation), too. This would allow analyzing the interaction between sample size and effect size on power to test such effects – from my point of view, this should be an important step before any study that aims at implementing structural equation modeling.

sessionInfo()

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Mojave 10.14.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] pwr_1.2-2       paramtest_0.1.0 forcats_0.4.0   stringr_1.4.0  
##  [5] dplyr_0.8.3     purrr_0.3.2     readr_1.3.1     tidyr_1.0.0    
##  [9] tibble_2.1.3    ggplot2_3.2.1   tidyverse_1.2.1 knitr_1.28     
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.5 xfun_0.12        haven_2.1.1      lattice_0.20-38 
##  [5] colorspace_1.4-1 vctrs_0.2.0      generics_0.0.2   htmltools_0.3.6 
##  [9] yaml_2.2.0       utf8_1.1.4       rlang_0.4.0      pillar_1.4.2    
## [13] glue_1.3.1       withr_2.1.2      modelr_0.1.5     readxl_1.3.1    
## [17] lifecycle_0.1.0  munsell_0.5.0    gtable_0.3.0     cellranger_1.1.0
## [21] rvest_0.3.4      codetools_0.2-16 evaluate_0.14    labeling_0.3    
## [25] parallel_3.6.1   fansi_0.4.0      broom_0.5.2      Rcpp_1.0.2      
## [29] scales_1.0.0     backports_1.1.5  jsonlite_1.6     hms_0.5.1       
## [33] digest_0.6.23    stringi_1.4.3    grid_3.6.1       cli_1.1.0       
## [37] tools_3.6.1      magrittr_1.5     lazyeval_0.2.2   crayon_1.3.4    
## [41] pkgconfig_2.0.3  zeallot_0.1.0    ellipsis_0.3.0   xml2_1.2.2      
## [45] lubridate_1.7.4  assertthat_0.2.1 rmarkdown_2.1    httr_1.4.1      
## [49] rstudioapi_0.10  R6_2.4.0         nlme_3.1-140     compiler_3.6.1

References

Hugh, J. (2017). paramtest: R package for varying parameters in simulations or other iterative processes. https://github.com/jeff-hughes/paramtest

Kline, R. (2015). Principles and Practice of Structural Equation Modeling. The Guilford Press.