Confirmed Speakers
A small sample solution for SEM: ML estimation with bounds – Julie De Jonckere
If the sample size is (very) small, ML estimation in structural equation modeling (SEM) often fails even for relatively simple models. This means that the optimizer that is used in SEM software can not find a solution at all, or that the solution that is obtained contains nonsensical (often extreme) values for several parameters in the model. A partial solution that has been proposed (in the frequentist framework) is to break down the model into smaller pieces, and use a stepwise approach to estimate the model parameters. In this presentation, we propose a different strategy (that can be combined with the stepwise approach): using ML estimation with simple bounds on the parameters. First, we need to compute the theoretical lower and upper bounds of all the parameters in the model, conditional on the data. For example, for a residual variance of an indicator, the natural lower bound is zero, and the natural upper bound equals the observed variance of this indicator. For factor loadings, the formulas for the bounds depend on how we have set the metric of the latent variables. If we have fixed the variances of the latent variables to unity, the formulas are simple. However, if we have set the metric of the latent variables by fixing the first factor loading of a marker indicator to unity, we must specify a minimum value (say, 0.1) for the reliability of the marker indicator. Once we have determined the natural bounds for all the parameters, we can optionally widen a subset of the bounds (either downwards, upwards, or both) with a factor of about 10-30%. This will allow for (mildly) negative residual variances for indicators (Heywood cases), while keeping non-negative variances for the latent variables. In this presentation, we will report the results of a simulation study where we have compared several settings to choose these bounds. For each setting, we will report the convergence rate, and the bias and MSE of the estimated parameter values. The sample sizes vary from 10 to 100.
The simulation study is based on a simple model: two latent variables with three indicators each, and a single regression linking the two latent variables. The same model was used in a presentation given by the second author last year during the first S4 conference, and the model was also used in a study reported in chapter 17 of the (first) S4 book.
Julie De Jonckere
Ghent University