next up previous contents
Next: Model selection in Sociological Up: Other criteria Previous: Transformation of a test   Contents


Goodness of fit in structural equation models

Tucker and Lewis (1973) [371] Using an ANOVA analogy, defines a GOF index (`reliability coefficient') for factor analysis models [generalises immediately to other covariance structure models], defined as (D0/df0-DH/dfH)/(D0/df0-1).

Bentler and Bonett (1980) [31] An expository article on goodness of fit and model choice for covariance structure models for psychologists. Discussion of GOF tests (chi2), both overall and for nested comparisons. Proposes `incremental fit indices' based on the discrepancy function used for estimation (i.e. essentially R2D, deviance gives R2L). Discussion of sample sizes, practical and statistical significance, choice of the null model, de-emphasising significance tests.

Steiger and Lind (1980) [350] Propose an interval estimation approach for (a function of) the noncentrality parameter.

Saxena and Alam (1982) [329] Estimation of the noncentrality parameter of a noncebtral chi2.

Hoelter (1983) [188] Proposes a statistic which estimates the smallest N such that the lack of fit of the current model would be judged statistically significant.

Jöreskog and Sörbom (1984) [204] In the LISREL manual, propose several indices of fit for covariance structure models (see also Tanaka and Huba (1985) [366]).

Steiger et al. (1985) [352] Joint asymptotic distribution of nested chi2 goodness-of-fit tests and their differences. Models are not assumed to be true (as long as not too far from the truth), leading to a noncentral chi2 distribution for each. Correlations of the g-o-f statistics, differences are asymptotically independent. Simulation suggests that formulas work well in reasonably large samples.

Satorra and Saris (1985) [327] Estimating power of a likelihood ratio test for covariance structure analysis. Shows that under local alternatives the non-centrality parameter of the relevant chi2 distribution can be well approximated by N times minimum discrepancy from the alternative model (a special case of a more general result, which is essentially the same as that of [352]).

Tanaka and Huba (1985) [366] Present a GOF index which generalises and explains those of Jöreskog and Sörbom (1984) [204]. Essentially a generalised R2 for a GLS estimate of the covariance matrix.

Sobel and Bohrnstedt (1985) [346] Discuss the choice of null (baseline) models in Bentler and Bonetti's (1980) [31] and similar R2-type indices. Argue that the choice should depend on the current state of theory and how exploratory / confirmatory the modelling is. [An uncontroversial point rather too extensively and vigorously argued.]

Bollen (1986) [42] Proposes the index (D0/df0-DH/dfH)/(D0/df0).

Tanaka (1987) [364] Discusses sample size in covariance structure modelling, concentrating on problems with small samples. Discusses indices of fit.

Saris et al. (1987) [325] Detection and identification of specification errors in structural equation models. Shows that both that the power of the chi2 test and the `modification index' are different for misspecification in different parts of the model. Suggest a statistic which measures changes in each of the parameters when restrictions on them are lifted.

Marsh et al. (1988) [259] Consider the effect of sample size on GOF indices in confirmatory factor analysis. Review of existing indices. A large simulation with both simple and complex true models. Concludes that `type ii' incremental fit indices are least sensitive to sample size.

Satorra (1989) [326] A review of asymptotic properties of test statistics (L2, score and Wald) when (i) neither null nor alternative model needs to be true, (ii) model may be overparametrized and (iii) the discrepancy function may not be asymptotically optimal. Emphasis on covariance structure models, but applied more generally.

Mulaik et al. (1989) [280] A very wordy discussion of GOF indices (here meaning ones normalised to 0-1) for structural equation models. A discussion of the concept and philosophy of parsimony and adjustments motivated by it.

Bollen (1989) [43] Proposes an adjustment to the incremental fit index of [31] where the denominator is F0-E(FH | Model H holds) instead of F0. Argues that the mean of this is fairly insensitive to N.

McDonald (1989) [266] Proposes the index exp[-(1/2) d] where d estimates the noncentrality parameter.

Bollen (1990) [45] Effect of N on goodness-of-fit indices. Distinguishes two kinds of effects: (1) whether the (formula of) the index is a function of $ N$, (2) whether the mean of the sampling distribution of the index depends on $ N$. Fairly generally, measures normalised to 0-1 have (2) and those that are not (ones with degrees of freedom adjustments) have (1).

Bentler (1990) [29] Considers comparative fit indices (essentially R2D and generalisations of it). Argues that these have been used solely as descriptive statistics. Defines a population quantity -- similar ratio for non-centrality parameters -- and discusses the indices as estimators of it. Simulations.

McDonald and Marsh (1990) [267] Considers various model selection indices (including AIC) and what population parameters (involving the noncentrality parameter) they estimate. Dependence on sample size or lack of it. Very similar to Bentler (1990) [29].

Kaplan (1990) [206] Covariance structure modelling. What to do when a GOF test is significant, especially in large samples. Take into account nonnormality and missing data, using appropriate estimation methods. Attempt to separate lack of fit from effect of sample size on $P$-value using statistics proposed in the covariance structure modelling literature. Discussion ([249], [30], [351], [44], [365], [183]) and rejoinder [207].

Breckler (1990) [55] Discusses the use of covariance structure modelling in psychology, especially personality and social psychology. Concerned with assessment of GOF, choice of one model when others fit equally well, and lack of cross-validation. Survey and critical assessment of applications in the literature.

Cudeck and Henly (1991) [103] Discusses the large-sample ``problem'', i.e. that different models are preferred in samples with different samples. Argues that this is to some extent inevitable, when the true model is not one of the candidates. Different dicrepancies between estimates, true model etc.; estimates (cross-validation, noncentrality parameter) for these. Comments on models and modelling in general.

MacCallum and Tucker (1991) [250] Considers a representation of the common-factor model which incorporates an (additive) term for teh lack of fit of the true model. Discussion of this model error and different types of sampling error in estimates. Notes that when the c-f model does not hold, different estimation methods for it estimate different population parameters.

Kaplan (1991) [208] Discusses the relationship between methods for specification error searches (`modification index') and measures of predictive validity (AIC). Basically MI is score test statistic, asymptotically equivalent to a LR test statistic (i.e. deviance), hence predictive validity improves if MI is greater than 2. Not terribly exciting.

Browne and Du Toit (1992) [65] Describe algorithms for fitting nonstandard mean and covariance structure models. For model assessment, use the interval estimation (for the noncentrality parameter) of Steiger and Lind (1980) [350].

Bollen and Long (1992) [47] Introduction to a special issues of SMR on model assessment for structural equation models. Also the same as the introduction to Bollen and Long (1993) [48]. The issue contains articles [160], [248], [49], [66] and [32].

Gerbing and Anderson (1992) [160] A nice reviews of existing indices of fit and Monte Carlo studies of them.

Bollen and Stine (1992) [49] Bootstrap to assess significance of assessments of goodness-of-fit in structural equation models using likelihood ratio tests and fit indices. Identifies a problem with the standard application of bootstrap may not work, and suggests a modification.

Browne and Cudeck (1992) [66] Assessment of model fit, with modelling of covariance matrices (esp. factor analysis) as an example. Two types of error: between true model and best model (with respect to a discrepancy function) in a given model class (approximation error); between best model and sample estimate of it (estimation error); together comprise overall error. Approximation error ($F_{0}$) essentially noncentrality parameter of a $\chi^{2}$; point and interval estimates, tests of `close fit', i.e. $F_{0}$ small (e.g. $\le 0.05$), both assuming that the discrepancy function is correct for the class containing the true model. For fit-parsimony tradeoff, use RMSE of approximation $\sqrt{F_{0}/df}$. A second approach: cross-validation indices with and without a validation sample. Nice general remarks on model choice (e.g. interpretability vs. fit).

Bentler and Chou (1992) [32] Considers tests and statistics for characterizing the effect of `model modification', i.e. relaxing some of the restrictions of current model. Essentially based on the score statistic and one step of Newton-Raphson algorithm stating from restricted values.

Bollen and Long (1993) [48] A book-length collection of papers on model assessment for structural equation models. Of the articles, [160], [248], [49], [66] and [32] appear also in the special issue of SMR introduced by Bollen and Long (1992) [47], ??? only in the book.


next up previous contents
Next: Model selection in Sociological Up: Other criteria Previous: Transformation of a test   Contents
Jouni Kuha 2003-07-16