next up previous contents
Next: Bayesian methods Up: Model Assessment and Model Previous: Comparisons: Penalised criteria   Contents


Significance Testing, Goodness-of-fit Tests

Cox (1961) [87] Considers tests for separate (non-nested) families of hypothesis, concentrating on the `likelihood ratio' and its distribution.

Cox (1962) [88] Further results on the approach suggested in Cox (1961) [87]

Anderson (1962) [21] Considers choice of degree for polynomial regression. Proposes a sequential testing procedure, starting with testing the highest-order polynomial term.

Guttman (1967) [171] Proposes a goodness-of-fit testing procedure where (1) Bayesian predictive distribution for the data is obtained (under noninformative priors), (2) predicted distribution over a partition of the range of the response is computed, (3) observed and predcited counts are compared using a chi2 goodness-of-fit test. Also recommends choosing a model (example is polynomial regression) for which such chi2 is smallest.

Ramsey (1974) [311] Model selection between a set of (not necessarily nested) models, with the aid of significance tests (in particular, `specification error tests' for various alternatives to the basic, e.g. linear, model).

Cox (1977) [90] General review of the nature and role of significance tests, including interpretation of significance levels; choice of test statistics; types of null hypothesis (plausible / dividing; simple primary / simple secondary structure; embedded / not); ways of deriving tests; diagnostic ability; modification of analysis in the light of data; other points, including comparison of models.

Aitkin (1980) [3] Model selection for log-linear models using a simultaneous testing procedure with Bonferroni-type adjusted critical levels.

Cox (1982) [91] Discussion of significance tests [to a different audience and slightly more informal than Cox (1977) [90]]. One / two-sided; types of null hypothesis; interpretation of results; multiple testing; remarks on the Bayesian approach.

Rubin (1984) [321] Use of frequancy calculations in applied Bayesian analysis: (1) as an aid to interpreting results of Bayesian analysis; (2) assessing operating characteristics of Bayesian procedures; (3) monitoring the adequacy of models under considerations. Number (3) involves generating samples from the posterior distribution of some statistic T(D), and assessing whether T(Dobs) for observed data Dobs is unusual for the distribution (i.e. in a way analogous to motivation of P-values). (This could also be in the Bayesian section of this bibliography, but it does not deal with Bayes factors or posterior model probabilities).

Diaconis and Efron (1985) [109] Testing for independence in a two-way table. Proposes an alternative model of a completely random choice from all possible tables, and a test for this hypothesis (to be used especially in large-sample cases where independence model is strongly rejected). Intermediate (essentially random-effect) models when neither extreme holds. [With long discussion, which I have not read yet.]

Koehler (1986) [225] Goodness-of-fit tests for log-linear models for sparse tables. Derives a normal approximation for the likelihood ratio statistic for certain models under asymptotic where not all expected frequencies become large.

Heckman and Walker (1987) [184] Goodness of fit tests for duration models with unobserved heterogeneity, applied to data on fertility. Tests for both nested (LR) and non-nested (BIC and classical chi2 gof tests) are considered.

Vuong (1989) [379] Defines likelihood ratio-based tests for comparing two models, which may be nonnested and neither of which needs to be true. The null hypothesis tested is then that both models are equally close (in Kullback-Leibler sense) to the true model.

Long and Trivedi (1992) [248] Reviews the most important `specification tests' for linear models from econometrics and gives simulation results of their small-sample performance.

Kallenberg and Ledwina (1997) [205] Using BIC to select the order of the alternative hypothesis for a `smooth test' of goodness of (distributional) fit.


next up previous contents
Next: Bayesian methods Up: Model Assessment and Model Previous: Comparisons: Penalised criteria   Contents
Jouni Kuha 2003-07-16