next up previous contents
Next: Comparisons: Penalised criteria Up: Model Assessment and Model Previous: Contents   Contents


Model selection: general

Box and Hill (1967) [50] Discriminating between several (theory-based) models for understanding a system. Sequential design for selecting next experimental setting to achieve best discriminatory power (maximum expected change in entropy). Bayesian model probabilities under normal linear model.

Cox and Snell (1974) [97] Comments on choice of variables in observational studies, with emphasis on multiple linear regression. Design (what to measure) and analysis stages. Prediction vs. interpretation. For models for interpretation, suggest initial exhaustive search of models to find subset of models consistent with the data.

Box (1976) [51] Notes on aspects of statistics and the scientific method, using R.A. Fisher as an example [paper is a Fisher lecture]. `Motivated iteration' between theory and practice, induction and deduction. On model selection: since all models are wrong, need to `worry selectively about model inadequacies'.

Guttman (1977) [172] Comments on various open problems and common misunderstandings in statistics at the time of writing. Includes notes on, for example, choice of null and alternative hypothesis, simultaneous inference and stepwise regression.

Box (1980) [52] Discussion of the iteration between model criticism and estimation in statistics (and their relation to the iterative process of science). Suggests Bayesian approach to estimation and a sampling-theory one for criticism (model adequacy, diagnostics); proposes the sampling distribution of the Bayesian predictive distribution p(D|M) as a basis for diagnostic tests. Comments on robustness, favouring robust parametric (Bayesian) models over robust procedures. Discussion by an eminent cast of discussants.

Cox and Snell (1981) [98] Discussion [in Part 1] of the general principles involved in applying statistical methods, including discussion of model formulation, significance tests etc.

Hodges (1987) [187] Long discussion of different types of uncertainty associated with statistical work (essentially model uncertainty, estimation/prediction uncertainty and numerical/approximation errors), especially in the context of policy analysis, how to take them into account. Bayesian approach recommended.

Turney (1990) [372] Defines `stability' which for linear models is E(ePje)=s2 dim(bj). Argues that minimizing instability is a good motivation for preferring simplicity in models. A philosophy of science paper, less interesting from statistical point of view.

Miller (1990) [272] Book on subset selection for linear models. See Section 5.2 for further discussion.

Cox (1990) [93] Role of probability models in statistical analysis. Distinction between substantive, empirical and indirect models (with subdivisions). Remarks on model formulation.

Lehmann (1990) [236] Issues in model formulation (where do models come from?). Views of Fisher and Neyman. A `reservoir' of standard models as building blocks. Theories of formal model selection from narrowly defined class of choices. Empirical and explanatory models, and various differences between them. Very nicely written.

Freedman (1991) [149] Criticizes the use of regression models in observational studies in social sciences. Argues that the use of models is not a subtitute for thinking, good design etc., and, by extension [it seems] that the use of regression models for non-experimental data is almost inevitably flawed. Comment articles by Berk (1991) [37] (repeatability, cross-validation), Blalock (1991) [40] (nothing new in Freedman) and Mason (1991) [260] (broadly agrees, many [and most constructive] comments and questions to the sociological and statistical communities), and a rejoinder by Freedman [148].

Cox (1995) [94] General comments on model selection. Model formulation and how to incorporate subject-matter information into the statistical formulation. Model choice: model adequacy and model comparison. Role of significance tests and Bayesian model choice. Derivation and criticism of BIC. In discussion, further comments on Bayesian model selection (see especially comments by Bernardo and Pericchi and reply by Cox).

Chatfield (1995) [76] A book on statistical analysis in a wide sense (`statistical problem solving'). A section on model formulation. Particular emphasis on using bacground (subject-matter) information and combining information on different studies.

Freedman (1995) [147] Discussion of (1) different approaches to probability and statistics and (2) the validity of statistical models and their connection to reality. A nice discussion by Berger, Lehmann, Holland, Clogg and Henry, and a rejoinder by Freedman.

Cox and Wermuth (1996) [100] In a book, important discussion of model formulation, model comparison, significance testing etc.



Subsections
next up previous contents
Next: Comparisons: Penalised criteria Up: Model Assessment and Model Previous: Contents   Contents
Jouni Kuha 2003-07-16