next up previous contents
Next: Subset selection for linear Up: Cross-validation and other predictive Previous: Cross-validation   Contents


Other predictive criteria

San Martini and Spezzaferri (1984) [324] Choose model which gives the best prediction for a future observation, where `best' has maximum expected utility for a logarithmic utility function. Closed-form results for nested normal linear models. Averaging future observations over observed data gives a criterion of the type $L^{2}-c\times df$, where $c\approx \log n$.

Efron (1986) [121] Estimates of the prediction error for general GLM prediction models and general error measures. AIC and Cp as special cases. Compared to cross-validation estimate (PRESS) in the linear case -- results differ because c-v is based on random future X, Cp on fixed X.

Hemerly and Davis (1989) [185] Strong consistency of the order of a (finite-order) AR selected using the PLS criterion.

Hannan et al. (1989) [176] Strong consistency of the order of a (finite-order) AR selected using a version of the PLS criterion. Similar to Hemerly and Davis (1989) [185], but different proof, more discussion and comparison to BIC.

Wei (1992) [381] Model selection for linear regression using the predictive least squares criterion (PLS) criterion of Rissanen [sum of cumulative squared prediction errors]. PLS expressed as a penalised criterion. Strong consistency of PLS. Asymptotic equivalence of PLS and BIC (not always, but even then typically close). Simulations. Proposes an alternative criterion (Fisher information criterion or FIC) where the penalty term is essentially the log of the Fisher information for the parameters [i.e. sort of combines effective sample size and effective number of parameters].

Laud and Ibrahim (1995) [232] Three selection criteria based on the predictive distribution $f_{k}=p(z|y,M_{k})$ for new data $z$ given model $M_{k}$ and observed data $y$: (i) $E[(z-y)'(z-y)]$, i.e. the `MSE' of the predictive distribution; (ii) (function of) $p(y|y,M_{k})$, which is effectively the posterior BF of [4]; (iii) the Kullback-Leibler divergence between $f_{k}$ and $f_{k'}$. Expressions for (i) and (ii) in the form
$L^{2}-\alpha(df_{2}-df_{1})$ (c.f. BIC). Measures of uncertainty for (i) and (ii). Examples of computations for normal linear model, harder (not in closed form) for many other models. Examples: selection of covariates, link and variance function. Also proposes priors for parameters of normal linear model inferred from prior mean for the response.

Gelfand and Ghosh (1998) [155] Decision-theoretic approach to model selection. Minimizing minimum expected loss for future data given observed data over set of models. Close form for quadratic loss function. Criterion can be partitioned into a goodness-of-fit term and a penalty term.

XXX


next up previous contents
Next: Subset selection for linear Up: Cross-validation and other predictive Previous: Cross-validation   Contents
Jouni Kuha 2003-07-16