NIC/TIC and other generalisations of AIC

Next: Cross-validation and other predictive Up: AIC and related methods Previous: Order selection of time Contents

NIC/TIC and other generalisations of AIC

[Takeuchi (1976) [363]] First to propose TIC as a more accurate estimate than AIC. Not read, as my Japanese is a bit shaky.

Findley (1985) [131] Linear time series models, stationary Gaussian ARMA as example. Asymptotic bias of AIC (i.e. of observed likelihood) when true model is in the class considered, or is not but is close.

Hurvich and Tsai (1989) [196] Small-sample correction of AIC, extending results of [359] to nonlinear regression (still with normal errors) and Gauessian time series models. Notes on the small-sample bias of AIC, and examples in simulations.

Hurvich et al. (1990) [194] Gaussian ARMA models, containing the true model. AIC $_{c}$ is still biased, improved version proposed. Shows that the bias term is asymptotically (and even in small samples approximately) independent of $\theta_{0}$ ; estimated by simulation and tabulated. Simulations show improvement over $\text{AIC}_{c}$ , especially when $d/N>1/2$ .

Hurvich and Tsai (1991) [198] Normal linear regression and normal AR, when the true model is not included in the candidates. Bias of $AIC$ and $AIC_{c}$ . Simulations showing that the bias of $AIC_{c}$ can be much smaller than that of AIC, and model selections better, also compared to results from BIC.

Ripley (1995) [314] Model selection for a neural network: choosing degree of smoothness and number of `hidden units' rather than covariates. Cross validation, AIC and NIC; BF and model averaging briefly.

Ripley (1996) [315] Model selection in the context of pattern recognition and neural networks. Discusses AIC, NIC and BIC, as well as general issues.

Konishi and Kitagawa (1996) [226] Derives estimates of the expected log-likelihood (as in derivation of AIC) when (i) true model is not necessarily in the class considered, and (ii) parameters can be estimated by other than MLE (e.g. robust, panalised likelihood or Bayesian estimates). With (i) + MLE, we get TIC. Examples of bias in a normal mixture example. Also describes bootstrap estimation of the expected log-likelihood (giving criterion called EIC). A nice summary of the entropy maximization paradigm in the introduction of the paper.

Fujikoshi and Satoh (1997) [150] Multivariate normal linear models, where the largest model contains the true model (but some of the submodels considered do not, i.e. they are underspecified). Derives a corrected AIC which is consistent $(O(n^{-1}))$ for underspecified models and better than AIC (with $O(n^{-2})$ bias) for overspecified models. Similar exercise for $C_{p}$ , for which

Kieseppä (1997) [224] Paper from a philosophy of science journal. Shows (without putting it quite like this) that for normal linear models AIC is unbiased for expected log-likelihood even when the model does not hold.

Shi and Tsai (1998) [336] Generalises AIC in several ways: (i) M-estimators rather than MLEs, (ii) expected `K-L' distance between estimating functions rather than loh-likelihoods, (iii) small-sample corrections. Simulations to compare performance in model identification.

Hurvich et al. (1998) [195] Selection of smoothing parameter for nonparametric estimation of an unknown smooth regression function. Derives improved (less biased) versions of AIC; the simplest is essentially generalisation of $\text{AIC}_{c}$ to this case.

Next: Cross-validation and other predictive Up: AIC and related methods Previous: Order selection of time Contents

Jouni Kuha 2003-07-16