Approximations to BF: choice of prior

Next: Approximations to BF: training Up: Bayesian methods Previous: Bayesian testing Contents

Approximations to BF: choice of prior

Cox and Hinkley (1978) [95] Derivation of an approximation of BF; BIC is a special case. Discussion of the choice of priors for the two models. (A solution to an exercise in Cox and Hinkley (1974) [96].)

Schwarz (1978) [332] BIC obtained as model selection criterion for linear exponential family modeld with bounded priors. Motivated as method of choosing dimensionality of a model, e.g. degree of polynomial regression or order of Markov chain.

Smith and Spiegelhalter (1980) [342] BF for choice between nested linear models (main conclusions hold more generally) under different priors for parameters. A prior with constant (w.r.t. $n$ ) variance leads to BIC-type BF; prior for larger model which gives nonnegligible weight to neighbourhood of smaller model gives an AIC-type BF (with factor 3/2 instead of 2). General discussion of model sel. criteria of type $L^{2}-k \Delta_{df}$ with $k$ constant or function of $n.$ Lindley's paradox.

Pericchi (1984) [292] Suggests assigning prior model probabilities in a way (based on `expected gains in information about the parameters') that avoids Lindley's paradox. In some cases leads to penalised criteria with a constant penalty term or even to just the deviance. Normal linear model as an example. [Did not follow this very well.]

Haughton (1988) [180] Gives a[n even] more precise statement of the results of Schwarz (1978) [332] and extends them to the case of the curved exponential family. Consistency of BIC under certain conditions.

Kass and Vaidyanathan (1992) [214] Testing a sharp null hypothesis (nested models). Laplace approximation of BF and its accuracy. Sensitivity of the result to changes in the prior: (1) insensitivity to the prior of the nuisance parameters under `null orthogonality' and when true value of $\theta$ is close to the null value; (2) lower bound for BF over all normal priors centered at $\theta_{0}$ ; (3) transformation of Bf from one set priors to another. Example demonstrates these and the sensitivity of BF to the prior variance of $\theta$ .

McCullogh and Rossi (1992) [265] BFs for hypotheses which involve nolinear restrictions. Projection methods to define priors from priors for nonrestricted models. Monte Carlo integration for the computations.

Kass and Wasserman (1995) [215] Choice of reference priors for BF when comparing two nested models, i.e. testing hypothesis $H_{0}: \psi=\psi_{0}$ , with nuisance parameters $\beta$ . Assume $\psi$ and $\beta$ null orthogonal and prior for $\beta$ same for both models. Laplace approximation for BF. For prior for $\psi$ under $H_{1}$ , assume (a) elliptically symmetrical, (b) information equal to info in one observation. If (a)+(b)+prior normal, get BIC, approx. of log BF with $O(n^{-1/2})$ error. If prior Cauchy, get BIF+constant, also a version of a criterion by Jeffreys (error $O(n^{-1/2})$ ). Examples.

Raftery (1996) [309] Approximations of BF based on the Laplace approximation. One further assumption / approximation yields BIC. Applied to generalized linear models. A set of proper reference priors based on null hypothesis of no predictors; choice of parameters for prior. Mainly choice of predictors, choice of link functions, error distributions / variance functions also discussed. Model averaging. Raftery (1994) [306] is the same as technical report, with some further numerical results. Raftery (1988) [305] is an even earlier version, with a different example on social class and educational achievement.

Hsiao (1997) [193] Laplace approximation to $p(y|M)$ when the integrand has a boundary mode. Resulting approximation is a small modification of BIC when only one parameter is on the boundary. Unlike in the standard case, this approximation has always at least $O(1)$ relative error.

Pauler (1998) [289] Variable selection in normal linear models, nested hypotheses. Approximation of BF under fairly general informative prior [nothing new there]; particular choices of prior lead to BIC and other proposed criteria. Comparison of these in examples. Careful discussion of conditions and assumptions. Also considers mixed linear models (choosing fixed effects). There a key problem is the determination of ` $n$ ' in the BIC formula (the order of the determinant of the information matrix). This depends on which hypotheses are tested and which random effects (if any) are associated with each fixed effect. A dramatic example where the effective sample size is much smaller than the total sample size and leads to a very different conclusion. [Read this again]

Next: Approximations to BF: training Up: Bayesian methods Previous: Bayesian testing Contents

Jouni Kuha 2003-07-16