next up previous contents
Next: Simulation methods Up: Bayesian methods Previous: Approximations to BF: choice   Contents


Approximations to BF: training samples

Atkinson (1978) [22] Posterior probabilities for normal linear models. Discusses difficulties with uninformative priors, priors (notionally or actually) derived from training samples, behaviour of posterior probabilities for model of differential dimensions (nested or not; when one, both or neither is true).

Spiegelhalter and Smith (1982) [349] BF for comparing two nested models. When prior is improper, BF involves an arbitrary, unspecified constant. Value for this assinged by imagining a training sample which (i) is the smallest possible to allow a comparison between the models, (ii) among such samples, provides maximum support for the smaller model. Examples for designed experiments and other linear models, and for (saturated vs. non-saturated) log-linear models (appealing to asymptotic posteriors).

Raftery (1986b) [302] Two comments to Spiegelhalter and Smith (1982) [349]. (1) Approximate BF proposed by SS for log-linear models is indeterminate if any cell counts are zero; change of prior remedies this. (2) BF of SS is asymptotically equivalent to BIC.

Fornell and Rust (1989) [139] Comparing covariance structure models with possibly unequal prior probabilities. Proposes (1) splitting the data in half (D1 and D2) and approximating $p(D_{2}|M_{j})$ by $p(D_{2}|\hat{\theta}(D_{1}) M_{j})$ (2) with a vague appeal to cross-validation, replacing $p(D_{2}|\hat{\theta}(D_{1}) M_{j})$ in the above by a leave-one-out c-v pseudo-likelihood, estimated by AIC (i.e. ends up with an approximate BF with the prior implied by AIC). Not a very good paper, with much confusion about what is being said and assumed.

Aitkin (1991) [4] Proposes a model selection criterion which is the ratio of quantities
$\int p(y|\theta_{j}; M_{j}) p(\theta_{j}|y; M_{j}) d\theta_{j}$, i.e. the posterior mean of the likelihood. This leads to criteria of the type $L^{2}-df\times\text{constant}$ (constant$=\log 2$ in examples). Nice summary of some earlier literature. In discussion, criticism from the Bayesian point of view (examples of where leads to contradictions) and for `using the data twice'.

McCullogh and Rossi (1992) [265] Using BFs to test models from arbitrage pricing theory. Nested hypothesis, using the Savage density ratio to simplify the BF. Conjugate priors; careful discussion and sensitivity analysis of the choice of the prior.

Gelfand and Dey (1994) [153] Notes that BF and many of its modifications (intrinsic, partial and fractional BF) involve predictive densities of the type $f(y_{1}|y_{2}; M)$, where $y=(y_{1}, y_{2})$ is a partitioning of the data. Laplace approximations for these under different asymptotics for $y_{1}$ and $y_{2}$ and corresponding asymptotics for the resulting BFs. Exact calculations using Monte Carlo methods.

O'Hagan (1995) [286] When prior distributions for the parameters of the models are improper, standard BF depends on unspecified constants. Partial BF gets around this by assigning part of the data to a training sample; posteriors for the training sample are used as (proper) priors for the rest of the data. Problem: how to choose the training sample, even after its size $m$ has been chosen. O'Hagan proposes a fractional BF, motivated by the idea that for large $m$ and $n$ (total sample size), likelihood for the training sample is approximately $f_{\text{full}}^{b}$ where $f_{\text{full}}$ is the full-sample likelihood and $b=m/n$. Examples, discussion of sensitivity of BF to choice of prior. Choice of $b$: in all examples $b=m_{0}/n$ (where $m_{0}$ is smallest possible to give a proper posterior in training sample), but other possibilities discussed. In discussion, concerns about model selection in general, Bfs (especially with improper priors) and FBFs, even accusations of `ad hoc non-Bayesianism'.

Berger and Pericchi (1996a) [34] The `intrinsic BF' as an automatic default method of constructing priors for BFs. Essentially averages priors over all minimal training samples. Different versions, and priors implied by the approach. Numerical computations. Comparison to other default choices for BFs.

Berger and Pericchi (1996b) [33] The intrinsic BF of [34], especially for linear models (including choice of error distribution and hierarchical models). With discussion.


next up previous contents
Next: Simulation methods Up: Bayesian methods Previous: Approximations to BF: choice   Contents
Jouni Kuha 2003-07-16