Atkinson (1978) [22] Posterior probabilities for normal linear models. Discusses difficulties with uninformative priors, priors (notionally or actually) derived from training samples, behaviour of posterior probabilities for model of differential dimensions (nested or not; when one, both or neither is true).
Spiegelhalter and Smith (1982) [349] BF for comparing two nested models. When prior is improper, BF involves an arbitrary, unspecified constant. Value for this assinged by imagining a training sample which (i) is the smallest possible to allow a comparison between the models, (ii) among such samples, provides maximum support for the smaller model. Examples for designed experiments and other linear models, and for (saturated vs. non-saturated) log-linear models (appealing to asymptotic posteriors).
Raftery (1986b) [302] Two comments to Spiegelhalter and Smith (1982) [349]. (1) Approximate BF proposed by SS for log-linear models is indeterminate if any cell counts are zero; change of prior remedies this. (2) BF of SS is asymptotically equivalent to BIC.
Fornell and Rust (1989) [139]
Comparing covariance structure models with possibly unequal prior
probabilities. Proposes (1) splitting the data in half (D1 and D2) and
approximating
$p(D_{2}|M_{j})$
by $p(D_{2}|\hat{\theta}(D_{1}) M_{j})$
(2) with a vague appeal to cross-validation, replacing
$p(D_{2}|\hat{\theta}(D_{1}) M_{j})$
in the above by a
leave-one-out c-v pseudo-likelihood, estimated by AIC (i.e. ends up with
an approximate BF with the prior implied by AIC). Not a very good paper,
with much confusion about what is being said and assumed.
Aitkin (1991) [4]
Proposes a model selection criterion
which is the ratio of quantities
$\int p(y|\theta_{j}; M_{j}) p(\theta_{j}|y; M_{j}) d\theta_{j}$
,
i.e. the posterior mean of the
likelihood. This leads to criteria of the type
$L^{2}-df\times\text{constant}$
(constant$=\log 2$
in examples). Nice summary of
some earlier literature. In discussion, criticism from the Bayesian
point of view (examples of where leads to contradictions) and for `using
the data twice'.
McCullogh and Rossi (1992) [265] Using BFs to test models from arbitrage pricing theory. Nested hypothesis, using the Savage density ratio to simplify the BF. Conjugate priors; careful discussion and sensitivity analysis of the choice of the prior.
Gelfand and Dey (1994) [153]
Notes that BF and many of its modifications (intrinsic, partial and
fractional BF) involve predictive densities of the type
$f(y_{1}|y_{2}; M)$
, where
$y=(y_{1}, y_{2})$
is a partitioning
of the data. Laplace approximations for these under different
asymptotics for $y_{1}$
and $y_{2}$
and
corresponding asymptotics for the resulting BFs. Exact calculations
using Monte Carlo methods.
O'Hagan (1995) [286]
When prior distributions for the parameters of the models are improper,
standard BF depends on unspecified constants. Partial BF gets
around this by assigning part of the data to a training sample;
posteriors for the training sample are used as (proper) priors for the
rest of the data. Problem: how to choose the training sample, even after
its size $m$
has been chosen. O'Hagan proposes a fractional
BF, motivated by the idea that for large $m$
and $n$
(total sample size), likelihood for the training sample is approximately
$f_{\text{full}}^{b}$
where $f_{\text{full}}$
is the
full-sample likelihood and $b=m/n$
. Examples, discussion of
sensitivity of BF to choice of prior. Choice of $b$
: in all
examples $b=m_{0}/n$
(where $m_{0}$
is smallest possible
to give a proper posterior in training sample), but other possibilities
discussed. In discussion, concerns about model selection in
general, Bfs (especially with improper priors) and FBFs, even
accusations of `ad hoc non-Bayesianism'.
Berger and Pericchi (1996a) [34] The `intrinsic BF' as an automatic default method of constructing priors for BFs. Essentially averages priors over all minimal training samples. Different versions, and priors implied by the approach. Numerical computations. Comparison to other default choices for BFs.
Berger and Pericchi (1996b) [33] The intrinsic BF of [34], especially for linear models (including choice of error distribution and hierarchical models). With discussion.