next up previous contents
Next: Estimating R2 of a Up: R2-like measures Previous: R2-like measures   Contents


Proportional reduction in discrepancy

Goodman and Kruskal (1954) [163] Overview of measures of association for cross-classified categorical data. The tau statistic is [at least for a binary response variable] equal to R2D using the squared error criterion.

Light and Margolin (1971) [242] Define a measure of variation (related to the Gini coefficient) for a categorical response variable, and a corresponding pseudo-R2 statistic for a one-way `ANOVA' model for such a response.

Margolin and Light (1974) [258] Shows (among many other things) that the R2 of Light and Margolin (1971) [242] is equal to the tau of Goodman and Kruskal (1954) [163].

Neter and Maynes (1970) [283] Comments on the (mis)use of product-moment correlation between binary Y and continuous X.

Lave (1970) [234] Fits a probit model for the choice of mode transport. Uses an R2 (discussed in a footnote) which is computed from the standard formula but with fitted values as fitted probabilities from probit (also `corrected for degrees of freedom', presumably like in the adjusted R2).

Theil (1970) [367] Binomial and multinomial logit models, fitted using GLS. Assessing the `explanatory power' using increases in log-likelihood (motivated as average conditional entropies); relative increases motivated as R2-type measures.

Goodman (1970) [164] An early paper on log-linear modelling. Proposes, in a brief note in an example, R2L (using Pearson X2 rather than deviance) relative to some baseline model as an approximate analogue of R2 of linear models.

Goodman (1971) [165] An early paper on log-linear modelling and multinomial logistic models. Proposes R2L in more detail than in [164].

Goodman (1972) [166] Paper on log-linear models (and their causal interpretations) in a sociological journal (AJS). Describes R2L as in [165].

Morrison (1972) [277] Computes theoretical upper bounds for R2 (defiend as explained variance / total variance) when responses are binary and, predictions are predicted probabilities (these minimize expected squared loss) and true probabilities have a beta distribution.

Goldberger (1973) [161] Points out an error in Morrison (1972) [277].

McFadden (1974) [268] Discusses deviance-$R^{2}$ for multinomial logit models. Econometrics, models for qualitative choice.

Hauser (1978) [181] Establishes the interpretation of R2L in terms of Kullback-Leibler information for multinomial logistic models.

Efron (1978) [119] Using an axiomatic approach, derives a family of `measures of variation' (loss functions) for binary response models; log-likelihood/entropy and squared error are special cases. Discusses a general R2 measure (explained variation / total variation) based on these.

Schwartz (1985) [331] Notes that the `deviance explained' statistic ($R_{L}^{2}$) in log-linear models cannot be compared to $R^{2}$ in multiple regression. The latter refers to variation in the response explained, the former to how much of the association between variables is explained, ignoring within-cell variation. Consequently $rL^{2}$ is typically much higher than $R^{2}$.

Ben-Akiva and Lerman (1985) [28] Discuss R2L and propose an adjusted (for degrees of freedom) version.

Dhrymes (1986) [108] A review of models for limited dependent variables for econometricians. Uses R2L for binary logistic models. Discussion of requirements for pseudo-R2 measures.

Harrell (1986) [178] For proportional hazards models, uses R2D where D is the partial log likelihood (this information from [221]).

Cox and Wermuth (1992) [99] Points out that if a binary response is modelled using linear regression, the upper bound of standard R2 is severely restricted (depending on the distribution of X, overall maximum 0.36, more typically around 0.1) when the response probabilities (here 0.2-0.8) are such that linear modelling is at all sensible. R2 increases when modelling grouped data. R2LR mentioned as an alternative measure.

Cameron and Windmeijer (1997) [71] Presents and R2 based on the reduction in Kullback-Leibler divergence from the saturated model, which is equal to R2L. Discussion of other R2 measures and various useful equalities between them.


next up previous contents
Next: Estimating R2 of a Up: R2-like measures Previous: R2-like measures   Contents
Jouni Kuha 2003-07-16