next up previous
Next: Least squares Up: Contents Previous: Hypothesis testing

Analysis of variance

Activity 9.1   Think carefully about the last Tables 9.1 to 9.3. The sum of squares identity says that the total sum of squares from all the data in the first table is obtained by adding together the total sums of squares of each of the last two tables. Also, the entries in the first table are the sums of the corresponding entries in the last two tables. We have broken down the original grouped observations into the last table, which reflects the group structure, and the second table, which has numbers which look randomly distributed around zero. How would these tables change if groups 3 and 4 were put together as one group?
$ \blacksquare$

Answer 9.1   Tables 9.1 to 9.3 would change only in the third and fourth columns. These would form one column in the new tables, and entries in them would be different.
$ \blacksquare$

Activity 9.2   If you fix $ \nu_1$ and let $ \nu_2$ become large, the F distribution becomes close to what distribution?
$ \blacksquare$

Answer 9.2   If $ \nu_2$ is large, then $ V/\nu_2$, which can be thought of as the average of the squares of $ \nu_2$ independent standard normal variables, gets closer and closer to the expected value of the square of a standard normal variable , which is its variance, 1. So the F statistic becomes effectively $ U/\nu_1$, which is a $ \chi^2$ random variable with $ \nu_1$ degrees of freedom divided by its degrees of freedom.
$ \blacksquare$

Activity 9.3   Suppose that in a one-way ANOVA you reject the Null Hypothesis that all the population group means are equal with a test at the 5% level. Does it follow that a 95% confidence interval for the difference between at least one pair of population group means will not include 0?
$ \blacksquare$

Answer 9.3   It does not follow that some pairwise difference of means must be significantly different from zero. We are putting together information from all the groups for the overall test, and it may not be possible to isolate the significant information in any particular pair.
$ \blacksquare$

Activity 9.4   Check that using $ 100(1-0.05/ r)$% gives $ r$ simultaneous intervals with at least 95% confidence level.
$ \blacksquare$

Answer 9.4   Each of the $ r$ intervals has a probability of not covering its corresponding population mean difference of $ (5/r)$%. The probability that at least one of them does not cover its population mean difference is therefore no more than $ r\times
(5/r)$%, which is $ 5$%. So the probability that they all cover their corresponding population mean differences is no less than 95%.
$ \blacksquare$

Activity 9.5   Give the interval in the simultaneous set for the contrast with $ d_1=-0.854953$, $ d_2=0.203993$, $ d_3=0.237256$, $ d_4=0.413704$. Do you notice anything that connects your interval with the analysis of variance table?
$ \blacksquare$

Answer 9.5   $ \sum_{i=1}^k d_i \bar{X}_i=3.724$, and $ \sqrt{\sum
d_i^2/n_i}=0.429$. The 95% simultaneous interval is

$\displaystyle 3.724\pm s\sqrt{3F_{0.05,3,15}}\times 0.429.
$

which is

$\displaystyle 3.724\pm 0.833\sqrt{3\times 3.287}\times 0.429
$

which is

$\displaystyle 3.724\pm 1.229.
$

There is a connection with the analysis of variance table, because $ (3.724/0.429)^2=75.424$ apart from rounding errors. So we can see that the confidence interval for this particular contrast does not include zero because

$\displaystyle \frac{75.424/3}{s^2}\ge F_{0.05,3,15}.
$

This is precisely the condition that is used in the analysis of variance for rejecting the Null Hypothesis that the groups all have the same population mean. One can always find a contrast with this property. The F-test in the analysis of variance always, in effect, rejects the Null Hypothesis of equal group means when the the simultaneous interval for some particular contrast does not include zero. That contrast may, as in this example, be of little interest in its own right.
$ \blacksquare$

Activity 9.6   In the following table of population means, what is the difference between means in column 1 and column 2? What is the difference between means in row 2 and row 3? What is the interaction for the 2$ \times$2 table of cells in columns 1 and 2 and rows 2 and 3? Does this table show an additive structure for row and column effects?

\begin{displaymath}
\begin{array}{\vert r\vert r\vert r\vert}\hline
1.0&2.0&10.0\\ \hline
2.1&3.1&4.3\\ \hline
3.2&4.2&5.4\\ \hline
\end{array}\end{displaymath}

$ \blacksquare$

Answer 9.6   The difference between cells in columns 1 and 2 that are in the same row is 1.1. For instance in row 1, column $ 2
-$column  $ 1 = 2 -1 =1$, while in row 2, column $ 2
-$column  $ 1 = 3.1-2.1 =1$.

The difference between cells in rows 2 and 3 that are in the same column is 1. For instance in column1 1, row $ 3
-$row $ 2 = 3.2 -2.1 =1.1$, while in column 2, row $ 3
-$row $ 2 = 4.2-3.1 =1.1$.

The interaction for cells in columns 1, 2 and rows 2, 3 is

$\displaystyle 2.0-10.0-3.1+4.3=-6.8.
$

This table of means has almost an additive structure, but the large value in the top right-hand corner has destroyed it. If that value were 3.2 instead of 10 the table would have the additive structure.


$ \blacksquare$

Activity 9.7   What happens to all the sums of squares if one entry in a data table is allowed to become very large? This is the effect of a gross error in recording the data.
$ \blacksquare$

Answer 9.7   One can, of course, just try putting in a large value, and see what happens. To get some idea without calculating, one can see that if one entry in the table is very large, it will dominate all the others, so one may as well imagine that all the other entries are zero. So let's think of a table in which the top left hand cell has a large value $ a$ and all the other entries are zero.

The overall mean is $ a/rc$, the row effect for the first row is $ a/c-a/rc$, and for all other rows it is -a/rc. The sum of squares between rows is therefore

$\displaystyle c(a/c-a/rc)^2+c(r-1)(-a/rc)^2=a^2(r-1)/rc.
$

The sum of squares between columns is, similarly, $ a^2(c-1)/rc$. The total sum of squares is $ rca^2-a^2/rc$, and the sum of squares for error is, by subtraction, $ a^2(r-1)(c-1)/rc$. The F ratios for testing whether there are population row differences and column differences are therefore both equal to 1. So, the effect of a single large outlier is to remove all the evidence for row and column effects.
$ \blacksquare$


next up previous
Next: Least squares Up: selftestnew Previous: Hypothesis testing
M.Knott 2002-09-12