For the sake of concreteness here, let's recall one of the analysis of variance tables from the previous page:
Source | DF | SS | MS | F | P |
---|---|---|---|---|---|
Factor | 2 | 2510.5 | 1255.3 | 93.44 | 0.000 |
Error | 12 | 161.2 | 13.4 | ||
Total | 14 | 2671.7 |
In working to digest what is all contained in an ANOVA table, let's start with the column headings:
Now, let's consider the row headings:
With the column headings and row headings now defined, let's take a look at the individual entries inside a general one-factor ANOVA table:
Hover over the lightbulb for further explanation.
Source | DF | SS | MS | F | P |
---|---|---|---|---|---|
Factor | m-1 | SS (Between) | MSB | MSB/MSE | 0.000 |
Error | n-m | SS (Error) | MSE | ||
Total | n-1 | SS (Total) |
Yikes, that looks overwhelming! Let's work our way through it entry by entry to see if we can make it all clear. Let's start with the degrees of freedom ( DF ) column:
Now, the sums of squares ( SS ) column:
The mean squares ( MS ) column, as the name suggests, contains the "average" sum of squares for the Factor and the Error:
The F column, not surprisingly, contains the F-statistic. Because we want to compare the "average" variability between the groups to the "average" variability within the groups, we take the ratio of the Between Mean Sum of Squares to the Error Mean Sum of Squares. That is, the F-statistic is calculated as F = MSB/MSE.
When, on the next page, we delve into the theory behind the analysis of variance method, we'll see that the F-statistic follows an F-distribution with m−1 numerator degrees of freedom and n−m denominator degrees of freedom. Therefore, we'll calculate the P-value, as it appears in the column labeled P , by comparing the F-statistic to an F-distribution with m−1 numerator degrees of freedom and n−m denominator degrees of freedom.
Now, having defined the individual entries of a general ANOVA table, let's revisit and, in the process, dissect the ANOVA table for the first learning study on the previous page, in which n = 15 students were subjected to one of m = 3 methods of learning:
Hover over the lightbulb for further explanation.
Source | DF | SS | MS | F | P |
---|---|---|---|---|---|
Factor | 2 | 2510.5 | 1255.3 | 93.44 | 0.000 |
Error | 12 | 161.2 | 13.4 | ||
Total | 14 | 2671.7 |
Okay, we slowly, but surely, keep on adding bit by bit to our knowledge of an analysis of variance table. Let's now work a bit on the sums of squares.
In essence, we now know that we want to break down the TOTAL variation in the data into two components:
Let's see what kind of formulas we can come up with for quantifying these components. But first, as always, we need to define some notation. Let's represent our data, the group means, and the grand mean as follows:
Group | Data | Means | |||
---|---|---|---|---|---|
1 | \(X_\) | \(X_\) | . . . | \(X_>\) | \(\bar>_\) |
2 | \(X_\) | \(X_\) | . . . | \(X_>\) | \(\bar>_\) |
. . . | . . . | . . . | . . . | . . . | . . . |
\(m\) | \(X_\) | \(X_\) | . . . | \(X_>\) | \(\bar>_\) |
Grand Mean | \(\bar>_<..>\) |
That is, we'll let:
Okay, with the notation now defined, let's first consider the total sum of squares, which we'll denote here as SS(TO). Because we want the total sum of squares to quantify the variation in the data regardless of its source, it makes sense that SS(TO) would be the sum of the squared distances of the observations \(X_\) to the grand mean \(\bar_<..>\). That is:
With just a little bit of algebraic work, the total sum of squares can be alternatively calculated as:
Can you do the algebra?
Now, let's consider the treatment sum of squares, which we'll denote SS(T). Because we want the treatment sum of squares to quantify the variation between the treatment groups, it makes sense that SS(T) would be the sum of the squared distances of the treatment means \(\bar_\) to the grand mean \(\bar_<..>\). That is:
Again, with just a little bit of algebraic work, the treatment sum of squares can be alternatively calculated as:
Can you do the algebra?
Finally, let's consider the error sum of squares, which we'll denote SS(E). Because we want the error sum of squares to quantify the variation in the data, not otherwise explained by the treatment, it makes sense that SS(E) would be the sum of the squared distances of the observations \(X_\) to the treatment means \(\bar_\). That is:
As we'll see in just one short minute why the easiest way to calculate the error sum of squares is by subtracting the treatment sum of squares from the total sum of squares. That is:
Okay, now, do you remember that part about wanting to break down the total variation SS(TO) into a component due to the treatment SS(T) and a component due to random error SS(E)? Well, some simple algebra leads us to this:
and hence why the simple way of calculating the error of the sum of squares. At any rate, here's the simple algebra:
Well, okay, so the proof does involve a little trick of adding 0 in a special way to the total sum of squares:
Then, squaring the term in parentheses, as well as distributing the summation signs, we get:
Now, it's just a matter of recognizing each of the terms:
That is, we've shown that: