Variation: It's a kind of average of how much each number in a group differs from the group mean.
Several statistics are available for measuring variation. All of them work the same way: The larger the value of the statistic, the more the numbers differ from the mean and vice versa.
Suppose you measure the heights of a group of children and their heights are: 48, 48, 48, 48, and 48
Then you measure another group and find their heights are 50, 47, 52, 46, and 45
If you calculate the mean of each group, you'll find they're the same, 48 inches. Just looking at the numbers tells you the two groups of heights are different though.
One way to show the dissimilarity between the two groups is to examine the deviations in each one. Think of a "deviation" as the difference between a score and the mean of all the scores in a group.
Here's what I mean:
One way to proceed is to average the deviations. The average of the deviations is 0 in both sets of data though.
Averaging the deviations doesn't help you see a difference between the two groups because the average of deviations from the mean in any group of numbers is always zero.
The joker in the deck is the negative numbers. The trick is to use something from Algebra: A minus times a minus is a plus.
You multiply each deviation times itself and then average the results, this gives you the squared deviation.
The Variance (The average of the squared deviation) in the second group is (4+1+16+4+9) = 34/5 = 6.8.
The variance of the first group is 0.
So to summarize, to calculate variance:
1. Find all the deviations from the mean
2. Square the deviations
3. Add them all up and find the average
Excel's two worksheet functions, VAR.P and VARPA calculate population variance.
Start with VAR.P using the second set of data from above.
Using VAR.P function with data in cells 8-12, you get the result 6.8. If you include blank cells this function ignores them unlike VARPA.
VARPA takes text and logical values into consideration and includes them in its variance calculation. If a cell contains text, VARPA sees that cell as containing a zero. If a cell contains the logical value FALSE, that's also zero. If the cell contains TRUE, that's considered a logical value of 1.
It's important to note that sample variance is a little different. If your set of numbers is a sample drawn from a large population, you're probably interested in using the variance of the sample to estimate the variance of the population. The formula above for variance doesn't quite work as an estimate of the population variance. Although the sample mean works just fine as an estimate of the population mean, this doesn't hold true for variance.
The difference in calculating the sample variance is that instead of averaging the numbers (Step 3 above), you add them all up and divide by the number of numbers minus 1. So in the example above it would be (4+1+16+4+9)/4 = 34/4 = 8.5
So, if these numbers 50, 47, 52, 46, and 45 are an entire population, their variance is 6.4. If they're a sample drawn from a larger population, the best estimate of that population's variance is 8.5
No comments:
Post a Comment