Wednesday, September 18, 2013

SA With Excel For Dummies (Notes: Standardizing Scores STANDARDIZE)

A number in isolation doesn't really tell a story.  In order to fully understand what a number means, you have to consider the process that produced it.  In order to compare one number to another, they both have to be on the same scale.

In some cases, like currency conversion, it's easy to figure out a standard.  In others, like temperature conversion or conversion into the metric system, a formula guides you.

When it's not all laid out for you, you can use the mean and the standard deviation to standardize scores that come from different processes.  The idea is to take a set of scores and use its mean a a zero-point and its standard deviation as a unit of measure.  Then you compare the deviation of each score from the mean to the standard deviation.  You're asking, "How big is a particular deviation relative to (something like) an average of all the deviations?"

To do this, you divide the score's deviation by the standard deviation.  In effect, you transform the score inotoanother kind of score.  The transformed score is called a standard score or a Z-Score.

Characteristics of Z-Scores

A z-score can be positive, negative or zero.  A negative z-score represents a score that's less than the mean and a positive z-score represents a score that's greater than the mean.  When the score is equal to the mean, the z-score is zero.

When you calculate the z-score for every score in the set, the mean of the z-scores is 0, and the standard deviation is 1.

After you do this for several sets of scores, you can legitimately compare a score from one set to a score from another. If the two sets have different means and different standard deviations, comparing without standardizing is like comparing apples with oranges.

Here's an example of using a z-score:

Who is the greatest home run hitter of all-time, Barry Bonds or Babe Ruth?  One way to do this is look at each player's best season and compare the two.  Bonds hit 73 home runs in 2001, and Ruth hit 60 in 1927.  On the surface Bonds appears to be the more productive hitter.

The year 1927 was very different from 2001, however.  Baseball went through huge changes in the intervening years, and player statistics reflect those changes.  A home run was harder to hit in the 1920s than in the 2000s.

Standard scores can help decide which season was better.  To standardize I took the top 50 home run hitters of 1927 and the top 50 from 2001.  I calculated the mean and standard deviation of each group, and then turned Ruth's 60 and Bonds 73 into z-scores.

The average is 1927 is 12.68 homers with a standard deviation of 10.49.  The average from 2001 is 37.02 homers with a standard deviation of 9.64.  Although the means differ greatly, the standard deviations are pretty close.

And the z-scores?

Ruth's is (60-12.68)/10.49 = 4.51

Bond's is (73-37.02)/9.64 - 3.73

Just to show how times have changed, Gherig hit 47 home runs in 1927 finishing 2nd for a z-score of 3.27.  In 2001, 47 home runs amounted ot a z-score of 1.04.

Another example:

One practical application of z-scores is the assignment of grades to exam scores.  Based on percentage scoring, instructors traditionally evaluate a score of 90 points or higher as an A, 80-89 as a B and so on.  Then they average scores from several exams together to assign a course grade.

Is that fair?  Just as a peso from Argentina is worth more than a peso from Chile, and a home run was harder to hit in 1927 than in 2001, is a point on one exam worth the same as a point on another?  Like Peso, isn't that just word magic?

Indeed it is.  A point on a difficult exam is, by definition, harder to come by than a point on an easy exam.  Because points might not mean the same thing from one exam to another, the fairest thing to do is convert scores from each exam into z-scores before averaging them.  That way, you're averaging numbers on a level playing field.  For example, on an exam where the mean is 65 and the standard deviation is 12, a score of 71 results in a z-score of .5.  On another exam, with a mean of 69, a score of 75 is equivalent to a z-score of .429.  Moral of the story: Numbers in isolation tell you very little.  You have to understand the process that produces them.

Excel's STANDARDIZE function calculates z-scores.

Pictured below is a set of exam scores along with their mean and standard devation.  I used AVERAGE and STDEVP to calculate the statistics.



































Here are the steps for calculating the Z-scores:

1.  The data are in C2:C22.  I selected D2 to hold the z-score from the score in C2.  Ultimately I'll autofill column D and line up all the z-scores next to the corresponding exam scores.

2.  From the Statistical Functions Menu, select STANDARDIZE.

3.  In the Functions Arguments dialog box, enter the appropriate values for the arguments.

First I entered the cell that holds the first exam score in to the X box (D2).  In the mean box, I entered the cell that holds the mean (C24).  It has to be in absolute reference format, so the entry is $C$24.  You can type it that way, or you select C24 and press F4.  In the Standard_Dev box, I entered the cell that holds the standard deviation (C24).  This also has to be in absolute reference format $C$24.

No comments:

Post a Comment