Monday, September 9, 2013

Statistics for Dummies (Notes on Number Crunching Basics) - Five Number Summary/Interquartile Range

Five Number Summary:

Beyond reporting a single measure of center and/or a single measure of spread, you can create a group of statistics and put them together to get a more detailed description of a data set.

The empirical rule uses the mean and standard deviation in tandem to describe a bell-shaped data set.  In the case where your data are not bell-shaped, you use a different set of statistics (based on percentiles) to describe the big picture of data.  This method involves cutting the data into four pieces (with an equal amount of data in each piece) and reporting the resulting five cutoff points that separate these pieces.  These cutoff points are represented by a set of five statistics that describe how the data are laid out.

The five numbers in a five-number summary are:

1.  The minimum (smallest) number int the data set.
2.  The 25th percentile (also known as the first quartile, or Q1)
3.  The median (50th percentile)
4.  The 75th percentile (also known as the third quartile, or Q3)
5.  The maximum (largest) number int the data set.

For example, suppose you want to find the five-number summary of the following 25 (ordered) exam scores:
43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99, 99.

1.  The minimum is 43.
2.  Multiply .25 *25 = 6.25 and round up to 7.  The 7th number from the right is 68.
3.  Multiply .5*25 = 12.5 and round up to 13.  The 13th number from the right is 77
4.  Multiply .75*25 = 18.75 and round up to 19.  The 19th number from the right is 89
5.  The maximum is 99

To best interpret a five-number summary you can use a boxlot.

Interquartile Range:

The purpose of the five-number summary is to give descriptive statistics for center, variation, and relative standing all in one shot.  The measure of center is the median, and the first quartile, median, and third quartiles are measures of relative standing.

To obtain a measure of variation based on the five-number summary, you can find what's call the interquartile range (or IQR).  The IQR equals Q3 minus Q1 (that is the 75th percentile minus the 25th percentile) and reflects the distance taken up by the innermost 50% of the data.  If the IQR is small, you know a lot of data are close to the median.  If the IQR is large, you know the data are more spread out from the median.  The IQR from the test scores data set is 96-68 = 21, which is fairly large seeing as how test scores only go from 1 to 100.

The interquartile range is a much better measure of variation than the regular range (maximum value minus minimum value because the IQR doesn't take outliers into account.  It focuses ont he distance within the middle 50% of the data.


No comments:

Post a Comment