Monday, September 9, 2013

Statistics for Dummies (Notes on Number Crunching Basics) - Median/Standard Deviation/Percentiles

Descriptive Statistics (Statistic for short) is a number that summarizes or describes some characteristic about a set of data.

How to find the Median:

1.  Order the numbers from smallest to largest
2.  If the data set contains an odd number of numbers, choose the one that is exactly in the middle.
3.  If the data set contains an even number of numbers, take the two numbers that appear in the middle and average them to find the median.

Standard Deviation: Measures how concentrated the data are around the mean.  The more concentrated, the smaller the standard deviation.  Often reported in parentheses: (s=2.68)

The steps for calculating Standard Deviation are:

1.  Find the average of the data set
2.  Take each number in the data set and subtract the mean from it
3.  Square each of the differences
4.  Add up all the results from 3 to get the sum of squares
5.  Divide the sum of squares by the number of numbers in the data set minus 1
6.  Take the square root

Example:

To find the standard deviation of 1, 3, 5, 7:

1.  The mean is 16/4 = 4
2.  -3, -1, 1, 3
3.  9, 1, 1, 9
4.  20
5.  20/3 = 6.67
6. 2.58

In statistics, the 68-95.99.7 rule, also knows as the three-sigma rule or empirical rule, states that nearly all values lie within 3 standard deviations of the mean in a normal distribution.

About 68.27% of the values like within 1 standard deviation of the mean.  Similarly, about 95.45% of the values lie within 2 standard deviations of the mean.

Percentiles: Measures where you stand compared to the rest of the herd.  The kth percentile is a number in the data set that splits the data into two pieces.  The lower piece contains k percent of the data, and the upper pieces contains 100-k percent.  The median is the 50th percentile.

To calculate the kth percentile (where k is any number between one and one hundred, do the following:

1.  Order all the numbers in the data set from smallest to largest
2.  Multiply k percent times the total number of numbers, n.
3a.  If your result from Step 2 is a whole number go to Step 4.  If the result from Step 2 is not a whole number, round it up to the nearest whole number and go to Step 3b.
3b.  Count the numbers in your data set from left to right (from the smallest to the largest number) until you reach the value indicated in Step 3a.  The corresponding value in your data set is the kth percentile.
4.  Count the numbers in your data set from left to right until you reach the one indicated by Step 2.  The kth percentile is the average of that corresponding value in your data set and the value that directly follows it.

For example, suppose you have 25 test scores, and in order from lowest to highest they look like this:
43, 54, 56, 61, 62, 66, 68, 69, 69, 70, 71, 72, 77, 78, 79, 85, 87, 88, 89, 93, 95, 96, 98, 99, 99

To find the 90th percentile for these ordered scores:
1.  Multiply 90% times the total number of scores = .9*25 = 22.5.  Round up to 23
2.  Counting from left to right, you go until you find the 23rd number in the data set which is 98.

* A percentile is not a percent; a percentile is a number (or the average of two numbers) in the data set that marks a certain percentage of the way through the data.  Suppose your score on the GRE was reported to be the 80th percentile.  This doesn't mean you answered 80% of the questions correctly.  It means that 80% of the students' scores were lower than yours and 20% higher.









No comments:

Post a Comment