Tuesday, September 10, 2013

Statistics for Dummies Notes (Graphing Categorical and Numericall Data)

The most common types of data displays for categorical data are pie charts and bar graphs.

Pie Charts:  Takes categorical data and breaks them down by group, showing the percentage of individuals that fall into each group.  Because a pie chart takes on the shape of a circle, the "slices" that represent each group can easily be compared and contrasted.



Bar Chart: Like a pie chart, a bar graph breaks categorical data down by group.  Unlike a pie chart, it represents these amounts by using bars of different lengths; whereas a pie chart most often reports the amount in each group as percentages, a bar graph uses either the number of individuals in the group (also called the frequency) or the percentage in each group (called the relative frequency).



The most common types of data displays for numerical data are charts, histograms, and boxplots.

Histograms: A Special graph applied to data broken down into numerically ordered groups; for example, age groups such as 10-20, 21-30, 31-40, and so on.  The bars connect to each other in a histogram - as opposed to a bar graph for categorical data, where the bars represent categories that don't have a particular order and are separated.  The height of each bar of a histogram represents either the number of individuals (called frequency) in each group or the percentage of individuals (relative frequency) in each group.  Each individual in the data set falls into exactly one bar.

A histogram provides a snapshot of all the data broken down into numerically ordered groups, making it a quick way to get the big picture of the data, in particular, its general shape.

A histogram tells you three main features of numerical data:

- How the data are distributed among the groups (shape of the data)
- The amount of variability in the data (spread)
- Where the center of the data is (statisticians use different measures)



Boxplot: A one-dimensional graph of numerical data based on the five-number summary, which includes the minimum value, the 25th percentile (Q1), the median, the 75th percentile (Q3), and the maximum value.  In essence, these five descriptive statistics divide the data set into four parts; each part contains 25% of the data.

To make a boxplot:

1.  Find the five-number summary of your data set.
2.  Create a vertical (or horizontal) number line whose scale includes the numbers in the five-number summary and uses appropriate units of equal distance from each other.
3.  Mark the location of each number in the five-number summary just above the number line (for a horizontal boxplot) or just to the right of the number line (for a vertical boxplot).
4.  Draw a box around the marks for the 25th and 75th percentile.
5.  Draw a line in the box where the median is located.
6.  Determine whether or not outliers are present.
To make this determination, calculate the IQR (by subtracting Q3-Q1);then multiply by 1.5.  Add this amount to the value of Q3 and subtract this amount from Q1.  This gives you a wider boundary around the median than the box does. Any data points that fall outside this boundary are determined to be outliers.

7.  If there are no outliers, draw lines from the upper and lower edges of the box out to the minimum and maximum values in the data set.
8.  If there are outliers, indicate their location on the boxplot with *signs.  Instead of drawing a line from the edge of the box all the way to the most extreme outlier, stop the line at the last data value that isn't an outlier.




No comments:

Post a Comment