Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Boxplot

A boxplot (, ) (also known as a box-and-whisker plot) provides a graphical overview of how data is distributed over the number line. R's boxplot function displays a graphical representation of the textual summary of data. The skewness of the distribution of the data becomes clear.

A boxplot shows the median (the second quartile or the 50th percentile) as the thicker line within the box ($Ash=2.36$). The top and bottom extents of the box ($2.558$ and $2.210$ respectively) identify the upper quartile (the third quartile or the 75th percentile) and the lower quartile (the first quartile and the 25th percentile). The extent of the box is known as the interquartile range ( $2.558-2.210=0.348$). The dashed lines extend to the maximum and minimum data points that are no more than $1.5$ times the interquartile range from the median. Outliers (points further than $1.5$ times the interquartile range from the median) are then individually plotted (at 3.23, 3.22, and 1.36). Our plot here adds faint horizontal lines to more easily read off the various values.




load("wine.Rdata")
attach(wine)
boxplot(Ash, xlab="Ash")
abline(h=seq(1.4, 3.2, 0.1), col="lightgray", lty="dotted")

http://rattle.togaware.com/code/rplot-wine-boxplot-single.R



Subsections
Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010