|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
Histogram |
The plot also includes a line plot showing the so called
density estimate
and is a more accurate display of the actual (at least estimated true)
distribution of the data (the values of XnullXXnullXRattle!VariablesR functions (R function)Rattle!VariablesR libraries (R library)Rattle!VariablesR option (R option)Rattle!VariablesR packages (R package)Rattle!VariablesDatasets (Dataset)XnullXRattle!VariablesRattle!VariablesIncome). It allows
us to see that rather than values in the range
occurring
frequently, in fact there is a much smaller range (perhaps
) that occurs very frequently.
The third element of the plot is the so called rug along the bottom of the plot. The rug is a single dimension plot of the data along the number line. It is useful in seeing exactly where data points actually lay. For large collections of data with a relatively even spread of values the rug ends up being quite black, as is the case here, up to about $25,000. Above about $35,000 we can see that there is only a splattering of entities with such values. In fact, from the Summary option, using the Describe check box, we can see that the highest values are actually $36,1092.60, $38,0018.10, $39,1436.70, $40,4420.70, and $42,1362.70.
Copyright © 2004-2008 Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.