|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
Basics |
Use XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionsprintcp to view the performance of the model.
> printcp(wine.rpart)
Classification tree:
rpart(formula = Type ~ ., data = wine)
Variables actually used in tree construction:
[1] Dilution Flavanoids Hue Proline
Root node error: 107/178 = 0.60112
n= 178
CP nsplit rel error xerror xstd
1 0.495327 0 1.00000 1.00000 0.061056
2 0.317757 1 0.50467 0.47664 0.056376
3 0.056075 2 0.18692 0.28037 0.046676
4 0.028037 3 0.13084 0.23364 0.043323
5 0.010000 4 0.10280 0.21495 0.041825
|
The XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionspredict function will apply the model to data. The
data must contain the same variable on which the model was built. If
not an error is generated. This is a common problem when wanting to
apply the model to a new dataset that does not contain all the same
variables, but does contain the variables you are interested in.
> cols <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline")
> predict(wine.rpart, wine[,cols])
Error in eval(expr, envir, enclos) : Object "Alcohol" not found
|
Fix this up with
> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline,
data=wine)
> predict(wine.rpart, wine[,cols])
1 2 3
1 0.96610169 0.03389831 0.00000000
2 0.96610169 0.03389831 0.00000000
[...]
70 0.03076923 0.93846154 0.03076923
71 0.00000000 0.25000000 0.75000000
[...]
177 0.00000000 0.25000000 0.75000000
178 0.00000000 0.02564103 0.97435897
|
Display a confusion matrix.
> table(predict(wine.rpart, wine, type="class"), wine$Type)
1 2 3
1 57 2 0
2 2 66 4
3 0 3 44
|
Copyright © 2004-2008 Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.