Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

R



> library(rpart)
> weather.rpart <- rpart(RainTomorrow ~ RainToday, data=weather)

You can find which terminal branch each observation in the training dataset ends up in with the Roption[]where component of the object.

> wine.rpart$where
  1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20
  3   3   3   3   6   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3

 21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
  3   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3   3

[...]

161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178
  9   8   9   9   9   9   9   9   9   9   9   9   9   9   9   4   4   9

The predict function will apply the model to data. The data must contain the same variable on which the model was built. If not an error is generated. This is a common problem when wanting to apply the model to a new dataset that does not contain all the same variables, but does contain the variables you are interested in.

> vars <- c("Type", "Dilution", "Flavanoids", "Hue", "Proline")
> predict(wine.rpart, wine[,vars])
Error in eval(expr, envir, enclos) : Object "Alcohol" not found

Fix this up with

> wine.rpart <- rpart(Type ~ Dilution + Flavanoids + Hue + Proline, data=wine)
> predict(wine.rpart, wine[,vars])
             1          2          3
1   0.96610169 0.03389831 0.00000000
2   0.96610169 0.03389831 0.00000000
[...]
70  0.03076923 0.93846154 0.03076923
71  0.00000000 0.25000000 0.75000000
[...]
177 0.00000000 0.25000000 0.75000000
178 0.00000000 0.02564103 0.97435897



Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010