Data Mining Survivor: Random_Forests

DATA MINING
Desktop Survival Guide
by Graham Williams

Tuning Parameters

For the Two Class paradigm of Rattle, the random forest model build builds a classification model. Each tree in the resulting ensemble model is then used to predict the class of an observation, with the proportion of trees predicting the positive class then being the probability of the observation being in the positive class.

Rattle provides access to just three parameters (Figure 13.1) for tuning the models built by the random forest model builder: the number of trees, sample size, and number of variables. As is generally the case with Rattle, the defaults are a very good starting point! The defaults are to build 500 trees, to not do any sampling of the training dataset, and to choose from the square root of the number of variables available. In Figure 13.1 we see that the number of variables has automatically been set to 3 for the audit_auto.csv dataset, which has 9 input variables.

Subsections

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010