Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Overfitting

Overfitting is more of a problem when training on smaller datasets.

A characteristic of the random forest algorithm is that it will often overfit the training data. For any model builder this, at first, may be a little disconcerting, with hte usual thought that therefore the model will not generalise to new data. However, for random forests, this overfitting is not usually a problem. Applying the model to a test dataset will usually indicate that it does generalise quite well, and that it does not suffer from the usual consequence of a model that has overfit the training dataset.



Copyright © 2004-2010 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010