Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Building Models

In this section we present a framework within which we cast the task of data mining--the task being model building. We refer to an algorithm for building a model as a model builder. Rattle supports a number of model builders, including decision tree induction, boosted decision trees, random forests, support vector machines, logistic regression, kmeans, and association rules. In essence, the model builders differ in how they represent the models they build (i.e., the discovered knowledge) and how they find (or search for) the best model within this representation.

We can think of the discovered knowledge, or the model, as being expressed as sentences in a language. We are familiar with the fact that we express ourselves using sentences in our own specific human languages (whether that be English, French, or Chinese, for example). As we know, there is an infinite number of sentences that we can construct in our human languages.

The situation is similar for the ``sentences'' we construct through using model builders--there is generally an infinite number possible sentences. In human language we are generally very well skilled at choosing sentences from this infinite number of possibilities to best represent what we would like to communicate. And so it is with model building. The skill is to express within the language chosen the best sentences that capture what it is we are attempting to model.

We formally present this general framework. The following sections then present models builders for various tasks in the context of this framework.

FRAMEWORK GOES HERE

Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.