Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


First Contact

In Chapter [*] we identified that a significant amount of effort within a data mining project is spent in processing our data into a form suitable for mining. The amount of such effort should not be underestimated.

Once we have processed our data we are ready to build a model--and with Rattle we can build the model with just a few mouse clicks. Using a sample dataset that someone else has already prepared for use, in Rattle we simply:

  1. Click on the Execute button;
  2. Click on Yes within the resulting popup;
  3. Click on the Model tab;
  4. Click on the Execute button.
The resulting decision tree model is based on a sample dataset of historic daily weather observations (the curious can skip a few pages ahead to see the actual decision tree in Figure 2.5 on page [*]).

The data comes from a weather monitoring station located in Canberra, Australia. Each observation is a summary of the weather conditions on a particular day. It has been processed to include a target variable which indicates whether it rained the day following the particular observation. Using this historic data we have built a model to predict whether it will rain tomorrow. Weather data is commonly available, and you might be able to build a similar model based on data from your own region.

With only one or two more clicks, further models can be built. A further few clicks will have an evaluation chart displaying the performance of the model. Then a click or two more will have the model applied to a new dataset to generate scores for new observations.

Now to the details. We will continue with both the Rattle GUI and simple command line familiarity (which is not strictly necessary in using Rattle, but as we develop our data mining capability, it will become useful). We will load data into Rattle and explain the model that we have built. We will build a second model and compare their performances. We will then apply the model to a new dataset to provide scores for a collection of new observations (i.e., predictions of the likelihood of it raining tomorrow).

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010