Desktop Survival Guide
by Graham Williams
In Chapter we identified that a significant amount of effort within a data mining project is spent in processing our data into a form suitable for mining. The amount of such effort should not be underestimated.
Once we have processed our data we are ready to build a model--and with Rattle we can build the model with just a few mouse clicks. Using a sample dataset that someone else has already prepared for use, in Rattle we simply:
The data comes from a weather monitoring station located in Canberra, Australia. Each observation is a summary of the weather conditions on a particular day. It has been processed to include a target variable which indicates whether it rained the day following the particular observation. Using this historic data we have built a model to predict whether it will rain tomorrow. Weather data is commonly available, and you might be able to build a similar model based on data from your own region.
With only one or two more clicks, further models can be built. A further few clicks will have an evaluation chart displaying the performance of the model. Then a click or two more will have the model applied to a new dataset to generate scores for new observations.
Now to the details. We will continue with both the Rattle GUI and simple command line familiarity (which is not strictly necessary in using Rattle, but as we develop our data mining capability, it will become useful). We will load data into Rattle and explain the model that we have built. We will build a second model and compare their performances. We will then apply the model to a new dataset to provide scores for a collection of new observations (i.e., predictions of the likelihood of it raining tomorrow).
Copyright © 2004-2010 Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.