Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Interacting with Rattle

The Rattle interface is based on this set of tabs through which we progress. For any tab, once we have set up the required information, we will click the Execute button to perform the actions. Take a moment to explore the interface a little. Notice the Help menu and find that the help layout mimics the tab layout.

We will work through the functionality of Rattle with the use of a simple dataset, the audit dataset, which is supplied as part of the Rattle package (it is also available for download as a CSV file from http://rattle.togaware.com/audit.csv). This is an artificial dataset consisting of 2,000 fictional clients who have been audited, perhaps for tax refund compliance. For each case an outcome is recorded (whether the taxpayer's claims had to be adjusted or not) and any amount of adjustment that resulted is also recorded.

The dataset is only 2,000 entities in order to ensure model building is relatively quick, for illustrative purposes. Typically, our data contains tens of thousands and more (often millions) of entities. The audit dataset contains 13 columns (or variables), with the first being a unique client identifier. Again, real data will often have one hundred or more variables.

We proceed through the typical steps of a data mining project, beginning with a data load and selection, then an exploration of the data, some transformations, and finally, modelling and evaluation.

We step through each tab, left to right, performing the corresponding actions. Remember that for any tab configure the options and then click the Execute button (or F5) to perform the appropriate tasks. It is important to note that the tasks are not performed until the Execute button (or F5 or the Execute menu item under Tools) is clicked.

The Status Bar at the base of the window will indicate when the action is completed. Messages from R (e.g., error messages, although many R error messages are captured by Rattle and displayed in a popup) will appear in the R console from where Rattle was started.

The R Code that is executed underneath will appear in the Log tab. This allows us to review the R commands that perform the corresponding data mining tasks. The R code snippets can be copied as text from the Log tab and pasted into the R Console from which Rattle is running, to be directly executed. This allows us to deploy Rattle for basic tasks, yet still give us the full power of R to be deployed as needed, perhaps through using more command options than exposed through the Rattle interface. This also allows us the opportunity to export the whole session as an R script file as a record of the actions taken, and possibly for running directly and automatically through R itself at a later time. Simply click on the Export button to export the log to a file that will have the .R extension.

Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.