Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Transforming Data

The Transform tab provides numerous options for transforming our datasets. Cleaning our data and creating new features from the data occupies much of our time as data miners. There is a myriad of approaches, and a programming language like R supports them all. Through the Rattle user interface we can perform some of the more common transformations. This includes normalising our data, filling in missing values, turning numeric variables into categoric variables, and vice versa, dealing with outliers, and removing variables or observations with missing values. More complex transformations are available through R.

Figure 23.1: Transform options.
Image rattle-audit-transform-normalise-income

In this chapter we introduce the various transformations supported by Rattle. Transformations are not always appropriate and so we indicate where they might be applicable as well providing warnings about the different approaches, particularly in the context of imputation, which can significantly alter the distribution of our datasets.

In tuning our dataset to suit our needs, we do often transform it in many different ways. Of course, once we have transformed our dataset, we will want to save the new version. After working on our dataset through the Transform tab we can save the data through the Export button. We will be prompted for a CSV file into which the current transformation of the dataset will be saved. In fact, this is the same save operation as available through the Export button on the Data and Select tabs.



Subsections
Copyright © 2004-2010 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010