DATA MINING
Desktop Survival Guide
by Graham Williams

Organisation

In Chapter 2 we introduce Rattle as a graphical user interface (GUI) developed for making any data mining project a lot simpler. This covers the installation of both R and Rattle, as well as basic interaction with Rattle.

Chapters 4 to 22 then detail the steps of the data mining process, corresponding to the straightforward interface presented through Rattle.

We introduce data and how to select variables and perform sampling in Chapter 4. Chapter 5 covers the loading of data into Rattle. Chapters 6 and 7 then reviews various approaches to exploring the data in order for us to gain some insights about the data we are looking at as well as understanding the distribution of the data and to assess the appropriateness of any modelling. Chapter 23 reviews approaches to transforming data so that it might be better modelled.

Chapters and cover modelling, including descriptive (unsupervised) and predictive (supervised) modelling. The evaluation of the performance of the models and their deployment is covered in Chapter 22.

Chapter 27 provides an introduction to migrating from Rattle to the underlying R system. It does not attempt to cover all aspects of interacting with R but is sufficient for a competent programmer or software engineer to be able to extend and further fine tune the modelling performed in Rattle. It aims to set the data miner on the road to leaving Rattle behind and gaining access to the full power of the R statistical language.

Chapter 25 covers troubleshooting within Rattle.

Part delves much deeper into the use of R for data mining. In particular, R is introduced as a programming language for data mining. Chapter 28 introduces the basic environment of R. Data and data types are covered in Chapter 30 and R's extensive capabilities in producing stunning graphics is introduced in Chapter 31. We then pull together the capabilities of R to help us understand data in Chapter 32. We then move on to preparing our data for data mining in Chapter 33, building models in Chapter 37, and evaluating our models in Chapter 38.

Part I reviews the algorithms employed in data mining. The encyclopedic type overview covers many tools and techniques deployed within data mining, ranging from decision tree induction and association rules, to multivariate adaptive regression splines and patient rule induction methods. We also cover standards for sharing data and models.

We continue the Desktop Guide with a snapshot of some current alternative open source and then commercial data mining products in Part II, Open Source Products, and Part III, Commercial Off The Shelf Products.

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010