DATA MINING
Desktop Survival Guide
by
Graham Williams
Desktop Survival
Project Home
List of Figures
List of Tables
Data Mining with Rattle
Introduction
Data Mining with Rattle
Data Sources
Selecting Data
Exploring Data
Transforming Data
Descriptive Models
Predictive Models
Evaluation and Deployment
Issues
Moving into R
Troubleshooting
R for the Data Miner
R
Data
Graphics in R
Understanding Data
Preparing Data
Descriptive and Predictive Analytics
Issues
Evaluating Models
Reporting
Cluster Analysis
Text Mining
Text Mining
Algorithms
Bagging
Bayes Classifier
Cluster Analysis
Conditional Trees
Hierarchical Clustering
K-Nearest Neighbours
Linear Models
Neural Networks
Support Vector Machines
Open Products
AlphaMiner
Borgelt Data Mining Suite
KNime
R
Rattle
Weka
Closed Products
C4.5
Clementine
Equbits Foresight
GhostMiner
InductionEngine
ODM
Enterprise Miner
Statistica Data Miner
TreeNet
Virtual Predict
Appendicies
Glossary
Bibliography
Index
List of Figures
The
Rattle
window.
Initial steps of the data mining process (Tony Nolan)
The data mining process
The
Rattle
window showing paradigms
Selecting the
Unsupervised
paradigm
A sample of plots
Rattle
title bar showing the file name
The
Rattle
window.
The CSV file chooser
After identifying a file to load
Data tab dataset summary.
Loading an ARFF file
Loading data through an ODBC database connection
Teradata ODBC connection
Netezza ODBC connection
Netezza configuration
Loading an R binary data file.
Loading an already defined R data frame
Selected region of a spreadsheet copied to the clipboard
Loading an
R
data frame originally from the clipboard
Data entry spreadsheet
Select tab choosing Adjusted as a Risk variable.
Missing value summary for a version of the
audit
modified to include missing values.
Benford stratified by Marital and Gender.
Mosaic plot of Age by Adjusted.
Correlations between keywords in documents.
Transform options.
Selection of normalisations performed on Income.
Normalisations of Age.
Normalisations of Age.
Selection of imputations.
Imputation using the mode for missing values of Age.
Binning Age.
Distributions of binned Age.
Turning Gender into an Indicator Variable.
Selection of cleanup operations.
KMeans Iteration Interface
KMeans Iteration Plot
Random forest tuning parameters.
Random forests only supports factors with up to 32 levels.
Random forest model of audit data.
Random forest model measure of variable importance.
Random forest risk charts: test and train datasets.
Warning when evaluating a model on the training dataset.
Random forest ROC chart.
Informational dialog.
Evaluate
tab with
Score
option and a CSV file.
Scores have been saved.
Load and analyse score data using the Gnumeric spreadsheet.
Distribution of scores displayed using
Rattle
.
R command line under GNU/Linux
R command line under MS/Windows
R GUI using ESS for Emacs
R Commander GUI
An ordered monthly box plot.
A approximate model of random data.
Reduced example of an alternating decision tree.
Audit risk chart from an alternating decision tree.
Togaware's
Rattle
Gnome Data Mining interface.
The Weka GUI chooser.
Weka explorer viewing data.
Import CSV data into Weka.
Output from running J48 (C4.5).
Fujitsu GhostMiner interface.
Sample ODMiner interface to ODM.
SAS Enterprise Miner interface (Version 4).
Statistica
Data Miner graphical interface.
Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the
purchase of the PDF
version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by
Togaware
.