Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Usage

Weka runs as a Java application. Thus, a user can simply obtain the appropriate Java archive file weka.jar and then start up the application with Java. On GNU/Linux and Unix this is usually:

  java -jar weka.jar

On MSWindows this is usually:

  javaw -jar weka.jar

On start-up you will see the Weka GUI Chooser (Figure 53.1). From here you can either run the system from a simple command line interface (Simple CLI), or else start up an interactive data explorer and modeller with the Explorer button.

Figure 53.1: The Weka GUI chooser.
Image weka-gui-chooser

The Weka Explorer (Figure 53.2) can be used to interactively load data, pre-process the data, and run the modelling tools. Figure 53.2 shows a dataset having been loaded, with a list of the variables found in the CSV file in the left pane, with a plot of the distribution of the output variable (yexno) shown in the right pane.

Figure 53.2: Weka explorer viewing data.
Image weka-explorer-data

To load a CSV file, for example, click on the Open file... button. This will bring up the Weka Open dialogue (Figure 53.3). Click in the button labelled Arff data files to change this to CSV data files. Then browse to the CSV file you wish to load. In our example this is wine-nominal.csv. Double click the name, and then the Open button to import the data.

Figure 53.3: Import CSV data into Weka.
Image weka-open

To start building models, go to the Classify tab (Figure 53.4). The default model builder is ZeroR, a very basic model builder indeed! Click the Choose button to select from over 60 model builders. For example, under Trees you could choose J48, which is an implementation of C4.5. You will also find support vector machines (SMO under Functions) and random forests (under Trees) and AdaBoost (under Meta). Once you have chosen you model builder the corresponding command is shown in the text box to the right of the button. Click in here to change any of the parameters, or to read some documentation about the chosen method. From the drop-down menu above the Start button, choose the output variable (in this case we have chosen Class). When you are ready to build your model, click on the Start button.

Figure 53.4: Output from running J48 (C4.5).
Image weka-classify-j48

A tree built this way will list, for each branch, the number of training instances and the number of these that are misclassified.

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010