DATA MINING
Desktop Survival Guide
by Graham Williams

Weka:
From University of Waikato

WEKA (the Waikato Environment for Knowledge Analysis) is an open source and freely available workbench for applying machine learning techniques to practical problems, integrating many different machine learning tools within a common framework and a uniform, if basic but functional, user interface. WEKA incorporates over 60 machine learning techniques, ranging from traditional decision trees, association rules, clustering, through to modern random forests and support vector machines. A WEKA user is able to use machine learning techniques to derive useful knowledge from quite large databases. Typical users include both researchers and industrial scientists.

WEKA is a great suite of data mining algorithms that allow us to quickly explore alternatives approaches to data mining. However, perhaps because it is written in Java, it is well known that the WEKA user interface is very heavy in its use of memory. Staying with the command line in WEKA (see the excellent guide written by Alexander K. Seewald) is a good option with reports of, for example, using the rather efficient NaiveBayesNominal algorithm to process a large ham/spam dataset with 500,000 samples and 1.3 million attributes on the commandline ``in a few minutes.'' (See KDD Nuggets, 2007, n24).

Subsections

Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010

Weka: From University of Waikato

Weka:
From University of Waikato