Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Project 1

The task for this project is to explore a dataset, then use one or more of the predictive modelling algorithms we have studied, together with one or more of the techniques for performance improvement. The aim is to improve on the performance of the task, represented in the dataset.

Two datasets are available. One is for building the model and includes values for the target variable for each observation. It contains approximately 5000 observations. You are encouraged to partition this dataset into training and validation datasets.

The second dataset is a testing dataset which does not include values for the target variable. It contains approximately 25,000 observations, allowing a reasonable estimate of performance across multiple models submitted by students. (Instructors are welcome to substitute their own datasets here and retain a holdout dataset for evaluating the performance of the models submitted by students.)

Students are required to submit predictions based on the testing dataset and to deliver a report, covering the required documentation for a data mining project.



Subsections
Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010