Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Building a Model

Let's have a look at the simplest of problems. Suppose we want to model one variable (e.g., a person's height) in terms of another variable (e.g., a person's age).

We can create a collection of people's ages and heights, using some totally random data:

> set.seed(123)			# To ensure repeatability.
> ages <- runif(10, 1, 20)	# Random ages between 1 and 20
> heights <- 30 + rnorm(10, 1, as.integer(ages)) + ages*5
> plot(ages, heights)

We can now build a model (in fact, a linear interpolation) that approximates this data using R's approxfun:

> my.model <- approxfun(ages, heights)
> my.model(15)
[1] 85.38172
> plot(my.model, add=TRUE, col=2, ylim=c(20,200), xlim=c(1,20))

The resulting plot is show in Figure 18.1. We can see it is only an approximate model and indeed, not a very good model. The data is pretty deficient, and we also know that generally height does not decrease for any age group in this range. It illustrates the modelling task though.

Figure 18.1: A approximate model of random data.
Image rplot-random-ages-heights



> my.spline <- splinefun(ages, heights)



Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.