Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Data Frames

A data frame is essentially a list of named vectors, where, unlike a matrix, the different vectors (or columns) need not all be of the same data type. A data frame is analogous to a database table, in that each column has a single data type, but different columns can have different data types. This is distinct from a matrix in which all elements must be of the same data type.

> age <- c(35, 23, 56, 18)
> gender <- c("m", "m", "f", "f")
> people <- data.frame(Age=age, Gender=gender)
> people
  Age Gender
1  35      m
2  23      m
3  56      f
4  18      f

The columns of the data frame have names, and the names can be assigned as in the above example. The names can also be changed at any time by assignment to the output of the function call to colnames:



> colnames(people)
[1] "Age"    "Gender"
> colnames(people)[2] <- "Sex"
> colnames(people)
[1] "Age" "Sex"
> people
  Age Sex
1  35   m
2  23   m
3  56   f
4  18   f

If we have the datasets we wish to combine as a single list of datasets, we can use the do.call function to apply rbind to that list so that each element of the list becomes one argument to the rbind function:

j <- list()			# Generate a list of data frames
for (i in letters[1:26]) 
{
    j[[i]] <- data.frame(rep(i,25),matrix(rnorm(250),nrow=25))
}
j[[1]]
allj <- do.call("rbind", j)	# Combine list of data frames into one.

You can reshape data in a data frame using unstack:

> ds <- data.frame(type=c('x', 'y', 'x', 'x', 'x', 'y', 'y', 'x', 'y', 'y'),
                   value=c(10, 5, 2, 6, 4, 8, 3, 6, 6, 8))
> ds
   type value
1     x    10
2     y     5
3     x     2
4     x     6
5     x     4
6     y     8
7     y     3
8     x     6
9     y     6
10    y     8
> unstack(ds, value ~ type)
   x y
1 10 5
2  2 8
3  6 3
4  4 6
5  6 8

To even assign the values to variables of the same names as the types you could use attach:

> attach(unstack(ds, value ~ type))
> x
[1] 10  2  6  4  6
> y
[1] 5 8 3 6 8

We can see that a data frame is just a list using a combination of the unclass and str functions:



> str(unclass(iris))



List of 5
 $ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
 - attr(*, "row.names")= int [1:150] 1 2 3 4 5 6 7 8 9 10 ...



Subsections
Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010