Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Elements



> letters		        # a b c [...] z
> letters[10]			# "j"
> letters[10:15]		# "j" "k" "l" "m" "n" "o"
> letters[c(1, 2, 4, 8, 16)]	# "a" "b" "d" "h" "p"
> letters[-(10:26)]		# "a" "b" "c" "d" "e" "f" "g" "h" "i"

An operator (or function) can be applied to a vector to return a vector. This is particularly useful for boolean operators, returning a vector of boolean values which can then be used to select specific elements of a vector:

> letters > "j"		        		# FALSE FALSE FALSE [...] TRUE
> letters[letters > "j"]	        	# "k" "l" "m" "n" [...] "y" "z"
> letters[letters > "w" | letters < "e"]	# "a" "b" "c" "d" "x" "y" "z"

Here's a useful trick to ensure we don't divide by zero, which would otherwise give an infinite answer (XnullXXnullXR variablesR functions (R function)R variablesR libraries (R library)R variablesR option (R option)R variablesR packages (R package)R variablesDatasets (Dataset)XnullXR variablesR variablesInf):

> x <- c(0.28, 0.55, 0, 2)
> y <- c(0.53, 1.34, 1.2, 2.07)
> sum(((x-y)^2/x))                  
[1] Inf
> sum(((x-y)^2/x)[x!=0])            # Exclude the zeros
[1] 1.360392

We could also generate random subsets of our data.



> subdataset <- dataset[sample(seq(1, nrow(dataset)), 1000),]

We can select elements meeting set inclusion conditions. Here we first select a subset of rows from a data frame having particular colours.

> ds[ds$colour %in% c("green", "blue"),]
> ds[ds$colour %in% names(which(table(ds$colour) > 11)),]



Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.