|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
Elements |
> letters # a b c [...] z > letters[10] # "j" > letters[10:15] # "j" "k" "l" "m" "n" "o" > letters[c(1, 2, 4, 8, 16)] # "a" "b" "d" "h" "p" > letters[-(10:26)] # "a" "b" "c" "d" "e" "f" "g" "h" "i" |
An operator (or function) can be applied to a vector to return a
vector. This is particularly useful for boolean operators, returning a
vector of boolean values which can then be used to select specific
elements of a vector:
> letters > "j" # FALSE FALSE FALSE [...] TRUE > letters[letters > "j"] # "k" "l" "m" "n" [...] "y" "z" > letters[letters > "w" | letters < "e"] # "a" "b" "c" "d" "x" "y" "z" |
Here's a useful trick to ensure we don't divide by zero, which would
otherwise give an infinite answer (XnullXXnullXR variablesR functions (R function)R variablesR libraries (R library)R variablesR option (R option)R variablesR packages (R package)R variablesDatasets (Dataset)XnullXR variablesR variablesInf):
> x <- c(0.28, 0.55, 0, 2) > y <- c(0.53, 1.34, 1.2, 2.07) > sum(((x-y)^2/x)) [1] Inf > sum(((x-y)^2/x)[x!=0]) # Exclude the zeros [1] 1.360392 |
We could also generate random subsets of our data.
> subdataset <- dataset[sample(seq(1, nrow(dataset)), 1000),] |
We can select elements meeting set inclusion conditions. Here we first
select a subset of rows from a data frame having particular colours.
> ds[ds$colour %in% c("green", "blue"),]
> ds[ds$colour %in% names(which(table(ds$colour) > 11)),]
|