|
DATA MINING
Desktop Survival Guide by Graham Williams |
|
|||
Using gbm |
Generalised boosted models, as proposed by () and extended by (), has been implemented for R as the gbm package by Greg Ridgeway. This is a much more extensive package for boosting than the boost package.
We illustrate AdaBoost using the Roption[]distribution option of
the gbm function.
> library(gbm)
> load("wine.RData")
> ds <- wine
> ds$Type <- as.numeric(ds$Type)
> ds$Type[ds$Type>1] <- 0
> ds$Type
[1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> ds.gbm <- gbm(Type ~ Alcohol + Malic + Ash + Alcalinity + Magnesium +
Phenols + Flavanoids + Nonflavanoids + Proanthocyanins +
Color + Hue + Dilution + Proline,
data=ds, distribution="adaboost", n.trees=100)
Iter TrainDeviance ValidDeviance StepSize Improve
1 0.9408 nan 0.0010 0.0006
2 0.9402 nan 0.0010 0.0006
3 0.9394 nan 0.0010 0.0007
4 0.9387 nan 0.0010 0.0007
5 0.9381 nan 0.0010 0.0005
6 0.9374 nan 0.0010 0.0006
7 0.9368 nan 0.0010 0.0006
8 0.9361 nan 0.0010 0.0007
9 0.9354 nan 0.0010 0.0006
10 0.9349 nan 0.0010 0.0004
100 0.8750 nan 0.0010 0.0007
> summary(ds.gbm)
var rel.inf
1 Proline 91.82978
2 Flavanoids 8.17022
3 Alcohol 0.00000
4 Malic 0.00000
5 Ash 0.00000
6 Alcalinity 0.00000
7 Magnesium 0.00000
8 Phenols 0.00000
9 Nonflavanoids 0.00000
10 Proanthocyanins 0.00000
11 Color 0.00000
12 Hue 0.00000
13 Dilution 0.00000
> pretty.gbm.tree(ds.gbm)
SplitVar SplitCodePred LeftNode RightNode MissingNode ErrorReduction Weight
0 12 8.675000e+02 1 2 3 65.36408 89
1 -1 -8.139656e-04 -1 -1 -1 0.00000 62
2 -1 9.236987e-04 -1 -1 -1 0.00000 27
3 -1 -2.868090e-04 -1 -1 -1 0.00000 89
Prediction
0 -0.0002868090
1 -0.0008139656
2 0.0009236987
3 -0.0002868090
> gbm.show.rules(ds.gbm)
Number of models: 100
Tree 1: Weight XXXX
Proline < 867.50 : 0 (XXXX/XXXX)
Proline >= 867.50 : 1 (XXXX/XXXX)
Proline missing : 0 (XXXX/XXXX)
[...]
Tree 100: Weight XXXX
Proline < 755.00 : 0 (XXXX/XXXX)
Proline >= 755.00 : 1 (XXXX/XXXX)
Proline missing : 0 (XXXX/XXXX)
|