|   | DATA MINING Desktop Survival Guide by Graham Williams |   | |||
| Min Bucket (minbucket) | 
The Rarg[]minbucket is the minimum number of observations in any terminal leaf node.
The two variables Rarg[]minbucket and Rarg[]minsplit
are closely related. In rpart if either is not specified
then by default the other is calculated as 
 .
.
Using rpart directly we specify Roption[]minbucket within an option called Roption[]control which takes the results from a function called rpart.control. In this example we
| 
> audit <- read.csv(url("http://rattle.togaware.com/audit.csv"))
> audit.rpart <- rpart(TARGET_Adjusted ~ Age + Marital 
                                             + Occupation 
                                             + Deductions, 
                       data=audit,
                       method="class", 
                       control=rpart.control(minbucket=100))
> audit.rpart
 | 
Changing Rarg[]minbucket can result in different variables being chosen at different nodes. Compare the tree obtain with the command above (with Rarg[]minbucket set to 100) to the result when Rarg[]minbucket is set to 10. Note how node 7 was originally split using Age but with the minimum bucket size set to 10 the node is split on Deductions. We can see why -- the resulting node 15 has only 30 entities:
| 
[...] 
  control=rpart.control(minbucket=100))
[...]
   7) Occupation=Clerical [...] 516 207 1 (0.40116279 0.59883721)  
    14) Age< 36.5 151  72 0 (0.52317881 0.47682119) *
    15) Age>=36.5 365 128 1 (0.35068493 0.64931507) *
[...]
  control=rpart.control(minbucket=10))
[...]
    7) Occupation=Clerical [...] 516 207 1 (0.40116279 0.59883721)  
     14) Deductions< 1299.833 486 207 1 (0.42592593 0.57407407)  
[...]
     15) Deductions>=1299.833 30   0 1 (0.00000000 1.00000000) *
 | 
Whilst the default is to set Rarg[]minbucket to be one third of Rarg[]minsplit there is no requirement for Rarg[]minbucket to be less than Rarg[]minsplit. A node will always have at least Rarg[]minbucket entities, and it will be considered for splitting if it has at least Rarg[]minsplit entities and on splitting, each of its children have at least Rarg[]minbucket entities.
Copyright © Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.