Desktop Survival Guide
by Graham Williams
Statistics is one of the fundamental tools for the data miner. Statistics is essentially about uncertainty--to understand it and thereby to make allowance for it. It also provides a framework for understanding the discoveries made in data mining. Discoveries need to be statistically sound and statistically significant--uncertainty associated with modelling needs to be understood.
We might also note some of the controversy around machine learning and statistics. Leading computational statistician (and one of the Core R team) Brian D. Ripley provocatively suggests that machine learning is statistics minus any checking of models and assumptions.