Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Loss Matrix

The loss matrix is used to weight mis-classifications differently. This refers to the false positives (or type I errors) and false negatives (type II errors), when we talk about a two class problem. Often, one type of error is more of a loss to us than another type of error. In fraud, for example, a model that might identify too many false positives is probably better than a model that identifies too many false negatives (because we then miss too many real frauds).

The default loss for each case is 1--they are all of equal impact or loss. In the case of a rare, and underrepresented class (like fraud) we might consider false negatives to be 4 or even 10 times worse than a false positive. Thus, we communicate to the model builder that we want it to work harder to build a model to find all of the positive cases.

The loss matrix records these relative weights, for the two class case, as:

\begin{displaymath}\begin{array}{rll}
Pr v Ob & 0 & 1\\
0 & TN & FN \\
1 & FP & TP \\
\end{array}\end{displaymath}

Noting that we do not specify any weights in the loss matrix for the true positives (TP) and the true negatives(TN) we supply weights of 0 for them in the matrix. To spcify the matrix in the Rattle interface, we thus supply a list of the form: $0,FN,FP,0$. This is used to construct, row-wise, the loss matrix.

In general, the loss matrix must have the same dimensions as the number of classes (i.e., the number of levels of the target variable) in the training dataset. For binary classification we must supply four numbers with the diagonal as zeros.

An example is the string of numbers $0,10,1,0$, which might be interpreted as saying that an actual 1, predicted as 0 (i.e., a false negative) is 10 times more unwelcome that a false positive!

The loss matrix is used to alter the priors which will affect the choice of variable to split the dataset on at each node, giving more weight where appropriate.

Using the loss matrix is often indicated when we build a decision tree that ends up being just a single root node (often because the positive class represents less than 5% of the population--and so the most accurate model would predict everyone to be a negative).

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010