DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Predicted versus Observed |
This produces a plot of the actual or observed values (X axis) with the model predicted values (Y axis). A linear model is also fit to the predicted value, based on the actual value, and is displayed as the blue line. The diagonal line (Predicted=Observed) is the perfect model (i.e. perfect correlation between the predicted values and the observed values).
A pseudo-R-square value is also reported. When we are building models to predict categoric outputs there is no measure equivalent to the R-square of ordinary least squares regression which predicts numeric values. Instead, a so called pseudo-R-square can be calculated, which is a measure of the correlation between the predicted and observed values. A higher value indicates that our model fits the data better. The pseudo-R-square can not be interpreted in the same way as the R-square for ordinary regression. There are also several approaches to calculating a pseudo-R-square--we have used a correlation.