Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Stratified Benford Plots

Image rattle-audit-benford-stratify-marital
We often want to stratify our data (that is, split it up into subgroups in some way). For example, in fraud investigations we might split our data up into groups associated with different geographic regions, or different auditors, etc. Suppose we are considering accounts payable data where each record is a payment and there are, say, ten individuals who sign off on the invoices. We can choose in the Data tab the variable that identifies the individuals who are signing off as the Target variable.

The plot here illustrates the idea using the audit dataset. Here, we have chosen Marital to have the role as a Target variable (doing this in the Data tab). Then we have asked for a Benford plot of the Income variable, and we can see that the plot is stratified over the possible values for the Marital variable.

Figure 6.2: Benford stratified by Marital and Gender.
Image rattle-audit-benford-stratify-marital-gender
To stratify on more than two categoric variables requires a little extra work. Rattle does not allow selecting more than a single target! However, under the Transform tab, under the Remap option (See Section 23.3.3), you can "join" two categoric variables into one and then set this combined categoric as your target variable.

This could be useful when, using the accounts payable example again, we have a person signing off the invoices and another person issuing the invoices, and we wish to explore whether there are any patterns through the combination of these two. That is, the person signing off invoices might only be manipulating those invoices issued by a specific individual. Thus, re-mapping these two categoric variables into a single combined categoric variable will allow us to explore this relationship.

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010