Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Export KMeans Clusters

The export functionality is implemented for kmeans clusters to export the actual model as PMML. Within the PMML the centroids are recorded in the PMML model specification.

To save a CSV file that records the cluster to which each entity belongs go to the Evaluate tab and select Score.

In saving a cluster model as PMML the PMML specifications provide for quite a bit of generality. So, to use a kmeans cluster model to score a new data point, in general, we calculate the distance between the new data point and each centroid. Now, in general, the different variables may be very different types. Thus, for each variable we might use a different mechanism for calculating the distance. The default is to simply calculate the absolute difference (this is call absDiff in the resulting PMML. This is appropriate for numeric data. For categoric data we might use the Jaccard index. Then we calculate the sum of those distances by adding them in some appropriate form. For this we use the squared Euclidean as the distance comparison measure, as specified in the ComparisonMeasure element of the PMML.

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010