Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Bibliography

Aggarwal, C. C. & Yu, P. S. (2001)
, Outlier detection for high diemnsional data, in Proceedings of the 27th ACM SIGMOD International Conference on Management of Data (SIGMOD01), pp. 37-46.

Agrawal, R. & Srikant, R. (1994),
Fast algorithms for mining association rules in large databases, in J. B. Bocca, M. Jarke & C. Zaniolo, eds, Proceedings of the 20th International Conference on Very Large Databases (VLDB94), Morgan Kaufmann, pp. 487-499. http://citeseer.ist.psu.edu/agrawal94fast.html.

Barnett, V. & Lewis, T. (1994),
Outliers in Statistical Data, John Wiley.

Bauer, E. & Kohavi, R. (1999),
`An empirical comparison of voting classification algorithms: Bagging, boosting, and variants', Machine Learning 36(1-2), 105-139. http://citeseer.ist.psu.edu/bauer99empirical.html.

Beyer, K. S., Goldstein, J., Ramakrishnan, R. & Shaft, U. (1999),
When is ``nearest neighbor'' meaningful?, in Proceedings of the 7th International Conference on Database Theory (ICDT99), Jerusalem, Israel, pp. 217-235. http://citeseer.ist.psu.edu/beyer99when.html.

Bhandari, I., Colet, E., Parker, J., Pines, Z., Pratap, R. & Ramanujam, K. (1997),
`Advance scout: data mining and knowledge discovery in nba data', Data Mining and Knowledge Discovery 1(1), 121-125.

Blake, C. & Merz, C. (1998),
`UCI repository of machine learning databases'. http://www.ics.uci.edu/~mlearn/MLRepository.html.

Breiman, L. (1996),
`Bagging predictors', Machine Learning 24(2), 123-140. http://citeseer.ist.psu.edu/breiman96bagging.html.

Breiman, L. (2001),
`Random forests', Machine Learning 45(1), 5-32.

Breunig, M. M., Kriegel, H., Ng, R. & Sander, J. ( 1999),
OPTICS-OF: Identifying local outliers, in Proceedings of the XXXXth Conference on Priciples of Data Mining and Knowledge Discovery (PKDD99), Springer-Verlag, pp. 262-270.

Breunig, M. M., Kriegel, H., Ng, R. & Sander, J. ( 2000),
LOF: Identifying denisty based local outliers, in Proceedings of the 26th ACM SIGMOD International Conference on Management of Data (SIGMOD00) Proceedings of the 26th ACM SIGMOD International Conference on Management of Data (SIGMOD00) (2000).

Caruana, R. & Niculescu-Mizil, A. ( 2006),
An empirical comparison of supervised learning algorithms, in Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA.

Cendrowska, J. (1987),
`An algorithm for inducing modular rules', International Journal of Man-Machine Studies 27(4), 349-370.

Cleveland, W. S. (1993),
Visualizing Data, Hobart Press, Summit, New Jersey.

Culp, M., Johnson, K. & Michailidis, G. ( 2006),
`ada: An r package for stochastic boosting', Journal of Statistical Software 17(2). http://www.jstatsoft.org/v17/i02/v17i02.pdf.

Cypher, A., ed. (1993),
Watch What I Do: Programming by Demonstration, The MIT Press, Cambridge, Massachusetts. http://www.acypher.com/wwid/WWIDToC.html.

Dalgaard, P. (2002),
Introductory Statistics with R, Statistics and Computing, Springer, New York.

Freund, Y. & Mason, L. (1999),
The alternating decision tree algorithm, in Proceedings of the 16th International Conference on Machine Learning, pp. 124-133.

Freund, Y. & Schapire, R. E. (1995)
, A decision-theoretic generalization of on-line learning and an application to boosting, in Proceedings of the 2nd European Conference on Computational Learning Theory (Eurocolt95), Barcelona, Spain, pp. 23-37. http://citeseer.ist.psu.edu/freund95decisiontheoretic.html.

Friedman, J. H. (2001),
`Greedy function approximation: A gradient boosting machine', Annals of Statistics 29(5), 1189-1232. http://citeseer.ist.psu.edu/46840.html.

Friedman, J. H. (2002),
`Stochastic gradient boosting', Computational Statistics and Data Analysis 38(4), 367-378. http://citeseer.ist.psu.edu/friedman99stochastic.html.

Hahsler, M., Grün, B. & Hornik, K. ( 2005),
A Computational Environment for Mining Association Rules and Frequent Item Sets, R Package, Version 0.2-1.

Hastie, T., Tibshirani, R. & Friedman, J. ( 2001),
The elements of statistical learning: Data mining, inference, and prediction, Springer Series in Statistics, Springer-Verlag, New York.

Hawkins, D. (1980),
Identification of Outliers, Chapman and Hall, London.

Ho, T. K. (1998),
`The random subspace method for constructing decision forests', IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832-844.

Jin, W., Tung, A. K. H. & Han, J. ( 2001),
Mining top-n local outliers in large databases, in Proceedings of the 7th International Conference on Knowledge Discovery and Data Mining (KDD01).

King, R. D., Feng, C. & Sutherland, A. ( 1995),
`Statlog: Comparison of classification algorithms on large real-world problems', Applied Artificial Intellgience 9(3), 289-333.

Knorr, E. & Ng, R. (1998),
Algorithms for mining distance based outliers in large databases, in Proceedings of the 24th International Conference on Very Large Databases (VLDB98), pp. 392-403.

Knorr, E. & Ng, R. (1999),
Finding intensional knowledge of distance-based outliers, in Proceedings of the 25th International Conference on Very Large Databases (VLDB99) Proceedings of the 25th International Conference on Very Large Databases (VLDB99) (1999), pp. 211-222.

Kohavi, R. (1996),
Scaling up the accuracy of naive-Bayes classifiers: A decision tree hybrid, in Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining (KDD96), Portland, OR, pp. 202-207. http://citeseer.ist.psu.edu/kohavi96scaling.html.

Lin, W., Orgun, M. A. & Williams, G. J. ( 2000),
Temporal data mining using multilevel-local polynominal models, in Proceedings of the 2nd International Conference on Intelligent Data Engineering and Automated Learning (IDEAL 2000), Hong Kong, Vol. 1983 of Lecture Notes in Computer Science, Springer-Verlag, pp. 180-186.

Lin, W., Orgun, M. A. & Williams, G. J. ( 2001),
Temporal data mining using hidden markov-local polynomial models, in D. W.-L. Cheung, G. J. Williams & Q. Li, eds, Proceedings of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD01), Hong Kong, Vol. 2035 of Lecture Notes in Computer Science, Springer-Verlag, pp. 324-335.

Mingers, J. (1989),
`An empirical comparison of selection measures for decision-tree induction', Machne Learning 3(4), 319-342.

Proceedings of the 25th International Conference on Very Large Databases (VLDB99) (1999).

Proceedings of the 26th ACM SIGMOD International Conference on Management of Data (SIGMOD00) (2000),
ACM Press.

Provost, F. J., Jensen, D. & Oates, T. ( 1999),
Efficient progressive sampling, in Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining (KDD99), San Diego, CA, ACM Press, pp. 23-32. http://citeseer.ist.psu.edu/provost99efficient.html.

Quinlan, J. R. (1993),
C4.5: Programs for machine learning, Morgan Kaufmann.

R D (2005),
R Data Import/Export, version 2.1.1 edn.

Ramaswamy, S., Rastogi, R. & Kyuseok, S. ( 2000),
Efficient algorithms for mining outliers from large data sets, in Proceedings of the 26th ACM SIGMOD International Conference on Management of Data (SIGMOD00) Proceedings of the 26th ACM SIGMOD International Conference on Management of Data (SIGMOD00) (2000), pp. 427-438.

Schafer, J. L. (1997),
Analysis of Incomplete Multivariate Data, Chapman and Hall, London.

Schapire, R. E., Freund, Y., Bartlett, P. & Lee, W. S. (1997),
Boosting the margin: a new explanation for the effectiveness of voting methods, in Proceedings of the 14th International Conference on Machine Learning (ICML97), Morgan Kaufmann, pp. 322-330. http://citeseer.ist.psu.edu/schapire97boosting.html.

Soares, C., Brazdil, P. B. & Kuba, P. ( 2004),
`Meta-learning method to select the kernel width in support vector regression', Machine Learning 54(3), 195-209.

Tufte, E. R. (1985),
The Visual Display of Quantitative Information, Graphics Press.
A timeless classic in how complex information should be presented graphically. The Strunk & White of visual design. Should occupy a place of honor-within arm's reach-of everyone attempting to understand or depict numerical data graphically. The design of the book is an exemplar of the principles it espouses: elegant typography and layout, and seamless integration of lucid text and perfectly chosen graphical examples. Very Highly Recommended.

Tukey, J. W. (1977),
Exploratory data analysis, Addison-Wesley.

Venables, W. N. & Ripley, B. D. ( 2002),
Modern Applied Statistics with S, Staistics and Computing, 4th edn, Springer, New York.

Viveros, M. S., Nearhos, J. P. & Rothman, M. J. ( 1999),
Applying data mining techniques to a health insurance information system., in Proceedings of the 25th International Conference on Very Large Databases (VLDB99) Proceedings of the 25th International Conference on Very Large Databases (VLDB99) (1999), pp. 286-294. http://www.informatik.uni-trier.de/~ley/vldb/ViverosNR96/Article.PS.

Williams, G. J. (1987),
`Some experiments in decision tree induction.', Australian Computer Journal 19(2), 84-91.

Williams, G. J. (1988),
Combining decision trees: Initial results from the MIL algorithm, in J. S. Gero & R. B. Stanton, eds, Artificial Intelligence Developments and Applications, Elsevier Science Publishers B.V. (North-Holland), pp. 273-289.

Williams, G. J. (1991),
Inducing and combining decision structures for expert systems, PhD thesis, Australian National University. http://togaware.redirectme.net/papers/gjwthesis.pdf.

Yamanishi, K., ichi Takeuchi, J., Williams, G. J. & Milne, P. (2000),
On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms, in Proceedings of the 6th International Conference on Knowledge Discovery and Data Mining (KDD00), pp. 320-324. http://citeseer.ist.psu.edu/446936.html.

Ye, J. (1998),
`On measuring and correcting the effects of data mining and model selection', Journal of the American Statistical Association 93(441), 120-131.



Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.