DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Sample Business Case |
(Replace Togaware with your own company!)
Business Case: Provision of R Statistical Software Package
Background
Togaware undertakes a wide range of statistical analyses, using a variety of statistical software packages, mostly expensive packages from vendors like SAS and SPSS. The choice of package is generally dependent on the data analysts previous exposure and training in the specific package.
A number of keen analysts in the organisation have evaluated the freely available statistical software package called R (an open source implementation of the S Statistical Language) for several research projects. With increasing demand for the use of R this business case proposes the roll out of R to more users.
R is open source software and comes with a license that allows it to be freely installed and ensures that it will always be so. The software is well documented, with documentation available on the World Wide Web and installed with the package. There is a growing number of companies that will now support R, but significantly there are several active mailing lists that offer the opportunity to seek help directly from the developers and a very large user community.
Why is R Needed
R provides extensive support for all statistical analyses with features particularly useful to Togaware's analysts. Data can be directly sourced using ODBC to connect to databases including SQL Server, to import data and undertake statistical analysis.
About R
R (http://www.r-project.org/) is a statistical analysis package consisting of a programming language and a graphics system. It is the most comprehensive statistical analysis software available with over 1400 packages specialising in topics like from Econometrics, Data Mining, Spatial Analysis, and Bio-Informatics. The packages are freely available from the comprehensive R archive network (CRAN). R incorporates all of the standard statistical tests, models, analyses, as well as providing a comprehensive language for managing and manipulating data.
R is free and open source software allowing anyone to use and, importantly, to modify it, without limitation. R is licensed under the GNU General Public License, with Copyright held by The R Foundation for Statistical Computing. Thus R can be installed on any desktop or server without any license fee, but with a license ensuring it can be used without limitation.
The development of the core R system is assured through a rigourous process that is documented for the US Food and Drug Administration in the Regulatory Copmliance and Validation Issues document: See http://www.r-project.org/doc/R-FDA.png.
R is in use by stasticians throughtout the world, as well as being deployed in the Australian Taxation Office, Geosciences Australia, Google, and Merc and Co, for example. The importance of R is acknowledged by the traditional statistical software a
Copyright © 2004-2010 Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.