Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Excel

The simplest way to transfer data from Excel, or any spreadsheet in fact, is to save the data in CSV (Comma Separated Value) format, usually into a file with extension .csv. This is supported in all spreadsheet applications and is effective in that if we are fluent with data manipulation in Excel, then we can get our data into shape using Excel, and then load it into Rattle for data mining.

The XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionsread.xls function in the gdata package can read specified sheets from an Excel spreadsheet.

Alternatively, on MS/Windows Excel spreadsheetscan be directly accessed and manipulated through ODBCusing XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionsodbcConnectExcel. Available sheets can be listed with XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionssqlTables and individual sheets can be queried through the XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionssqlQuery function or else imported with XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionssqlFetch. To use a spreadsheet as a database though, the first row of the spreadsheet must be the column names! If not, we will find that we end up reading from the second row of our data.

In this example we open a connection to a spreadsheet and then give a sample query:

> library(RODBC)
> channel <- odbcConnectExcel("h:/audit.xls")
> ds <- sqlQuery(channel, "SELECT * FROM `Sheet1$` 
                                  WHERE Type = "TOC" 
                                  AND   Valve="5010-05"")
> odbcClose(channel)

odbcCloseXnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)odbcCloseR functionsR functions

To simply fetch the full contents of a single sheet of a spreedsheet we can use the XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionssqlFetch query:

library(RODBC)
channel <- odbcConnectExcel("h:/audit.xls")
ds <- sqlFetch(xlsConnect, "Sheet1")
odbcClose(xlsConnect)

On MS/Windows you can also use the xlsReadWrite package to directly access and manipulate an Excel spreadsheet. For example, to read a spreadsheet we can use XnullXR functionsR functions (R function)R functionsR libraries (R library)R functionsR option (R option)R functionsR packages (R package)R functionsDatasets (Dataset)R functionsR functionsread.xls:

library(xlsReadWrite)
ds <- read.xls("audit.xls", colNames=TRUE, sheet=6,
               colClasses=c("factor","integer","double"))

Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.