Desktop Survival Guide
by Graham Williams
The simplest way to transfer data from Excel, or any spreadsheet in fact, is to save the data in CSV (Comma Separated Value) format, usually into a file with extension .csv. This is supported in all spreadsheet applications and is effective in that if we are fluent with data manipulation in Excel, then we can get our data into shape using Excel, and then load it into Rattle for data mining.
The read.xls function in the gdata package can read specified sheets from an Excel spreadsheet.
Alternatively, on MS/Windows Excel spreadsheetscan be directly accessed and manipulated through ODBCusing odbcConnectExcel. Available sheets can be listed with sqlTables and individual sheets can be queried through the sqlQuery function or else imported with sqlFetch. To use a spreadsheet as a database though, the first row of the spreadsheet must be the column names! If not, we will find that we end up reading from the second row of our data.
In this example we open a connection to a spreadsheet and then give a
> library(RODBC) > channel <- odbcConnectExcel("h:/audit.xls") > ds <- sqlQuery(channel, "SELECT * FROM `Sheet1$` WHERE Type = "TOC" AND Valve="5010-05"") > odbcClose(channel)
To simply fetch the full contents of a single sheet of a spreedsheet
we can use the sqlFetch query:
library(RODBC) channel <- odbcConnectExcel("h:/audit.xls") ds <- sqlFetch(xlsConnect, "Sheet1") odbcClose(xlsConnect)
On MS/Windows you can also use the xlsReadWrite
to directly access and manipulate an Excel spreadsheet. For example,
to read a spreadsheet we can use read.xls:
library(xlsReadWrite) ds <- read.xls("audit.xls", colNames=TRUE, sheet=6, colClasses=c("factor","integer","double"))
Copyright © 2004-2010 Togaware Pty Ltd Support further development through the purchase of the PDF version of the book.