Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Locating and Loading Data

Using the Spreadsheet option of Rattle's Data tab we can directly load data from a exttt.csv file. Click the Filename button (Figure 5.2) to display the file chooser dialogue (Figure 5.3).

We can browse to the exttt.csv file we wish to load, highlight it, and click the Open button.

Figure 5.2: The toolbar and Spreadsheet options of the Data tab, highlighting the Filename button. Click this button to open up the file chooser.
Image load:rattle_startup_annotate_csv

Figure 5.3: The CSV file chooser showing just those files with a .csv extension in the folder. We can also select to display just the .txt files (e.g., the extension often used for tab delimited files) or else all files by selecting from the drop down menu at the bottom right.
Image load:rattle_weather_csv_file_select_annotate

We have told Rattle the location and the name of the file to load. We now need to actually load the data with a click on the Execute button (or press the F2 key). This loads the contents of the file from the hard disk into the computer's memory, for processing by Rattle.

We have mentioned above that rattle supplies a number of sample CSV files and in particular provides the weather.csv data file. The file itself will have been installed when rattle was installed. We can ask R to tell us of its actual location using the system.file function which we type into the R Console:



> system.file("csv", "weather.csv", package = "rattle")



[1] "/usr/local/lib/R/site-library/rattle/csv/weather.csv"

The location reported will depend on your particular installation and operating system. Here the location is as on my own installation, which is a standard GNU/Linux system.

We can review the contents of the file using the file.show function. This will pop up a window displaying the contents of the file.



> fn <- system.file("csv", "weather.csv", package = "rattle")
> file.show(fn)

The file contents can be directly viewed outside of R and Rattle, with any simple text editor. If you aren't familiar with CSV files, it is instructional to do so. We will see that the top of the file will appear as:



Date,Location,MinTemp,MaxTemp,Rainfall,Evaporation,Sunshine...
2007-11-01,Canberra,8,24.3,0,3.4,6.3,NW,30,SW,NW,6,20,68...
2007-11-02,Canberra,14,26.9,3.6,4.4,9.7,ENE,39,E,W,4,17,80...
2007-11-03,Canberra,13.7,23.4,3.6,5.8,3.3,NW,85,N,NNE,6,6,82...
2007-11-04,Canberra,13.3,15.5,39.8,7.2,9.1,NW,54,WNW,W,30,24,62...
2007-11-05,Canberra,7.6,16.1,2.8,5.6,10.6,SSE,50,SSE,ESE,20,28,68...
2007-11-06,Canberra,6.2,16.9,0,5.8,8.2,SE,44,SE,E,20,24,70...

A CSV file is actually a normal text file that begins with a header row, listing the names of the variables, each separated by a comma. The remainder of the file after the header is expected to consist of rows of data that record the observations, again with fields separated by commas recording the values of the variables for each observation.

Copyright © Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010