Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google

Manipulating Data As SQL

The Structured Query Language (SQL) is a declarative language commonly used for databases. Many data analysts know SQL and can manipulate data easily using SQL. The sqldf provides a mechanism for data analysts familiar with SQL to simply manipulate R data using SQL.



> library(sqldf)
# Simple count
> sqldf("select count(*) from iris")
  count(*)
1      150
> sqldf("select * from iris order by Sepal_Length desc limit 3")
  Sepal_Length Sepal_Width Petal_Length Petal_Width   Species
1          7.9         3.8          6.4         2.0 virginica
2          7.7         3.8          6.7         2.2 virginica
3          7.7         2.6          6.9         2.3 virginica
# New data frame with Species2 a factor with two levels.
> sqldf("select Sepal_Length, Sepal_Width, 
                Petal_Length, Petal_Width, 
                Species as Species2 
         from iris where Species <> 'setosa'")

See http://code.google.com/p/sqldf/ for further examples.



Copyright © 2004-2010 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
The PDF version is a formatted comprehensive draft book (with over 800 pages).
Brought to you by Togaware. This page generated: Sunday, 22 August 2010