Togaware DATA MINING
Desktop Survival Guide
by Graham Williams
Google


Dates and Times

To calculate the differences between times use difftime.

When importing data from a CSV file, for example, dates are simply read as factors. These can easily be converted to date objects using as.Date:

> ds <- read.csv("authors.csv")
> ds$Notified
   [1]            2005/06/05 2005/06/05 
> as.Date(ds$Notified, format="%Y/%m/%d")
   [1] NA           "2005-06-05" "2005-06-05"

The default format is "%Y-%m-%d". See the help for strftime for an explanation of the format. Any extra text found in the string after the text has been consumed by the format string will simply be ignored. But if the format is not found at the beginning of the string then a NA is returned.

> ds <- c("2005-05-22 12:35:00", "2005-05-23 abc","abc 2005-05-24")
> ds
[1] "2005-05-22 12:35:00" "2005-05-23 abc"      "abc 2005-05-24"     
> class(ds)
[1] "character"
> ds <- as.Date(ds)
> ds
[1] "2005-05-22" "2005-05-23" NA
> class(ds)
[1] "Date"

To compare date values use as.Date:

> ds > as.Date("2005-05-22")
[1] FALSE  TRUE    NA

To view the methods associated with the Date class:

> methods(class = "Date")
 [1] as.character.Date  as.data.frame.Date as.POSIXct.Date    Axis.Date*        
 [5] c.Date             cut.Date           -.Date             [<-.Date          
 [9] [.Date             [[.Date            +.Date             diff.Date         
[13] format.Date        hist.Date*         is.numeric.Date    julian.Date       
[17] Math.Date          mean.Date          months.Date        Ops.Date          
[21] plot.Date*         print.Date         quarters.Date      rep.Date          
[25] round.Date         seq.Date           summary.Date       Summary.Date      
[29] trunc.Date         weekdays.Date     

   Non-visible functions are asterisked

To aggregate by month, some alternatives:

> library(chron)
> dts=seq.dates("1/1/01","12/31/03")
> rnum=rnorm(1:length(dts))
> df=data.frame(date=dts,obs=rnum)
> aggregate(df[,2],list(year=years(df[,1]),month=months(df[,1])),sum)
> library(zoo)
> aggregate(zoo(rnum, dts), as.yearmon, sum)
> aggregate(rnum, list(dts = as.yearmon(dts)), sum)

Extract the year from a vector of dates:

> dates <- c("26 Jan 1974", "April 3, 2002", "23 June, 1999", "2007")
>  gsub(".*([1-9][0-9]{3}).*", "\\1", dates)
[1] "1974" "2002" "1999" "2007"



> as.POSIXlt('2005-7-1')
[1] "2005-07-01"
> unlist(as.POSIXlt('2005-7-1'))
  sec   min  hour  mday   mon  year  wday  yday isdst 
    0     0     0     1     6   105     5   181     0

Copyright © 2004-2008 Togaware Pty Ltd
Support further development through the purchase of the PDF version of the book.
PDF version is properly formatted and forms a comprehensive book (draft with over 600 pages).
Brought to you by Togaware.