60
Today we will use the ggplot2 and lubridate packages install.packages("lubridate") library(ggplot2) #load first! library(lubridate) #load second!

20 date-times

Embed Size (px)

Citation preview

Page 1: 20 date-times

Today we will use the ggplot2 and lubridate packages

install.packages("lubridate")library(ggplot2) #load first!library(lubridate) #load second!

Page 2: 20 date-times

Garrett GrolemundPhd Student / Rice University

Department of Statistics

Dates and Times

Page 3: 20 date-times

1. four more table rules

2. manipulating date-times

3. manipulating time spans (i.e, math with date-times)

Page 4: 20 date-times

Table style guidelines

Page 5: 20 date-times

1. When to use a table

Use a table instead of a graphic if:

a) there is only a small amount of data, or

b) precision is important

Page 6: 20 date-times

2. Significant digits

Pick an appropriate amount of significant digits (at most 4 or 5)

Use signif( ) to round data to that amount

Page 7: 20 date-times

3. Align decimals

Ensure that decimal points line up so that differences in order of magnitude are easy to spot.

Page 8: 20 date-times

3. Align decimals

Ensure that decimal points line up so that differences in order of magnitude are easy to spot.

Page 9: 20 date-times

4. Include captions

Always include a captions.

Less enthusiastic readers will only look at your figures, so try to summarize the whole story there.

Captions should explain what data the figure shows and highlight the important finding.

Page 10: 20 date-times

date-times in R

Page 11: 20 date-times

Time is a measurement system similar to the number system (which measures quantity). Just as numbers can be arranged on a number line, times can be arranged on a time line

monotonically increasing

0 CEBC AD

Page 12: 20 date-times

0 CE

A date-time is a specific instant of time. It refers to an exact point on the time line. For example,

January 1st, 2000 12:34:00, or

right nowJanuary 1st, 2000 12:34:00

right now

Page 13: 20 date-times

0 CE

2000-01-01 12:34:00

Identifying instants

x seconds since 0 CE

Instants of time are commonly identified in two ways:

1) as the number of seconds since a reference time

2) by a unique combination of year, month, day, hour, minute, second, and time zone values

Page 14: 20 date-times

R stores date-times as either POSIXct or a POSIXlt objects.

POSIXct objects are stored as the number of seconds since a reference time (by default 1970-01-01 00:00:00 UTC)

unclass(now())

POSIXlt objects are stored as a unique combination of year, month, day, hour, minute, second, and time zone values

unclass(as.POSIXlt(now())

Page 15: 20 date-times

Parsing datesParse character strings into date-times with the ymd() type functions in lubridate

e.g, ymd("2010-11-01")*

*note: this is also a quick way to create new dates

Page 16: 20 date-times

Parsing datesuse the function whose name matches the order of the elements in the date

ymd("2010-11-02")

dmy("02/11/2010")

mdy("11.02.10")

ymd_hms("2010-11-02 04:22:58")

Page 17: 20 date-times

Your turn

Parse the column of dates in emails.csv into POSIXct objects.

Page 18: 20 date-times

Access a specific element of a date-time with the function that has the element’s name

Page 19: 20 date-times

now()year(now())hour(now())

day(now())yday(now())wday(now())wday(now(), label = TRUE)wday(now(), label = TRUE, abbr = FALSE)

month(now(), label = TRUE, abbr = FALSE)

Determine which day of the week you were born on

Page 20: 20 date-times

Your turn

Use the email data to determine at which hour of the day Professor Wickham is most likely to respond to your email.

Page 21: 20 date-times

em <- read.csv("emails.csv")

em$hour <- hour(em$time)

qplot(hour, data = em, binwidth = 1)

Page 22: 20 date-times

Accessor functions can also be used to change the elements of a date-time

now()year(now()) <- 1999hour(now()) <- 23

day(now()) <- 45

tz(now()) <- "UTC"

Determine what day of the week your birthday

will be on next year

Page 23: 20 date-times

Time zones

Page 24: 20 date-times

Different clock times. All refer to the same instant of time

2010-11-01 03:53:06 PDT

2010-11-01 04:53:06 MDT

2010-11-01 05:53:06 PDT

2010-11-01 06:53:06 EDT

Switch between time zones with with_tz()

Page 25: 20 date-times

with_tz(now(), "UTC")

with_tz(now(), "America/New_York")

Page 26: 20 date-times

Same clock times. All refer to the different instants of time

2010-11-01 03:53:06 PDT

2010-11-01 03:53:06 MDT

2010-11-01 03:53:06 PDT

2010-11-01 03:53:06 EDT

Alter time zone with force_tz()

Page 27: 20 date-times

Recall that force_tz() returns a new instant of time

force_tz(now(), "UTC")

force_tz(now(), "America/New_York")

What time is it now where Hadley is (London)? What time was it here when London clocks displayed our current

time?

Page 28: 20 date-times

rounding instants

We can round an instant to a specified unit using floor_date(), ceiling_date(), round_date()

e.g.

round_date(now(), "day")

round_date(now(), "month")

Page 29: 20 date-times

Your turn

Use ddply to calculate how many emails Hadley received per day since September 1st. How will you subset? How will you pair up emails sent on the same day?

Plot the number of emails per day over time. Compare the number of emails sent by day of the week.

Page 30: 20 date-times

# just the recent emails

recent <- subset(em, time > ymd("2010-09-01"))

# binning into days

recent$day <- lubridate::floor_date(recent$time, "day")

# calculating daily totals

> daily <- ddply(recent, "day", summarise, words = sum(words), emails = length(day))

Page 31: 20 date-times

qplot(day, emails, data = daily, geom = "line")

qplot(wday(day, label = T), emails, data = daily, geom = "boxplot")

Page 32: 20 date-times

Time spans/ math with date-times

Page 33: 20 date-times

0 CE

A time span is a period of time. It refers to an interval on the time line. For example,

19 months, or

one century

one century 19 months

Page 34: 20 date-times

What does time measure?

- one half of something called space-time?

- position of the sun?

- tilt of the Earth’s axis?

- number of days left until the weekend?

- All of the above (and poorly at that)?

Page 35: 20 date-times

The month suggests the tilt of the Earth’s axis (but requires a leap day to get back in sync)

The hour suggests where the sun is in the sky (but requires time zones and daylight savings)

The Earth’s movement is decelerating, but space-time is constant (which requires random leap seconds)

as.period(diff(.leap.seconds))

Page 36: 20 date-times

ConsiderSuppose we wish to record the opening value of the S&P 500 everyday for a month. Since the stock market opens at 8:30 CST, we could calculate:

force_tz(ymd_hms("2010-01-01_08:30:00"), "") + ddays(0:30)

how about

force_tz(ymd_hms("2010-03-01_08:30:00"), "") + ddays(0:30)

What went wrong?

Page 37: 20 date-times

Why does this matter?

What do we mean by exactly one month from now?

How long is an hour?

How long will an hour be at 2:00 this sunday morning?

Page 38: 20 date-times

According to clock times, the time line looks more like this

day light savings leap dayday light savings

Page 39: 20 date-times

3 types of time spans

We can still do math on the time line, as long as we’re specific about how we want to measure time.

We can use 3 types of time spans, each measures time differently: durations, periods, and intervals

Page 40: 20 date-times

DST <- force_tz(ymd_hms("2010-11-7 01:06:39"), "")

durationsDurations measure the exact amount of seconds that pass between two time points.

DST + ddays(1)

Page 41: 20 date-times

Use new_duration() or a helper function to create a durations object. Helper functions are named d + the

plural of the object you are trying to create.

new_duration(3601)new_duration(minute = 5)dminutes(5)dhours(278)dmonths(4) #no dmonths()

Page 42: 20 date-times

Durations are appropriate when you wish to measure a time span exactly, or compare tow time spans. For example,

- the radioactive half life of an atom

- the lifespan of two brands of lightbulb

- the speed of a baseball pitch

Page 43: 20 date-times

Periods measure time spans in units larger than seconds. Periods pay no attention to how many sub-units occur during the unit of measurement.

periods

DST + days(1)

{

Page 44: 20 date-times

{No surprises.

2010-11-01 00:00:00 + months(1) will always equal 2010-12-01 00:00:00 no matter how many leap seconds, leap days or changes in DST occur in between

Why use periods?

=

Page 45: 20 date-times

Why not use periods?We cannot accurately compare two periods unless we know when they occur.

1 month = 31 daysJanuary = 31 days

February = 31 days

?

Page 46: 20 date-times

Use new_period() or a helper function to create a period object. Helper functions are simply the plural

of the object you are trying to create.

new_period(3601)new_period(minute = 5)minutes(5)hours(278)months(4) # months are not a problem

Page 47: 20 date-times

Periods are appropriate when you wish to model events that depend on the clock time. For example,

- the opening bell of a stock market

- quarterly earnings reports

- reoccurring deadlines

Page 48: 20 date-times

parsing time spans

a time span that contains only hours, minutes, and seconds information can be parsed as a period with hms(), hm(), ms(), hours(), minutes(), or seconds()

e.g ms("11:45")

Page 49: 20 date-times

Intervals measure a time span by recording its endpoints. Since we know when the time span occurs, we can calculate the lengths of all the units involved.

intervals

{

Page 50: 20 date-times

Intervals retain all of the information available about a time span, but cannot be generalized to other spots on the time line.

Intervals can be accurately converted to either periods or durations with as.period() and as.duration()

Page 51: 20 date-times

Create an interval with new_interval() or by subtracting two dates.

int <- ymd("2010-01-01") - ymd("2009-01-01")

Access and set the endpoints with start() and end(). Note that setting preserves length (in seconds).

start(int)

end(int) <- ymd("2010-03-14")

Intervals are always positive

Page 52: 20 date-times

converting between time spans

Periods can be converted to durations by using the most common lengths (in seconds) of each time unit.

These are just estimates. For accuracy, convert a period to a interval first and then convert the interval to a duration.

Page 53: 20 date-times

arithmetic with date-times

time spans are meant to be added and subtracted to both each other and instants

e.g, now() + days(1) - minutes(25:30)

Page 54: 20 date-times

Challenge

Write a function that takes a date and returns the last day of the month the date occurs in.

Page 55: 20 date-times

multiplication/division

Multiplication and division of time spans works as expected.

Dividing periods with durations or other periods can only provide an estimate. Convert to intervals for accuracy.

Dividing intervals by durations creates an exact answer.

Dividing intervals by periods performs integer division.

Page 56: 20 date-times

modulo arithmetic and integer division

integer division128 %/% 5 = 25

modulo128 %% 5 = 3

Page 57: 20 date-times

Your turn

Calculate your age in minutes. In hours. In days. How old is someone who was born on June 4th, 1958?

Page 58: 20 date-times

lifetime <- now() - ymd("1981-01-22")

lifetime / dminutes(1)

lifetime / dhours(1)

lifetime / ddays(1)

(now() - ymd("1981-01-22")) / years(1)

Page 59: 20 date-times
Page 60: 20 date-times

This work is licensed under the Creative Commons Attribution-Noncommercial 3.0 United States License. To view a copy of this license, visit http://creativecommons.org/licenses/by-nc/3.0/us/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.