17. Dates
This script covers some of the more common issues we may face while dealing with dates.
Date details
Look at strip time format for guidance
Check the local time zone
Creating daily dates
Create date columns out of the mangled date data we have loaded.
# Create good date column
new_dates <- sad_dates %>%
mutate(new_good = as.Date(good))
# Correct bad date column
new_dates <- new_dates %>%
mutate(new_bad = as.Date(bad, format = "%m/%d/%y"))
# Correct ugly date column
new_dates <- new_dates %>%
mutate(new_ugly = seq(as.Date("1998-01-13"), as.Date("1998-01-21"), by = "day"))
Creating hourly dates
If we want to create date values out of data that have hourly values (or smaller), we must create ‘POSIXct’ valus because ‘Date’ values may not have a finer temporal resolution than one day.
# Correcting good time stamps with hours
new_dates <- new_dates %>%
mutate(new_good_hours = as.POSIXct(good_hours, tz = "Africa/Mbabane"))
# Correcting bad time stamps with hours
new_dates <- new_dates %>%
mutate(new_bad_hours = as.POSIXct(bad_hours, format = "%Y-%m-%d %I:%M:%S %p", tz = "Africa/Mbabane"))
# Correcting bad time stamps with hours
new_dates <- new_dates %>%
mutate(new_ugly_hours = seq(as.POSIXct("1998-01-13 09:00:00", tz = "Africa/Mbabane"),
as.POSIXct("1998-01-13 17:00:00", tz = "Africa/Mbabane"), by = "hour"))
But shouldn’t there be a function that loads dates correctly?
Importing dates in one step
Why yes, yes there is. read_csv()
is the way to go.
But why does it matter that we correct the values to dates? For starters, it affects the way our plots look/work. Let’s create some random numbers for plotting and see how these compare against our date values when we create figures.
# Generate random number
smart_dates$numbers <- rnorm(9, 2, 10)
# Scatterplot with correct dates
ggplot(smart_dates, aes(x = good, y = numbers)) +
geom_point() +
geom_smooth(method = "lm", se = F)
# Scatterplot with incorrect dates
ggplot(smart_dates, aes(x = bad, y = numbers)) +
geom_point() +
geom_smooth(method = "lm", se = F)
# OR
ggplot(smart_dates, aes(x = ugly, y = numbers)) +
geom_point() +
geom_smooth(method = "lm", se = F)
If the dates are formatted correctly it also allows us to do schnazy things with the data.
R> [1] "1998-02-17"
R> Time difference of 6 days
R> [1] "1998-01-21" "1998-01-20" "1998-01-19" "1998-01-18" "1998-01-17"
R> [6] "1998-01-16" "1998-01-15"
R> [1] "1970-01-01"
Reuse
Citation
@online{j._smit2021,
author = {J. Smit, Albertus},
title = {17. {Dates}},
date = {2021-01-01},
url = {http://tangledbank.netlify.app/BCB744/intro_r/17-dates.html},
langid = {en}
}