Dates From netCDF Files: Two Approaches

This vignette demonstrates basic ideas behind dates in netCDF files.

Author

AJ Smit

Published

November 22, 2023

Working with dates in netCDF files can be tricky. Often netCDF files are distributed as one file for each day over several decades. In this case, the time dimension would be of length one and the coordinate variable would provide the date. However, in these cases, the date is usually also encoded within the filename. One therefore has two options for extracting the dates in these situations:

parse the dates encoded in the filename and create a date data class from scratch, or
extract the date from the netCDF files by using information about the date units in the coordinate variable attributes and value.

Let as look at each approach.

Parse dates encoded within the filename

# load the libraries
library(tidyverse)
library(lubridate)
library(ncdf4)

# list the files in the directory
ncDir <- "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME"
SST_files <- dir(path = ncDir, full.names = TRUE)

The number of netCDF files (one per day) is:

length(SST_files)

[1] 13993

The full path and filenames are:

SST_files[1:5] # showing the first 5 files

[1] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810901120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[2] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810902120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[3] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810903120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[4] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810904120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[5] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810905120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"

Looking at the filenames, we see the first eight digits indicate the date in the format YYYYMMDD. How do I know this? I read the data product’s manual!

The first file in the time series indicates the time series starts on 19810901, or 1981-09-01. The last date is 2019-12-31:

basename(SST_files[length(SST_files)]) # I removed the file path

[1] "20191231120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"

Now that we know where to find the dates in the filename, let us create a date from scratch.

fName <- basename(SST_files[1]) # the filename without the file path
fName

[1] "19810901120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"

fDate <- substr(fName, 1, 8) # extract the substring comprised of the first eight characters
fDate

[1] "19810901"

date <- as.Date(fDate, format = "%Y%m%d")
date

[1] "1981-09-01"

Extract date information from the netCDF coordinate variable

To use this approach, we must first open the netCDF file with one of the R netCDF libraries. Here I use ncdf4. Then we get the time coordinate variable’s attribute and the content of the variable:

# open one of the first file in the listing
nc <- nc_open(SST_files[1])

# extract the date units
tunits <- ncatt_get(nc, "time", "units")
tunits

$hasatt
[1] TRUE

$value
[1] "seconds since 1981-01-01 00:00:00"

# extract the time and convert it to something sensible
time <- ncvar_get(nc, "time")
time

[1] 20995200

This is a strange value for a date! This is because each day is counted as the number of seconds from a predefined starting time, in this case, exactly midnight on 1981-01-01. We convert this to something useful like this:

date <- as.POSIXct(time, origin = "1981-01-01 00:00:00")
date

[1] "1981-09-01 02:00:00 SAST"

Note above… Why not exaclty midnight, 1981-09-01 00:00:00? Instead, we have 2 hours after midnight. This is because as.POSIXct() took our local locale into account and automagically converted to SAST or GMT+2. We can prevent this behaviour by setting the time zone explicitely. Below I ignore this discrepancy, but it might be important to consider under some specific situations.

To get rid of the HH:MM:SS we convert to a normal date class (not POSIXct).

date <- as.Date(date)
date

[1] "1981-09-01"

Above I showed how to find the date for any one of the files in a long list of files. Once we know how to do it for one file, we can easily apply it to each file in the directory listing when we create a dataframe that combine all the daily files into one (combining all the coordinate variables, typically lon, lat, and time).

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,_a._j.,
  author = {Smit, A. J., and Smit, AJ},
  title = {Dates {From} {netCDF} {Files:} {Two} {Approaches}},
  date = {},
  url = {http://tangledbank.netlify.app/vignettes/netCDF_dates.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit, A. J., Smit A Dates From netCDF Files: Two Approaches. http://tangledbank.netlify.app/vignettes/netCDF_dates.html.