Dates From netCDF Files: Two Approaches
Working with dates in netCDF files can be tricky. Often netCDF files are distributed as one file for each day over several decades. In this case, the time dimension would be of length one and the coordinate variable would provide the date. However, in these cases, the date is usually also encoded within the filename. One therefore has two options for extracting the dates in these situations:
- parse the dates encoded in the filename and create a date data class from scratch, or
- extract the date from the netCDF files by using information about the date units in the coordinate variable attributes and value.
Let as look at each approach.
Parse dates encoded within the filename
The number of netCDF files (one per day) is:
The full path and filenames are:
[1] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810901120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[2] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810902120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[3] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810903120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[4] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810904120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[5] "/Volumes/OceanData/AVHRR_OI-NCEI-L4-GLOB-v2.0/Africa_LME/19810905120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
Looking at the filenames, we see the first eight digits indicate the date in the format YYYYMMDD
. How do I know this? I read the data product’s manual!
The first file in the time series indicates the time series starts on 19810901
, or 1981-09-01
. The last date is 2019-12-31
:
[1] "20191231120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
Now that we know where to find the dates in the filename, let us create a date from scratch.
[1] "19810901120000-NCEI-L4_GHRSST-SSTblend-AVHRR_OI-GLOB-v02.0-fv02.0_subset.nc"
[1] "19810901"
[1] "1981-09-01"
Extract date information from the netCDF coordinate variable
To use this approach, we must first open the netCDF file with one of the R netCDF libraries. Here I use ncdf4. Then we get the time coordinate variable’s attribute and the content of the variable:
# open one of the first file in the listing
nc <- nc_open(SST_files[1])
# extract the date units
tunits <- ncatt_get(nc, "time", "units")
tunits
$hasatt
[1] TRUE
$value
[1] "seconds since 1981-01-01 00:00:00"
[1] 20995200
This is a strange value for a date! This is because each day is counted as the number of seconds from a predefined starting time, in this case, exactly midnight on 1981-01-01
. We convert this to something useful like this:
Note above… Why not exaclty midnight, 1981-09-01 00:00:00
? Instead, we have 2 hours after midnight. This is because as.POSIXct()
took our local locale into account and automagically converted to SAST or GMT+2. We can prevent this behaviour by setting the time zone explicitely. Below I ignore this discrepancy, but it might be important to consider under some specific situations.
To get rid of the HH:MM:SS
we convert to a normal date class (not POSIXct).
Above I showed how to find the date for any one of the files in a long list of files. Once we know how to do it for one file, we can easily apply it to each file in the directory listing when we create a dataframe that combine all the daily files into one (combining all the coordinate variables, typically lon
, lat
, and time
).
Reuse
Citation
@online{j._smit,
author = {J. Smit, Albertus and Smit, AJ},
title = {Dates {From} {netCDF} {Files:} {Two} {Approaches}},
date = {},
url = {http://tangledbank.netlify.app/vignettes/netCDF_dates.html},
langid = {en}
}