Regridding gridded data
Aligning and matching spatial grids
First, I define the Benguela region and time extent of interest:
Then I find some data.
VIIRS chlorophyll-a data
I select the VIIRS chl-a data. These data start in 2012 and it has a spatial resolution of ~4km lat/lon. The data, “VIIRSN, Suomi-NPP, Level-3 SMI, NASA, Global, 4km, Chlorophyll a, OCI Algorithm, R2018, 2012-present, Daily,” were retrieved from here.
And now I download it for the region and time period specified earlier:
ERA5 reanalysis wind data
I use the “ERA5 hourly data on single levels from 1940 to present” data, which come in at an hourly resolution for the whole world, starting in 1940. The spatial resolution is an unimpressive 0.25° × 0.25° lat/lon, far coarser than the chl-a data.
The ERA5 reanalysis wind data (u and v components) were downloaded from Copernicus using a python script, which can be generated on the website using the “Show API request” option after selecting the variables and spatio-temporal ranges of interest:
import cdsapi
c = cdsapi.Client()
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'variable': [
'10m_u_component_of_wind', '10m_v_component_of_wind',
],
'year': '2021',
'month': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
],
'day': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
'13', '14', '15',
'16', '17', '18',
'19', '20', '21',
'22', '23', '24',
'25', '26', '27',
'28', '29', '30',
'31',
],
'time': [
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00',
],
'area': [
-20, 15, -37.5,
20,
],
'format': 'netcdf',
},
'download.nc')
The end product of the python download is a 52.3 Mb netCDF file, which I will now load and process to produce the daily temperature values to match the daily resolution of the VIIRS chl-a data:
# time units: hours since 1900-01-01 00:00:00.0
origin <- as.POSIXct("1900-01-01 00:00:00", tz = "UTC")
ncFile <- "/Volumes/OceanData/ERA5/ERA5_2021_Benguela.nc"
# ncFile <- "~/Downloads/ERA5_2021_Benguela.nc"
era5 <- tidync(ncFile) |>
hyper_tibble() %>%
mutate(time = floor_date(time * 3600 + origin, "day")) |>
reframe(u10 = mean(u10),
v10 = mean(v10),
.by = c(time, longitude, latitude)) |>
as.data.table()
Note that I coerce the data to a date.table object since the regridding step (Option 1) uses
Regridding
Let’s check out their respective spatial resolutions:
There is a huge difference. It is seldom a good idea to go from a low resolution like 25 km to a higher resolution like 4 km, but if you insist, you can do it with bi-linear interpolation. Here we will degrade the 4 km product to match the 25 km resolution of the wind data.
Option 1
Define a new output grid. This will be the coarsest resolution one from ERA5:
Using the metR package and its Interpolate()
function, interpolate the data to the new coarser resolution grid. I show two approaches: i) a method shown in the function’s help file that uses data.table, and ii) a dplyr (tidyverse) method using the new reframe()
function. What reframe()
does when used within a dplyr data pipe is reveal the column names, which can then be given to the function of interest in the usual way; it then returns a dataframe or tibble of arbitrary length. reframe()
also accommodates the grouping structure within the function itself through the use of the .by =
argument (i.e. no need for an a priori group_by()
), making the syntax not dissimilar to that of data.table’s. A few years ago the data.table approach would have been faster, but it seems the new versions of dplyr have undergone some significant speed improvements. See the results of the system.time()
function:
library(metR)
# using the data.table method
system.time(
interp_chl <- chl[, Interpolate(chla ~ longitude + latitude, lon.out, lat.out), by = time]
)
head(interp_chl)
# using dplyr
system.time(
interp_chl <- chl |>
reframe(Interpolate(chla ~ longitude + latitude,
x.out = lon.out, y.out = lat.out), .by = time) |>
as_tibble()
)
interp_chl
Note that daily chl-a data are very gappy and hence there are many NA
s in the interpolated dataset.
Let’s verify that the output grids are now the same:
These approaches seem to work… ‘work’ as in they don’t fail. I have not tested the output data to see if the results are believable; for example, how does the Interpolate()
function handle missing values? I am unsure.
Option 2
Another commonly used function for interpolation lives in the akima package. The advantage of this package is that it can do spline interpolation in addition to linear interpolation (Interpolate()
only does linear interpolation). The disadvantage is that is really does not like NA
s. This is how it would work:
Option 3
The next regridding option is done entirely within a spatial data framework. Traditionally we used the raster package, but this has been phased out in favour of sf (Simple Features for R), stars (Spatiotemporal Arrays: Raster and Vector Data Cubes), and terra (Spatial Data Analysis). Here I shall use sf and stars. Refer to Spatial Data Science for information about these spatial methods; specifically, see Chapter 7, Introduction to sf and stars.
R spatial packages are experiencing a rapid evolution and the learning curve might be steep. I think, however, that it’s well worth one’s time as a host of spatial mapping options become available, bringing R closer in functionality to GIS. I am still learning all the various features myself and I am exploring options for integrating the spatial functionality into our marine heatwave workflows.
Make stars objects from the gridded data. Let’s start with the chl-a data first:
Here are a few interrogation methods to see info about the data’s spatial extent:
We can also plot the stars data directly; here I plot the data on the 34th day in the time series:
Next we also need to get the ERA5 data into a stars format, and we print out some spatial info and make a basic map:
All of this was to bring us to a point where we can do the actual regridding. This is done with the same st_warp()
function. Previously it was used to make the data conform to a specific coordinate reference system (CRS) but here I use it to perform the regridding:
Let us see if it worked as advertised:
Convert the stars object back to a tibble if necessary:
That’s it, folks!
If you have any suggestions about how to do regridding or make the spatial functions more user-friendly in a marine heatwave analysis workflow, please let me know.
Reuse
Citation
@online{smit,_a._j.,
author = {Smit, A. J., and Smit, AJ},
title = {Regridding Gridded Data},
date = {},
url = {http://tangledbank.netlify.app/vignettes/regridding.html},
langid = {en}
}