Regridding gridded data
Aligning and matching spatial grids
First, I define the Benguela region and time extent of interest:
Then I find some data.
VIIRS chlorophyll-a data
I select the VIIRS chl-a data. These data start in 2012 and it has a spatial resolution of ~4km lat/lon. The data, “VIIRSN, Suomi-NPP, Level-3 SMI, NASA, Global, 4km, Chlorophyll a, OCI Algorithm, R2018, 2012-present, Daily,” were retrieved from here.
And now I download it for the region and time period specified earlier:
ERA5 reanalysis wind data
I use the “ERA5 hourly data on single levels from 1940 to present” data, which come in at an hourly resolution for the whole world, starting in 1940. The spatial resolution is an unimpressive 0.25° × 0.25° lat/lon, far coarser than the chl-a data.
The ERA5 reanalysis wind data (u and v components) were downloaded from Copernicus using a python script, which can be generated on the website using the “Show API request” option after selecting the variables and spatio-temporal ranges of interest:
import cdsapi
c = cdsapi.Client()
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'variable': [
'10m_u_component_of_wind', '10m_v_component_of_wind',
],
'year': '2021',
'month': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
],
'day': [
'01', '02', '03',
'04', '05', '06',
'07', '08', '09',
'10', '11', '12',
'13', '14', '15',
'16', '17', '18',
'19', '20', '21',
'22', '23', '24',
'25', '26', '27',
'28', '29', '30',
'31',
],
'time': [
'00:00', '01:00', '02:00',
'03:00', '04:00', '05:00',
'06:00', '07:00', '08:00',
'09:00', '10:00', '11:00',
'12:00', '13:00', '14:00',
'15:00', '16:00', '17:00',
'18:00', '19:00', '20:00',
'21:00', '22:00', '23:00',
],
'area': [
-20, 15, -37.5,
20,
],
'format': 'netcdf',
},
'download.nc')
The end product of the python download is a 52.3 Mb netCDF file, which I will now load and process to produce the daily temperature values to match the daily resolution of the VIIRS chl-a data:
# time units: hours since 1900-01-01 00:00:00.0
origin <- as.POSIXct("1900-01-01 00:00:00", tz = "UTC")
ncFile <- "/Volumes/OceanData/ERA5/ERA5_2021_Benguela.nc"
# ncFile <- "~/Downloads/ERA5_2021_Benguela.nc"
era5 <- tidync(ncFile) |>
hyper_tibble() %>%
mutate(time = floor_date(time * 3600 + origin, "day")) |>
reframe(u10 = mean(u10),
v10 = mean(v10),
.by = c(time, longitude, latitude)) |>
as.data.table()
Note that I coerce the data to a date.table object since the regridding step (Option 1) uses
Regridding
Let’s check out their respective spatial resolutions:
[1] 14.97917 15.02084 15.06251 15.10417 15.14584 15.18751
[1] 15.00 15.25 15.50 15.75 16.00 16.25 16.50 16.75 17.00 17.25 17.50 17.75
[13] 18.00 18.25 18.50 18.75 19.00 19.25 19.50 19.75 20.00
[1] -19.97917 -20.02084 -20.06250 -20.10417 -20.14584 -20.18750
[1] -37.50 -37.25 -37.00 -36.75 -36.50 -36.25
There is a huge difference. It is seldom a good idea to go from a low resolution like 25 km to a higher resolution like 4 km, but if you insist, you can do it with bi-linear interpolation. Here we will degrade the 4 km product to match the 25 km resolution of the wind data.
Option 1
Define a new output grid. This will be the coarsest resolution one from ERA5:
Using the metR package and its Interpolate()
function, interpolate the data to the new coarser resolution grid. I show two approaches: i) a method shown in the function’s help file that uses data.table, and ii) a dplyr (tidyverse) method using the new reframe()
function. What reframe()
does when used within a dplyr data pipe is reveal the column names, which can then be given to the function of interest in the usual way; it then returns a dataframe or tibble of arbitrary length. reframe()
also accommodates the grouping structure within the function itself through the use of the .by =
argument (i.e. no need for an a priori group_by()
), making the syntax not dissimilar to that of data.table’s. A few years ago the data.table approach would have been faster, but it seems the new versions of dplyr have undergone some significant speed improvements. See the results of the system.time()
function:
library(metR)
# using the data.table method
system.time(
interp_chl <- chl[, Interpolate(chla ~ longitude + latitude, lon.out, lat.out), by = time]
)
user system elapsed
5.955 0.823 5.662
time longitude latitude chla
<Date> <num> <num> <num>
1: 2021-01-01 15.00 -37.5 NA
2: 2021-01-01 15.25 -37.5 NA
3: 2021-01-01 15.50 -37.5 NA
4: 2021-01-01 15.75 -37.5 NA
5: 2021-01-01 16.00 -37.5 NA
6: 2021-01-01 16.25 -37.5 NA
# using dplyr
system.time(
interp_chl <- chl |>
reframe(Interpolate(chla ~ longitude + latitude,
x.out = lon.out, y.out = lat.out), .by = time) |>
as_tibble()
)
user system elapsed
6.008 0.892 6.086
# A tibble: 544,215 × 4
time longitude latitude chla
<date> <dbl> <dbl> <dbl>
1 2021-01-01 15 -37.5 NA
2 2021-01-01 15.2 -37.5 NA
3 2021-01-01 15.5 -37.5 NA
4 2021-01-01 15.8 -37.5 NA
5 2021-01-01 16 -37.5 NA
6 2021-01-01 16.2 -37.5 NA
7 2021-01-01 16.5 -37.5 NA
8 2021-01-01 16.8 -37.5 NA
9 2021-01-01 17 -37.5 NA
10 2021-01-01 17.2 -37.5 NA
# ℹ 544,205 more rows
Note that daily chl-a data are very gappy and hence there are many NA
s in the interpolated dataset.
Let’s verify that the output grids are now the same:
[1] 15.00 15.25 15.50 15.75 16.00 16.25
[1] 15.00 15.25 15.50 15.75 16.00 16.25
[1] -37.50 -37.25 -37.00 -36.75 -36.50 -36.25
[1] -37.50 -37.25 -37.00 -36.75 -36.50 -36.25
These approaches seem to work… ‘work’ as in they don’t fail. I have not tested the output data to see if the results are believable; for example, how does the Interpolate()
function handle missing values? I am unsure.
Option 2
Another commonly used function for interpolation lives in the akima package. The advantage of this package is that it can do spline interpolation in addition to linear interpolation (Interpolate()
only does linear interpolation). The disadvantage is that is really does not like NA
s. This is how it would work:
Option 3
The next regridding option is done entirely within a spatial data framework. Traditionally we used the raster package, but this has been phased out in favour of sf (Simple Features for R), stars (Spatiotemporal Arrays: Raster and Vector Data Cubes), and terra (Spatial Data Analysis). Here I shall use sf and stars. Refer to Spatial Data Science for information about these spatial methods; specifically, see Chapter 7, Introduction to sf and stars.
R spatial packages are experiencing a rapid evolution and the learning curve might be steep. I think, however, that it’s well worth one’s time as a host of spatial mapping options become available, bringing R closer in functionality to GIS. I am still learning all the various features myself and I am exploring options for integrating the spatial functionality into our marine heatwave workflows.
Make stars objects from the gridded data. Let’s start with the chl-a data first:
Here are a few interrogation methods to see info about the data’s spatial extent:
stars object with 3 dimensions and 1 attribute
attribute(s), summary of first 1e+05 cells:
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
chla 0.09295794 0.1467107 0.2234708 0.5681996 0.375212 7.863433 99166
dimension(s):
from to offset delta refsys x/y
x 1 121 14.9583 0.0416985 WGS 84 [x]
y 1 422 -19.9583 -0.0416985 WGS 84 [y]
time 1 365 2021-01-01 1 days Date
x y time
121 422 365
from to offset delta refsys x/y
x 1 121 14.9583 0.0416985 WGS 84 [x]
y 1 422 -19.9583 -0.0416985 WGS 84 [y]
time 1 365 2021-01-01 1 days Date
xmin ymin xmax ymax
14.95834 -37.55509 20.00385 -19.95834
We can also plot the stars data directly; here I plot the data on the 34th day in the time series:
Next we also need to get the ERA5 data into a stars format, and we print out some spatial info and make a basic map:
era5_st <- era5 |>
st_as_stars(dims = c("longitude", "latitude", "time"),
raster = c("u10", "v10")) |>
sf::st_set_crs(4326) |>
st_warp(crs = st_crs(4326))
print(era5_st)
stars object with 3 dimensions and 2 attributes
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
u10 -17.47988 -2.111889 -0.1112329 0.009983922 1.863335 17.6071 7665
v10 -11.88497 -1.194172 0.7711142 1.171964005 3.200816 15.3441 7665
dimension(s):
from to offset delta refsys x/y
x 1 21 14.875 0.250205 WGS 84 [x]
y 1 72 -19.875 -0.250205 WGS 84 [y]
time 1 365 2021-01-01 UTC 1 days POSIXct
xmin ymin xmax ymax
14.87500 -37.88974 20.12930 -19.87500
All of this was to bring us to a point where we can do the actual regridding. This is done with the same st_warp()
function. Previously it was used to make the data conform to a specific coordinate reference system (CRS) but here I use it to perform the regridding:
Let us see if it worked as advertised:
stars object with 3 dimensions and 1 attribute
attribute(s):
Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
chla 0.04547131 0.1944372 0.3054729 0.8114727 0.6934659 93.51793 488906
dimension(s):
from to offset delta refsys x/y
x 1 21 14.875 0.250205 WGS 84 [x]
y 1 72 -19.875 -0.250205 WGS 84 [y]
time 1 365 2021-01-01 1 days Date
xmin ymin xmax ymax
14.87500 -37.88974 20.12930 -19.87500
Convert the stars object back to a tibble if necessary:
# A tibble: 551,880 × 4
x y time chla
<dbl> <dbl> <date> <dbl>
1 15.0 -20.0 2021-01-01 NA
2 15.3 -20.0 2021-01-01 NA
3 15.5 -20.0 2021-01-01 NA
4 15.8 -20.0 2021-01-01 NA
5 16.0 -20.0 2021-01-01 NA
6 16.3 -20.0 2021-01-01 NA
7 16.5 -20.0 2021-01-01 NA
8 16.8 -20.0 2021-01-01 NA
9 17.0 -20.0 2021-01-01 NA
10 17.3 -20.0 2021-01-01 NA
# ℹ 551,870 more rows
That’s it, folks!
If you have any suggestions about how to do regridding or make the spatial functions more user-friendly in a marine heatwave analysis workflow, please let me know.
Reuse
Citation
@online{j._smit,
author = {J. Smit, Albertus and Smit, AJ},
title = {Regridding Gridded Data},
date = {},
url = {http://tangledbank.netlify.app/vignettes/regridding.html},
langid = {en}
}