Global Earth and Ocean Data

An Introduction to Earth and Ocean Science Datasets

This vignette intorduces the variety gridded data products available freely on the internet.

Author

AJ Smit

Published

Invalid Date

1 Introduction to Earth and Ocean Science Datasets

The wealth of data about the global ocean we have access to today has gradually accumulated since 1784, which is the earliest date of the sea surface temperature data that populate the International Comprehensive Ocean Archive Network (ICOADS). Astounding improvements in data coverage and quality have resulted from the application of scientific principles and technological developments to the sounding the ocean depths, the pin-pointing of locational information, the measurement of variations in the physical properties of seawater, the gathering of ever-larger quantities of data in well-described datasets, and their processing, analysis, and interpretation. Current-day datasets come in various ‘flavours’ across a range of spatial, spectral, and temporal resolutions. Some, such as ICOADS, is comprised entirely of data sources, while others contain data obtained remotely, usually by instruments mounted on Earth-orbiting satellites, aeroplanes, or autonomous aerial vehicles. Many are blends of and satellite-derived sources. Such data may cover geophysical phenomena such as sea surface temperature or topography, or they may be maps of the ocean floor. Others may be about biological variables, such as ocean colour that relates to phytoplankton community properties.

The wealth of Earth and Ocean science datasets covers diverse topics such as climate, weather, geology, oceanography, and more. Some of the most important datasets include:

The Global Historical Climatology Network (GHCN) dataset, which provides temperature and precipitation data for thousands of weather stations around the world dating back to the late 1800s.
The Advanced Microwave Scanning Radiometer (AMSR) data, which provide information on global precipitation, sea surface temperature, and sea ice concentration.
Data from the SeaWiFS sensors, which provides information on global ocean color and chlorophyll concentrations, which can be used to study ocean productivity and the health of marine ecosystems.
Landsat data, which provide detailed images of the Earth’s surface, including information on vegetation, land use, and other geographical features.
The MODIS Aqua and Terra datasets, which provide information on a wide variety of Earth’s features, including vegetation, land use, sea ice, and more, at a high spatial resolution.
Various National Oceanic and Atmospheric Administration (NOAA) datasets, which provide a wide range of data on the oceans, including sea surface temperature, currents, salinity, and more.

These datasets are important for researchers, policymakers, and the general public to better understand the Earth and its systems, and to aid in decision making and resource management.

Regardless of how these data have been obtained or what they represent, a common feature is that they are stored digitally in common file formats that are described in a similar manner, and therefore can be accessed using a small collection of software tools.

2 Levels of Processing

These datasets undergo a complex series of processing from then they are first captured by the instruments onboard Earth-orbiting satellites (Level 1) up to the stage where they are used by users (L3 and L4).

L1, L2, L3, and L4 are different levels or stages of processing for gridded data products. These levels refer to the amount of processing that has been applied to the raw data before it is made available to users.

L1 data products are the rawest form of data, and often consist of sensor measurements or observations. They have not undergone any processing or calibration and may contain errors or inconsistencies.

L2 data products are an intermediate stage of processing, where some basic corrections and calibrations have been applied to the L1 data. This can include removing instrumental biases or correcting for sensor drift. This level of data is often used for initial analysis and quality control.

L3 data products are created by processing L2 data, which has undergone some basic corrections and calibrations, converted to the geophysical variable of interest, and then gridded to a regular spatial resolution. This level of data is often used for operational applications and in the creation of higher-level data products. The data are usually gridded in time-series format, often daily, and cover a specific region or global coverage. This level of data is commonly used for monitoring and analysing the variability of physical and bio-geochemical ocean properties and for the creation of derived variables and indices.

L4 data products are the highest level of processing, where L3 data are further processed to provide specific geophysical or environmental variables in a gap-free spatial and temporal format. This means that the data have to be interpollated to fill any spatial and temporal gaps. These gaps exist in the L3 data because some atmospheric (e.g. cloud cover) or ocean (e.g. sea glint reflectance) conditions can prevent the retrieval of information, or the quality of the data is deemed too low and the data are then discarded. Sometimes L4 data can undergo a blending of the measured data with the modelled or statistical representations of these measured geophysical phenomena in regions where, and during times when, direct measurements are absent in the gaps. L4 data are often used for long-term climate studies and other research applications. The data is usually gridded in a climatological format, covering a specific period of time, such as monthly or seasonal and covers a specific region or global coverage. This level of data is commonly used for climate studies, long-term variability and trend analysis, and model validation.

In summary, L1 data products are raw data, L2 data products are corrected and calibrated data, L3 data products are specific geophysical or environmental variables gridded data with some gaps in space and time, and L4 data products are gap free geophysical variables.

3 About This Module

In this Module, we will:

introduce the mostly commonly format in which gridded data are stored,
look at some of the most commonly sites where these data are housed,
provide examples of accessing the data using
- the python MOTU client (in R),
- OPeNDAP
- ERDDAP

Other download options are also available—notably via web interfaces, FTP, and WMS—but I’ll restrict the examples to those available within R .

Reuse

CC BY-NC-SA 4.0

Citation

BibTeX citation:

@online{smit,
  author = {Smit, A. J. and Smit, AJ},
  title = {Global {Earth} and {Ocean} {Data}},
  date = {},
  url = {https://tangledbank.netlify.app/vignettes/gridded_data_intro.html},
  langid = {en}
}

For attribution, please cite this work as:

Smit AJ, Smit A Global Earth and Ocean Data. https://tangledbank.netlify.app/vignettes/gridded_data_intro.html.

--- author: "AJ Smit" title: "Global Earth and Ocean Data" subtitle: "An Introduction to Earth and Ocean Science Datasets" date: "`r Sys.Date()`" description: "This vignette intorduces the variety gridded data products available freely on the internet." bibliography: ../references.bib csl: ../styles/marine-biology.csl format: html: code-fold: false toc-title: "On this page" standalone: true toc-location: right page-layout: full --- ## Introduction to Earth and Ocean Science Datasets The wealth of data about the global ocean we have access to today has gradually accumulated since 1784, which is the earliest date of the sea surface temperature data that populate the International Comprehensive Ocean Archive Network (ICOADS). Astounding improvements in data coverage and quality have resulted from the application of scientific principles and technological developments to the sounding the ocean depths, the pin-pointing of locational information, the measurement of variations in the physical properties of seawater, the gathering of ever-larger quantities of data in well-described datasets, and their processing, analysis, and interpretation. Current-day datasets come in various ‘flavours’ across a range of spatial, spectral, and temporal resolutions. Some, such as ICOADS, is comprised entirely of \emph{in situ} data sources, while others contain data obtained remotely, usually by instruments mounted on Earth-orbiting satellites, aeroplanes, or autonomous aerial vehicles. Many are blends of \emph{in situ} and satellite-derived sources. Such data may cover geophysical phenomena such as sea surface temperature or topography, or they may be maps of the ocean floor. Others may be about biological variables, such as ocean colour that relates to phytoplankton community properties. The wealth of Earth and Ocean science datasets covers diverse topics such as climate, weather, geology, oceanography, and more. Some of the most important datasets include: * The Global Historical Climatology Network (GHCN) dataset, which provides temperature and precipitation data for thousands of weather stations around the world dating back to the late 1800s. * The Advanced Microwave Scanning Radiometer (AMSR) data, which provide information on global precipitation, sea surface temperature, and sea ice concentration. * Data from the SeaWiFS sensors, which provides information on global ocean color and chlorophyll concentrations, which can be used to study ocean productivity and the health of marine ecosystems. * Landsat data, which provide detailed images of the Earth's surface, including information on vegetation, land use, and other geographical features. * The MODIS Aqua and Terra datasets, which provide information on a wide variety of Earth's features, including vegetation, land use, sea ice, and more, at a high spatial resolution. * Various National Oceanic and Atmospheric Administration (NOAA) datasets, which provide a wide range of data on the oceans, including sea surface temperature, currents, salinity, and more. These datasets are important for researchers, policymakers, and the general public to better understand the Earth and its systems, and to aid in decision making and resource management. Regardless of how these data have been obtained or what they represent, a common feature is that they are stored digitally in common file formats that are described in a similar manner, and therefore can be accessed using a small collection of software tools. ## Levels of Processing These datasets undergo a complex series of processing from then they are first captured by the instruments onboard Earth-orbiting satellites (Level 1) up to the stage where they are used by users (L3 and L4). L1, L2, L3, and L4 are different levels or stages of processing for gridded data products. These levels refer to the amount of processing that has been applied to the raw data before it is made available to users. L1 data products are the rawest form of data, and often consist of sensor measurements or observations. They have not undergone any processing or calibration and may contain errors or inconsistencies. L2 data products are an intermediate stage of processing, where some basic corrections and calibrations have been applied to the L1 data. This can include removing instrumental biases or correcting for sensor drift. This level of data is often used for initial analysis and quality control. L3 data products are created by processing L2 data, which has undergone some basic corrections and calibrations, converted to the geophysical variable of interest, and then gridded to a regular spatial resolution. This level of data is often used for operational applications and in the creation of higher-level data products. The data are usually gridded in time-series format, often daily, and cover a specific region or global coverage. This level of data is commonly used for monitoring and analysing the variability of physical and bio-geochemical ocean properties and for the creation of derived variables and indices. L4 data products are the highest level of processing, where L3 data are further processed to provide specific geophysical or environmental variables in a gap-free spatial and temporal format. This means that the data have to be interpollated to fill any spatial and temporal gaps. These gaps exist in the L3 data because some atmospheric (e.g. cloud cover) or ocean (e.g. sea glint reflectance) conditions can prevent the retrieval of information, or the quality of the data is deemed too low and the data are then discarded. Sometimes L4 data can undergo a blending of the measured data with the modelled or statistical representations of these measured geophysical phenomena in regions where, and during times when, direct measurements are absent in the gaps. L4 data are often used for long-term climate studies and other research applications. The data is usually gridded in a climatological format, covering a specific period of time, such as monthly or seasonal and covers a specific region or global coverage. This level of data is commonly used for climate studies, long-term variability and trend analysis, and model validation. In summary, L1 data products are raw data, L2 data products are corrected and calibrated data, L3 data products are specific geophysical or environmental variables gridded data with some gaps in space and time, and L4 data products are gap free geophysical variables. ## About This Module In this Module, we will: * introduce the mostly commonly format in which gridded data are stored, * look at some of the most commonly sites where these data are housed, * provide examples of accessing the data using * the python MOTU client (in R), * OPeNDAP * ERDDAP Other download options are also available---notably via web interfaces, FTP, and WMS---but I'll restrict the examples to those available within R \citep{R2020}.