Back to Tutorials and Data Recipes
Authors: Guoqing Wang (NASA, GSFC), Ian Carroll (NASA, UMBC), Eli Holmes (NOAA), Anna Windle (NASA, GSFC)
An Earthdata Login account is required to access data from the NASA Earthdata system, including NASA ocean color data.
This tutorial demonstrates accessing and analyzing NASA ocean color data using Python from the NASA Ocean Biology Distributed Active Archive Center (OB.DAAC) archives. Currently, there are several ways to find and access ocean color data:
In this tutorial, we will focus on using earthaccess
Python module to access MODIS Aqua ocean color data through NASA’s
Common Metadata Repository (CMR), a metadata system that catalogs Earth
Science data and associated metadata records. The level 2 dataset of
MODIS Aqua is one of the most popular datasets of OB.DAAC. Here we will
use MODIS Aqua L2 Chlorophyll a data of the Chesapeake Bay as
an example. The standard NASA ocean color Chlorophyll a
algorithm is described in the Algorithm
Theoretical Basis Document (ATBD).
At the end of this notebok you will know: * How to find OB.DAAC ocean
color data * How to download files using earthaccess
* How
to create a plot using xarray
We begin by importing all of the packages used in this notebook. If you have created an environment following the guidance provided with this tutorial, then the imports will be successful.
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter
from IPython.display import JSON
import cartopy
import earthaccess
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
Next, we authenticate using our Earthdata Login credentials.
Authentication is not needed to search publicaly available collections
in Earthdata, but is always needed to access data. We can use the
login
method from the earthaccess
package.
This will create an authenticated session when we provide a valid
Earthdata Login username and password. The earthaccess
package will search for credentials defined by environmental
variables or within a .netrc file saved in the
home directory. If credentials are not found, an interactive prompt will
allow you to input credentials.
The persist=True
argument ensures any discovered
credentials are stored in a .netrc
file, so the argument is
not necessary (but it’s also harmless) for subsequent calls to
earthaccess.login
.
auth = earthaccess.login(persist=True)
The MODIS instrument, on board the Aqua satellite, collects ocean
color data, processed from Level-1 through Level-4 and distributed by
the OB.DAAC. In this example, we will use the standard Chlorophyll a
data from Level-2 ocean color files. To find data we will use the
earthaccess
Python library to search on NASA’s CMR API.
NASA data collections, i.e. a series of related granules, are
discoverable with earthaccess.search_datasets
. Various
search parameters can be used to search collections and granules using
metadata attributes. See more details here.
Below, CMR Catalog are queried to find collections with “L2
ocean color” keyword in them, with the MODIS
instrument. The returned response can be used to retrieve the
ShortName
and concept-id
for each dataset.
results = earthaccess.search_datasets(
keyword="L2 ocean color",
instrument="MODIS",
)
Datasets found: 16
Each result has a summary
method with information such
as the collection’s short-name.
set((i.summary()["short-name"] for i in results))
{'MODISA_L2_IOP', 'MODISA_L2_IOP_NRT', 'MODISA_L2_OC', 'MODISA_L2_OC_NRT', 'MODISA_L2_SST', 'MODISA_L2_SST4', 'MODISA_L2_SST4_NRT', 'MODISA_L2_SST_NRT', 'MODIST_L2_IOP', 'MODIST_L2_IOP_NRT', 'MODIST_L2_OC', 'MODIST_L2_OC_NRT', 'MODIST_L2_SST', 'MODIST_L2_SST4', 'MODIST_L2_SST4_NRT', 'MODIST_L2_SST_NRT'}
We are interested in the MODISA_L2_OC
dataset.
We can use spatial and temporal arguments to search for granules covering Chesapeake Bay during the time frame of Oct 15 - 23, 2020. We can also add the cloud_cover parameter to filter out granules with too much cloud coverage. cloud_cover = (0, 50) # max 50% of cloud coverage
tspan = ("2020-10-15", "2020-10-23")
bbox = (-76.75, 36.97, -75.74, 39.01)
cc = (0, 50)
results = earthaccess.search_data(
short_name="MODISA_L2_OC",
temporal=tspan,
bounding_box=bbox,
cloud_cover=cc,
)
Granules found: 2
Now we can print some info about these granules using the built-in methods. We can see how each result prefers to display itself.
results[0]
Or we could use the data_links
and size
methods provided on each result.
data_links = [{"links": i.data_links(), "size (MB):": i.size()} for i in results]
JSON(data_links, expanded=True)
<IPython.core.display.JSON object>
Or we can interactively inspect all the fields underlying a result.
JSON(results)
<IPython.core.display.JSON object>
Since the data are not hosted in the Earthdata Cloud (see output from
results[0]
above), we need to download files. This will
download the data in a folder called “data” in your working
directory.
paths = earthaccess.download(results, "data")
Getting 2 granules, approx download size: 0.1 GB
File AQUA_MODIS.20201017T183001.L2.OC.nc already downloaded File AQUA_MODIS.20201016T174500.L2.OC.nc already downloaded
Step-by-step, we’ll build a nice map showing the log-transformed chlorophyll a estimate for the first granule we downloaded. The first step is to open some of the “groups” present within the NetCDF files to begin preparing a variable to plot.
prod = xr.open_dataset(paths[0])
obs = xr.open_dataset(paths[0], group="geophysical_data")
nav = xr.open_dataset(paths[0], group="navigation_data")
The “navigation_data” group has geospatial coordinates that we merge into the “geophysical_data” group, which has the “chlor_a” product.
nav = (
nav
.set_coords(("longitude", "latitude"))
.rename({"pixel_control_points": "pixels_per_line"})
)
dataset = xr.merge((prod, obs, nav.coords))
Now, we can pull out and fine-tune the “chlor_a” variable for visualization.
array = np.log10(dataset["chlor_a"])
array.attrs.update(
{
"units": f'log10({dataset["chlor_a"].attrs["units"]})',
}
)
The plot
method from XArray’s plotting API is an easy
way to take an xr.Dataset
or xr.DataArray
to a
matplotlib
figure.
plot = array.plot(
x="longitude", y="latitude", aspect=2, size=4, cmap="jet", robust=True
)
We can enrich the visualiation using matplotlib
and
cartopy
. The coordinates are latitude and longitude, so if
we add the “Plate Carree” coordinate reference system (CRS) to our axes,
we will get an improved map.
fig = plt.figure(figsize=(10, 7))
ax = plt.axes(projection=cartopy.crs.PlateCarree())
array.plot(x="longitude", y="latitude", cmap="jet", robust=True, ax=ax)
ax.gridlines(draw_labels={"bottom": "x", "left": "y"})
ax.add_feature(cartopy.feature.STATES, linewidth=0.5)
ax.set_title(dataset.attrs["product_name"], loc="center")
plt.show()
You have completed the notebook on Aqua/MODIS L2 data exploration.