Methods for Assessing the Quality and Consistency of Ocean Color Products

Bryan Franz
NASA Goddard Space Flight Center
Ocean Biology Processing Group
18 January 2005

September 2009 (Update in Progress)


This document provides details on several of the standard methods used by the Ocean Biology Discipline Processing Group (OBPG) at NASA/GSFC to evaluate the oceanic optical properties derived from spaceborne ocean color sensors. Many of these analyses are performed routinely for standard products, but they are also used to evaluate changes in processing algorithms or calibration. The analyses serve to verify the implementation of proposed changes and to provide quantitative feedback as to the impact of those changes on field-data comparisons, sensor-to-sensor agreement, temporal and spatial stability in derived product retrievals, and long-term sensor stability.

Although not discussed here, any evaluation of instrument calibration or processing algorithm changes is normally preceded by a re-evaluation of the vicarious calibration (Franz et al., 2007). This effectively removes any bias on the mission-mean normalized water-leaving radiance retrievals at the vicarious calibration site. When comparing products from different sensors, any algorithm changes that are applicable to both sensors are applied equally, and both sensors are vicariously recalibrated to a common source (e.g., the Marine Optical Buoy, MOBY).

The plots and images shown in this document come from various processing and testing events. They are provided as examples only, and thus they do not reflect the current state of product quality. This document is intended to describe the analysis methods. The analysis results are posted elsewhere.

II. Comparison with in situ Observations

The primary mechanism for assessing the quality of retrieved ocean color properties is through comparison with ground-truth measurements. A detailed description of the in situ match-up process is provided in Bailey and Werdell (2006), and current operational results are posted on the OBPG Validation Website. It should be recognized, however, that the temporal and geographic distribution of the in situ dataset is limited. These match-ups are generally not sufficient for assessing the quality of satellite remote sensed ocean color data over the full range of geometries through which the spaceborne sensor views the earth, or over the full temporal and geographic distribution of the Level-3 products, nor do they account for the effects of temporal and spatial averaging or systematic errors associated with Level-3 masking decisions.

III. Level-2 Regional Analysis

Going beyond traditional match-ups, the OBPG also looks at regional analyses in locations where significant concentrations of in situ measurements exist. As described in Werdell et al. (2007), the in situ data can be used to characterize the region, and the bulk statistics (e.g., seasonal histogram distributions, monthly-mean trends) can then be compared against equivalent statistics from the Level-2 satellite sensor retrievals. A one-for-one match-up between satellite retrievals and field observations is not required, so the "match-up" return is much higher. Examples of such analyses are provided in Figures 1 and 2. Figure 1 is a seasonal comparison of satellite retrievals collected within the northern region of Chesapeake Bay over a 12-year period against in situ measurements of chlorophyll-a collected over the same time-span. The colored lines show results for different satellite processing algorithms (a change in the near-infrared water-leaving radiance algorithm, in this case). The black line is the distribution of in situ observations. Similarly, Figure 2 shows a timeseries of monthly mean satellite retrievals of aerosol properties relative to monthly mean in situ measurements, where the latter come from an AeroNET sun photometer located within the region.

Figure 1: Example of a regional analysis against bulk in situ statistics. The plots show seasonal distributions of SeaWiFS chlorophyll-a retrievals, before (blue) and after (red) a particular algorithm change, with the regional distribution of in situ measurements (black).
Figure 2: Example of a regional time-series analysis. The plots show SeaWiFS time-series aerosol property retrievals, before (blue) and after (red) a particular algorithm change, with the regional distribution of in situ measurements (black).

III. Level-3 Temporal Trending

Level-3 trend analysis looks at long-term trends on global and regional spatial scales. It provides a standard mechanism for evaluating derived product consistency and sensor stability, and it quantifies the relative impact of calibration and algorithm changes on life-of-mission time scales. The Level-3 products are global binned, multi-day averages at 4.6 or 9-km resolution, with bins distributed in an equal-area, integerized sinusoidal projection (Campbell et al., 1995). The typical composite period is 8-days, but for quick turn-around test processing the OBPG uses a temporal subset of the mission lifespan consisting of 4-day composites generated from the start of each consecutive 32-day period (i.e., 12.5% of the mission dataset). The temporal subset is generated at 9-km resolution, and it can be processed within 1-day. The 4-day compositing period generally provides sufficient opportunity to observe most of the day-lit side of the earth, including coverage in orbit and glint gaps. The analysis is focused on the trends in normalized water-leaving radiances (nLw) or remote sensing reflectances (Rrs), but trends in bio-optical and atmospheric products are also evaluated.

From these multi-day global composites, a subset of the filled bins is selected and the binned products are averaged and trended with time. For bin selection, five global subsets are defined, corresponding to 1) all bins, 2) all deep water bins, and those bins from locations that are typically associated with 3) oligotrophic, 4) mesotrophic, 5) eutrophic conditions. The deep water subset consists of all bins where water depth is greater than 1000 meters. The three trophic subsets are predefined based on a previous analysis of the SeaWiFS global mission-averaged chlorophyll. The oligotrophic subset is all bins where 0.0 < chl < 0.1 mg/m^3. Similarly, mesotrophic and eutrophic subsets correspond to mean chl ranges between 0.1 to 1 and 1 to 10 mg/m3, respectively.

Figure 3: SeaWiFS mission mean chlorophyll showing the distribution of mesotrophic bins.

An example of a trend analysis is the SeaWiFS annual cycle for Rrs shown in Figure 3. In the absence of any major geophysical events, we expect the trend in global deep-water or global oligotrophic-water Rrs to repeat from year to year. Low-level differences may be due to geographic sampling biases or real geophysical changes, but on the large-scale these plots tell us that SeaWiFS products are, to first order, self-consistent over time.

Figure 4: SeaWiFS annual cycle analysis for oligotrophic and deep-water subsets. Plots show trends in remote sensing reflectance for the six visible channels.

IV. Temporal Anomaly Analysis

To evaluate temporal trends in a more quantitative manner, the mean annual cycle for each parameter is subtracted from the respective time series to produce a temporal anomaly trend. An example is shown in Figure 5. The variations observed in these trends may be due to real, geophysical variability, or residual instrument calibration errors. The trend plots are often compared to known spacecraft or instrument state changes to assess whether instrumental artifacts are present in the derived products, and to verify the impact of characterization and correction efforts. Rrs or nLw anomaly trends in the 500-560nm range in clear water are of particular interest, as this spectral range is relatively insensitive to changes in chlorophyll concentration, which is the primary driver of optical properties in clear water. The oligotrophic Rrs(555) trend in Figure 5, for example, confirms that the SeaWiFS time-series is consistent with no trend, since the gray region encompasses the Rrs(555)=0 line.

Figure 5: SeaWiFS anomaly trend relative to the mean annual cycle, for chlorophyll-a and nLw(555) in oligotrophic waters. Black symbols are the instantaneous subset mean minus the multi-year mean, with error bars indicating standard uncertainty on the mean. The Blue line is an 11-pt box-car average through the data points. The grey region is the range of linear fits that can be drawn through the data points (least squares fit, plus and minus twice the uncertainties on the fit coefficiants).

V. Trend Comparisons and Common Bins

Another useful tool for separating geophysical changes from sensor calibration and algorithm artifacts is to compare Level-3 trends between missions. Similarly, comparison of trends derived with different processing algorithms or sensor calibrations can help to verify proper implementation and to assess the impact of such changes on the global science products. This analysis looks at average values in coincident Level-3 retrievals on global and regional spatial scales, and presents the results as a comparative time-series over the common mission lifespan. We begin with Level-3 products composited over a common time period (usually 4 or 8 days). All OBPG Level-3 ocean color products use the same, equal area binning approach (Campbell et. al, 1995), but standard MODIS products are distributed at 4.6-km resolution while SeaWiFS is distributed at 9-km resolution. To allow for a direct, bin-for-bin comparison, the MODIS products are re-binned to the SeaWiFS 9-km resolution using standard binning algorithms. With Level-3 composited data products in an equivalent form, the data sets are further reduced to a set of common bins. This means that only those bins for which a retrieval exists for both sensors or both test processing configurations are included in subsequent averaging and trending. This is critical to the statistics, as some sensors show systematic data gaps even after 8-days of compositing, and this can result in geographic sampling bias if both sensors are not equivalently masked. Figure 6 shows an example of a trend comparison for remote sensing reflectance retrievals from two different SeaWiFS processing configurations (Reprocessing 2007 vs Reprocessing 2009).

Figure 6: Example of a comparison for SeaWiFS remote sensing reflectance trends in the six visible wavelengths. The plot on the left is a direct comparison, while the plot on the right is a ratio of the two cases.

Figure 7 shows a comparison of nLw trends between SeaWiFS and MODIS/Aqua, for a pair of test processings. (The ST and AT in the plot title correspond to SeaWiFS Test and Aqua Test, respectively, and the numbers are sequence numbers for tracking the test configurations). When comparing spectral parameter trends between different missions, the closest equivalent band is compared. No effort is made to force the sensor retrievals to a common band-pass, so some level of difference is expected when the nominal center wavelengths are not identical.

Figure 7: Example of a comparison between MODIS/Aqua and SeaWiFS normalized water-leaving radiance trends for seven visible wavelengths of MODIS. The plot on the left is a direct comparison, while the plot on the right is a ratio of the two sensors. The nearest wavelength bands of SeaWiFS are selected for the comparison, so MODIS 531-nm band is being compared to SeaWiFS 51-nm band, and both MODIS 667 and 678-nm bands are compared to SeaWIFS 670-nm band.

It is often useful to further stratify the trend comparisons into geographic regions. The current stratification employed within the OBPG includes a number of regions selected for proximity to vicarious calibration sites, as well as a set of latitudinally distributed zones covering a wide range of solar viewing conditions. The regions are described in Table 2.

AtlN55 50.0 60.0 -50.0 -20.0
PacN45 40.0 50.0 -179.0 -140.0
PacN35 30.0 40.0 -179.0 -140.0
PacN25 20.0 30.0 -179.0 -140.0
PacN15 10.0 20.0 -179.0 -140.0
PacEqu -10.0 10.0 -179.0 -140.0
PacS15 -20.0 -10.0 -179.0 -140.0
PacS25 -30.0 -20.0 -179.0 -140.0
PacS35 -40.0 -30.0 -179.0 -140.0
PacS45 -50.0 -40.0 -179.0 -140.0
AtlS55 -60.0 -50.0 -20.0 10.0
Hawaii 15.0 25.0 -163.0 -153.0
SIO -35.0 -25.0 75.0 85.0
SPG -32.0 -22.0 -139.0 -129.0
AtlNE 36.0 40.0 -74.0 -70.0

Table 2: Regional Subset Definitions

Figure 8 shows an example of the utility of the zonal subsets to isolate the impact of a specific algorithm change. The plots show the ratio of MODIS/Aqua water-leaving radiance retrievals before and after adding the correction for NO2. The plot on the left is derived from the equatorial Pacific, while the plot on the right is a time-series at high latitude (PacN45). The results here serve to quantify the impact of relatively small stratospheric NO2 absorption effect that, when magnified by the extended solar path length at high latitudes, results in significant seansonal variations.

Figure 8: MODIS/Aqua water-leaving radiance ratios, before and after NO2 corrections. The left panel shows the effect at the equator while the right panel shows the effect at high latitude.

Taken alone, these comparitive temporal analyses can not be used to determine absolute error, since relative differences may be due to errors in either dataset or real geophysical effects which are not yet understood. However, when taken in concert with the self consistency analyses and in situ comparisons, the sensor-to-sensor comparisons can serve to identify and isolate the likely cause for differences between sensors and possible sources of geographic or temporally dependent discrepancies with field measurements.

VII. Level-2 to Level-3 Comparison

This analysis seeks to quantify and track changes in residual cross-scan artifacts and, in the case of MODIS and OCTS, detector-to-detector relative differences (i.e., striping). The approach takes advantage of the fact that residual errors at specific scan angles or specific detectors will tend to average-out over time and space. A procedure was therefore developed to generate match-ups between Level-2 observations and Level-3 bins, where the Level-3 product is typically a 7 or 15-day mean at 9-km resolution, temporally centered on the date of the Level-2 granule. The software gathers all relevant information relating to the match-up, including scan-pixel, detector number, and mirror side of the Level-2 observation. Match-ups for all granules collected over a complete day are screened, and those cases corresponding to deep, clear water (chlorophyll < 0.15 mg/m^3) with minimal glint contamination are accepted. Standard binner masking is also employed, with the object being to obtain a large number of Level-2 to Level-3 match-ups from homogeneous, temporally stable waters, where the Level-3 retrieval is likely to be a good representation of what the Level-2 retrieval should be. The Level-2 to Level-3 ratios for each derived products can then be averaged within scan-pixel or detector number. Figure 9 shows an example of cross-scan trends derived in this manner. The case shown is nLw at 443 and 547 nm for MODIS/Aqua, from a recent test processing.
Figure 9: Example of scan-angle-dependent residuals in MODIS/Aqua normalized water-leaving radiances. Data is from a recent test processing for day 289 o5 2005. Red and blue symbols are mean Level-2 to Level-3 ratios witin each scan pixel, stratified by scan mirror side. The dashed line shows one standard deviation on the mean.

As another example, Figure 10 shows the same data as a function of detector number (MODIS has 10 detectors in each of the standard ocean color bands. These analyses are typically done for the start and end of the mission, and once per year through-out the mission to assess the impact of uncorrected changes in sensor response across scan or between detectors.

Figure 10: Example of detector-dependent residuals in MODIS/Aqua normalized water-leaving radiances. Data is from a recent test processing for day 289 o5 2005. Red and blue symbols are mean Level-2 to Level-3 ratios witin each detector of the resepective waveband, stratified by scan mirror side. The error-bars show one standard deviation on the mean.


Bailey, S. W., & Werdell, P. J. (2006). A multi-sensor approach for the on-orbit validation of ocean color satellite data products. Remote Sensing of Environment, 102(1-2), 12-23.

Campbell, J.W., J.M. Blaisdell, and M. Darzi (1995). Level-3 SeaWiFS Data Products: Spatial and Temporal Binning Algorithms. NASA Tech. Memo. 104566, Vol. 32, S.B. Hooker, E.R. Firestone, and J.G. Acker, Eds., NASA Goddard Space Flight Center, Greenbelt, Maryland.

Franz, B.A., S.W. Bailey, P.J. Werdell, and C.R. McClain, F.S. (2007). Sensor-Independent Approach to Vicarious Calibration of Satellite Ocean Color Radiometry, Appl. Opt., 46 (22).

Meister, G., Kwiatkowska, E.J., Franz, B.A., Patt, F.S., Feldman G.C., McClain, C.R. (2005). Moderate-Resolution Imaging Spectroradiometer ocean color polarization correction, Appl. Opt., 44 (26) 5524-5535.

Werdell, P.J., Franz, B.A., Bailey, S.W., Harding, L.W.Jr., and Feldman, G.C. (2007). Approach for the long-term spatial and temporal evaluation of ocean color satellite data products in a coastal environment, Coastal Ocean Remote Sensing.