Methods for Assessing the Quality and Consistency of Ocean Color Products
NASA Goddard Space Flight Center
Ocean Biology Processing Group
18 January 2005
September 2009 (Update in Progress)
This document provides details on several of the standard methods used by the Ocean Biology Discipline Processing Group (OBPG) at NASA/GSFC to evaluate the oceanic optical properties derived from spaceborne ocean color sensors. Many of these analyses are performed routinely for standard products, but they are also used to evaluate changes in processing algorithms or calibration. The analyses serve to verify the implementation of proposed changes and to provide quantitative feedback as to the impact of those changes on field-data comparisons, sensor-to-sensor agreement, temporal and spatial stability in derived product retrievals, and long-term sensor stability.
Although not discussed here, any evaluation of instrument calibration or processing algorithm changes is normally preceded by a re-evaluation of the vicarious calibration (Franz et al., 2007). This effectively removes any bias on the mission-mean normalized water-leaving radiance retrievals at the vicarious calibration site. When comparing products from different sensors, any algorithm changes that are applicable to both sensors are applied equally, and both sensors are vicariously recalibrated to a common source (e.g., the Marine Optical Buoy, MOBY).
The plots and images shown in this document come from various processing and testing events. They are provided as examples only, and thus they do not reflect the current state of product quality. This document is intended to describe the analysis methods. The analysis results are posted elsewhere.
II. Comparison with in situ Observations
The primary mechanism for assessing the quality of retrieved ocean color properties is through comparison with ground-truth measurements. A detailed description of the in situ match-up process is provided in Bailey and Werdell (2006), and current operational results are posted on the OBPG Validation Website. It should be recognized, however, that the temporal and geographic distribution of the in situ dataset is limited. These match-ups are generally not sufficient for assessing the quality of satellite remote sensed ocean color data over the full range of geometries through which the spaceborne sensor views the earth, or over the full temporal and geographic distribution of the Level-3 products, nor do they account for the effects of temporal and spatial averaging or systematic errors associated with Level-3 masking decisions.
III. Level-2 Regional Analysis
Going beyond traditional match-ups, the OBPG also looks at regional analyses in locations where significant concentrations of in situ measurements exist. As described in Werdell et al. (2007), the in situ data can be used to characterize the region, and the bulk statistics (e.g., seasonal histogram distributions, monthly-mean trends) can then be compared against equivalent statistics from the Level-2 satellite sensor retrievals. A one-for-one match-up between satellite retrievals and field observations is not required, so the "match-up" return is much higher. Examples of such analyses are provided in Figures 1 and 2. Figure 1 is a seasonal comparison of satellite retrievals collected within the northern region of Chesapeake Bay over a 12-year period against in situ measurements of chlorophyll-a collected over the same time-span. The colored lines show results for different satellite processing algorithms (a change in the near-infrared water-leaving radiance algorithm, in this case). The black line is the distribution of in situ observations. Similarly, Figure 2 shows a timeseries of monthly mean satellite retrievals of aerosol properties relative to monthly mean in situ measurements, where the latter come from an AeroNET sun photometer located within the region.
IV. Level-3 Temporal Trending
Level-3 trend analysis looks at long-term trends on global and regional spatial scales. It provides a standard mechanism for evaluating derived product consistency and sensor stability, and it quantifies the relative impact of calibration and algorithm changes on life-of-mission time scales. The Level-3 products are global binned, multi-day averages at 4.6 or 9-km resolution, with bins distributed in an equal-area, integerized sinusoidal projection (Campbell et al., 1995). The typical composite period is 8-days, but for quick turn-around test processing the OBPG uses a temporal subset of the mission lifespan consisting of 4-day composites generated from the start of each consecutive 32-day period (i.e., 12.5% of the mission dataset). The temporal subset is generated at 9-km resolution, and it can be processed within 1-day. The 4-day compositing period generally provides sufficient opportunity to observe most of the day-lit side of the earth, including coverage in orbit and glint gaps. The analysis is focused on the trends in normalized water-leaving radiances (nLw) or remote sensing reflectances (Rrs), but trends in bio-optical and atmospheric products are also evaluated.
From these multi-day global composites, a subset of the filled bins is selected and the binned products are averaged and trended with time. For bin selection, five global subsets are defined, corresponding to 1) all bins, 2) all deep water bins, and those bins from locations that are typically associated with 3) oligotrophic, 4) mesotrophic, 5) eutrophic conditions. The deep water subset consists of all bins where water depth is greater than 1000 meters. The three trophic subsets are predefined based on a previous analysis of the SeaWiFS global mission-averaged chlorophyll. The oligotrophic subset is all bins where 0.0 < chl < 0.1 mg/m^3. Similarly, mesotrophic and eutrophic subsets correspond to mean chl ranges between 0.1 to 1 and 1 to 10 mg/m3, respectively.
An example of a trend analysis is the SeaWiFS annual cycle for Rrs shown in Figure 3. In the absence of any major geophysical events, we expect the trend in global deep-water or global oligotrophic-water Rrs to repeat from year to year. Low-level differences may be due to geographic sampling biases or real geophysical changes, but on the large-scale these plots tell us that SeaWiFS products are, to first order, self-consistent over time.
V. Temporal Anomaly Analysis
To evaluate temporal trends in a more quantitative manner, the mean annual cycle for each parameter is subtracted from the respective time series to produce a temporal anomaly trend. An example is shown in Figure 5. The variations observed in these trends may be due to real, geophysical variability, or residual instrument calibration errors. The trend plots are often compared to known spacecraft or instrument state changes to assess whether instrumental artifacts are present in the derived products, and to verify the impact of characterization and correction efforts. Rrs or nLw anomaly trends in the 500-560nm range in clear water are of particular interest, as this spectral range is relatively insensitive to changes in chlorophyll concentration, which is the primary driver of optical properties in clear water. The oligotrophic Rrs(555) trend in Figure 5, for example, confirms that the SeaWiFS time-series is consistent with no trend, since the gray region encompasses the Rrs(555)=0 line.
VI. Trend Comparisons and Common Bins
Another useful tool for separating geophysical changes from sensor calibration and algorithm artifacts is to compare Level-3 trends between missions. Similarly, comparison of trends derived with different processing algorithms or sensor calibrations can help to verify proper implementation and to assess the impact of such changes on the global science products. This analysis looks at average values in coincident Level-3 retrievals on global and regional spatial scales, and presents the results as a comparative time-series over the common mission lifespan. We begin with Level-3 products composited over a common time period (usually 4 or 8 days). All OBPG Level-3 ocean color products use the same, equal area binning approach (Campbell et. al, 1995), but standard MODIS products are distributed at 4.6-km resolution while SeaWiFS is distributed at 9-km resolution. To allow for a direct, bin-for-bin comparison, the MODIS products are re-binned to the SeaWiFS 9-km resolution using standard binning algorithms. With Level-3 composited data products in an equivalent form, the data sets are further reduced to a set of common bins. This means that only those bins for which a retrieval exists for both sensors or both test processing configurations are included in subsequent averaging and trending. This is critical to the statistics, as some sensors show systematic data gaps even after 8-days of compositing, and this can result in geographic sampling bias if both sensors are not equivalently masked. Figure 6 shows an example of a trend comparison for remote sensing reflectance retrievals from two different SeaWiFS processing configurations (Reprocessing 2007 vs Reprocessing 2009).
Figure 7 shows a comparison of nLw trends between SeaWiFS and MODIS/Aqua, for a pair of test processings. (The ST and AT in the plot title correspond to SeaWiFS Test and Aqua Test, respectively, and the numbers are sequence numbers for tracking the test configurations). When comparing spectral parameter trends between different missions, the closest equivalent band is compared. No effort is made to force the sensor retrievals to a common band-pass, so some level of difference is expected when the nominal center wavelengths are not identical.
It is often useful to further stratify the trend comparisons into geographic regions. The current stratification employed within the OBPG includes a number of regions selected for proximity to vicarious calibration sites, as well as a set of latitudinally distributed zones covering a wide range of solar viewing conditions. The regions are described in Table 2.
Table 2: Regional Subset Definitions
Figure 8 shows an example of the utility of the zonal subsets to isolate the impact of a specific algorithm change. The plots show the ratio of MODIS/Aqua water-leaving radiance retrievals before and after adding the correction for NO2. The plot on the left is derived from the equatorial Pacific, while the plot on the right is a time-series at high latitude (PacN45). The results here serve to quantify the impact of relatively small stratospheric NO2 absorption effect that, when magnified by the extended solar path length at high latitudes, results in significant seansonal variations.
Taken alone, these comparitive temporal analyses can not be used to determine absolute error, since relative differences may be due to errors in either dataset or real geophysical effects which are not yet understood. However, when taken in concert with the self consistency analyses and in situ comparisons, the sensor-to-sensor comparisons can serve to identify and isolate the likely cause for differences between sensors and possible sources of geographic or temporally dependent discrepancies with field measurements.
VII. Level-2 to Level-3 ComparisonThis analysis seeks to quantify and track changes in residual cross-scan artifacts and, in the case of MODIS and OCTS, detector-to-detector relative differences (i.e., striping). The approach takes advantage of the fact that residual errors at specific scan angles or specific detectors will tend to average-out over time and space. A procedure was therefore developed to generate match-ups between Level-2 observations and Level-3 bins, where the Level-3 product is typically a 7 or 15-day mean at 9-km resolution, temporally centered on the date of the Level-2 granule. The software gathers all relevant information relating to the match-up, including scan-pixel, detector number, and mirror side of the Level-2 observation. Match-ups for all granules collected over a complete day are screened, and those cases corresponding to deep, clear water (chlorophyll < 0.15 mg/m^3) with minimal glint contamination are accepted. Standard binner masking is also employed, with the object being to obtain a large number of Level-2 to Level-3 match-ups from homogeneous, temporally stable waters, where the Level-3 retrieval is likely to be a good representation of what the Level-2 retrieval should be. The Level-2 to Level-3 ratios for each derived products can then be averaged within scan-pixel or detector number. Figure 9 shows an example of cross-scan trends derived in this manner. The case shown is nLw at 443 and 547 nm for MODIS/Aqua, from a recent test processing.
As another example, Figure 10 shows the same data as a function of detector number (MODIS has 10 detectors in each of the standard ocean color bands. These analyses are typically done for the start and end of the mission, and once per year through-out the mission to assess the impact of uncorrected changes in sensor response across scan or between detectors.
Bailey, S. W., & Werdell, P. J. (2006). A multi-sensor approach for the on-orbit validation of ocean color satellite data products. Remote Sensing of Environment, 102(1-2), 12-23.
Campbell, J.W., J.M. Blaisdell, and M. Darzi (1995). Level-3 SeaWiFS Data Products: Spatial and Temporal Binning Algorithms. NASA Tech. Memo. 104566, Vol. 32, S.B. Hooker, E.R. Firestone, and J.G. Acker, Eds., NASA Goddard Space Flight Center, Greenbelt, Maryland.
Franz, B.A., S.W. Bailey, P.J. Werdell, and C.R. McClain, F.S. (2007). Sensor-Independent Approach to Vicarious Calibration of Satellite Ocean Color Radiometry, Appl. Opt., 46 (22).
Meister, G., Kwiatkowska, E.J., Franz, B.A., Patt, F.S., Feldman G.C., McClain, C.R. (2005). Moderate-Resolution Imaging Spectroradiometer ocean color polarization correction, Appl. Opt., 44 (26) 5524-5535.
Werdell, P.J., Franz, B.A., Bailey, S.W., Harding, L.W.Jr., and Feldman, G.C. (2007). Approach for the long-term spatial and temporal evaluation of ocean color satellite data products in a coastal environment, Coastal Ocean Remote Sensing.