Ocean Color Forum

The forum is locked.
The Ocean Color Forum has transitioned over to the Earthdata Forum (https://forum.earthdata.nasa.gov/). The information existing below will be retained for historical reference. Please sign into the Earthdata Forum for active user support.
Can I download data in bulk via HTTP?
Yes. It is possible to mimic FTP bulk data downloads using the HTTP-based data distribution server
CAVEATS
1) The following examples are provided for informational purposes only.
2) No product endorsement is implied.
3) There is no guarantee that these options will work for all situations.
4) The examples below are not an exhaustive description of the possibilities.
Using command-line utilities:
wget:
1) Use the file search utility to find and download OCTS daily L3 binned chlorophyll data from November 1, 1996 through December 31, 1999
sensor : mission name. valid options include: aquarius, seawifs, aqua, terra, meris, octs, czcs, hico, viirs (for snpp), viirsj1, s3olci (for sentinel-3a), s3bolci
sdate : start date for a search (format YYYY-MM-DD)
edate : end date for a search (format YYYY-MM-DD)
psdate : file processing start date for a search (format YYYY-MM-DD)
pedate : file processing end date for a search (format YYYY-MM-DD)
dtype : data type (i.e. level). valid options: L0, L1, L2, L3b (for binned data), L3m (for mapped data), MET (for ancillary data), misc (for sundry products)
addurl : include full url in search result (boolean, 1=yes, 0=no)
results_as_file : return results as a test file listing (boolean, 1=yes, 0=no, thus returns and HTML page)
search : text string search
subID: subscription ID to search
subType: subsciption type, valid options are:
1 - non-extracted (default)
2 - extracted
std_only : restrict results to standard products (i.e. ignore extracts, regional processings, etc.; boolean)
cksum: return a checksum file for search results (boolean; sha1sums except for Aquarius soil moisture products which are md5sums; forces results_as_file; ignores addurl)
format: valid options are:
txt - returns as plain text
json - returns results as JSON objects
html - returns results as an HTML page
2) "mget *SST4* from /MODIS-Aqua/L2/2006/005
or
3) grab MERIS L1B data (or any other data set that requires user authentication and needs a username and password)
(note: this method does not always work, if it fails, try the next method described below)
Or set up a netrc file to store credentials and use cookies to improve efficiency:
a) Configure your username and password for authentication using a .netrc file
where <uid> is your Earthdata Login username and <password> is your Earthdata Login password.
b) Call wget with a cookie:
4) retrieve recent files for a non-extracted subscription and check against the sha1sums:
--timeout=10 : sets timeout to 10 seconds (by default wget will retry after timeout)
--wait=0.5 : tells wget to pause for 0.5 seconds between attempts
--random-wait : causes the time between requests to vary between 0.5 and 1.5 * wait seconds, where
wait was specified using the --wait option
-N, --timestamping : prevents wget from downloading files already retrieved if a local copy exists and the remote copy is not newer
cURL:
Unlike wget, cURL has no method for downloading a list of URLs (although it can download multiple URLs on the command line).
However, a shell or script (perl, python, etc) loop can easily be written (examples below use a BASH for loop):
1) grab MODIS L2 files for 2006 day 005 (Jan 5th, 2006)
or
2) Use the file search utility to find and download OCTS daily L3 binned chlorophyll data from November 1, 1996 through December 31, 1999
3) grab MERIS L1B data (which needs username and password)
curl -u username:passwd -L -O https://oceandata.sci.gsfc.nasa.gov/echo/getfile/MER_RR__1PRLRA20120330_112205_000026183113_00138_52738_8486.N1.bz2
Or set up a netrc file to store credentials (see wget example above) and use cookies to improve efficiency:
a) Call cURL with a cookie:
--retry 10 - sets the number of retries to 10, by default curl does not retry
--max-time 10 - sets timeout to 10 seconds
Web Browser options:
Firefox add-on 'DownThemAll'
If you prefer a GUI based option, there is an add-on for the Firefox web browser called 'DownThemAll'. It is easy to configure to download only
what you want from the page (even has a default for archived products -gz, tar, bz2, etc.). It allows putting a limit concurrent downloads, which is
important for downloading from our servers as we limit connections to one concurrent per file and 3 files per IP - so don't try the "accelerate"
features as you're IP may get blocked.
Recommended preference settings:
1) Set the concurrent downloads to 1.
2) There is an option under the 'Advanced' tab called 'Multipart download'. Set the 'Max. number of segments per download' to 1.
3) Since this download manager does not efficiently close connection states, you may find that file downloads will time out. You may want to
set the Auto Retry setting to retry each (1) minute with Max. Retries set to 10.
Another alternative - that works for more than just Firefox (but isn't free) is "Internet Download Manager"
Like 'DownThemAll' it has features to grab all the links on a page, as well as limit the number of concurrent downloads. It also advertises download
acceleration - Do NOT use this feature with our servers, as you're IP may get blocked.
Yes. It is possible to mimic FTP bulk data downloads using the HTTP-based data distribution server
CAVEATS
1) The following examples are provided for informational purposes only.
2) No product endorsement is implied.
3) There is no guarantee that these options will work for all situations.
4) The examples below are not an exhaustive description of the possibilities.
Using command-line utilities:
wget:
1) Use the file search utility to find and download OCTS daily L3 binned chlorophyll data from November 1, 1996 through December 31, 1999
wget -q --post-data="sensor=octs&sdate=1996-11-01&edate=1997-01-01&dtype=L3b&addurl=1&results_as_file=1&search=*DAY_CHL*" -O - https://oceandata.sci.gsfc.nasa.gov/api/file_search |wget -i -
file_search options:
sensor : mission name. valid options include: aquarius, seawifs, aqua, terra, meris, octs, czcs, hico, viirs (for snpp), viirsj1, s3olci (for sentinel-3a), s3bolci
sdate : start date for a search (format YYYY-MM-DD)
edate : end date for a search (format YYYY-MM-DD)
psdate : file processing start date for a search (format YYYY-MM-DD)
pedate : file processing end date for a search (format YYYY-MM-DD)
dtype : data type (i.e. level). valid options: L0, L1, L2, L3b (for binned data), L3m (for mapped data), MET (for ancillary data), misc (for sundry products)
addurl : include full url in search result (boolean, 1=yes, 0=no)
results_as_file : return results as a test file listing (boolean, 1=yes, 0=no, thus returns and HTML page)
search : text string search
subID: subscription ID to search
subType: subsciption type, valid options are:
1 - non-extracted (default)
2 - extracted
std_only : restrict results to standard products (i.e. ignore extracts, regional processings, etc.; boolean)
cksum: return a checksum file for search results (boolean; sha1sums except for Aquarius soil moisture products which are md5sums; forces results_as_file; ignores addurl)
format: valid options are:
txt - returns as plain text
json - returns results as JSON objects
html - returns results as an HTML page
2) "mget *SST4* from /MODIS-Aqua/L2/2006/005
wget -q -O - https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/L2/2006/005/ |grep SST4|wget --base https://oceandata.sci.gsfc.nasa.gov/ -N --wait=0.5 --random-wait --force-html -i -
or
wget -q -O - https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/L2/2006/005?format=txt |grep SST4|awk -F',' '{print $1}'| wget --base https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/ -N --wait=0.5 --random-wait -i -
3) grab MERIS L1B data (or any other data set that requires user authentication and needs a username and password)
wget --user=username --password=passwd https://oceandata.sci.gsfc.nasa.gov/echo/getfile/MER_RR__1PRLRA20120330_112205_000026183113_00138_52738_8486.N1.bz2
(note: this method does not always work, if it fails, try the next method described below)
Or set up a netrc file to store credentials and use cookies to improve efficiency:
a) Configure your username and password for authentication using a .netrc file
$ touch ~/.netrc
$ echo "machine urs.earthdata.nasa.gov login <uid> password <password>" > ~/.netrc
$ chmod 0600 .netrc
where <uid> is your Earthdata Login username and <password> is your Earthdata Login password.
b) Call wget with a cookie:
$ touch ~/.urs_cookies
$ wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --keep-session-cookies https://oceandata.sci.gsfc.nasa.gov/echo/getfile/MER_RR__1PRLRA20120330_112205_000026183113_00138_52738_8486.N1.bz2
4) retrieve recent files for a non-extracted subscription and check against the sha1sums:
wget --post-data="subID=###&cksum=1" -q -O - https://oceandata.sci.gsfc.nasa.gov/api/file_search > search.cksums && awk '{print $2}' search.cksums
| wget -N --base="https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/" -i - && sha1sum -c search.cksums
Useful wget options:
--timeout=10 : sets timeout to 10 seconds (by default wget will retry after timeout)
--wait=0.5 : tells wget to pause for 0.5 seconds between attempts
--random-wait : causes the time between requests to vary between 0.5 and 1.5 * wait seconds, where
wait was specified using the --wait option
-N, --timestamping : prevents wget from downloading files already retrieved if a local copy exists and the remote copy is not newer
cURL:
Unlike wget, cURL has no method for downloading a list of URLs (although it can download multiple URLs on the command line).
However, a shell or script (perl, python, etc) loop can easily be written (examples below use a BASH for loop):
1) grab MODIS L2 files for 2006 day 005 (Jan 5th, 2006)
for file in $(curl https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/L2/2006/005/ | grep getfile | cut -d '"' -f 2);
do
curl -L -O $file;
done;
or
for file in $(curl https://oceandata.sci.gsfc.nasa.gov/MODIS-Aqua/L2/2006/005?format=txt |cut -d "," -f 1);
do
curl -L -O https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/$file;
done;
2) Use the file search utility to find and download OCTS daily L3 binned chlorophyll data from November 1, 1996 through December 31, 1999
for file in $(curl -d "sensor=octs&sdate=1996-11-01&edate=1997-01-01&dtype=L3b&addurl=1&results_as_file=1&search=*DAY_CHL*" https://oceandata.sci.gsfc.nasa.gov/api/file_search |grep getfile);
do
curl -L -O $file;
done;
3) grab MERIS L1B data (which needs username and password)
curl -u username:passwd -L -O https://oceandata.sci.gsfc.nasa.gov/echo/getfile/MER_RR__1PRLRA20120330_112205_000026183113_00138_52738_8486.N1.bz2
Or set up a netrc file to store credentials (see wget example above) and use cookies to improve efficiency:
a) Call cURL with a cookie:
$ curl -b ~/.urs_cookies -c ~/.urs_cookies -L -n https://oceandata.sci.gsfc.nasa.gov/echo/getfile/MER_RR__1PRLRA20120330_112205_000026183113_00138_52738_8486.N1.bz2
Useful curl options:
--retry 10 - sets the number of retries to 10, by default curl does not retry
--max-time 10 - sets timeout to 10 seconds
Web Browser options:
Firefox add-on 'DownThemAll'
If you prefer a GUI based option, there is an add-on for the Firefox web browser called 'DownThemAll'. It is easy to configure to download only
what you want from the page (even has a default for archived products -gz, tar, bz2, etc.). It allows putting a limit concurrent downloads, which is
important for downloading from our servers as we limit connections to one concurrent per file and 3 files per IP - so don't try the "accelerate"
features as you're IP may get blocked.
Recommended preference settings:
1) Set the concurrent downloads to 1.
2) There is an option under the 'Advanced' tab called 'Multipart download'. Set the 'Max. number of segments per download' to 1.
3) Since this download manager does not efficiently close connection states, you may find that file downloads will time out. You may want to
set the Auto Retry setting to retry each (1) minute with Max. Retries set to 10.
Another alternative - that works for more than just Firefox (but isn't free) is "Internet Download Manager"
Like 'DownThemAll' it has features to grab all the links on a page, as well as limit the number of concurrent downloads. It also advertises download
acceleration - Do NOT use this feature with our servers, as you're IP may get blocked.
Powered by mwForum 2.29.7 © 1999-2015 Markus Wichitill