Can I download data in bulk via HTTP?Yes. It is possible to mimic FTP bulk data downloads using the
HTTP-based data distribution serverCAVEATS1) The following examples are provided for informational purposes only.
2) No product endorsement is implied.
3) There is no guarantee that these options will work for all situations.
4) The examples below are not an exhaustive description of the possibilities.
Using command-line utilities:
w.get:
(*note: the period between 'w' and 'g' in w.get needs to be removed when executing the commands)
1) "mget *SST4* from /MODISA/L2/2006/005
w.get -q -O - http://oceandata.sci.gsfc.nasa.gov/MODISA/L2/2006/005/ |grep SST4|w.get -N --wait=0.5 --random-wait --force-html -i -2) Use the
file search utility to find and download OCTS daily L3 binned data from November 1, 1996 through December 31, 1999
w.get -q --post-data="sensor=octs&sdate=1996-11-01&edate=1997-01-01&dtype=L3b&addurl=1&results_as_file=1&search=*DAY*" -O - http://oceandata.sci.gsfc.nasa.gov/search/file_search.cgi |w.get -i -file_search.cgi options: sensor : mission name. valid options include: aquarius, seawifs, aqua, terra, meris, octs, czcs
sdate : start date for a search
edate : end date for a search
dtype : data type (i.e. level). valid options: L0, L1, L2, L3b (for binned data), L3m (for mapped data), MET (for ancillary data), misc (for sundry products)
addurl : include full url in search result (boolean, 1=yes, 0=no)
results_as_file : return results as a test file listing (boolean, 1=yes, 0=no, thus returns and HTML page)
search : text string search
std_only : restrict results to standard products (i.e. ignore extracts, regional processings, etc.)
3) grab SeaWiFS data (which needs username and password)
w.get --user=username --password=passwd http://oceandata.sci.gsfc.nasa.gov/restrict/getfile/S2006010174900.L2_GAC_OC.bz2Useful w.get options: --timeout=10 : sets timeout to 10 seconds (by default w.get will retry after timeout)
--wait=0.5 : tells w.get to pause for 0.5 seconds between attempts
--random-wait : causes the time between requests to vary between 0.5 and 1.5 *
wait seconds, where
wait was specified using the --wait option
-N, --timestamping : prevents w.get from downloading files already retrieved if a local copy exists and the remote copy is not newer
cURL:
Unlike w.get, cURL has no method for downloading a list of URLs (although it can download multiple URLs on the command line).
However, a shell or script (perl, python, etc) loop can easily be written (examples below use a BASH for loop):
1) grab MODIS L2 files for 2006 day 005 (Jan 5th, 2006)
for file in $(curl http://oceandata.sci.gsfc.nasa.gov/MODISA/L2/2006/005/ | grep getfile | cut -d '"' -f 2);
do
curl -L -O $file;
done;2) Use the
file search utility to find and download OCTS daily L3 binned data from November 1, 1996 through December 31, 1999
for file in $(curl -d "sensor=octs&sdate=1996-11-01&edate=1997-01-01&dtype=L3b&addurl=1&results_as_file=1&search=*DAY*" http://oceandata.sci.gsfc.nasa.gov/search/file_search.cgi |grep getfile);
do
curl -L -O $file;
done;3) grab SeaWiFS data (which needs username and password)
curl -u username:passwd -L -O
http://oceandata.sci.gsfc.nasa.gov/restrict/getfile/S2006010174900.L2_GAC_OC.bz2Useful curl options: --retry 10 - sets the number of retries to 10, by default curl does not retry
--max-time 10 - sets timeout to 10 seconds
Web Browser options:
Firefox add-on '
DownThemAll'
If you prefer a GUI based option, there is an add-on for the Firefox web browser called 'DownThemAll'. It is easy to configure to download only
what you want from the page (even has a default for archived products -gz, tar, bz2, etc.). It allows putting a limit concurrent downloads, which is
important for downloading from our servers as we limit connections to one concurrent per file and 3 files per IP - so don't try the "accelerate"
features as you're IP may get blocked.
Recommended preference settings:
1) Set the concurrent downloads to 1.
2) There is an option under the 'Advanced' tab called 'Multipart download'. Set the 'Max. number of segments per download' to 1.
3) Since this download manager does not efficiently close connection states, you may find that file downloads will time out. You may want to
set the Auto Retry setting to retry each (1) minute with Max. Retries set to 10.
Another alternative - that works for more than just Firefox (but isn't free) is "
Internet Download Manager"
Like 'DownThemAll' it has features to grab all the links on a page, as well as limit the number of concurrent downloads. It also advertises download
acceleration - Do NOT use this feature with our servers, as you're IP may get blocked.