- By mcsorley Date 2006-06-01 16:09
Are their limits or restrictions put in place for data retrieval?

Yes there are limits and restrictions in place for data retrieval. The Ocean
Color community is compromised of thousands of people from all around the world.
To give everyone in the community equal access to Ocean Color data there needs to be
limits set so one person does not prevent another from retrieving his or her

The primary focus of the restrictions are based upon amounts of connections and
time between connections.

Examples of ways to get blocked from data retrieval.

1. Retrieving the same file over and over again in a loop.
2. Flooding our network with connection attempts.
3. Using download accelerators.
4. Out of control scripts.

Where people get in trouble with the restrictions are when they write very
aggressive scripts. Scripting your data retrieval is a great idea. But you need
to bear in mind that when your script does the file retrieval too quickly
you will begin hitting the retrieval limits set by OBPG. Try to keep it to a
couple of connections at a time.

For example let's say you have a script that logs in and retrieves a FTP
directory listing of 1000 files. Then it executes an FTP request for each file
it just got a listing for. This in turn would open 1000 connections to the FTP
or web server at one time. Doing this in a loop until you successfully retrieve
all the files puts a heavy load on our servers. You are limited by the amount
of times you can login to the FTP server. All of the extra connections will just
be denied. The constant barrage of hundreds of connections at a time from
one host can also be viewed as a denial of service attack.

If you're going to try to script your data retrieval here are some suggestions
that will help insure the successful retrieval of that data.

1. Use scriptable friendly programs like curl, wget or ncftpget. They have many
settings like timeout periods you can adjust to help things fail gracefully.

2. Use FTP programs that support resume functions. This will help you finish
downloading any file if your connection gets cutoff.

3. Try to keep your FTP connections to a couple at a time. Trying to login as
many times as possible over and over again will get you blocked at some point. A
couple of connections at a time will ensure you can login to get your data as
long as the server is not full.

4. Write your scripts with connection limits in mind. Pounding away with
hundreds of connections at once is useless. Put in sleep states between
file retrieval attempts. Keep track of the files you need and the files you
currently have. Request the files from your list a couple at a time. Then
after one of those files are done downloading request the next one.
