The forum is locked.
The Ocean Color Forum has transitioned over to the Earthdata Forum (https://forum.earthdata.nasa.gov/). The information existing below will be retained for historical reference. Please sign into the Earthdata Forum for active user support.
we are using the SeaDAS scripts on a cluster running e.g. 500 concurrent processes on different inputs. For this we had to retrieve the auxiliary data once and use the scripts with options to provide the ancillary data, e.g.
modis_GEO.py A2015088104500.L1A_LAC -o A2015088104500.GEO --ancdir=./anc --att1=anc/2015/088/PM1ATTNR_NRT.A2015088.1045.006 --att2=anc/2015/088/PM1ATTNR_NRT.A2015088.1050.006 --eph1=anc/2015/087/PM1EPHND.P2015087.1200.003 --disable-download -d -v
Running modis_GEO.py without the --att1 and --eph1 options leads to a denial of service of the oceancolor web server because of the 500 concurrent requests from the "same" IP.
Unfortunately, the current implementation of modis_L1A_extract.py when using the option --extract_geo does not support using the same options but instead constantly does the request at the oceancolor server. Would it be possible to add the options --att1 ... --eph2 also to modis_L1A_extract.py? (I will attach my updated script as proposal).
The more general question also for l2gen is: Are the rules published somewhere which determine the response to the getanc requests? I had to reverse-engineer a set of rules for NRT processing and for consolidated products re-processing by querying with different times and looking for patterns in responses.
But I am not sure whether my rules are complete.
With best wishes
First, thanks for the suggested improvement (complete with modified code!). We will include this modification in our next release (due out shortly).
As for the block you experience when running in your cluster environment...there are ways around this problem. The source of the problem is the fact
that the code we distribute is targeted at the typical user (i.e. a researcher running a single instance at a time). The scripts use a local flatfile database
(sqlite3) to store information about the ancillary files needed to process a given file. The purpose of this database is to prevent repeated external queries
to our system. Once a file is processed the first time, the required ancillary data are cataloged locally so subsequent processings for that file will use the
local cache. It sounds like your system is 500 of these individual instances. There are two potential modifications you could make to your workflow
that will prevent getting blocked by our servers.
1) Use a single machine to populate the ancillary database, then copy that database file to the machines in the cluster. If the cluster has the same filesystem layout
(so the ancillary paths are identical), this should work without the need to modify the database path elements. If not, the paths will need to be updated, but that is
a reasonably trivial task.
2) Similar to (1) but use a single "real" database, and have each machine in the cluster access a single database. There is a version of the ancDB.py code that uses a mysql
database (cleverly called ancDBmysql.py). This code can be used in place of the original with only minor tweaks to the scripts. It worked when I last tested it, but I've
not had time to do any rigorous testing of it. In theory, any database could be used. As with case (1), this would work best with a single machine populating the database
via modis_atteph.py (and getanc.py if you're doing L2 processing as well).
The responses to getanc requests are in the comments of the anc_utils.py script, copied here so you don't need to dig them out:
# FOR MET/OZONE:
# For each anc type, DB returns either a zero status if all optimal files are
# found, or different error statuses if not. However, 3 MET or OZONE files can be
# returned along with an error status meaning there were one or more missing
# files that were then filled with the file(s) found, and so though perhaps
# better than climatology it's still not optimal. Therefore check for cases
# where all 3 MET/ozone files are returned but status is negative and then
# warn the user there might be more files to come and they should consider
# reprocessing at a later date.
# DB return status bitwise values:
# -all bits off means all is well in the world
# -bit 1 = 1 - missing one or more MET
# -bit 2 = 1 - missing one or more OZONE
# -bit 3 = 1 - missing SST
# -bit 4 = 1 - missing NO2
# -bit 5 = 1 - missing ICE
# FOR ATT/EPH:
# 0 - all is well in the world
# 1 - predicted attitude selected
# 2 - predicted ephemeris selected
# 4 - no attitude found
# 8 - no ephemeris found
# 16 - invalid mission
Powered by mwForum 2.29.7 © 1999-2015 Markus Wichitill