Not logged inOcean Color Forum

The forum is locked.

The Ocean Color Forum has transitioned over to the Earthdata Forum (https://forum.earthdata.nasa.gov/). The information existing below will be retained for historical reference. Please sign into the Earthdata Forum for active user support.

Up Topic Products and Algorithms / Satellite Data Access / Wget constantly redirected
- By schckngs Date 2020-02-21 10:49
Hi OC forum,
Quick question ... I'm using the new wget procedure and it works perfectly until..it doesn't. My script plugs along perfectly for a couple hours and then wget will suddenly be redirected and not find the files.  (Example below) By deleting the .urs_cookies file this seems to fix it. Just wondering if it's possible to tell if this is an error on my end (eg network?) or why this is happening? I'm working on a large processing and it would be nice to be able to leave it without having to check back constantly to make sure it's still working.

Example:
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2016201174500.L2_LAC_OC.nc
--2020-02-21 11:38:37--  https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2016201174500.L2_LAC_OC.nc
Resolving oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)... 169.154.128.84, 2001:4d0:2418:128::84
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|169.154.128.84|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /ob/getfile/A2016201174500.L2_LAC_OC.nc [following]
--2020-02-21 11:38:37--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/A2016201174500.L2_LAC_OC.nc
Reusing existing connection to oceandata.sci.gsfc.nasa.gov:443.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code [following]
--2020-02-21 11:38:37--  https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code
Resolving urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)... 198.118.243.33, 2001:4d0:241a:4081::89
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|198.118.243.33|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=7c6cfe60597147c8d27c205eceed61a246f5cdbb37f2c8a36b7e7119d78e6289 [following]
--2020-02-21 11:38:38--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=7c6cfe60597147c8d27c205eceed61a246f5cdbb37f2c8a36b7e7119d78e6289
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|169.154.128.84|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&client_id=Z0u-MdLNypXBjiDREZ3roA&response_type=code [following]
--2020-02-21 11:38:38--  https://urs.earthdata.nasa.gov/oauth/authorize?redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&client_id=Z0u-MdLNypXBjiDREZ3roA&response_type=code
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|198.118.243.33|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=24d69431beb953c1b53e7e01f9a9695febf519af9e04f82a1527452d197645f2 [following]
--2020-02-21 11:38:38--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=24d69431beb953c1b53e7e01f9a9695febf519af9e04f82a1527452d197645f2
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|169.154.128.84|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code [following]
--2020-02-21 11:38:38--  https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|198.118.243.33|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=afb8a035a75c5424a47ebfa2438e4360acdfa8fead849db0e26822df61a6ec55 [following]
--2020-02-21 11:38:39--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=afb8a035a75c5424a47ebfa2438e4360acdfa8fead849db0e26822df61a6ec55
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|169.154.128.84|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code [following]
--2020-02-21 11:38:39--  https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|198.118.243.33|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=f3b283d52a8ae0845c67c59cd228a760e71fc2112609b666c4e7bb87154e9aae [following]
--2020-02-21 11:38:39--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/restrict?code=f3b283d52a8ae0845c67c59cd228a760e71fc2112609b666c4e7bb87154e9aae
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|169.154.128.84|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?response_type=code&client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict [following]
--2020-02-21 11:38:40--  https://urs.earthdata.nasa.gov/oauth/authorize?response_type=code&client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|198.118.243.33|:443... connected.

----ETC...---

20 redirections exceeded.
- By schckngs Date 2020-02-21 13:23 Edited 2020-02-21 13:34
I am mistaken.... it is not deleting .urs_cookies that fixes it, I have no idea why it started working again for me. I was able to download 6 files before it stopped again. Once it worked when I added --user and --password to the wget command, but just once!
Copy/pasting the link into a browser works part time, and other times the page keeps refreshing as well.
- By gnwiii Date 2020-02-21 15:49
Wget has been erratic for me.   Running with the "--debug" option gives lots of detail, but so far all it has shown is that wget sometimes tries to use IPv6 and dies with "no route to host".   For that it is ncessary to use the "-4" option but wget still gives random failures.
- By schckngs Date 2020-02-24 07:44 Edited 2020-02-24 09:12
Okay, glad in a sense to hear that this problem is not just me!
I left a script to process over the weekend and it was able to download 1 year of data before stopping...
Once again this time it only started working again once I deleted the .urs_cookies file.
I guess the next step is to see if curl is more reliable??
- By gnwiii Date 2020-02-24 10:29
Sean posted some download scripts that retry downloads after an error occurs and save a log file for failed downloads.   I haven't tried these recently, but they were very useful in the past with heavily used internet.  I modifed the scripts to save the log file for each download so I could use the download rates from the logs to identify slowdown/failure times.   At my site, internet usage was high after morning coffee break until afternoon coffee, and at night for replication of data stores to another site.   Scheduling downloads for early AM (after replication was finished) worked well.
- By schckngs Date 2020-02-26 15:54
Thanks for sharing the link!  (Novice wget user here :grin:)

So I am not getting any error codes to work with - as posted above it retries to download until it maxes out, so to test a condition it needs to first exit the wget command. The problem seems to be the .urs_cookies file, the cookies are frequently (like once an hour or more) saved incorrectly. I delete the .urs_cookies file, it works again. Without the "--auth-no-challenge=on" option, the redirects also save an xml file for every retry when it stops working.

I'm following the wget command posted on the data download page:
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --keep-session-cookies --content-disposition https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/${myfile}

Is it a bad idea to only load a .urs_cookies file I know works and not save or keep session cookies like so?
wget --load-cookies ~/.urs_cookies_good --auth-no-challenge=on --content-disposition https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/${myfile}
- By amscott Date 2020-02-26 16:07
We removed the recommendation to --keep-session-cookies in the data download examples. What happens if you "save" cookies during your session, but don't "keep" them after you exit the browser?
- By schckngs Date 2020-02-27 13:21
Thanks, now I'm trying saving and loading cookies, but not keeping them.
I tried a run first saving a cookie to make sure I had one that works. Then in my script I ran subsequent wget commands with only the --load-cookies option (same as example in my previous comment). It worked well for a few hours and then same thing again.... then it won't work with the same cookie. I must be exceeding some kind of time limit or file limit...?
- By schckngs Date 2020-02-28 08:04
As a follow up, that seems to have fixed it, thanks amscott. :)

Eg.
wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --content-disposition https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/${l2nasa_file}
- By khyde Date 2020-02-28 13:32
Hello,

I have been having similar issues, but when I tried this latest example it did work.  However, I would really like to use the -N option in wget to only get newer files if the current one exists and this always returns a 400 Bad request error.  Is there a reason why -N will not work?

Thanks,
Kim

PS The example below also uses -c, which does work as long as -N isn't an option, although I haven't tested it with a partially downloaded file yet.

wget --load-cookies ~/.urs_cookies --save-cookies ~/.urs_cookies --auth-no-challenge=on --content-disposition -c -N https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2020026183500.L1A_LAC.bz2
--2020-02-28 13:25:57--  https://oceandata.sci.gsfc.nasa.gov/cgi/getfile/A2020026183500.L1A_LAC.bz2
Resolving oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)... 169.154.128.84, 2001:4d0:2418:128::84
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|169.154.128.84|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: /ob/getfile/A2020026183500.L1A_LAC.bz2 [following]
--2020-02-28 13:25:58--  https://oceandata.sci.gsfc.nasa.gov/ob/getfile/A2020026183500.L1A_LAC.bz2
Connecting to oceandata.sci.gsfc.nasa.gov (oceandata.sci.gsfc.nasa.gov)|169.154.128.84|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code [following]
--2020-02-28 13:25:58--  https://urs.earthdata.nasa.gov/oauth/authorize?client_id=Z0u-MdLNypXBjiDREZ3roA&redirect_uri=https%3A%2F%2Foceandata.sci.gsfc.nasa.gov%2Fob%2Fgetfile%2Frestrict&response_type=code
Resolving urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)... 198.118.243.33, 2001:4d0:241a:4081::89
Connecting to urs.earthdata.nasa.gov (urs.earthdata.nasa.gov)|198.118.243.33|:443... connected.
HTTP request sent, awaiting response... 400 Bad Request
2020-02-28 13:25:58 ERROR 400: Bad Request.
- By seanbailey Date 2020-02-28 15:40
Kim,

The -c won't work, as we don't support it.

The -N option isn't working (we believe) because wget is doing a HEAD request, which is being denied by the remote authentication server during the authentication redirection we do.
cURL works....

I will (hopefully soon) be posting an updated python script under https://oceancolor.gsfc.nasa.gov/data/download_methods/ that also will only download files if the remote version is newer than the local copy.

Sean
- By khyde Date 2020-03-02 17:25
Thanks for the update Sean,

I'm not very familiar with curl, but will give it a try...

I did play around with curl a bit and tried the -C - option to continue a download, but that also returns an error: curl: (52) Empty reply from server.
It also appears that -z requires an input and doesn't work as well as wget's -N.  I typically use wget to download from a list of files and it looks like that is an option with curl as well, but I don't know if I can use -z with the download file list.

What are the chances that the wget -N bug can be fixed?  I'm just wondering if I should put the time in to overhauling my download methods or wait for an update on your end.

I also took a look at the current Python download script and it isn't clear to me how you would use that to download a specific set of files.  I'm not overly familiar with Python so I'm sure I'm missing something obvious.

Thanks again!
Kim
- By seanbailey Date 2020-03-04 15:37
Kim,
I may not have been clear in my last post, so let me clarify....our servers do NOT (at least not any longer) support download continuation, regardless of which client is used.
So, no need to test that capability at your end :grin:

A modification to the python download script will soon be posted on the download methods page.
It will include a fully functional script.  All that is required is a python installation with the requests library installed.

I have tested it under Mac, Linux and Windows with python v2.7 and v3.7, so it should work for you (and be more consistent than wget has been...)
I'll try to remember to post a reply once the script is available.

Here's a preview of the usage:

usage: obdaac_download.py [-h] [-v] [--filelist FILELIST]
                          [--http_manifest HTTP_MANIFEST] [--odir ODIR]
                          [--uncompress] [--force]
                          [filename]

Download files archived at the OB.DAAC

positional arguments:
  filename              name of the file (or the URL of the file) to retreive

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         print status messages
  --filelist FILELIST   file containing list of filenames to retreive, one per
                        line
  --http_manifest HTTP_MANIFEST
                        URL to http_manifest file for OB.DAAC data order
  --odir ODIR           full path to desired output directory; defaults to
                        current working directory:
                        /accounts/swbaile1/Downloads/junk
  --uncompress          uncompress the retrieved files (if compressed)
  --force               force download even if file already exists locally

Provide one of either filename, --filelist or --http_manifest. NOTE: For
authentication, a valid .netrc file in the user home ($HOME) is required,
e.g.: machine urs.earthdata.nasa.gov login USERNAME password PASSWD


Regards,
Sean
- By seanbailey Date 2020-03-04 17:21
As promised :smile:
The script is now available from https://oceancolor.gsfc.nasa.gov/data/download_methods

Sean
- By khyde Date 2020-03-05 11:59
Thanks Sean,

I will try your new python script.  This looks very handy. 

In hindsight, I'm not sure the -N (get file if new) is necessary for me because I do compare the checksums before creating my list of files to download so I can just remove any "old" files from my system and add the replacement to the download list.

Hopefully I will have some time to test and implement it before the weekend (I've got some downloading to catch up on :wink:).

Kim
- By gnwiii Date 2020-03-05 13:38 Edited 2020-03-05 14:05
obdaac_download.py ends up here with MSDOS <CF-LF> line endings.  The script works nicely in Cygwin. On MacOS the script displays the help text (using "python3 obdaac_download.py"), but the output from the file command is garbled:
$ file obdaac_download.py
script text executableython
$ dos2unix obdaac_download.py
dos2unix: converting file obdaac_download.py to Unix format...
$ file obdaac_download.py
obdaac_download.py: a python script text executable

... switching to linux (more to come) ... back, but now using a linux VM:
$ python --version
Python 3.7.6

file obdaac_download.py
obdaac_download.py: Python script, ASCII text executable, with CRLF line terminators
$ obdaac_download.py
/usr/bin/env: ‘python\r’: No such file or directory
$ dos2unix obdaac_download.py
dos2unix: converting file obdaac_download.py to Unix format...


The obdaac_download.py script "works (at least once) for me" on Debian, Ubuntu, Fedora, macOS (El Capitan,) and Windows 10 (Cygwin).   For those living with unreliable internet, there are many documents that discuss debugging and troubleshooting python requests.
- By seanbailey Date 2020-03-05 18:06
Thanks for testing :smile:

Yes, it appears that the version download from the web server has the CRLF endings which causes issues on Mac and Linux...bummer.  

A version of the script will eventually be part of the SeaDAS distribution, which will definitely be free of the pesky ^Ms

We'll add a buyer-beware note to the web page .

Sean
Up Topic Products and Algorithms / Satellite Data Access / Wget constantly redirected

Powered by mwForum 2.29.7 © 1999-2015 Markus Wichitill