Python requests vs. urllib.urlretrieve
Python’s
urllib.request.urlretrieve
doesn’t have a way to handle connection timeouts.
This can lead to user complaints where they think your program is hanging, when really it’s a bad internet connection since urlretrieve will hang for many minutes.
Python requests download files
This is a robust way to download files in Python with timeout.
I name it url_retrieve to remind not to use the old one.
from pathlib import Path
import requests
def url_retrieve(url: str, outfile: Path):
R = requests.get(url, allow_redirects=True)
if R.status_code != 200:
raise ConnectionError('could not download {}\nerror code: {}'.format(url, R.status_code))
outfile.write_bytes(R.content)Why isn’t this in requests?
Because the Requests BDFL
doesn’t want it
pure Python download files
If you can’t or don’t want to use requests, here is how to download files in Python using only built-in modules:
from pathlib import Path
import urllib.request
import urllib.error
import socket
def url_retrieve(
url: str,
outfile: Path,
overwrite: bool = False,
):
"""
Parameters
----------
url: str
URL to download from
outfile: pathlib.Path
output filepath (including name)
overwrite: bool
overwrite if file exists
"""
outfile = Path(outfile).expanduser().resolve()
if outfile.is_dir():
raise ValueError("Please specify full filepath, including filename")
# need .resolve() in case intermediate relative dir doesn't exist
if overwrite or not outfile.is_file():
outfile.parent.mkdir(parents=True, exist_ok=True)
try:
urllib.request.urlretrieve(url, str(outfile))
except (socket.gaierror, urllib.error.URLError) as err:
raise ConnectionError(
"could not download {} due to {}".format(url, err)
)