Scientific Computing

Python requests vs. urllib.urlretrieve

Python’s urllib.request.urlretrieve doesn’t have a way to handle connection timeouts. This can lead to user complaints where they think your program is hanging, when really it’s a bad internet connection since urlretrieve will hang for many minutes.

Python requests download files

This is a robust way to download files in Python with timeout. I name it url_retrieve to remind not to use the old one.

from pathlib import Path
import requests

def url_retrieve(url: str, outfile: Path):
    R = requests.get(url, allow_redirects=True)
    if R.status_code != 200:
        raise ConnectionError('could not download {}\nerror code: {}'.format(url, R.status_code))

    outfile.write_bytes(R.content)

Why isn’t this in requests? Because the Requests BDFL doesn’t want it

pure Python download files

If you can’t or don’t want to use requests, here is how to download files in Python using only built-in modules:

from pathlib import Path
import urllib.request
import urllib.error
import socket


def url_retrieve(
    url: str,
    outfile: Path,
    overwrite: bool = False,
):
    """
    Parameters
    ----------
    url: str
        URL to download from
    outfile: pathlib.Path
        output filepath (including name)
    overwrite: bool
        overwrite if file exists
    """
    outfile = Path(outfile).expanduser().resolve()
    if outfile.is_dir():
        raise ValueError("Please specify full filepath, including filename")
    # need .resolve() in case intermediate relative dir doesn't exist
    if overwrite or not outfile.is_file():
        outfile.parent.mkdir(parents=True, exist_ok=True)
        try:
            urllib.request.urlretrieve(url, str(outfile))
        except (socket.gaierror, urllib.error.URLError) as err:
            raise ConnectionError(
                "could not download {} due to {}".format(url, err)
            )

Read CDF files in Python

For CDF file read / write, pure Python + Numpy cdflib as cdflib is OS-agnostic, easy to install and performant. The .cdf file format is totally different from “.nc” NetCDF4 files, which are essentially specially formatted HDF5 files.

VisPy OpenGL for Python

OpenGL support is widespread. OpenGL enables extremely fast 2D and 3D animation–including from Python. With VisPy, OpenGL is easily used with Matplotlib-like syntax to make interesting 3-D plots from Numpy arrays. VisPy also has an advanced interface to OpenGL from Python.

Installing VisPy is easiest by:

conda install vispy

Examples:

git clone https://github.com/vispy/vispy

in the vispy/examples/demo directory are numerous examples. Try using the mouse scroll wheel to zoom on some demos.

New AGU LaTeX template for all AGU journals

In April 2019, AGU released a new LaTeX template that replaces the 2016 AGUJournal.cls and 2001 agutex.cls.

The new AGU template syntax has:

  • much condensed and improved format
  • single command to select journal
  • improved PDF generation and formatting for reviewers and the editor

Download the AGU LaTeX template and modify the example article.


On Linux, if you get error

File ’newtxtext.sty’ not found

try:

apt install texlive-fonts-extra

Git line endings on Windows with Cygwin / WSL

When using Git on Windows with Cygwin or Windows Subsystem for Linux, CRLF conflicts can falsely make a Git repo dirty. From Cygwin or WSL with line ending clashes, “git diff” will show ^M at the end of each line and fail merge on “git pull”. This can cause missed code changes or needless commits. We suggest to force LF line endings no matter what environment the user is in. Even Windows Notepad supports LF line endings.

git config --global core.autocrlf false

git config --global core.eol lf

This tells Git to use line endings \n on committed files. Use pre-commit to check for LF line endings before committing.

To cleanup mixed repository mixed line endings use dos2unix and mac2unix recursively and make a Git commit just for those changes.


To disregard line endings for diff and patch:

diff -Naur --strip-trailing-cr old.txt new.txt

Fix corrupt UTF8 files with Python

I find that sometimes files included in Python projects, for example Fortran files, have corrupted characters that are incorrect UTF-8 characters. Maybe it’s a case of bad OCR that also plagues LaTeX / BibTeX copy / paste references from journal websites. Thus, this method will also apply to BibTeX files.

Pure Python script find_bad_characters.py recursively:

  1. finds such corrupt files
  2. removes the corrupted characters
  3. backs up original file and overwrites if desired

Install Xrdp for VNC via Windows Remote Desktop

xrdp creates an RDP server on remote Linux PCs.

RDP client on laptop:

  • Windows: factory installed
  • macOS: RDP client
  • Linux: apt install xfreerdp

Setup Xrdp server: remote Linux PC has the Xrdp server. Install Xrdp and Openbox desktop

apt install xrdp openbox

Create ~/.xsession containing

exec openbox-session

Enable xrdp with new config

service xrdp restart

Openbox will show a grey screen upon typing password at Xrdp login. Right-click mouse to open menu. If only a gray/black screen, try editing /etc/xrdp/startwm.sh on the remote PC:

#!/bin/sh

if [ -r /etc/default/locale ]; then
. /etc/default/locale
export LANG LANGUAGE
fi

exec openbox-session