Scientific Computing

Limitations of loading HDF5 files with xarray

xarray.open_dataset can open HDF5 files. However, unexpected HDF5 file layouts can cause Python to quietly crash without error message. This is true even with the minimum required versions of xarray, h5py and h5netcdf installed.

We don’t have a specific workaround for this other than to use h5py to build up an xarray Dataset variable-by-variable.

Duplicate GitHub Wiki

To duplicate a GitHub repo AND the GitHub Wiki benefits from scripting.

An example solution is in GitEDU. This requires manually clicking to enable the new wiki via a web browser for each repo wiki.

GitHub Wiki Git accessibility

In general, GitHub Wiki is just another Git repo. The URL for the GitHub Wiki is obtained by appending .wiki.git to the associated GitHub repo URL.

Example:

  • Main repo: github.invalid/username/reponame
  • Wiki repo: github.invalid/username/reponame.wiki.git

Related: Moving GitHub Wiki

Moving a GitHub Wiki

GitHub Wikis are not accessible via GitHub API v4. We can use a couple simple Git command line statements to move a GitHub Wiki.

For these examples, we assume that:

  • old wiki: github.invalid/username/repo1.wiki.git
  • new wiki: github.invalid/username/repo2.wiki.git

Copy GitHub Wiki to the laptop:

git clone --bare https://github.invalid/username/repo1.wiki.git

Browse to new Wiki and create blank Wiki

Mirror push Wiki to new repo:

git -C repo1 push --mirror https://github.invalid/username/repo2.wiki.git

Once you see the new Wiki is OK, remove the old Wiki pages if desired:

git -C repo1 rm *.md

git -C repo1 commit -am "deleted old wiki pages"

git -C repo1 push

Related: Duplicating GitHub Wiki

Why use Python context manager for file I/O?

One should almost always use a Python context manager when working with file I/O in Python. Context managers for Python I/O resources help avoid exceeding system resource limits. For long running jobs, context managers help avoid random crashes due to excess file I/O resource utilization from files left hanging open. There are edge cases where you do need to keep the handle open without context manager–for example, inside a for loop. In many cases, it may be better and easier to let the file open and close with the context manager.

It is also possible to create your own content managers with Python contextlib, which we use in georinex for example.

Context Manager examples: assuming:

from pathlib import Path

fn = Path('~/mydir/myfile').expanduser()

simple file I/O:

with fn.open('r') as f:
    line = f.readline()

Note, if just reading a whole file, consider pathlib.Path methods like:

txt = fn.read_text()

b = fn.read_bytes()

h5py:

import h5py

with h5py.File(fn, 'r') as f:
    data = f['myvar'][:]

NetCDF4:

import netCDF4

with netCDF4.Dataset(fn, 'r') as f:
    data = f['myvar'][:]

VNCserver setup on Ubuntu

The Free TightVNC server works with Ubuntu and other modern Linux distros. In general the default 3-D Ubuntu desktop is not available over VNC, so we use a traditional 2-D desktop environment over VNC.

Install Linux VNC Server:

apt install tightvncserver openbox

Choose desktop environment on server PC: XFCE4, Openbox or other you prefer. Pick ONE of the following:


Install XFCE on server PC

apt install xfce4
apt remove xscreensaver xscreensaver-data

Create ~/.vnc/xstartup

#!/bin/sh
unset SESSION_MANAGER
startxfce4 &

Correct possible keystroke issues on server PC:

xfconf-query -c xfce4-keyboard-shortcuts -p /xfwm4/custom/'<'Super'>'Tab -r

Openbox is the lightest-weight desktop environment–right click to open a menu. Otherwise, you just have a plain gray background, extremely minimal–good for embedded systems and old PCs.

Put into file ~/.vnc/xstartup

#!/bin/sh
unset SESSION_MANAGER
exec openbox-session &

Start VNC server

vncserver :1 -geometry 1200x700 -localhost

Create a file ~/startVNC.sh containing:

#!/bin/sh
vncserver :1 -geometry 1200x700 -localhost

and then from the server username

crontab -e

adding the line:

@reboot  /home/username/startVNC.sh

If you have an encrypted /home drive, VNC and SSH require configuration to allow decrypting home drive upon SSH login.

Setup VNC client

On your laptop, install VNC client

apt install tigervnc-viewer

Alternatively:

apt install vncviewer

Create a shell script:

#!/usr/bin/env bash
ssh -f -L 5901:localhost:5901 user@IPaddress sleep 1;
vncviewer  localhost::5901

Notes

Openbox-Message: Unable to find a valid menu file “/var/lib/openbox/debian-menu.xml”

Wwhen I get this error, I’m also unable to open a terminal.

  • You can leave your VNC desktop running – it is not the same as your local desktop.
  • It is a little tricky to share your local desktop reliably. X11VNC can be more trouble than it is worth. It’s MUCH easier to startup a new separate desktop session with vncserver or x11vnc
  • After installing a new desktop environment, at your next local login, you’ll need to rechoose the traditional Ubuntu desktop (it will then remember this choice).

Security

Be sure that ports 5900-5999 are NOT exposed to outside world–VNC is NOT secure by itself! VNC must be tunneled over the Internet with SSH.

Ports exposed by the PC to the network are listed by:

ufw status

You should see only port 22 and any other ports only if you know what they are for.


Kill a frozen/undesired desktop by logout of VNC from your laptop, then:

vncserver -kill :1

Fix non-working right-click Openbox menu by creating ~/.config/openbox/menu.xml with the content

<?xml version="1.0" encoding="utf-8"?>
<openbox_menu xmlns="https://openbox.org/" xmlns:xsi="https://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="/usr/share/openbox/menu.xsd">
    <menu id="root-menu" label="Openbox 3">
        <item label="Run Program">
            <action name="Execute">
                <execute>
                    gmrun
                </execute>
            </action>
        </item>
        <separator/>
        <item label="Terminal">
            <action name="Execute">
                <execute>
                    xterm
                </execute>
            </action>
        </item>
  </menu>
</openbox_menu>

Fix missing XFCE4 VNC menu icons under: Settings → Appearance. Set: Style-Xfce, Icons-ubuntu-mono-light, Fonts-turn hinting on if you want

Python requests vs. urllib.urlretrieve

Python’s urllib.request.urlretrieve doesn’t have a way to handle connection timeouts. This can lead to user complaints where they think your program is hanging, when really it’s a bad internet connection since urlretrieve will hang for many minutes.

Python requests download files

This is a robust way to download files in Python with timeout. I name it url_retrieve to remind not to use the old one.

from pathlib import Path
import requests

def url_retrieve(url: str, outfile: Path):
    R = requests.get(url, allow_redirects=True)
    if R.status_code != 200:
        raise ConnectionError('could not download {}\nerror code: {}'.format(url, R.status_code))

    outfile.write_bytes(R.content)

Why isn’t this in requests? Because the Requests BDFL doesn’t want it

pure Python download files

If you can’t or don’t want to use requests, here is how to download files in Python using only built-in modules:

from pathlib import Path
import urllib.request
import urllib.error
import socket


def url_retrieve(
    url: str,
    outfile: Path,
    overwrite: bool = False,
):
    """
    Parameters
    ----------
    url: str
        URL to download from
    outfile: pathlib.Path
        output filepath (including name)
    overwrite: bool
        overwrite if file exists
    """
    outfile = Path(outfile).expanduser().resolve()
    if outfile.is_dir():
        raise ValueError("Please specify full filepath, including filename")
    # need .resolve() in case intermediate relative dir doesn't exist
    if overwrite or not outfile.is_file():
        outfile.parent.mkdir(parents=True, exist_ok=True)
        try:
            urllib.request.urlretrieve(url, str(outfile))
        except (socket.gaierror, urllib.error.URLError) as err:
            raise ConnectionError(
                "could not download {} due to {}".format(url, err)
            )

Read CDF files in Python

For CDF file read / write, pure Python + Numpy cdflib as cdflib is OS-agnostic, easy to install and performant. The .cdf file format is totally different from “.nc” NetCDF4 files, which are essentially specially formatted HDF5 files.

VisPy OpenGL for Python

OpenGL support is widespread. OpenGL enables extremely fast 2D and 3D animation–including from Python. With VisPy, OpenGL is easily used with Matplotlib-like syntax to make interesting 3-D plots from Numpy arrays. VisPy also has an advanced interface to OpenGL from Python.

Installing VisPy is easiest by:

conda install vispy

Examples:

git clone https://github.com/vispy/vispy

in the vispy/examples/demo directory are numerous examples. Try using the mouse scroll wheel to zoom on some demos.

New AGU LaTeX template for all AGU journals

In April 2019, AGU released a new LaTeX template that replaces the 2016 AGUJournal.cls and 2001 agutex.cls.

The new AGU template syntax has:

  • much condensed and improved format
  • single command to select journal
  • improved PDF generation and formatting for reviewers and the editor

Download the AGU LaTeX template and modify the example article.


On Linux, if you get error

File ’newtxtext.sty’ not found

try:

apt install texlive-fonts-extra