Limitations of loading HDF5 files with xarray

xarray.open_dataset gained the ability to open HDF5 files in 0.12.0. However, this can cause Python to quietly crash without error message, which can be quite confusing. This is true even with the minimum required versions of xarray, h5py and h5netcdf installed.

We don’t have a specific workaround for this other than to use h5py directly. We have used h5py for several years in high-stakes operations, including data analysis and data collection. Both h5py and netcdf4 Python modules work with context managers to avoid excess I/O resource consumption.

Using Git SSH with GitLab self-managed instances

Since GitLab Community Edition is open source, large projects like CMake may host their own self-managed GitLab instance. To make merge requests to such projects, one can configure Git SSH. For this example, we use Kitware’s CMake GitLab instance https://gitlab.kitware.com/cmake. This procedure works for any operating system, including Windows.

First, create an account on the self-managed GitLab instance and fork the desired repo. This will be available like

git clone https://gitlab.kitware.com/username/cmake

To git push using SSH, type:

git config --global url."ssh://gitlab.kitware.com/".pushInsteadOf https://gitlab.kitware.com/

Generate an SSH key–don’t reuse SSH keys between sites.

ssh-keygen -t ed25519 -f ~/.ssh/kitware

Go to the GitLab SSH Key page like

https://gitlab.kitware.com/profile/keys

and add the contents of ~/.ssh/kitware.pub

Add to ~/.ssh/config:

Host gitlab.kitware.com
  User git
  IdentityFile ~/.ssh/kitware

Now checkout a new branch, make your changes according to Contributing.rst and submit a merge request.

Duplicate GitHub Wiki

Related: Moving GitHub Wiki


The GitHub API v4 does not include Wikis. The GitLab API v4 does include Wikis.

Thus, to duplicate a GitHub repo AND the GitHub Wiki benefits from a little scripting help.

An example solution is in GitEDU. This requires manually clicking to enable the new wiki via a web browser for each repo wiki.

GitHub Wiki Git accessibility

In general, GitHub Wiki is just another Git repo. The URL for the GitHub Wiki is obtained by appending .wiki.git to the associated GitHub repo URL.

Example:

  • Main repo is https://github.com/username/reponame
  • Wiki repo is https://github.com/username/reponame.wiki.git

Moving a GitHub Wiki

Related: Duplicating GitHub Wiki


GitHub Wikis are not accessible via GitHub API v4. We can use a couple simple Git command line statements to move a GitHub Wiki.

For these examples, we assume that:

  • old wiki https://github.com/username/repo1.wiki.git
  • new wiki https://github.com/username/repo2.wiki.git

Copy GitHub Wiki to your laptop:

git clone --bare https://github.com/username/repo1.wiki.git

Browse to new Wiki https://github.com/username/repo2/wiki and create blank Wiki

Mirror push Wiki to new repo:

git -C repo1 push --mirror https://github.com/username/repo2.wiki.git

Once you see the new Wiki https://github.com/username/repo2/wiki is OK, remove the old Wiki pages if desired:

git -C repo1 rm *.md

git -C repo1 commit -am "deleted old wiki pages"

git -C repo1 push

Why use Python context manager for file I/O?

One should almost always use a Python context manager when working with file I/O in Python. Context managers for Python I/O resources help avoid exceeding system resource limits. For long running jobs, context managers help avoid random crashes due to excess file I/O resource utilization from files left hanging open. There are edge cases where you do need to keep the handle open without context manager–for example, inside a for loop. In many cases, it may be better and easier to let the file open and close with the context manager.

It is also possible to create your own content managers with Python contextlib, which we use in georinex for example.

Context Manager examples

These examples assume you’ve done something like

from pathlib import Path

fn = Path('~/mydir/myfile').expanduser()

simple file I/O

with fn.open('r') as f:
    line = f.readline()

Note, if just reading a whole file, consider pathlib.Path methods like:

txt = fn.read_text()

b = fn.read_bytes()

h5py

import h5py

with h5py.File(fn, 'r') as f:
    data = f['myvar'][:]

NetCDF4

import netCDF4

with netCDF4.Dataset(fn, 'r') as f:
    data = f['myvar'][:]

SSH Agent for WSL and Ubuntu

Related: Disable Gnome Keyring SSH Agent


SSH Agent remembers SSH Public Key authentication for a period of time. While native Windows has SSH built in, there is no straightforward way to have an SSH agent in Windows itself. Instead, one can use WSL for SSH agent as follows.

SSH agent setup

This works for Linux in general, including Windows Subsystem for Linux.

Add to ~/.bashrc:

if [ -z "$(pgrep ssh-agent)" ]; then
   rm -rf /tmp/ssh-*
   eval $(ssh-agent -s) > /dev/null
else
   export SSH_AGENT_PID=$(pgrep ssh-agent)
   export SSH_AUTH_SOCK=$(find /tmp/ssh-* -name agent.*)
fi

Open a new Terminal and type:

ssh-add -t 30m ~/.ssh/mygithubkey
-t 30m
remember authentication for a period of time (here, 30 minutes)

When done SSHing, you can optionally remove all SSH agent keys from RAM by

ssh-add -D

Tips

Add multiple SSH keys in one command by commands like:

ssh-add ~/.ssh/{mygithub,mybitbucket}

Notes

reference

Anaconda Python + Spyder on Windows Subsystem for Linux

Python on Windows can be used with Windows Subsystem for Linux. Using Python on WSL can be advantageous because of easier compiler access. This procedure includes the graphical Spyder IDE if you so desire.

Install Python in WSL

These commands are all from the Linux / WSL Terminal.

Download Miniconda (50 MB)

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Install Miniconda

bash Miniconda3*.sh

Press space bar a few times till it asks you to type yes and prepend the path to ~/.bashrc (the default choice).

Setup Spyder IDE in WSL

Ensure X11 prereqs are installed, including:

apt install libxcomposite libxss1

If Spyder won’t start, look in the error message for such missing libraries.

Setup X11 for WSL and then install Spyder:

conda install matplotlib spyder

Notes

You may be missing X11 prereqs, which will be specified in the error message on starting GUI programs like Spyder. Look above the error:

ModuleNotFoundError: No module named ‘PyQt5.QtWebKitWidgets’

to see if things like libxcomposite or libxss etc. need to be installed via apt install.

Matplotlib trouble?

Notes on setting up Matplotlib for WSL

old Windows 10: WSL Ubuntu 14.04

Windows builds with WSL Ubuntu 16.04 / 18.04 work fine.

Very old Windows 10 Builds using Ubuntu 14.04 won’t work with MKL. You’ll get errors when using MKL-using modules like Scipy or Numpy on

import scipy; scipy.test()

OMP: Error #100: Fatal system error detected. OMP: System error #22: Invalid argument

Workaround: install MKL-less versions of these packages by

conda install nomkl

VNCserver setup on Ubuntu

The Free TightVNC server works with Ubuntu and other modern Linux distros. In general the default 3-D Ubuntu desktop is not available over VNC, so we use a traditional 2-D desktop environment over VNC.

Setup VNC server

Install Linux VNC Server:

apt install tightvncserver openbox

Choose desktop environment on server PC–XFCE4, Openbox or other you prefer. Pick ONE of the following:

XFCE4

Install XFCE on server PC

apt install xfce4
apt remove xscreensaver xscreensaver-data

Create ~/.vnc/xstartup

#!/bin/sh
unset SESSION_MANAGER
startxfce4 &

Correct possible keystroke issues on server PC:

xfconf-query -c xfce4-keyboard-shortcuts -p /xfwm4/custom/'<'Super'>'Tab -r

Openbox

Openbox is the lightest-weight desktop environment–right click to open a menu. Otherwise, you just have a plain gray background, extremely minimal–good for embedded systems and old PCs.

Put into file ~/.vnc/xstartup

#!/bin/sh
unset SESSION_MANAGER
exec openbox-session &

Start VNC server

vncserver :1 -geometry 1200x700 -localhost

Create a file ~/startVNC.sh containing:

#!/bin/sh
vncserver :1 -geometry 1200x700 -localhost

and then from my server username

crontab -e

adding the line:

@reboot  /home/username/startVNC.sh

Encrypted /home

If you have an encrypted /home drive, VNC and SSH require configuration to allow decrypting home drive upon SSH login.

Setup VNC client

On your laptop, install VNC client

apt install tigervnc-viewer

Alternatively:

apt install vncviewer

Create a shell script:

#!/bin/bash
ssh -f -L 5901:localhost:5901 user@IPaddress sleep 1;
vncviewer  localhost::5901

Notes

Openbox-Message: Unable to find a valid menu file “/var/lib/openbox/debian-menu.xml”

Wwhen I get this error, I’m also unable to open a terminal.

  • You can leave your VNC desktop running – it is not the same as your local desktop.
  • It is a little tricky to share your local desktop reliably–I have done so with X11VNC, but it can be more trouble than it is worth! It’s MUCH easier to startup a new separate desktop session with vncserver or x11vnc
  • After installing a new desktop environment, at your next local login, you’ll need to rechoose the traditional Ubuntu desktop (it will then remember this choice).

  • reference: (defunct link) http://blog.zerosum42.com/2011/10/tech-fixing-tab-key-in-vnc.html

Security

Be sure that ports 5900-5999 are NOT exposed to outside world–VNC is NOT secure by itself! VNC must be tunneled over the Internet with SSH.

Ports exposed by the PC to the network are listed by:

ufw status

You should see only port 22 and any other ports only if you know what they are for.

Kill a frozen/undesired desktop

logout of VNC from your laptop, then:

vncserver -kill :1

Fix non-working right-click Openbox menu

You might need to create ~/.config/openbox/menu.xml with the content

<?xml version="1.0" encoding="utf-8"?>
<openbox_menu xmlns="http://openbox.org/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="/usr/share/openbox/menu.xsd">
    <menu id="root-menu" label="Openbox 3">
        <item label="Run Program">
            <action name="Execute">
                <execute>
                    gmrun
                </execute>
            </action>
        </item>
        <separator/>
        <item label="Terminal">
            <action name="Execute">
                <execute>
                    xterm
                </execute>
            </action>
        </item>
  </menu>
</openbox_menu>

Fix missing XFCE4 VNC icons

The default XFCE4 desktop may be missing menu icons (you see black squares or red X’s).

Settings → Appearance

  • Style: Xfce
  • Icons: ubuntu-mono-light
  • Fonts: turn hinting on if you want

Meson download, verify and extract compressed files

Related: CMake download file


Meson does not have built-in the ability to download any file. While this could also be done via a custom_target(), we do it via run_command() in meson.build. This technique uses only Python stdlib modules; no extra pip install is needed.

meson.build

run_command('python', 'meson_file_download.py', url, zipfn, '-hash', 'md5', md5hash, check: true)

run_command('python', 'meson_file_extract.py', zipfn, outpath, check: true)

meson_file_download.py

#!/usr/bin/env python3
"""
We use SystemExit as this will not blast the whole traceback to Meson.
Usually just a terse stderr will suffice and not overwhelm the Meson user.
"""
from pathlib import Path
import urllib.request
import urllib.error
import hashlib
import argparse
import typing
import socket


def url_retrieve(
    url: str,
    outfile: Path,
    filehash: typing.Sequence[str] = None,
    overwrite: bool = False,
):
    """
    Parameters
    ----------
    url: str
        URL to download from
    outfile: pathlib.Path
        output filepath (including name)
    filehash: tuple of str, str
        hash type (md5, sha1, etc.) and hash
    overwrite: bool
        overwrite if file exists
    """
    outfile = Path(outfile).expanduser().resolve()
    if outfile.is_dir():
        raise ValueError("Please specify full filepath, including filename")
    # need .resolve() in case intermediate relative dir doesn't exist
    if overwrite or not outfile.is_file():
        outfile.parent.mkdir(parents=True, exist_ok=True)
        try:
            urllib.request.urlretrieve(url, str(outfile))
        except (socket.gaierror, urllib.error.URLError) as err:
            raise SystemExit(
                "ConnectionError: could not download {} due to {}".format(url, err)
            )

    if filehash:
        if not file_checksum(outfile, filehash[0], filehash[1]):
            raise SystemExit("HashError: {}".format(outfile))


def file_checksum(fn: Path, mode: str, filehash: str) -> bool:
    h = hashlib.new(mode)
    h.update(fn.read_bytes())
    return h.hexdigest() == filehash


if __name__ == "__main__":
    p = argparse.ArgumentParser()
    p.add_argument("url", help="URL to file download")
    p.add_argument("outfile", help="filename to download to")
    p.add_argument("-hash", help="expected hash", nargs=2)
    P = p.parse_args()

    url_retrieve(P.url, P.outfile, P.hash)

meson_file_extract.py

#!/usr/bin/env python3 from pathlib import Path import argparse import zipfile import tarfile

def extract_zip(fn: Path, outpath: Path, overwrite: bool = False): outpath = Path(outpath).expanduser().resolve() # need .resolve() in case intermediate relative dir doesn’t exist if outpath.is_dir() and not overwrite: return

fn = Path(fn).expanduser().resolve()
with zipfile.ZipFile(fn) as z:
    z.extractall(str(outpath.parent))

def extract_tar(fn: Path, outpath: Path, overwrite: bool = False): outpath = Path(outpath).expanduser().resolve() # need .resolve() in case intermediate relative dir doesn’t exist if outpath.is_dir() and not overwrite: return

fn = Path(fn).expanduser().resolve()
if not fn.is_file():
    raise FileNotFoundError(fn)  # keep this, tarfile gives confusing error
with tarfile.open(fn) as z:
    z.extractall(str(outpath.parent))

if name == “main“: p = argparse.ArgumentParser() p.add_argument(“infile”, help=“compressed file to extract”) p.add_argument(“outpath”, help=“path to extract into”) P = p.parse_args()

infile = Path(P.infile)
if infile.suffix.lower() == ".zip":
    extract_zip(infile, P.outpath)
elif infile.suffix.lower() in (".tar", ".gz", ".bz2", ".xz"):
    extract_tar(infile, P.outpath)
else:
    raise ValueError("Not sure how to decompress {}".format(infile))

```

Switch from Python urllib.urlretrieve to requests for better features

Python’s urllib.request.urlretrieve doesn’t have a way to handle connection timeouts. This can lead to user complaints where they think your program is hanging, when really it’s a bad internet connection since urlretrieve will hang for many minutes.

Python requests download files

This is a robust way to download files in Python with timeout. I name it url_retrieve to remind myself not to use the old one.

from pathlib import Path
import requests

def url_retrieve(url: str, outfile: Path):
    R = requests.get(url, allow_redirects=True)
    if R.status_code != 200:
        raise ConnectionError('could not download {}\nerror code: {}'.format(url, R.status_code))

    outfile.write_bytes(R.content)

Why isn’t this in requests? Because the Requests BDFL doesn’t want it

pure Python download files

If you can’t or don’t want to use requests, here is how to download files in Python using only built-in modules:

from pathlib import Path
import typing
import urllib.request
import urllib.error
import socket


def url_retrieve(
    url: str,
    outfile: Path,
    overwrite: bool = False,
):
    """
    Parameters
    ----------
    url: str
        URL to download from
    outfile: pathlib.Path
        output filepath (including name)
    overwrite: bool
        overwrite if file exists
    """
    outfile = Path(outfile).expanduser().resolve()
    if outfile.is_dir():
        raise ValueError("Please specify full filepath, including filename")
    # need .resolve() in case intermediate relative dir doesn't exist
    if overwrite or not outfile.is_file():
        outfile.parent.mkdir(parents=True, exist_ok=True)
        try:
            urllib.request.urlretrieve(url, str(outfile))
        except (socket.gaierror, urllib.error.URLError) as err:
            raise ConnectionError(
                "could not download {} due to {}".format(url, err)
            )