Scientific Computing

Force upgrade Windows

August 26, 2019

For certain computers, it may be necessary to force upgrade Windows. In general, the approach to force upgrade Windows version is:

make an external backup of files–a cloud service and / or removable storage like a USB drive. We usually don’t backup the entire PC, just manually drag over folders containing needed info, as it very well may be lost in this procedure.
obtain a sufficiently large USB 3 flash drive and necessary adapters (e.g. USB-C to USB 3) for your PC. USB 2 flash drives are painfully slow.
Download and run the Windows Media Creation Tool. Be sure the USB 3 drive is plugged in before running, and create a bootable flash drive using the tool.
After data backup, consider install option “choose what to keep” → Nothing. That erases all files to help ensure there isn’t any bit of bad configuration left over.

Generally Windows OS upgrades are a gamble that doesn’t always work, while hard reinstalls work unless there is a deeper problem like hardware failure or malware.

Convert animated GIF to PNG stack

August 18, 2019

Convert animated GIF to PNG stack using ImageMagick by:

magick in.gif out_%04.png

where 04 is governed by the number of images in the GIF–04 accommodates up to 10000 images.

GIFs are not a great format for science image data, because the palette is compressed to 8-bit (256 colors). For plotting reduced data, GIFs can be fine.

Fix Spyder IDE not visible

August 14, 2019

Spyder IDE is a complex but usually stable Python program. A problem symptom is Spyder not getting past the splash logo or not even showing the splash logo.

To totally reset Spyder (erasing all user preferences for Spyder), type in Terminal / Command Prompt:

spyder --reset

Normally, that fixes Spyder. To diagnose further, start Spyder from Terminal instead of OS Start menu, it might give some hints.

CUDA, cuDNN and NCCL for Anaconda Python

August 13, 2019

Access GPU CUDA, cuDNN and NCCL functionality are accessed in a Numpy-like way from CuPy. CuPy also allows use of the GPU in a more low-level fashion as well.

Before starting GPU work in any programming language realize these general caveats:

I/O heavy workloads may make realizing GPU benefits more difficult
Consumer GPUs (GeForce) can be > 10x slower than workstation class (Tesla, Quadro)

CUDA requires a discrete Nvidia GPU. Check for existence of an Nvidia GPU by:

Linux: a blank response means an Nvidia GPU is not detected.
```
lspci | grep -i nvidia
```
Windows: Look under the “render” tab to see if an Nvidia GPU exists.
```
dxdiag
```

Determine the Compute Capability of the GPU and install the correct CUDA Toolkit. CuPy is installed distinctly depending on the CUDA Toolkit version installed on your computer. Reboot.

CuPy syntax is very similar to Numpy. There are a large set of CuPy functions relevant to many engineering and scientific computing tasks.

import cupy

dev = cupy.cuda.Device()
print('Compute Capability', dev.compute_capability)
print('GPU Memory', dev.mem_info)

The should return like:

Compute Capability 75

If you get error like

cupy.cuda.runtime.CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version

This means the CUDA Toolkit version is expecting a newer Nvidia driver. The Nvidia driver can be updated via your standard Nvidia update program that was installed from the factory. “Table 1” of the CUDA Toolkit release notes gives the CUDA Toolkit required Driver Versions.

Examples:

Python PyCUDA Matrix Multiplication Benchmark matmul_cuda.py
non-Python Graphics Benchmarks

Alternatives to CuPy include Numba.cuda, which is a lower-level C-like CUDA interface from Python. CUDA for Julia is provided in JuliaGPU. Anaconda Accelerate was discontinued

Code cells in Python IDE

August 12, 2019

A code cell in popular Python IDEs including PyCharm and Spyder is created by line starting with # %%. This “code cell” is analogous to IPython code cells and Matlab code sections.

You will see like

import math

# %% user data
x = 3
y = 4
# %% main loop
for i in range(5):
    x += y

The code cells allow running sections of code in an IDE without the need to constantly set/unset breakpoints in the IDE. They also catch the eye of developers to delineate logical blocks of code in the algorithm.

We encourage the use of code cell syntax, even if you don’t use them in the IDE directly, as the IDE will highlight sections of code to visibly delineate these separate parts of the algorithm.

Git SSH with GitLab self-managed instances

August 9, 2019

GitLab Community Edition is open source. Anyone may host their own self-managed GitLab instance if desired instead of gitlab.com. Git SSH. For this example, we use Kitware’s CMake GitLab instance.

First, create an account on the self-managed GitLab instance and fork the desired repo. This will be available like

git clone https://gitlab.kitware.com/username/cmake

To git push using SSH, type:

git config --global url."ssh://gitlab.kitware.com/".pushInsteadOf https://gitlab.kitware.com/

Generate an SSH key–don’t reuse SSH keys between sites.

ssh-keygen -t ed25519 -f ~/.ssh/kitware

Go to the GitLab SSH Key page and add the contents of ~/.ssh/kitware.pub

Add to ~/.ssh/config:

Host gitlab.kitware.com
  User git
  IdentityFile ~/.ssh/kitware

Now checkout a new branch, make your changes according to project guidelines and submit a merge request.

Limitations of loading HDF5 files with xarray

August 9, 2019

xarray.open_dataset can open HDF5 files. However, unexpected HDF5 file layouts can cause Python to quietly crash without error message. This is true even with the minimum required versions of xarray, h5py and h5netcdf installed.

We don’t have a specific workaround for this other than to use h5py to build up an xarray Dataset variable-by-variable.

Duplicate GitHub Wiki

August 8, 2019

GitHub API v4 does not include Wikis.
GitLab API v4 does include Wikis

To duplicate a GitHub repo AND the GitHub Wiki benefits from scripting.

An example solution is in GitEDU. This requires manually clicking to enable the new wiki via a web browser for each repo wiki.

GitHub Wiki Git accessibility

In general, GitHub Wiki is just another Git repo. The URL for the GitHub Wiki is obtained by appending .wiki.git to the associated GitHub repo URL.

Example:

Main repo: github.invalid/username/reponame
Wiki repo: github.invalid/username/reponame.wiki.git

Related: Moving GitHub Wiki

Moving a GitHub Wiki

August 7, 2019

GitHub Wikis are not accessible via GitHub API v4. We can use a couple simple Git command line statements to move a GitHub Wiki.

For these examples, we assume that:

old wiki: github.invalid/username/repo1.wiki.git
new wiki: github.invalid/username/repo2.wiki.git

Copy GitHub Wiki to the laptop:

git clone --bare https://github.invalid/username/repo1.wiki.git

Browse to new Wiki and create blank Wiki

Mirror push Wiki to new repo:

git -C repo1 push --mirror https://github.invalid/username/repo2.wiki.git

Once you see the new Wiki is OK, remove the old Wiki pages if desired:

git -C repo1 rm *.md

git -C repo1 commit -am "deleted old wiki pages"

git -C repo1 push

Related: Duplicating GitHub Wiki

Why use Python context manager for file I/O?

August 6, 2019

One should almost always use a Python context manager when working with file I/O in Python. Context managers for Python I/O resources help avoid exceeding system resource limits. For long running jobs, context managers help avoid random crashes due to excess file I/O resource utilization from files left hanging open. There are edge cases where you do need to keep the handle open without context manager–for example, inside a for loop. In many cases, it may be better and easier to let the file open and close with the context manager.

It is also possible to create your own content managers with Python contextlib, which we use in georinex for example.

Context Manager examples: assuming:

from pathlib import Path

fn = Path('~/mydir/myfile').expanduser()

simple file I/O:

with fn.open('r') as f:
    line = f.readline()

Note, if just reading a whole file, consider pathlib.Path methods like:

txt = fn.read_text()

b = fn.read_bytes()

h5py:

import h5py

with h5py.File(fn, 'r') as f:
    data = f['myvar'][:]

NetCDF4:

import netCDF4

with netCDF4.Dataset(fn, 'r') as f:
    data = f['myvar'][:]