Scientific Computing

PNG stack to animated GIF using FFmpeg

FFmpeg converts all PNGs in a directory into an animated GIF by:

ffmpeg -pattern_type glob -i '*.png' out.gif
-pattern_type glob -i '*.png'
collect all .png files in this directory

Windows FFmpeg file globbing

FFmpeg file globbing does NOT work on Windows, even with FFMPEG 4.x. The error on Windows is like:

Pattern type ‘glob’ was selected but globbing is not supported by this libavformat build

Windows FFmpeg users, consider using Windows Subsystem for Linux for file globbing with FFmpeg.

PNG stack to AVI using FFMPEG

FFmpeg losslessly converts all PNGs in a directory into a single lossless AVI video file by:

ffmpeg -framerate 5 -pattern_type glob -i '*.png' -c:v ffv1 out.avi
-framerate 5
show 5 PNG image frames per second.
-pattern_type glob -i '*.png'
collect all .png files in this directory
-c:v ffv1
use the lossless FFV1 codec

If you have problems playing back the .avi file, try omitting the -c:v ffv1 parameter. Don’t go below a framerate of about 3 frames/second because some viewers won’t work (e.g. VLC).

FFmpeg file globbing does NOT work on Windows, even with FFMPEG 4.x. The error on Windows is like:

Pattern type ‘glob’ was selected but globbing is not supported by this libavformat build

Windows FFmpeg users, consider using Windows Subsystem for Linux for file globbing with FFmpeg.

Overloading functions in Matlab and GNU Octave

Matlab and GNU Octave are constantly adding new functionality. However, legacy versions remain in use for years. Overloading a built-in function in Matlab and Octave requires logic to account for Octave providing many functions as m-files rather than builtin as in Matlab.

This example overloads isfile for single files.

Create a file “isfile.m” containing:

function ret = isfile(path)

if exist('isfile', 'builtin') == 5 || exist('isfile', 'file') == 2
  ret = builtin('isfile', path);
else
  ret = exist(path, 'file') == 2;
end

end

Using Ruby Gem install with GitHub Actions

To use Ruby quickly and easily in GitHub Actions, add this YaML snippet in your Job:

    - uses: actions/setup-ruby
      with:
        ruby-version: '2.x'

Example

A complete job (named integration) example where Ruby packages are called from Python is below. Example from Python Linguist.

  integration:
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout
    - uses: actions/setup-python
      with:
        python-version: '3.x'
    - uses: actions/setup-ruby
      with:
        ruby-version: '2.x'
    - run: gem install github-linguist --no-document
    - run: pip install -e .[tests]
    - run: pytest

Force apt to IPv4

Sometimes a remote host or ISP is temporarily misconfigured with regard to IPv6. In this case, apt update can simply hang, for example:

Connecting to archive.raspberrypi.org (2a00:1098:0:80:1000:13:0:5)

apt is forced to use IPv4 with “apt” option -o Acquire::ForceIPv4=true.

Example:

apt -o Acquire::ForceIPv4=true update

apt -o Acquire::ForceIPv4=true upgrade

Numpy can't read .zip files

ZIP files or GZ files and the like can be quick-and-dirty ways to compress individual data files for retrieval from remote sensors. In particular, the GeoRinex program has extensive capabilities for transparently (without extracting to uncompressed file) reading .zip, .z, .gz, etc. compressed text files, which benefit greatly from storage space savings. It was surprising to find that transparently processing similarly compressed binary data is not trivial, particularly with numpy.fromfile. Numpy has unresolved bugs with numpy.fromfile that preclude easy use with inline reading via zipfile.ZipFile or tarfile. Specifically, the .fileno attribute is not available from zipfile or tarfile, and numpy.fromfile() relies on .fileno among other attributes.

numpy.frombuffer is not generally suitable for this application either, because it does not advance the buffer position. We are not saying there’s no way around this situation, but we chose a more generally beneficial path.

Use HDF5

When raw data files need to be compressed and then later analyzed, we use HDF5. Even when the original program writing the raw binary data cannot be modified, a simple post-processing Python script with h5py reads the raw data and converts to lossless compressed HDF5 on the sensor. Then, when the data is analyzed out-of-core processing can be used, or at least the whole file doesn’t have to be read to retrieve data from an arbitrary location in the HDF5 file. This allows getting nearly all of the size and speed advantages of HDF5 without modifying the original program.

Delete empty zero-sized files

If faced with a large amount of arbitrarily named files that are empty (zero bytes) and it is desired to delete them, this can be easily done with GNU Findutils. macOS Homebrew findutils makes the command “gfind” in place of “find”.

Verify the file list to be deleted:

find ~/foo -type f -empty | sort

where ~/foo is the directory in which to delete the files and sort is used because in general the files are listed in random order. If satisfied, actually delete the empty files with:

find ~/foo -type f -empty -delete

Python using NaN or None as sentinel

Comparing to None instead of NaN is:

  • 4..50 times faster in CPython
  • more than 1000 times faster in PyPy3 with Numpy, same speed with math

Benchmarks using Intel Coffee Lake CPU with:

ipython

Python 3.7.3 IPython 7.7.0 Numpy 1.16.4

or PyPy3 with IPython

pypy3 -m IPython

Python 3.6.1 (PyPy3 7.1.1)

Numpy is well known to be slower at scalar operations than pure Python. But many data science and STEM application using arrays are vastly faster and more convenient with Numpy than pure Python methods.

from numpy import isnan
%timeit isnan(0.)
  • CPython: 428 ns ± 1.74 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Python NaN

from math import isnan
%timeit isnan(0.)
  • CPython: 45.7 ns ± 0.209 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
  • PyPy3: 0.988 ns ± 0.00506 ns per loop (mean ± std. dev. of 7 runs, 1000000000 loops each)

Python None

%timeit 0. is not None
  • CPython: 17 ns ± 0.328 ns per loop (mean ± std. dev. of 7 runs, 100000000 loops each)
  • PyPy3: 0.987 ns ± 0.0041 ns per loop (mean ± std. dev. of 7 runs, 1000000000 loops each)

Numba

using python-performance

python NoneVsNan.py
--> Numba NaN sentinel: 1.00e-07
--> Numba None sentinel: 1.00e-07
--> CPython NaN sentinel: 2.00e-07
--> Numpy NaN sentinel: 6.00e-07
--> CPython None sentinel: 1.00e-07

Force upgrade Windows

Over the years and major Windows releases, we have many times had to force upgrade Windows. This is especially so on development machines that see a lot of programs installed in weird locations, external hard drives used, etc.

In general, the approach to force upgrade Windows version is:

  1. make an external backup of files–this could be to a cloud service like Google Drive or OneDrive as well as unpluggable storage like a USB drive. We usually don’t backup the entire PC, just manually drag over folders containing needed info, as it very well may be lost in this procedure.
  2. obtain a USB 3 flash drive and necessary adapters (e.g. USB-C to USB 3) for your PC. USB 2 flash drives will be painfully slow. At this time, 8 GB or larger is required.
  3. Download and run the Windows Media Creation Tool. Be sure the USB 3 drive is plugged in before running, and create a bootable flash drive using the tool.
  4. To help ensure you only have to do this once, and after ensuring you have backed up any data, consider the most powerful install option. That is “choose what to keep” → Nothing. That erase all files to help ensure there isn’t any bit of bad configuration left over. You don’t want to have to keep repeating the upgrade.

I didn’t include screenshots etc. as while the particulars change over the years, the process has been the same since nearly the Windows 9x or even DOS days. Generally the OS upgrades are a gamble that doesn’t always work, while hard reinstalls naturally virtually always work. This is the case for Linux including Ubuntu as well.