Default MyPy type hint checks with .mypy.ini

MyPy is a Python type annotation checker. MyPy recursively checks all files in a Python project by typing:

mypy ~/myproject

Install

We typically use the PyPi MyPy instead of conda, to have the most recent MyPy version:

pip install mypy

Example config

It’s often useful to have a per-project MyPy configuration file to avoid excessive command line options. Put a file .mypy.ini in each Python project containing:

[mypy]
files = src/, scripts/

ignore_missing_imports = True
strict_optional = False
allow_redefinition = True
show_error_context = False
show_column_numbers = True

Where “files” is set appropriately for your project. Making a per-project files is strongly recommended to ensure files aren’t missed in the type check. One can make a system-wide ~/.mypy.ini, that is overridden by the per-project .mypy.ini.

isolate problem packages

Sometimes an external package adds type hinting that is incompatible with the current MyPy release. This is relatively rare, but was the case with Xarray. To ignore a package’s type hinting, add the following to .mypy.ini, where we assume we want to ignore xarray type checking.

[mypy-xarray]
follow_imports = skip

Notes

enhanced mypy usage http://calpaterson.com/mypy-hints.html

Get CPU count from Matlab

We use this function to capture the number of physical CPU cores available on a computer from Matlab. Like any such function it doesn’t always work, but we try 3 separate methods to help improve accuracy. Knowing the CPU count is useful when running mpiexec from Matlab.

function N = get_cpu_count()
%% get apparent number of physical CPU cores

if isoctave
  N = idivide(nproc, 2);  % assume hyperthreading
else
  N = maxNumCompThreads;
  if N < 2  % happens on some HPC
    N = feature('NumCores');
  end
  if N < 2 && usejava('jvm')
    % assume hyperthreading
    N = java.lang.Runtime.getRuntime().availableProcessors / 2;
  end
end

N = max(N, 1);

end


function isoct = isoctave()
isoct = exist('OCTAVE_VERSION', 'builtin') == 5;
end

Python CPU count

Get CPU count from Matlab

Python psutil allows accessing numerous aspects of system parameters, including CPU count. We recommend using a recent version of PSutil to cover more computing platforms.

Ncpu = psutil.cpu_count(logical=False)

usually gives the physical CPU count.

Matlab CPU count

CMake RESOURCE_LOCK vs. RUN_SERIAL advantages

CMake (via CTest) can run tests in parallel via the option

ctest --parallel 4

where “4” is the number of tests to run in parallel–set it approximately equal to the number of CPU cores in your system typically. Some tests need to be run not in parallel, for example tests using MPI that use lots of CPU cores, or tests that use a lot of RAM, or tests that must access a common file or hardware device. We have found that using the RUN_SERIAL makes whole groups of tests run sequentially instead of individually running sequentially when fixtures are used. That is, all the FIXTURES_SETUP run, then all FIXTURES_REQUIRED that have RUN_SERIAL. This is not necessarily desired, because we had consuming fixtures that didn’t have to wait for all the fixtures to be setup.

We found that using RESOURCE_LOCK did not suffer from this issue, and allows the proper test dependencies and the expected parallelism.

Example

For simplicity we omit the necessary add_test() and just show the properties.

The test has an MPI-using quick setup “Quick1” and then a long test “Long1” also using MPI. Finally, we have a quick Python script “Script1” checking the output.

In the real setup, we have Quick1, Quick2, … QuickN and so on. When we used RUN_SERIAL, we had to wait for ALL Quick* before Long* would start. With RESOURCE_LOCK the tests intermingle, making better use of CPU particularly on large CPU count systems, and with lots of tests.

The name “cpu_mpi” is arbitrary like the other names.

set_tests_properties(Quick1 PROPERTIES
  RESOURCE_LOCK cpu_mpi
  FIXTURES_SETUP Q1)

set_tests_properties(Long1 PROPERTIES
  RESOURCE_LOCK cpu_mpi
  FIXTURES_REQUIRED Q1
  FIXTURES_SETUP L1)

set_tests_properties(Script1 PROPERTIES
  FIXTURES_REQUIRED L1)

Notes

CMake Resource Groups are orthogonal to Resource Locks, and are much more complicated to use. There may be some systems that would benefit from Groups, but many can just use the simple Locks.

Put Git revision in executable or library

Traceability of a binary artifact such as an executable or library can be improved by writing information about the Git repository status into the artifact itself. This is a finer-grained implementation of the version number we are accustomed to seeing in the command line interface of executables. This example doesn’t cover every possible thing to be traced, for example non-version controlled artifacts that are linked in. This example just covers the Git repo of the current CMake project. Nonetheless, those needing more advanced traceability can build upon this example.

Usually for easier reuse across projects, we put this in a separate CMake script file like gitrev.cmake and include it from the main CMake project.

See the example in Fortran2018-examples repo.

Get Matlab HDF5 version

Matlab upgraded to HDF5 1.8.12 in R2015a. Matlab R2020b uses HDF5 1.8.12. HDF5 1.8.12 was released in November 2013. HDF Group official support for HDF5 1.8 ends in 2021.

Check Matlab HDF5 library version by:

[major,minor,rel] = H5.get_libversion()

Compatible HDF5 versions

Newer versions of the HDF5 library can write HDF5 1.8 files when specific options are used. This may be necessary to allow Matlab to read HDF5 files written by other applications.

Append PATH in GitHub Actions

One can globally set environment variables in GitHub Actions by using env: at the top level of a “.github/workflows/ci.yml” file, for example:

name: ci

env:
  CMAKE_GENERATOR: Ninja
  CC: gcc

In other cases like appending to PATH, this must be done dynamically but can still have global scope. In GitHub Actions this is done by writing to environment files.

Prepend PATH

To add “~/.local/bin” to PATH, under a run: stanza per operating system:

Linux / MacOS

echo "${HOME}/.local/bin" >> $GITHUB_PATH

Windows

Windows defaults to PowerShell, so the syntax is distinct from Unix shells:

echo "${HOME}/.local/bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append

Matlab websave SSL certificates

Matlab websave allows specifying details options to control HTTP behavior via weboptions. Typical options that are modified include Timeout and SSL Certificate checking bypass. While SSL certificate checking adds security to web operations, some HPC systems have old or broken certificates. Other systems may simply need environment variable SSL_CERT_FILE set to tell Matlab’s vendored cURL where the cert file is.

As a last resort, certificate checking can be turned off, but this opens up code / file integrity and concomitant security issues.

Configuration

A generally better solution than disabling certificate checking is to configuration your user profile to tell cURL and Git the location of the system certificates. For this example we assume the certificate file is at “/etc/ssl/certs/ca-bundle.crt”.

cURL SSL config

set environment variable by editing ~/.bashrc

export SSL_CERT_FILE=/etc/ssl/certs/ca-bundle.crt

This for example can fix issues with Matlab websave() that uses Matlab’s vendored cURL.

Git SSL config

Tell Git where the cert file is by:

git config --global http.sslCAInfo /etc/ssl/certs/ca-bundle.crt

Example

This example sets timeout to 15 seconds and specifies custom SSL cert location when environment variable SSL_CERT_FILE is set.

if isfile(getenv("SSL_CERT_FILE"))
  web_opts = weboptions('CertificateFilename', getenv("SSL_CERT_FILE"), 'Timeout', 15);
else
  web_opts = weboptions('Timeout', 15);
end

websave(saved_file, url, opts);