Scientific Computing

CMake RESOURCE_LOCK vs. RUN_SERIAL advantages

CMake (via CTest) can run tests in parallel. Some tests need to be run not in parallel, for example tests using MPI that use lots of CPU cores, or tests that use a lot of RAM, or tests that must access a common file or hardware device. We have found that using the RUN_SERIAL makes whole groups of tests run sequentially instead of individually running sequentially when fixtures are used. That is, all the FIXTURES_SETUP run, then all FIXTURES_REQUIRED that have RUN_SERIAL. This is not necessarily desired, because we had consuming fixtures that didn’t have to wait for all the fixtures to be setup.

We found that using RESOURCE_LOCK did not suffer from this issue, and allows the proper test dependencies and the expected parallelism.

CMake Resource Groups are orthogonal to Resource Locks, and are much more complicated to use. There may be some systems that would benefit from Groups, but many can just use the simple Locks.


For simplicity, this example omits the necessary add_test() and just show the properties.

The test has an MPI-using quick setup “Quick1” and then a long test “Long1” also using MPI. Finally, we have a quick Python script “Script1” checking the output.

In the real setup, we have Quick1, Quick2, … QuickN and so on. When we used RUN_SERIAL, we had to wait for ALL Quick* before Long* would start. With RESOURCE_LOCK the tests intermingle, making better use of CPU particularly on large CPU count systems, and with lots of tests.

The name “cpu_mpi” is arbitrary like the other names.

set_property(TEST Quick1 PROPERTY RESOURCE_LOCK cpu_mpi)
set_property(TEST Quick1 PROPERTY FIXTURES_SETUP Q1)

set_property(TEST Long1 PROPERTY RESOURCE_LOCK cpu_mpi)
set_property(TEST Long1 PROPERTY FIXTURES_REQUIRED Q1)
set_property(TEST Long1 PROPERTY FIXTURES_SETUP L1)

set_property(TEST Script1 PROPERTY FIXTURES_REQUIRED L1)

Put Git revision in executable or library

Traceability of a binary artifact such as an executable or library can be improved by writing information about the Git repository status into the artifact itself. This is a finer-grained implementation of the version number we are accustomed to seeing in the command line interface of executables. This example doesn’t cover every possible thing to be traced, for example non-version controlled artifacts that are linked in. This example just covers the Git repo of the current CMake project. Nonetheless, those needing more advanced traceability can build upon this example.

Usually for easier reuse across projects, we put this in a separate CMake script file like gitrev.cmake and include it from the main CMake project.

See the example in Fortran2018-examples repo.

Numpy N-D rot90 flip

Rotating 3-D arrays in Python (and Matlab) is straightforward even for N-dimensional arrays.

Numpy rot90 rotates N-dimensional arrays clockwise (k positive) or counterclockwise (k negative).

numpy.rot90(data, k=2, axes=(1,2))

Matlab rot90 can also handle N-dimensional arrays, but doesn’t allow control over the rotated axes as Numpy does.

Numpy flip works for N-dimensional arrays along the specified axis, just like Matlab flip.

Matlab package import like Python

Matlab users can share code projects as toolboxes and/or packages. Matlab packages work for Matlab ≥ R2008a as well as GNU Octave. Matlab toolboxes work for Matlab ≥ R2016a and not GNU Octave. The packages format brings benefits to toolboxes as well.

Matlab namespaces: a key issue with Matlab vs. Python arise from that Matlab users often add many paths for their project. If any function names clash, there can be unexpected behavior as it’s not immediately clear which function is being used without further investigation of path ordering. As in Python and other languages, there is considerable benefit for using a package format where the function names are specified in their namespace.

addpath example: Matlab package format. Suppose project directory structure:

myproj
  utils
    mem1.m
  conversion
    deg1.m
  sys
    disk1.m

To use these functions, the end users do:

addpath(genpath('myproj'))

This is where the namespace can have clashes, and with large projects it’s not clear where a function is without further introspection.

package example: make this project a Matlab / Octave package by changing the subdirectories containing .m files to start with a “+” plus symbol:

myproj
  +utils
    mem1.m
  +conversion
    deg1.m
  +sys
    disk1.m

The end users simply:

addpath('myproj')

access specific functions like:

myproj.utils.mem1(arg1)

Then multiple subdirectories can have the same function name without clashing in the Matlab namespace. Suppose the function “mem1” is used frequently in another function. To avoid typing the fully resolved function name each time, use the import statement:

function myfunc()

import myproj.utils.mem1

mem1(arg1)

mem1(arg2)

Private functions: Matlab packages can have private functions that are only accessible from functions in that level of the namespace. Continuing the example from above, if we added function:

myproj
  +utils
    private
      mysecret.m

then only functions under +utils/ can see and use mysecret.m function. mysecret() is used directly, without import since it’s only visible to functions at that directory level.

Matlab .mltbx toolboxes became available in R2016a. The Matlab-proprietary toolbox format also allows end users to create their own packages containing code, examples and even graphical Apps. In effect .mltbx provides metadata and adds the package to the bottom of Matlab path upon installation. The installation directory is under (system specific)/MathWorks/MATLAB Add-Ons/Toolboxes/packageName. Whether or not the project uses .mltbx, the namespace of the project is kept cleaner by using a Matlab package layout.

Python ImportError vs. ModuleNotFoundError

Python raises ModuleNotFoundError when a Python module is not able to be found. This exception catches a much narrow range of faults than the parent exception ImportError.

Although we often like to make exception handling more specific, we have found as a practical matter that using ImportError in a try: except: exception handler is almost always the most appropriate choice. This is particularly true for modules like h5py that require compiled-language interfaces. For example, h5py relies on the HDF5 library. We have found a small percentage of systems with conflicting HDF5 library versions on the system path, which causes h5py to raise ImportError. In these cases, we usually wish to detect that the imported module is ready to work, not just whether it is found or not.

Example

For Python imports loading compiled-language modules, the following is generally recommended:

try:
    import h5py
except ImportError as e:
    h5py = None

def myfun():
    if h5py is None:
        raise ImportError(f"myfun() requires h5py, which failed to import with error {e}")
    # rest of myfun()

Matlab .empty array initialization

Many coding languages have objects that are useful as sentinels to indicate missing data or unused parameters. For example, if a boolean parameter’s state has not been checked, it’s a bit disingenuous to say available = false. Sentinel values give a state that can communicate that something is unknown or unset.

Some example sentinels:

In Matlab an empty array can be used as a sentinel.

Create empty arrays for Matlab data types by appending .empty to the data type name. Examples:

datetime.empty
string.empty
struct.empty

The commonly used isempty() works for any Matlab type:

function out = myfun(cfg)
arguments
  cfg struct = struct.empty
end

if isempty(cfg)
  cfg.lims = [0,100];
end

Detect Linux distro version

Windows and macOS have few maintained versions due to their commercial nature. In contrast, FOSS operating systems like BSD and Linux have hundreds of maintained distros. Only a few Linux distros dominate such as Debian, Ubuntu and Red Hat. Less common Linux distros are commonly based on popular distros.

For certain system management tasks and install scripts it’s useful to programatically identify which major Linux distro family the current OS belongs to. A standard method to detect Linux operating system version is via plaintext /etc/os-release. Prior to this de facto standard, older Linux distros used other files.

The algorithm we use to identify older and newer Linux distros is to first check the file “/etc/os-release” versus known distros. If “/etc/os-release” is not present, the algorithm looks for OS-specific files:

Distro ID file
Red Hat /etc/redhat-release
Debian /etc/debian_version
Ubuntu /etc/debian_version
Arch Linux /etc/arch-version