Scientific Computing

Build executable from Python

We often use executables from Python with data transfer via:

  • stdin/stdout (small transfers, less than a megabyte)
  • temporary files (arbitrarily large data)

This provides a language-agnostic interface that we can use from other scripted languages like Matlab or Julia, future-proofing efforts at the price of some runtime efficiency due to the out-of-core data transfer.

Here is a snipping we use to compile a single C code executable from Python (from GeoRINEX program):

"""
save this in say src/mypkg/build.py
then from the code that needs the output executable, say "myprog.bin":

from .build import build
exe = "myprog.bin"
...
if not exe.is_file():
    build("src/myprog.c")
# code that passes data via stdin/stdout and/or files using subprocess.run()

"""
import subprocess
import shutil
from pathlib import Path


def build(src: Path, cc: str = None) -> int:
    """

    Parameters
    ----------

    src: pathlib.Path
        path to single C source file.
    cc: str, optional
        desired compiler path or name

    Returns
    -------

    ret: int
        return code from compiler (0 is success)
    """
    if cc:
        return do_compile(cc, src)

    compilers = ["cc", "gcc", "clang", "icx", "clang-cl"]
    ret = 1
    for cc in compilers:
        if shutil.which(cc):
            ret = do_compile(cc, src)
            if ret == 0:
                break

    return ret


def do_compile(cc: str, src: Path) -> int:
    if not src.is_file():
        raise FileNotFoundError(src)

    if cc.endswith("cl"):  # msvc-like
        cmd = [cc, str(src), f"/Fe:{src.parent}"]
    else:
        cmd = [cc, str(src), "-O2", f"-o{src.with_suffix('.bin')}"]

    ret = subprocess.run(cmd).returncode

    return ret

LTE cellular smartwatch RF signal performance

LTE smartwatches may get up to 90% of the communications range of a smartphone. Most providers have turned off (or are turning off) 2G and 3G so coverage may be dynamic. Generally Bluetooth headsets can be used with LTE smartwatches, which helps call quality for any phone device.

Mobile devices including smartwatches may switch frequency bands when going from idle to phone call or data usage:

E
2G EDGE, the oldest digital network mode still in use, very slow.
H
3G HSPA/HSPA+, good enough for basic web browsing and email.
4G
really good 3G. Carriers may call their upgraded 3G networks 4G.
LTE
actually using 4G LTE.
5G
not necessarily faster than LTE when in NSA (non-standalone mode), but can be much faster in SA (standalone mode).

The signal bars may jump up/down a few notches when going from idle to active due to the phone band switching e.g. 700 MHz vs. 1900 MHz. Apps like Network Cell Info can help reveal these behaviors.

CI Python package install

For continuous integration, it’s important to test the traditional package install

pip install .

along with the more commonly used in situ pip development mode

pip install -e .

Otherwise, the Python package install may depend on files not included in the MANIFEST.in file and fail for most end users who don’t use “pip install -e” option.

A particular failure this will catch on Windows CI is graft path/to/ where the trailing / will fail on Windows only.

Get CPU count from Python

Python psutil allows accessing numerous aspects of system parameters, including CPU count. We recommend using a recent version of PSutil to cover more computing platforms.

Ncpu = psutil.cpu_count(logical=False)

usually gives the physical CPU count.

PSutil uses Python script and compiled C code to determine CPU count–it’s not just a simple Python script.


Related: Matlab CPU count

CMake RESOURCE_LOCK vs. RUN_SERIAL advantages

CMake (via CTest) can run tests in parallel. Some tests need to be run not in parallel, for example tests using MPI that use lots of CPU cores, or tests that use a lot of RAM, or tests that must access a common file or hardware device. We have found that using the RUN_SERIAL makes whole groups of tests run sequentially instead of individually running sequentially when fixtures are used. That is, all the FIXTURES_SETUP run, then all FIXTURES_REQUIRED that have RUN_SERIAL. This is not necessarily desired, because we had consuming fixtures that didn’t have to wait for all the fixtures to be setup.

We found that using RESOURCE_LOCK did not suffer from this issue, and allows the proper test dependencies and the expected parallelism.

CMake Resource Groups are orthogonal to Resource Locks, and are much more complicated to use. There may be some systems that would benefit from Groups, but many can just use the simple Locks.


For simplicity, this example omits the necessary add_test() and just show the properties.

The test has an MPI-using quick setup “Quick1” and then a long test “Long1” also using MPI. Finally, we have a quick Python script “Script1” checking the output.

In the real setup, we have Quick1, Quick2, … QuickN and so on. When we used RUN_SERIAL, we had to wait for ALL Quick* before Long* would start. With RESOURCE_LOCK the tests intermingle, making better use of CPU particularly on large CPU count systems, and with lots of tests.

The name “cpu_mpi” is arbitrary like the other names.

set_property(TEST Quick1 PROPERTY RESOURCE_LOCK cpu_mpi)
set_property(TEST Quick1 PROPERTY FIXTURES_SETUP Q1)

set_property(TEST Long1 PROPERTY RESOURCE_LOCK cpu_mpi)
set_property(TEST Long1 PROPERTY FIXTURES_REQUIRED Q1)
set_property(TEST Long1 PROPERTY FIXTURES_SETUP L1)

set_property(TEST Script1 PROPERTY FIXTURES_REQUIRED L1)

Put Git revision in executable or library

Traceability of a binary artifact such as an executable or library can be improved by writing information about the Git repository status into the artifact itself. This is a finer-grained implementation of the version number we are accustomed to seeing in the command line interface of executables. This example doesn’t cover every possible thing to be traced, for example non-version controlled artifacts that are linked in. This example just covers the Git repo of the current CMake project. Nonetheless, those needing more advanced traceability can build upon this example.

Usually for easier reuse across projects, we put this in a separate CMake script file like gitrev.cmake and include it from the main CMake project.

See the example in Fortran2018-examples repo.

Numpy N-D rot90 flip

Rotating 3-D arrays in Python (and Matlab) is straightforward even for N-dimensional arrays.

Numpy rot90 rotates N-dimensional arrays clockwise (k positive) or counterclockwise (k negative).

numpy.rot90(data, k=2, axes=(1,2))

Matlab rot90 can also handle N-dimensional arrays, but doesn’t allow control over the rotated axes as Numpy does.

Numpy flip works for N-dimensional arrays along the specified axis, just like Matlab flip.

Python asyncio.run boilerplate

Concurrency is built into Python via asyncio. AsyncIO generators are implemented with yield much like synchronous generators. async for also simplifies expression of asynchronous for loops.

As in Julia, the expression of asynchronous structures in Python does not implement concurrent execution. Concurrent execution in Python is governed by collections of tasks or futures such as asyncio.gather and initiated by a runner such as asyncio.run

asyncio.subprocess

AsyncIO subprocess may need specific asyncio loop configuration. The options needed are not the same for every project, depending on the asynchronous functions used.

Example date_coro.py uses AsyncIO subprocess.

asyncio.open_connection

For networking apps asyncio.open_connection allows massive amounts of connection, as shown in findssh.

Matlab package import like Python

Matlab users can share code projects as toolboxes and/or packages. Matlab packages work for Matlab ≥ R2008a as well as GNU Octave. Matlab toolboxes work for Matlab ≥ R2016a and not GNU Octave. The packages format brings benefits to toolboxes as well.

Matlab namespaces: a key issue with Matlab vs. Python arise from that Matlab users often add many paths for their project. If any function names clash, there can be unexpected behavior as it’s not immediately clear which function is being used without further investigation of path ordering. As in Python and other languages, there is considerable benefit for using a package format where the function names are specified in their namespace.

addpath example: Matlab package format. Suppose project directory structure:

myproj
  utils
    mem1.m
  conversion
    deg1.m
  sys
    disk1.m

To use these functions, the end users do:

addpath(genpath('myproj'))

This is where the namespace can have clashes, and with large projects it’s not clear where a function is without further introspection.

package example: make this project a Matlab / Octave package by changing the subdirectories containing .m files to start with a “+” plus symbol:

myproj
  +utils
    mem1.m
  +conversion
    deg1.m
  +sys
    disk1.m

The end users simply:

addpath('myproj')

access specific functions like:

myproj.utils.mem1(arg1)

Then multiple subdirectories can have the same function name without clashing in the Matlab namespace. Suppose the function “mem1” is used frequently in another function. To avoid typing the fully resolved function name each time, use the import statement:

function myfunc()

import myproj.utils.mem1

mem1(arg1)

mem1(arg2)

Private functions: Matlab packages can have private functions that are only accessible from functions in that level of the namespace. Continuing the example from above, if we added function:

myproj
  +utils
    private
      mysecret.m

then only functions under +utils/ can see and use mysecret.m function. mysecret() is used directly, without import since it’s only visible to functions at that directory level.

Matlab .mltbx toolboxes became available in R2016a. The Matlab-proprietary toolbox format also allows end users to create their own packages containing code, examples and even graphical Apps. In effect .mltbx provides metadata and adds the package to the bottom of Matlab path upon installation. The installation directory is under (system specific)/MathWorks/MATLAB Add-Ons/Toolboxes/packageName. Whether or not the project uses .mltbx, the namespace of the project is kept cleaner by using a Matlab package layout.