Scientific Computing

Install JDK

Install JDK by:

  • Windows: winget install Microsoft.OpenJDK.21 (find exact version by winget search Microsoft.OpenJDK).
  • macOS: brew install openjdk
  • Linux: apt install default-jdk

Restart Terminal, ensure that echo $JAVA_HOME or PowerShell echo $env:JAVA_HOME is not empty.

Python 3.12 Apple App Store conflict

LWN.net reports on changes to Python 3.13 urllib standard library. It was deduced that Apple rejected Python 3.12 apps due to a string in the Python stdlib that was rejected, regardless of code execution. There naturally was some very good discussion linked to in the LWN.net article that illustrates the conflict between closed commercial platforms with great financial might and open source software. The Python 3.13 patch has already been merged. A pull request backport patch for Python 3.12 has also been created, and illustrates the clean nature of the patch and new configure flag.

Github Actions dynamic environment variables

GitHub Actions jobs can have a run: step that dynamically sets environment variables for following steps in that job.

Dynamic job environment variables (such as appending to PATH) are done by writing to environment files.

Append to PATH

To add “~/.local/bin” to PATH with job scope, do like the following examples (distinct behavior for Windows).

The steps after the “run:” stanzas have the new value for the environment variable.

Linux / macOS:

- run: echo "${HOME}/.local/bin" >> $GITHUB_PATH

Windows defaults to PowerShell, so the syntax is distinct from Unix shells:

- run: echo "${HOME}/.local/bin" | Out-File -FilePath $env:GITHUB_PATH -Encoding utf8 -Append

Set arbitrary environment variable

Writing to environment files can set any environment variable. For example, to set environment variable “CMAKE_INSTALL_PREFIX” and “CMAKE_PREFIX_PATH” to “~/libs” for the following job steps:

Linux / macOS:

- run: echo "CMAKE_INSTALL_PREFIX=~/libs" >> $GITHUB_ENV

- run: echo "CMAKE_PREFIX_PATH=~/libs" >> $GITHUB_ENV

Windows:

- run: echo "CMAKE_INSTALL_PREFIX=$HOME/libs" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append

- run: echo "CMAKE_PREFIX_PATH=$HOME/libs" | Out-File -FilePath $env:GITHUB_ENV -Encoding utf8 -Append

Note: since jobs don’t normally interact without additional options, advanced techniques would be needed to make a dynamic workflow environment variable.

Static environment variables in GitHub Actions

GitHub Actions environment variables have distinct scopes:

  • Workflow
  • Job
  • Step

It’s trivial to set static environment variables in each of these scopes. Dynamically setting environment variables is also possible.

Workflow

Set static workflow environment variables in GitHub Actions by using env: at the top level of a “.github/workflows/ci.yml” file like:

name: ci

env:
  CTEST_PARALLEL_LEVEL: 0
  CMAKE_BUILD_PARALLEL_LEVEL: 4
  CMAKE_GENERATOR: Ninja
  CC: gcc

Job

Static job environment variables are set like:

jobs:

  base:
    runs-on: macos-latest

    strategy:
      matrix:
        cc: [gcc-13, clang]

    env:
      CMAKE_GENERATOR: Ninja
      CC: ${{ matrix.cc }}

Step

Set static step environment variables like:


    - run: cmake -B build
      env:
        CMAKE_GENERATOR: Ninja

Detect if program was compiled with optimizations

Users and developers might accidentally build a program or library without optimizations when they are desired. This could make the runtime 10 to 1000 times or more slower than it would be with optimizations. This could be devastating in computational cost on HPC and cause needless schedule delays. Programmatically detecting or using a heuristic to determine if a program was built with optimizations can help prevent this. Such methods are language-specific.

  • CMake, NDEBUG is set if CMAKE_BUILD_TYPE is Release or RelWithDebInfo.
  • Meson: NDEBUG is set if buildtype is release or debugoptimized with
project(..., default_options: ['b_ndebug=if-release'])

C / C++

There is currently no universal language standard method in C / C++ to determine if optimization was used on build. The presence of macro NDEBUG is used by the standard library to disable assertions. One could use if NDEBUG is defined as an indication if optimizations were used.

bool fs_is_optimized(){
// This is a heuristic, trusting the build system or user to set NDEBUG if optimized.
#if defined(NDEBUG)
  return true;
#else
  return false;
#endif
}

Fortran

If the Fortran code is compiled with preprocessing, a method using NDEBUG as above could be used. Fortran iso_fortran_env provides functions compiler_version and compiler_options. These could be used in a fine-grained, per compiler way to determine if optimizations were used.

Python

Distributed Python environments would virtually always be optimized. One can use heuristic checks to help indicate if the Python executable was built in debug mode. I am not yet aware of a universal method to determine if the CPython executable was built with optimizations.

import sysconfig

debug = bool(sysconfig.get_config_var('Py_DEBUG'))

HDF5 command line tools

HDF5 command line tools h5dump and h5ls are handy to quickly explore HDF5 files from the command line. Backup link to old documentation. They are particularly useful when accessing a remote computer such as HPC where the HDF5 files may be very large and would take a while to transfer to a local computer.


h5ls provides a high-level look at objects in an HDF5 file. Backup link to old documentation. Typically we start examining HDF5 files by printing the dataset hierarchy:

h5ls --recursive my.h5

Determine the filters used (e.g. was the data compressed):

h5ls --verbose my.h5

h5dump can print the entire contents of an HDF5 file to the screen. Backup link to old documentation. This can be overwhelming, so we typically print only the headers to start:

h5dump --header my.h5

Individual variables can be printed like:

h5dump --dataset=myvar my.h5

Determine the filters used (e.g. was the data compressed):

h5dump --properties --header --dataset=myvar my.h5

Related: HDF5 data GUI

C++ size_type property vs size_t

The C++ Standard Library uses size_type as a property of containers like std::vector, std::string, etc. This is generally recommended over using size_t directly.

Example C++ code snippets using size_type property:

std::vector<int> vec;

std::vector<int>::size_type L = vec.size();

//----------------------------------------------
std::string path = "/usr/bin:/usr/local/bin";
constexpr char pathsep = ':';

std::string::size_type start = 0;
std::string::size_type end = path.find_first_of(pathsep, start);

Related: ssize_t for Visual Studio

Install Intel oneAPI C++ and Fortran compiler

Intel oneAPI is a cross-platform toolset that covers several programming languages including C, C++, Fortran and Python. Intel oneAPI replaces Intel Parallel Studio. Intel oneAPI including the C++ “icpx” compiler, Fortran “ifx” compiler, and Intel MPI is free-to-use and no login is required to download oneAPI.

We suggest using the “online installer” download, which is a small download. The “online” installer can be copied over SSH to an HPC user directory for example and installed from the Terminal.

Windows requires Visual Studio Community to be installed first–IDE integration is optional and we don’t use it. Visual Studio integration is optional–if installed, cmake -G "Visual Studio 17 2022" can be used to generate Visual Studio project files.

Install the oneAPI Base Toolkit with options:

  • Math Kernel Library (oneMKL)
  • (optional) GDB debugger

Install oneAPI HPC toolkit with options:

  • Intel MPI library
  • Intel C++ compiler
  • Intel Fortran compiler

Usage

There are distinct usage patterns to access Intel oneAPI compilers on Windows vs. Linux.

Windows

On Windows a Start menu shortcut for a oneAPI command prompt is installed. Powershell can also use oneapi-vars.bat to set the environment variables.

If CMake Visual Studio generater is desired, ensure:

cmake -Bbuild -G "Visual Studio 17 2022" -T fortran=ifx

Linux

Set environment variables CC, CXX, FC via script

On Linux, oneAPI requires GNU GCC headers etc.. Some HPC systems have a too-old GCC version defaulting for Intel oneAPI. This can cause problems with C++ stdlib or other linking problems. If needed, set environment variable CXXFLAGS for Intel GCC toolchain if needed in custom “oneapi.sh” like:

export CXXFLAGS=--gcc-toolchain=/opt/rh/gcc-toolset-12/root/usr/

which can be determined like:

scl enable gcc-toolset-12 "which g++"

CI runners - stable vs. updated

CI runners across CI services often update software images regularly, perhaps weekly. This can break workflows, but reflects user devices.

GitHub Actions updates the runners weekly or so. A few times a year on average across projects and operating system this may require updating the CI YaML configuration. Apple updates of XCode a few times a year this can disrupt end users and CI runs.

To have a version stable CI image would generally require private on-premises CI like Jenkins or GitHub Actions for on-premises. Those on-premises CI runners then need maintenance.

The key issue with such frozen CI runners is they are out of date with what end users have. For example, macOS with Homebrew is probably the majority of scientific computing users besides HPC. Homebrew updates often and breaks occur across projects a few times a year. Better to catch that in CI rather than on end user devices.