HDF5 command line tools

HDF5 command line tool h5dump and h5ls are handy to quickly explore HDF5 files from the command line. They are particularly useful when accessing a remote computer such as HPC where the HDF5 files may be very large and would take a while to transfer to a local computer.

h5ls provides a high-level look at objects in an HDF5 file. Typically we start examining HDF5 files like:

h5ls -r my.h5

h5dump can print the entire contents of an HDF5 file to the screen. This can be overwhelming, so we typically print only the headers to start:

h5dump -H my.h5

Individual variables can be printed like:

h5dump -d myvar my.h5

Related: HDF5 data GUI

Download VTK test data

VTK test data can be useful to test ParaView data flows and code as a reference of known good data. The VTK file I/O documentation itself recommends using the example data files to test one’s own project software. A straightforward way to obtain the data is by “building” a single ExternalData target, where virtually all of the time is spent downloading.

git clone --recursive https://gitlab.kitware.com/vtk/vtk.git --depth 1

cmake -Bbuild -DVTK_BUILD_TESTING=ON

cmake --build build -t VTKData

Numerous data folders are created under build/ExternalData/Testing/Data. For example, VTKHDF data in HDF5 file is the file mandelbrot-vti.hdf

Python directed dependency graphs

pyproject.toml specifies Python package prerequisites and typically all Python metadata. This helps security by allowing extremely fast recursive machine-parsing of prerequisites without installing packages first. Generally, specify Python package prerequisites in pyproject.toml as much as possible.

Python packages should minimize the size of their directed dependency graph for best package longevity with minimum maintenance effort. However, the most effective use of programmer/scientist/engineer time generally comes from reusing code wherever appropriate. How do we evaluate quality of prereqs? Modern Python code includes these factors:

Long term archiving of Python software requires direct and indirect dependencies. This is commonly done by pip freeze, but provides no direct sense of module hierarchy. The techniques described below provide a detailed, zoomable hierarchical view of Python module dependencies.

Python dependency analysis where packages use setup.py to specify package prerequisites generally require modules to be installed to determine their dependencies. That is, setup.py is recursively executed for each module to determine what modules are needed overall. This is bad for automated security analysis, which is slowed greatly by needing to install packages to determine prereqs. Modern Python packages solve this problem by specifying most package configuration in pyproject.toml.

Currently, pipdeptree is the most practical solution to generate plots of Python directed dependency graphs. This method assumes:

  • self-test has adequate coverage to be meaningful for most users
  • packages only used as convenience methods for some users are under [project.optional-dependencies] in pyproject.toml
  • strictly necessary modules are specified
  • minimum Python version is specified
  • CI-only requirements are specified

The process below is targeted for packages used in “development mode” that is, not installed into site-packages, except for a link back to the code directory.

Install prereqs:

pip install virtualenv

In the Python package directory, create a new Python virtual environment, since pipdeptree depends on having only the analyzed package and its dependencies installed.

virtualenv testdep
. testdep/bin/activate

pip install pipdeptree[graphviz]

Install the package to examine (and whatever dependencies it automatically installs)

pip install -e .

Make a hierarchical dependency graph

pipdeptree

This should be a very short tree (unless testing with a big package). Try it with a simple package, seeing if the dependency list is expected.

Now create the directed dependency graph for the package. Install GraphViz by

  • Linux: apt install graphviz
  • Mac: brew install graphviz
  • Windows

and then:

pipdeptree --graph-output svg > dep.svg

View the SVG in web browser or image viewer software such as IrfanView.

Wrap up the previous discussion and scripts in this Bash script pydeptree.sh for a one-click Python dependency graph.

#!/usr/bin/env bash

set -o errexit

[[ ! -z $1 ]] && cd $1

virtualenv testdep     # it's OK if it already exists

. testdep/bin/activate

pip install pipdeptree[graphviz]

pip install -e .[tests]

pipdeptree --graph-output svg > dep.svg

. deactivate

eog dep.svg &  # image viewing program

Notes

Other dependency graph modules are not yet ready to use in my opinion due to the deficiencies noted in each section. Hence, they are included for reference.

To make Modulegraph useful, the output must be post-processed, as almost all of the output is system stdlib modules. Modulegraph is an established, maintained tool for creating a .dot dependency graph. It lists extremely verbose output. It’s necessary to post-process .dot output with pydot to make use of modulegraph output. What if we instead preemptively excluded from a list of known stdlib modules, removing say 98% of modulegraph output from the start?

pip install modulegraph

Examine a file’s requirements, creating a .dot graph.

python -mmodulegraph file.py -q -g > graph.dot
dot -Tsvg graph.dot > graph.svg

Modulegraph command line options


Snakefood is another dependency graph checker.


Dependency graphs are also easily created in Matlab and Fortran.

macOS Terminal key shortcuts

Regardless of the macOS Terminal shell, the key bindings are generally distinct versus Linux terminals. It is possible to use keybind commands in ~/.zshrc to make macOS Terminal key shortcuts work like Linux terminal emulators. Learning the macOS Terminal key shortcut defaults can be useful when at another person’s Mac laptop.

C / C++ exit status macro

Exit status by convention has integer zero to represent “OK” no error status. Non-zero status is generally considered a failure. Coding languages such as Fortran also have built-in syntax to manage program exit code returned to the operating system.

C++ defines EXIT_SUCCESS and EXIT_FAILURE macros in header cstdlib. C defines EXIT_SUCCESS and EXIT_FAILURE macros in header stdlib.h.

Because typical headers already included often #include <cstdlib> or #include <stdlib.h>, developers may not realize these exit status macros need to be included somewhere. As compilers transition to providing stdlib via C++20 modules and generally cleanup excessive includes from built-in headers, code may suddenly complain about missing exit status macros at build time.

We feel it’s a good practice to use exit status macros as a findable and readable indication that program flow is ending and returning to system. A best practice is to include the appropriate header in any code file where the exit status macros are used.

#include <stdlib.h>

int main(void){
  return EXIT_SUCCESS;
}
#include <cstdlib>

int main(){
  return EXIT_SUCCESS;
}

Anaconda Python clean unused packages

The conda clean command allows cleaning up cached downloads and unused packages. Over time, Conda’s cache can grow over several gigabytes, even more than 10 GB of disk storage. To clean the unused files, use a command like:

conda clean --all --verbose

Zoom Windows 64-bit client

Many Windows users of Zoom may have downloaded the 32-bit Zoom client. We have observed that manually reinstalling the Zoom 64-bit Windows client can significantly help avoid choppiness of Zoom audio/video under high CPU usage. In most cases, it’s good to use the 64-bit version of a program for a 64-bit CPU and operating system. For Zoom as with many other programs, if the 32-bit version is installed, the updater doesn’t go to 64-bit version. The user typically has to manually once reinstall the 64-bit version of the program and from there on the updates stay at 64-bit program. Latency-sensitive 32-bit Windows programs may not work as well as 64-bit programs on 64-bit Windows due to WOW64 emulation for 32-bit programs on 64-bit Windows.

CMake presets build options

CMake presets can have specific build preset option for particular build tools. This CMakePresets.json shows example build presets for Ninja. The preset name “ninja” in this example is arbitrary. As usual, the presets help avoid copying script parameters and the possibility of typos in duplicated script code, whether for CI or developers themselves.

For example, to have ninja “explain” why a target is dirty:

cmake --build --preset explain
{
  "version": 2,
  "configurePresets": [
  {
    "name": "ninja",
    "generator": "Ninja",
    "binaryDir": "${sourceDir}/build"
  }
  ],
  "buildPresets":[
    {
      "name": "explain",
      "configurePreset": "ninja",
      "nativeToolOptions": ["-d", "explain"]
    },
    {
      "name": "keep",
      "configurePreset": "ninja",
      "nativeToolOptions": ["-d", "keeprsp", "-d", "keepdepfile"]
    },
    {
      "name": "stats",
      "configurePreset": "ninja",
      "nativeToolOptions": ["-d", "stats"]
    }
  ]
}

Git pull HTTPS, push SSH

For many public Git repos, using HTTPS for “git fetch”, “git pull”, and other Git download operations has adequate security. A primary concern for downloading public Git content is verifying the content is genuinely from the desired author. A reasonable degree of confidence can be accomplished using HTTPS to download and verifying the author PGP signed Git commits. Git download operations over HTTPS are perhaps twice as fast as Git over SSH and use less CPU. Typically it is desired for Git to verify SSL certificates:

git config --global http.sslVerify true

When pushing Git commits, SSH can provide enhanced security. Since “git push” operations typically take longer than “git pull”, particularly where pre-commit hooks and PGP commit signing are used, SSH speed penalty on “git push” is often acceptable.

For developers there are speed benefits from a hybrid Git configuration where Git downloads use HTTPS and Git uploads use SSH. Git has intrinsic functionality for this setup in a global configuration. The one-setup setup below uses “https://” for the remote repo URL instead of “ssh://”. To upgrade existing local public repos, edit individual repo Git config by:

git -C <repo_dir> config --edit

To set globally (normally we do this):

git config --global url."ssh://github.com/".pushInsteadOf https://github.com/

git config --global url."ssh://gitlab.com/".pushInsteadOf https://gitlab.com/

This makes all GitHub and GitLab public repos push over SSH, unless overridden in a repo’s own Git config. Confirm by git remote -v in a repo.


If experiencing problems on “git push”, check this matches the desired Git repo:

git config --get remote.origin.url

In particular, it must NOT have a trailing slash like repo/

CMake and Meson compiler option family

On Windows, compiler option syntax is generally MSVC-like or GCC-like. On Windows, the compiler option families are:

  • MSVC-like: Intel oneAPI (IntelLLVM), Clang-CL, Visual Studio
  • GCC-like: GCC, Clang, non-Windows OS}.

Meson and CMake can each detect compiler option family. In general, the “else” branch would have further nested “if” to handle compiler options with distinct syntax.

CMake uses the MSVC variable to detect compiler option family. More specific compiler option selection is often handled with an if-else tree for each compiler ID and / or check_c_compiler_flag().

cmake_minimum_required(VERSION 3.1)

project(Hello LANGUAGES C)

if(MSVC)
  message(STATUS "${CMAKE_C_COMPILER_ID} is MSVC-like")
else()
  message(STATUS "${CMAKE_C_COMPILER_ID} is GCC-like")
endif()

Meson uses the .get_argument_syntax() property to detect compiler option family. Getting more specific compiler options can be done with an if-else tree for each compiler ID and / or the .has_argument() method.

project('hello', 'c')

cc = meson.get_compiler('c')

if cc.get_argument_syntax() == 'msvc'
  message(cc.get_id() + ' is MSVC-like')
else
  message(cc.get_id() + ' is GCC-like')
endif