Scientific Computing

Disable Visual Studio debug model window

Visual Studio executables built in Debug mode by default pop up modal debug windows if an unhandled exception occurs. This can be annoying to developers particularly when unit testing a project. On remote systems, modal windows can become a real issue if the modal window is accidentally off-screen. In such cases it is sometimes hard to get the modal window back to the main desktop to be closed.

Adding a few lines of code to the C++ program works around this issue by redirecting the error text to stderr console and not popping up the modal window. _CrtSetReportMode keeps the model window from appearing. _CrtSetReportFile redirects the message text to stderr so that the message can be diagnosed.

This is also relevant to continuous integration systems such as GitHub Actions, which may hang with an unrealized modal dialog otherwise.

#ifdef _MSC_VER
#include <crtdbg.h>
#endif

int main(){
#ifdef _MSC_VER
    _CrtSetReportMode(_CRT_ASSERT, _CRTDBG_MODE_FILE);
    _CrtSetReportFile(_CRT_ASSERT, _CRTDBG_FILE_STDERR);
    _CrtSetReportMode(_CRT_WARN, _CRTDBG_MODE_FILE);
    _CrtSetReportFile(_CRT_WARN, _CRTDBG_FILE_STDERR);
    _CrtSetReportMode(_CRT_ERROR, _CRTDBG_MODE_FILE);
    _CrtSetReportFile(_CRT_ERROR, _CRTDBG_FILE_STDERR);
#endif

// rest of program
}

Visual Studio memory leak detection

Visual Studio can detect memory leaks in programs with _CrtDumpMemoryLeaks. This minimal example doesn’t do the printout. Using these checks requires the project is build in Debug mode.

More complete working minimal example that prints the memory diagnostics with Visual Studio. On Linux, Valgrind can be used to detect memory leaks. Numerous other free memory checkers are available and work with CMake CTest frontend.

#ifdef _MSC_VER
#include <crtdbg.h>
#endif

int main(void){
char* c

c = malloc( 100 );
// unfreed memory, a deliberate leak

// near the end of the function to be checked
#ifdef _MSC_VER
  _CrtDumpMemoryLeaks();
#endif

return 0;
}

Matplotlib AVI / MP4 movie

Matplotlib on any platform can use FFmpeg, Avconv or Mencoder to directly write lossy or lossless compressed movies created from sequences of plots.

Instead of creating hundreds of PNGs, or skipping plots and missing details, Matplotlib movies of a large sequence of plots is highly effective for many processes that evolve across time and/or space.

Alternatively, convert a stack of PNGs to AVI. It’s simpler and often faster and more robust to use Matplotlib.animation.

Lossy

Quality: the default auto bitrate makes excessively compressed, blocky movies. Override the default auto-bitrate with the following snippet:

import matplotlib.animation as anim

Writer = anim.writers['ffmpeg']
writer = Writer(fps=15, codec='mpeg4', bitrate=1e6)
#
with writer.saving(fg, fn,100):
    # code to plot/update figure
    writer.grab_frame(facecolor='k')

Lossless

In matplotlib_writeavi.py, just four added lines of code do the AVI writing. First line tells Matplotlib to use FFmpeg. Second line tells Matplotlib to make a lossless FFV1 video at 15 frames/sec. One can optionally use codec='mpeg4', but lossy encoding can wash out details of plots. Third line says to use 100 DPI (smaller DPI–smaller file and movie size).

import matplotlib.animation as anim

#...

Writer = anim.writers['ffmpeg']
writer = Writer(fps=15, codec='ffv1')
# ...
with writer.saving(fg, fn, 100):
# ...
   writer.grab_frame(facecolor='k')

Troubleshooting

For problems playing back the .avi file, try omitting the codec='ffv1' parameter.

Minimum AVI frame rate: less than 3 fps can invoke bugs on VLC. VLC has trouble with slow frame rate video from any source.

macOS terminal SSH locale

The macOS terminal defaults to UTF8. When SSHing into a macOS computer from a non-macOS computer, or any computer with a different locale, there may be problems running programs on the remote where locale is important. For example, a Linux system with “C” locale may cause .zip archive extraction on remote macOS to fail like:

Pathname cannot be converted from UTF-8 to current locale.

Locally on the macOS computer (or using Remote Desktop over SSH), check locale with:

% locale

LANG="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_CTYPE="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_ALL=

while from Windows or Linux may result in:

% locale

LANG=""
LC_COLLATE="C"
LC_CTYPE="C"
LC_MESSAGES="C"
LC_MONETARY="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_ALL=

Fix

We resolved this issue by creating on the remote macOS computer a file “locale.sh” containing:

export LANG="en_US.UTF-8"
export LC_COLLATE="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
export LC_MESSAGES="en_US.UTF-8"
export LC_MONETARY="en_US.UTF-8"
export LC_NUMERIC="en_US.UTF-8"
export LC_TIME="en_US.UTF-8"

then run one time when needed:

source ~/locale.sh

This fixed an issue we had with CMake not extracting a .zip file for ExternalProject URL with the error noted at the top of this page.

Another workaround as noted above is to use Remote Desktop over SSH.

Python minimal package with pyproject.toml

Python packaging can be described in pyproject.toml alone per PEP 621. These packages are installable in live developer mode:

python -m pip install -e .

Or via PyPI like any other Python package. It can be most effective to put all project configuration, including Python package prerequisites in pyproject.toml alone as a single source of truth. pyproject.toml is human-readable and machine-parseable without first installing the package. Putting all package metadata into pyproject.toml instead of setup.py gives benefits including:

  • reproducible results
  • security risk mitigation
  • dynamic prerequisite tree based on Python version etc.
  • static or dynamic package version

This is an example of a minimal pyproject.toml that works all alone, no other metadata files required, except perhaps MANIFEST.in for advanced cases. The __version__ is contained in file mypkg/__init__.py as Python code:

__version__ = "1.2.3"

pyproject.toml:

[build-system]
requires = ["setuptools>=61.0.0", "wheel"]
build-backend = "setuptools.build_meta"

[project]
name = "mypkg"
description = "really awesome package."
keywords = ["random", "cool"]
classifiers = ["Development Status :: 5 - Production/Stable",
 "Environment :: Console",
 "Intended Audience :: Science/Research",
 "Operating System :: OS Independent",
 "Programming Language :: Python :: 3",
]
requires-python = ">=3.7"
dynamic = ["version", "readme"]

[tool.setuptools.dynamic]
readme = {file = ["README.md"], content-type = "text/markdown"}
version = {attr = "mypkg.__version__"}

PEP8 checking via flake8 is configured in .flake8:

[flake8]
max-line-length = 100
exclude = .git,__pycache__,doc/,docs/,build/,dist/,archive/
per-file-ignores =
  __init__.py:F401

MANIFEST.in is used to specify external files installed.

Classifiers are optional and help projects indexing in PyPI and search engines. Classifiers must be from the official classifier trove or they will fail when uploading a package to PyPI.

Python can easily import Fortran code using f2py. See this f2py example setup.py.

Git LFS disable

If one is not using Git LFS but has a system with Git LFS installed, it can be useful to uninstall Git LFS to avoid problems like:

Remote "origin" does not support the Git LFS locking API. Consider disabling it with:
  $ git config lfs.locksverify false

To remove Git LFS, simply type:

git lfs uninstall

Download VTK test data

VTK test data can be useful to test ParaView data flows and code as a reference of known good data. The VTK file I/O documentation itself recommends using the example data files to test one’s own project software. A straightforward way to obtain the data is by “building” a single ExternalData target, where virtually all of the time is spent downloading.

git clone --recursive https://gitlab.kitware.com/vtk/vtk.git --depth 1

cmake -Bbuild -DVTK_BUILD_TESTING=ON

cmake --build build -t VTKData

Numerous data folders are created under build/ExternalData/Testing/Data. For example, VTKHDF data in HDF5 file is the file mandelbrot-vti.hdf

Python directed dependency graphs

pyproject.toml specifies Python package prerequisites and typically all Python metadata. This helps security by allowing extremely fast recursive machine-parsing of prerequisites without installing packages first. Generally, specify Python package prerequisites in pyproject.toml as much as possible.

Python packages should minimize the size of their directed dependency graph for best package longevity with minimum maintenance effort. However, the most effective use of programmer/scientist/engineer time generally comes from reusing code wherever appropriate. How do we evaluate quality of prereqs? Modern Python code includes these factors:

Long term archiving of Python software requires direct and indirect dependencies. This is commonly done by pip freeze, but provides no direct sense of module hierarchy. The techniques described below provide a detailed, zoomable hierarchical view of Python module dependencies.

Python dependency analysis where packages use setup.py to specify package prerequisites generally require modules to be installed to determine their dependencies. That is, setup.py is recursively executed for each module to determine what modules are needed overall. This is bad for automated security analysis, which is slowed greatly by needing to install packages to determine prereqs. Modern Python packages solve this problem by specifying most package configuration in pyproject.toml.

Currently, pipdeptree is the most practical solution to generate plots of Python directed dependency graphs. This method assumes:

  • self-test has adequate coverage to be meaningful for most users
  • packages only used as convenience methods for some users are under [project.optional-dependencies] in pyproject.toml
  • strictly necessary modules are specified
  • minimum Python version is specified
  • CI-only requirements are specified

The process below is targeted for packages used in “development mode” that is, not installed into site-packages, except for a link back to the code directory.

Install prereqs:

pip install virtualenv

In the Python package directory, create a new Python virtual environment, since pipdeptree depends on having only the analyzed package and its dependencies installed.

virtualenv testdep
. testdep/bin/activate

pip install pipdeptree[graphviz]

Install the package to examine (and whatever dependencies it automatically installs)

pip install -e .

Make a hierarchical dependency graph

pipdeptree

This should be a very short tree (unless testing with a big package). Try it with a simple package, seeing if the dependency list is expected.

Now create the directed dependency graph for the package. Install GraphViz by

  • Linux: apt install graphviz
  • macOS: brew install graphviz
  • Windows

and then:

pipdeptree --graph-output svg > dep.svg

View the SVG in web browser or image viewer software such as IrfanView.

Wrap up the previous discussion and scripts in this Bash script pydeptree.sh for a one-click Python dependency graph.

#!/usr/bin/env bash

set -o errexit

[[ ! -z $1 ]] && cd $1

virtualenv testdep     # it's OK if it already exists

. testdep/bin/activate

pip install pipdeptree[graphviz]

pip install -e .[tests]

pipdeptree --graph-output svg > dep.svg

. deactivate

eog dep.svg &  # image viewing program

Notes

To make Modulegraph useful, the output must be post-processed, as almost all of the output is system stdlib modules. Modulegraph is an established, maintained tool for creating a .dot dependency graph. It lists extremely verbose output. It’s necessary to post-process .dot output with pydot to make use of modulegraph output. What if we instead preemptively excluded from a list of known stdlib modules, removing say 98% of modulegraph output from the start?

pip install modulegraph

Examine a file’s requirements, creating a .dot graph.

python -mmodulegraph file.py -q -g > graph.dot
dot -Tsvg graph.dot > graph.svg

Modulegraph command line options


Snakefood is another dependency graph checker.


C / C++ exit status macro

Exit status by convention has integer zero to represent “OK” no error status. Non-zero status is generally considered a failure. Coding languages such as Fortran also have built-in syntax to manage program exit code returned to the operating system.

C++ defines EXIT_SUCCESS and EXIT_FAILURE macros in header cstdlib. C defines EXIT_SUCCESS and EXIT_FAILURE macros in header stdlib.h.

Because typical headers already included often #include <cstdlib> or #include <stdlib.h>, developers may not realize these exit status macros need to be included somewhere. As compilers transition to providing stdlib via C++20 modules and generally cleanup excessive includes from built-in headers, code may suddenly complain about missing exit status macros at build time.

We feel it’s a good practice to use exit status macros as a findable and readable indication that program flow is ending and returning to system. A best practice is to include the appropriate header in any code file where the exit status macros are used.

#include <stdlib.h>

int main(void){
  return EXIT_SUCCESS;
}
#include <cstdlib>

int main(){
  return EXIT_SUCCESS;
}