Scientific Computing

ssize_t for Visual Studio

The POSIX C type ssize_t is available on Unix-like systems in <sys/types.h>. Windows Visual Studio BaseTsd.h has SSIZE_T.

However, ssize_t is POSIX, but not C standard. It’s possible to define a signed size type “ssize_t” using “ptrdiff_t” for “ssize_t” in C and C++. Using ptrdiff_t instead of ssize_t is the practice of major projects like Emacs.

size_t bit width is guaranteed by C and C++ standards to have bit width not less than 16.

ptrdiff_t bit width is guaranteed by C standard to have bit width not less than 16, and C++ standard to have bit width not less than 17.

This example shows how to use ssize_t across computing platforms.


Related: C++ size_type property vs size_t

xarray to_netcdf() file compression

As with HDF5 and h5py, using xarray to_netcdf() to write netCDF files can losslessly compress Datasets and DataArrays, but file compression is off by default. Each data variable must have the compression option set to take effect. We typically only compress variables of 2-D or higher rank.

Notes:

  • Specify format="NETCDF4", engine="netcdf4" to allow a broader range of data types.
  • if “chunksizes” is not set, the data variable will not compress. We arbitrarily made the chunk sizes half of each dimension, but this can be optimized for particular data.
  • “fletcher32” is a checksum that can be used to detect data corruption.
  • Setting “.attr” of a data variable will be written to the netCDF file as well. This is useful to note physical units, for example.
from pathlib import Path
import xarray


def write_netcdf(ds: xarray.Dataset, out_file: Path) -> None:
    enc = {}

    for k in ds.data_vars:
        if ds[k].ndim < 2:
            continue

        enc[k] = {
            "zlib": True,
            "complevel": 3,
            "fletcher32": True,
            "chunksizes": tuple(map(lambda x: x//2, ds[k].shape))
        }

    ds.to_netcdf(out_file, format="NETCDF4", engine="netcdf4", encoding=enc)

Read image metadata with Python

The Python imageio package reads and writes numerous image formats and their metadata. The time and location of citizen science images are often critical to their interpretation. Not all cameras have GPS modules. Not all cameras have sufficiently accurately set clocks (including time zone).

A typical metadata item of interest is “DateTimeOriginal”. How this is defined and its accuracy is up to the camera implementation.

We show the reading of image metadata in a few distinct ways.

ImageIO read metadata

Get the image time using imageio.immeta:

import imageio.v3 as iio

from sys import argv
from pathlib import Path

fn = Path(argv[1]).expanduser()

meta = iio.immeta(fn)

for k in ("DateTimeOriginal", "DateTimeDigitized", "DateTime"):
    print(k, meta.get(k))

Consider that the timezone may need to be corrected.

ExifRead metadata

ExifRead Python module is powerful for reading EXIF image metadata.

If the camera had a GPS module, the location may be available. An ExifRead example of reading the EXIF GPS location:

import exifread

from sys import argv
from pathlib import Path

fn = Path(argv[1]).expanduser()

with open(fn, "rb") as f:
    tags = exifread.process_file(f)

latitude = tags["GPS GPSLatitude"]
longitude = tags["GPS GPSLongitude"]

print(f"{fn}  latitude, longitude: {latitude}, {longitude}")

Exif metadata

import exif

from sys import argv
from pathlib import Path

fn = Path(argv[1]).expanduser()

with open(fn, "rb") as f:
    tags = exif.Image(f)

latitude = tags.gps_latitude
longitude = tags.gps_longitude

print(f"{fn}  latitude, longitude: {latitude}, {longitude}")

GNU Make environment variables

These environment variables are common across build systems as a de facto standard, and assume a compiler like GCC or Clang environment variables.

Dynamic library path:

  • Linux: LD_LIBRARY_PATH
  • macOS: LIBRARY_PATH
  • Windows: must be on environment variable PATH

Include path (where .h C header files are located):

Linux / macOS: CPATH

An example GNU Make Unix-like shell command would be like:

LD_LIBRARY_PATH=/path/to/lib CPATH=/path/to/include make

List paths starting with dot

By default, the typical directory listing command “ls” does not show paths that start with a dot. That is, paths that start with a dot are hidden like “.ssh” or “.git” etc. Most shells will list all paths including those with a leading dot by:

ls -a

For PowerShell to list paths with a leading dot:

ls -Fo

“-Fo” is short for “-Force”.

set curl user agent

In general for programs that access the web, whether curl, Python, etc. web servers may block HTTP User Agent that doesn’t match typical graphical web browsers. The server filtering is often trivially overcome by setting a generic Mozilla user agent like “Mozilla/5.0”. For curl, this is done with the -A option.

curl -A "Mozilla/5.0" https://www.whatsmyua.info/api/v1/ua

User global .gitattributes

Organizations or users may have Git attributes they wish to apply to all repositories used on their computer user account. Similar to user global .gitignore, user global .gitattributes can be used to apply Git attributes to all user repositories:

git config --global core.attributesfile ~/.gitattributes

A Git attributes example application is picking distinct “git diff” commands for different languages. Additional .gitattributes templates are available for inspiration.

Python subprocess package executable

Paths to executables for Python subprocess should be handled robustly to avoid unexpected errors on end user systems that may not occur on the developer’s laptop or CI system.

NOTE: relative paths (names with slashes and/or “..”) are not allowed. That means “build-on-run” or “build-at-setup/install” executables must live at the same directory level as the resource specified.

Example: with black-box executable “amender.bin” that has been already built and exists in the package directory.


Alternatives have downsides for this application including:

setuptools.pkg_resources is not always installed on user systems.

__file__ is not always defined.

Consider performant Python stdlib importlib.resources for general package reference to package files. For PyTest test files, consider conftest.py to generate test files.