Scientific Computing

Python zipfile recursive write

This function recursively zips files in directories, storing only the relative path due to the use of the “arcname” parameter. Otherwise, you get the path relative to the root of your filesystem, which is rarely useful.

import zipfile
from pathlib import Path


def zip_dirs(path: Path, pattern: str) -> T.List[Path]:
    """
    recursively .zip a directory
    """

    path = Path(path).expanduser().resolve()

    dlist = [d for d in path.glob(pattern) if d.is_dir()]
    if len(dlist) == 0:
        raise FileNotFoundError(f"no directories to zip under {path} with {pattern}")

    for d in dlist:
        zip_name = d.with_suffix(".zip")
        with zipfile.ZipFile(zip_name, mode="w", compression=zipfile.ZIP_LZMA) as z:
            for root, _, files in os.walk(d):
                for file in files:
                    fn = Path(root, file)
                    afn = fn.relative_to(path)
                    z.write(fn, arcname=afn)
        print("write", zip_name)

Matlab parfor parallel plotting

Matlab plotting can be quite slow, as can Python Matplotlib plotting. Sometimes, Matlab parfor can be used to plot in parallel, when all parfor restrictions are met. However, parallel plotting in Matlab doesn’t always work, or may work on some operating systems but not others. So use great caution if making a “parfor” plotting loop–it may not work for others.

Problems when trying to do relatively simple plots in parallel:

Warning: A worker aborted during execution of the parfor loop. The parfor loop will now run again
on the remaining workers.
Error using distcomp.remoteparfor/rebuildParforController (line 194)
All workers aborted during execution of the parfor loop.
Warning: worker(s) crashed while executing code in the current parallel pool. MATLAB may attempt
to run the code again on the remaining workers of the pool, unless an spmd block has run. View the
crash dump files to determine what caused the workers to crash.

Matlab / Octave detect JVM

Matlab “.m” script functionality can be readily extended with Java and Python code. It may seem obvious to check if Python is available. It is also important to check if the Matlab JVM interface is available as Matlab may be running in -nojvm mode.

Detect if JVM is available from within a Matlab script by:

ok = usejava('jvm');
% boolean

GNU Octave also has a JVM interface that extends Octave functionality. As with Matlab, the Octave JVM interface availability is checked by

ok = usejava('jvm');
% boolean

Change file ownership

When desired and permitted by the computer filesystem, it’s possible to change file ownership. This may be necessary when a file was inadvertently created with root / admin ownership by mistake, and it’s necessary for a general user to edit or access the file. Changing file ownership can have unexpected consequences, like removing the ability of others to access the file or run a program depending on the file. Therefore, file ownership changes should be done only when necessary and with consideration for others who may depend on the file.

These examples assume the file “example.txt” is in the current directory and the user logged in should own the file. These tasks are similarly done in other languages such as Go or Julia.

As long as the user has filesystem permission, Python can easily change file ownership across operating systems.

import shutil
import getpass

shutil.chown("example.txt", user=getpass.getuser())

On Windows check ownership of the file by Command Prompt:

dir /q example.txt

or PowerShell:

(get-acl example.txt).owner

Windows uses the takeown command to change file ownership. For simplicity we assume the desired user is logged in and executing the command.

takeown /f example.txt

On macOS / Linux check ownership of the file by

ls -l example.txt

For simplicity we assume the desired user is logged in and executing the command chown:

chown $whoami:$whoami example.txt

Fix Python segmentation fault on exit

The “python_history” file contains recent commands entered into the Python interpreter. “python_history” files with incorrect permissions can cause a message on exiting Python like:

Error in atexit._run_exitfuncs:
PermissionError: [Errno 13] Permission denied

Find the owner of ~/.python_history

  • macOS / Linux: ls -l ~/.python_history
  • Windows / PowerShell: (get-acl ~/.python_history).owner

If it does not show the current username, that’s likely the problem.

Fix

This changes the file ownership to the current user.

  • macOS / Linux: chown $whoami:$whoami ~/.python_history
  • Windows PowerShell: takeown /f $home\.python_history

Notes

Git rewrite commit email addresses

git-filter-repo is a Python program that is generally recommended over Git filter-branch and BFG Repo Cleaner. Other users of your repo will get an “unrelated history” error so they will have to reset or reclone the repo.

Example

Suppose an incorrect email address was used to make previous Git commits. To update Git commit email address history from the Git repo directory:

git filter-repo --mailmap mailmap --force

where file “mailmap” has mailmap file format email addresses to change like:

Jane Doe <jane@new.com> <jane@old.com>
John Smith <john@new.com> <john@old.com>

Pip install develop mode PEP517

PEP517 / PEP518 introduced pyproject.toml as a supplement and even replacement for setup.py. Pyproject.toml has been adopted by major Python projects including PyTest. Some computers, especially those using system Python may be configured without a user site-packages directory. When pyproject.toml is present, systems without a Python user site-packages directory may fail on pip install -e . with errors including:

running develop WARNING: The user site-packages directory is disabled. error: can’t create or remove files in install directory

The fix below will work with or without pyproject.toml, installing just like usual:

  • pip install . → under user site-packages, a static copy of the package (any package code changes require package reinstall)
  • pip install -e . → Python-style link in user site-packages to this code directory (live development code)

Note: some systems require python3 -m pip instead of plain pip.

Fix

A general fix is to make setup.py contain site.ENABLE_USER_SITE = True. If the computer admin has locked down the user directory (very rare) this may still not work.

This installs this package, as well as any prerequisites specified and not already installed under the Python user directory site.USER_SITE.

#!/usr/bin/env python3
import site
import setuptools

# PEP517 workaround
site.ENABLE_USER_SITE = True

setuptools.setup()

Notes

  • Pip issue discussing this matter

Resume download with curl or Wget

curl and Wget are popular web interface command line utilities that among their many features can resume interrupted or canceled downloads. curl is a library with a command-line frontend, while Wget is a command line tool.

Daniel Stenberg’s curl has a permissive license that has perhaps led it to become very widespread, installed “from the factory” in operating systems from Windows, macOS, Linux to automobiles and many other embedded systems. curl’s man page is perhaps best used by text search.

curl -O -C - https://mysite.example
-C -
--continue-at - continue a previous transfer, automatically determining byte offset.

A distinct Wget capability is recursive downloads.

wget -C htttps://mysite.example

FindNetCDF.cmake with imported targets

NetCDF4 is a powerful scientific data format based on HDF5 standard. NetCDF4 is directly usable by ParaView including the CF (Climate and Format) metadata convention. We have created a FindNetCDF.cmake that efficiently finds NetCDF libraries for use in CMake.

Example:

Place FindNetCDF.cmake file under project cmake/ then from CMakeLists.txt:

list(APPEND CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake)

find_package(NetCDF COMPONENTS Fortran REQUIRED)

# example target
add_executable(main main.f90)
target_link_libraries(main PRIVATE NetCDF::NetCDF_Fortran)

Easy HDF5 Fortran interface

Using HDF5 from any language can be intimidating if directly using the low-level HDF5 API. HDF5 interfaces have sprung up for popular languages that Make HDF5 trivially easy to use, such as Python h5py. Now Fortran also has an easy, powerful object-oriented and functional HDF5 interface named h5fortran. h5fortran builds and connects to the project using CMake or Meson. h5fortran uses Fortran 2018 standard code and is tested across multiple compilers including GFortran and Intel oneAPI. The efficient h5fortran interface is polymorphic (rank/dimension and type-agnostic) from scalars through 7-D arrays. A companion library for NetCDF is nc4fortran.

Examples

H5fortran is simpler to use than even Matlab HDF5 interface.

Functional interface example:

use h5fortran

call h5write('foo.h5', '/x', x)

call h5read('bar.h5', '/y', y)

Object-oriented interface examples:

use h5fortran

type(hdf5_file) :: h5f

call h5f%open('test.h5', status='new')

call h5f%write('/value1', 123.)

call h5f%close()