Scientific Computing

Hex C2A0 is UTF8 for non-breaking space

Dealing with text files from many sources, it’s not uncommon to get stray hex codes in the files. These characters may be UTF8 or some other character mapping. They may be invisible or show up as empty squares or other odd glyphs. We had numerous markdown files for this website that had been converted from a legacy blogging system. We observed in certain files invisible characters with hex code C2A0. This is UTF8 for non-breaking space.

Programmatically remove the C2A0 in-place with SED like:

sed -i 's/\xC2\xA0/ /g' myfile.txt

Overwrite already pushed changes with Git

Overwriting Git history requires caution and care. An unlimited amount of work can be permanently lost in doing so. Git repos should have “offsite” backup that is checkpointed–the backup retains more than just the last copy. Overwriting Git history should normally only be done on “feature” branches, not on the “main” branch.

Overwrite Git history: common Git development patterns may require contributors to overwrite feature branch commits to avoid cluttered Git history. This Git development pattern usually has the contributor fork the original repo.

Make a feature branch in the fork, allowing maintainers to edit

git switch -c myfeature

CI pipelines validate/lint contributor pushes

git push -u origin myfeature

If CI errors, require overwriting and force-pushing correction

git commit -am "fixup"

Squash oops/typo commits

git rebase -i HEAD~2 myfeature

The “~2” indicates how far back to allow squashing. To reach farther back in history, increase “2”. Squash the fixup commit(s) by changing “pick” to “f” and then save in the editor that opens automatically.

Once changes are correct, force push.


Related:

Convert CMake to Meson

CMake has been growing for 20 years, and many major projects have switched to CMake from legacy build systems. Meson can replace CMake or be used alongside CMake. CMake’s long lifetime leaves legacy CMake 2.x scripts with difficult to understand behavior and variable scope. CMake 3.x made considerable modernization, and proposed auxiliary declarative CMake language is being discussed by Kitware.

Meson’s syntax is more declarative than CMake and is non-Turing complete without user functions. Key Meson goals include making meson.build script easier to understand while speeding up the build process itself. Meson vs. CMake is not an either-or choice. Projects can provide both CMake and Meson scripts so that it’s easy for both build systems to be independently used.

pip install meson ninja

Meson most commonly uses the Ninja build backend, which we also recommend for CMake.

A typical new Meson build starts like:

meson build  # from top meson.build directory

meson compile -C build

By default, Ninja builds in parallel.

Convert CMakeLists.txt to meson.build using tools/cmake2meson.py, which makes a first pass at recursively converting CMakeLists.txt to meson.build. The developer will need to manually complete this conversion process, but this script helps eliminate some of the tedious parts.

Run Matlab code from Python with oct2py

Python can run Matlab code using GNU Octave via Oct2Py. Python transparently calls Matlab/Octave “.m” functions using GNU Octave instead of Matlab. oct2py uses GNU Octave to run most “.m” code that’s compatible with the GNU Octave version installed. Shared memory (RAM) or disk (temporary file) is used to transfer data between Octave and Python. There are several ways to install GNU Octave

Install oct2py Python ↔ Octave module:

pip install oct2py

Some Octave functions require package install.

If import oct2py does not find Octave or finds the wrong Octave, set the environment variable OCTAVE_EXECUTABLE with the full path to the Octave executable. It’s generally not recommended to add Octave to the system Path on Windows, as that can interfere with MinGW or MSYS2.

Matlab/Octave .m functions are transparently used from Python like:

from oct2py import Oct2Py
oc = Oct2Py()

oc.functionname(arg1,arg2,...)

Python via oct2py can use:

  • user functions (".m" files you create)
  • builtin functions e.g. svd()
  • package functions e.g. signal fir1()

Oct2Py can be greatly sped up by using a RAM drive (tmpfs) instead of the system temporary directory. This may be accomplished by:

from oct2py import Oct2Py
oc = Oct2Py(temp_dir='/run/shm')

oc.functionname(arg1,arg2,...)

Of course, replace /run/shm with your RAM drive location.

Advanced Octave functionality is split off into packages to:

  • speed up Octave startup
  • enhance stability and development cycles

Thus you’ll see pkg load ... commands where appropriate.

  1. create/reuse an .m function with the appropriate input & output variables.
  2. call this .m function using Oct2Py from Python

For example, Matlab/Octave fir1() is compared in tests/test_oct2py.py with scipy.signal.firwin().

A simpler Python script example is:

from oct2py import Oct2Py

k=5
p=0.2

with Oct2Py() as oc:
    oc.eval('pkg load signal')
    bmat = oc.fir1(k,p)
print(bmat)
# %%
import scipy.signal

bpy = scipy.signal.firwin(k+1,p)
print(bpy)

For your own .m files, simply call the functions with input/output arguments as in the oc.fir1() line of this example.


Related: call Matlab Engine from Python

Ninja bootstrap build

Ninja uses GitHub Actions for CI / CD to build and distribute Ninja binaries. If it’s necessary to compile Ninja from source, Ninja is quick to compile by:

git clone https://github.com/ninja-build/ninja

python configure.py --bootstrap

or with CMake:

cmake -B build

cmake --build build

Ninja build on Red Hat

The binary executables for Ninja 1.9.0 may not work on RHEL 7 due to incompatible libc. The symptom of this is like:

ninja: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21’ not found (required by ninja)

This was a known issue with the Ninja release artifact build process that was fixed.

Workaround: if a current version of Ninja is not available, use Ninja 1.8.2. This may work for other “older” Linux distros.

Why use CNAME flattening for apex domains

CNAME flattening was popularized in part by Cloudflare. We have used CNAME flattening successfully for years with multiple web-hosting providers. DNS A records are a pre-WWW artifact necessary to resolve the apex domain example.invalid from www.example.invalid or radar.example.invalid.

Implement CNAME flattening

To enable CNAME flattening on DNS records, first screenshot or otherwise backup DNS settings. Don’t do this experiment during busy times, better to use on a little-used or test website first to ensure it works correctly with the proposed setup.

  1. Determine the web address the web hosting provider puts the website at. E.g. for GitHub Pages it would be username.github.io or at Netlify username.netlify.app
  2. Remove DNS A record for example.invalid that points to a specific IP address
  3. Add a CNAME record pointing example.invalid to the server address from step #1

For GitHub username joe with GitHub Pages site at joe.github.io, with desired web address example.invalid: make a CNAME DNS record with example.invalid as an alias to joe.github.io.

A or AAAA DNS records are unneeded with CNAME.

Notes

  • Cloudflare CNAME flattening article
  • You may see some old 2014 blog posts initially complaining about CNAME flattening, but these initial hiccups have been resolved long ago.

Fortran submodule and CMake

Fortran submodule is supported by all popular Fortran compilers. While designed as a way to better manage public/private exposure of variables in large Fortran modules, submodules can also be used to seamlessly switch in/out advanced functionality.

For example, the GEMINI 3-D ionospheric model was created with raw binary file I/O. Since we had already written an object-oriented HDF5 interface, we integrated HDF5 file I/O into GEMINI. To help ensure a smooth transition with seamless fallback to raw binary if HDF5 wasn’t available, we used Fortran submodule with CMake. The user would call file_read and file_write subroutines with the same name, regardless of whether HDF5 was enabled. CMake would switch in submodule files depending on whether HDF5 was working or not.

Requirements

Fortran submodule requires adequate support from the Fortran compiler, and from the build system. CMake and Meson fully support Fortran submodule.

Compilers supporting Fortran submodule include:

  • Gfortran ≥ 6
  • Intel oneAPI
  • Cray
  • IBM XL / OpenXL
  • Flang
  • NAG
  • Nvidia HPC SDK

CMake

Rather than maintain a compiler feature table, in general we create simple test programs and verify that they compile–all automatically handled within CMake.

Insert into CMakeLists.txt

include(CheckSourceCompiles)

set(CMAKE_TRY_COMPILE_TARGET_TYPE STATIC_LIBRARY)
# save link time, only compile is needed

check_source_compiles(Fortran
"module b
interface
module subroutine d
end subroutine d
end interface
end

submodule (b) c
contains
module procedure d
end
end

program a
end"
f08submod)

if(NOT f08submod)
 return()  # or make FATAL_ERROR here
endif()

Selectively enable program functionality using submodule in CMakeLists.txt. This example is for HDF5:

add_library(io io.f90)

if(USE_HDF5)
  target_sources(io PRIVATE hdf5.f90)
else()
  target_sources(io PRIVATE raw.f90)
endif()

Complete examples of submodule are provided.

GitHub / GitLab Pages to Netlify

While both GitHub Pages and GitLab Pages are adequate for most personal, group and project pages, when website size and / or traffic have grown beyond what is feasible for these solutions, a more comprehensive hosting provider like Netlify may be considered. Netlify provides its own CDN, so those that had been using Cloudflare for DNS and CDN can configure Cloudflare to provide only DNS, if they so choose. Netlify is free for single users, allowing a private GitLab, GitHub or Bitbucket repo (or other suitable source) to deploy to a public custom domain HTTPS website. SSL certificates can be user-provided or can be created through Netlify for your custom domain.

Why transfer site to Netlify

Netlify provides a comparison of GitHub Pages and Netlify. GitLab Pages allows user choice of static site generator (Hugo, Jekyll, etc.). GitHub Pages can using GitHub Actions for Hugo. GitLab Pages private repos have a monthly runtime quota. Netlify has a monthly traffic quota on the free tier, and monthly build quota. For sites that are becoming very popular, GitHub Pages will simply want you to move elsewhere, while Netlify will have a paid plan to offer. This process may be too burdensome for those with limited IT or bandwidth resources, or simply the lack to time to learn how to do this.

Netlify uses webhooks to detect a git push to the website GitLab repo, and then builds the site. Netlify has a CDN and DDoS protection built-in. Even if the other features aren’t needed, a key feature is the ability to have the website code in a private repo with unlimited public website deployments and traffic.

Build minute limits (such as on GitLab and Netlify) can legimately be worked around by building the site locally on your laptop and pushing the publish-ready HTML.

Transfer site to Netlify

Note: This process may take down your site for a day or two if things go wrong. Even under normal conditions, all site visitors may need to allow an HTTPS exception due to SSL certificate error since Netlify requires all DNS servers to update before generating the domain certificate.

  1. if not already on GitLab, copy your website repo to GitLab (any name repo is fine).
  2. disable Auto DevOps and ensure no file named .gitlab-ci.yml exists.
  3. Login to Netlify using Gitlab, which will ask for your website repo.
  4. pick a custom Netlify subdomain like mycompany.netlify.app. Ensure this site is totally working before proceeding.
  5. Set Cloudflare or whatever your DNS provider is to point CNAME or A to mycompany.netlify.app (THIS IS THE PART THAT CAN TAKE YOUR MAIN WEBSITE DOWN!)
  6. Under Netlify Domain Management → HTTPS → Verify DNS config, ensure the verification completes. Until the DNS change propagates worldwide, your main HTTPS domain visitors are getting SSL verification errors. They can use https://mycompany.invalid instead of https://mycompany.invalid temporarily. Do this at a low traffic time range! If using Cloudflare CDN, the old records may point to DigitalOcean while the new records point to *.netlify.app

Git commit date / time / author edit

If the Git commits have already been pushed to remote, this process will require other users of the repo to reset, rebase or reclone, as for any Git operation that edits history. If the Git commits have not already been pushed, then this process will not require extra steps from other repo users.

Show Git commit AuthorDate and CommitDate by

git show <commit_hash> --pretty=fuller

for the most recent commit, simply:

git show --pretty=fuller

edit last commit only

To reset the author of the last commit to the current Git username and email, as well as setting AuthorDate and CommitDate to the current time:

git commit --amend --reset-author --no-edit
--reset-author
reset date/time/author to current
--no-edit
skip opening text editor

edit previous commits, including already pushed

Use git rebase -i as usual and for the commits to reset author / date, change the operation to e to edit each by:

git commit --amend --reset-author --no-edit

Specific date setting

Specific commit times can be set with a combination of the “–date” option and environment variable GIT_COMMITTER_DATE.

Reference

GitHub commit troubleshooting