Scientific Computing

CMake file(ARCHIVE_EXTRACT) syntax

CMake file(ARCHIVE_EXTRACT), is more robust and easy to use than the older cmake -E tar syntax. The option PATTERNS_EXCLUDE saves time and disk space by skipping the extraction of files that are not needed, such as documentation files.

cmake_minimum_required(VERSION 3.18)

if(CMAKE_VERSION VERSION_GREATER_EQUAL 4.5)
  set(_pat_excl_docs PATTERNS_EXCLUDE "docs/*")
endif()

file(ARCHIVE_EXTRACT
  INPUT ${archive}
  DESTINATION ${out_dir}
  ${_pat_excl_docs}
)

Don't modify CMake _ROOT variables inside custom Find*.cmake

If writing a custom Find*.cmake module, it’s important to avoid modifying the *_ROOT variable corresponding to the Find module’s package. For example, if writing FindMy.cmake, do not modify My_ROOT within FindMy.cmake as that value will be ignored. Instead of modifying the My_ROOT variable, set the output variable say my_root to the modified value. An example of when a My_ROOT variable might be modified is changing the file separator from \ to / on Windows with cmake_path(CONVERT My_ROOT TO_CMAKE_PATH_LIST my_root). Then, feed every “find_*()” command in FindMy.cmake a HINTS ${my_root} argument.

Windows coreutils from Microsoft

The ubiquitous GNU coreutils has long been missing from Windows. We found ourselves invoking coreutils utilities via WSL using wsl <coreutils command> to get access to these utilities on Windows until now. Microsoft has enhanced Rust-based uutils coreutils to run natively on Windows, and has made it available via WinGet:

winget install --id=Microsoft.Coreutils -e

Close / reopen the Terminal windows to use Coreutils, which has distinct conflict and availability of tools when using ComSpec Command Prompt vs PowerShell. Some commands are so POSIX-intrinsic that they are not available or relevant in Microsoft coreutils. Other coreutils commands overlap so much or conflict with Windows intrinsic commands that they are omitted from Microsoft coreutils. There are distinctions in command parsing of Microsoft coreutils vs. standard coreutils to be aware of.

A general issue across systems that use “coreutils”, say on embedded or other minimal systems where not all coreutils are available, is that it’s up to the developer to handle cases where some coreutils tools isn’t available or overloaded by something else. Build systems like CMake also handle these problems, like what exactly is “gcc” when multiple compilers masquerade as “gcc” – CMake inspects the version string to formally ID the compiler vendor. To use a script consuming coreutils on Windows, the script needs to handle issues like the following, where MSVC “link” is overriding coreutils “link”:

where.exe link
C:\Program Files\Microsoft Visual Studio\18\Community\VC\Tools\MSVC\*\bin\HostARM64\arm64\link.exe
C:\Program Files\coreutils\bin\link.exe

Have a look in Cargo.toml to see the Microsoft coreutils commands.

We have long augmented CMake projects with Bash and PowerShell scripts to handle tasks too awkward for CMake. Python usually has enough built-in capability in “os”, “pathlib”, and “shutil” to avoid needing coreutils in Python scripts.

GNU / Windows ?

Could Windows with Microsoft coreutils be considered a GNU / Windows hybrid - no, because Microsoft coreutils is based on uutils coreutils, which is an MIT-licensed reimplementation of GNU coreutils in Rust. GNU / Linux is a common term for Linux distributions that include GNU utilities, and Microsoft coreutils brings many of those utilities to Windows. While it’s not a full GNU environment, it does provide a significant portion of the GNU toolset on Windows, making it a sort of hybrid in terms of command-line utilities.

As background, the core components of a typical GNU/Linux system include:

  • Linux Kernel: Core of the system, handling hardware and process management
  • GNU Utilities: Essential tools for file management, text processing, and system administration
  • Display Server and Desktop Environment: X11 or Wayland for graphics, with desktop environments like GNOME, KDE Plasma, or Xfce
  • Package Manager: Software installation and updates (e.g., APT, DNF, Pacman).
  • Shell: Command-line interface for interacting with the system.

On Windows the core components include:

  • Windows Kernel: Core of the system, handling hardware and process management
  • Microsoft coreutils: Essential tools for file management, text processing
  • Display Server and Desktop Environment: Windows GUI for graphics and user interface
  • Package Manager: WinGet for software installation and updates
  • PowerShell: Command-line interface for interacting with the system

On macOS the core components include:

  • XNU Kernel: Core of the system, handling hardware and process management
  • BSD Utilities: Essential tools for file management, text processing, and system administration
  • Display Server and Desktop Environment: Quartz for graphics, with the macOS desktop environment
  • Package Manager: Homebrew for software installation and updates
  • Shell: Terminal with Zsh for command-line interface

Python vs. Julia vs. GNU Octave in research

Python has become a dominant language for scientific computing, data analysis, machine learning, and engineering workflows. Julia offers a modern high-performance syntax specifically designed for numerical and scientific computing. GNU Octave is an open-source MATLAB alternative with largely MATLAB compatibile syntax.

GNU Octave continues to be developed by John W. Eaton. Octave is a high-level interpreted language designed for numerical computations. The community continues to release major versions roughly yearly.

Octave shines when you need:

  • Near drop-in compatibility with MATLAB .m files (as long as proprietary toolboxes aren’t required).
  • A quick way to test whether it’s worth porting a MATLAB function or script to Python.
  • Calling MATLAB/Octave functions directly from Python using Oct2Py.

Octave includes its own growing set of packages (toolboxes) that extend its capabilities in areas like signal processing, control systems, and optimization.

Julia

Julia is a modern, high-performance language designed specifically for scientific and numerical computing. It aims to combine the ease of use of Python/MATLAB with the speed of C/Fortran.

Julia excels when:

  • You need high performance without dropping to lower-level languages (JIT compilation often delivers near-C speeds for numerical loops and linear algebra).
  • Working on large-scale simulations, differential equations, optimization, or other compute-intensive scientific tasks.
  • You want a clean, math-friendly syntax with advanced features like multiple dispatch, metaprogramming, and excellent built-in support for parallelism and distributed computing.
  • Reproducibility and package management are priorities (via its built-in package manager).

Julia has strong libraries for data science, machine learning, visualization, and more, though its overall ecosystem is smaller than Python’s. It’s particularly appealing for researchers writing performance-critical code from scratch.

Python

Key Advantages of Python:

  • Vast ecosystem: NumPy, SciPy, Pandas, Matplotlib, scikit-learn, PyTorch/TensorFlow, and thousands of other specialized libraries cover everything from microcontrollers to supercomputers.
  • Scalability: The same language and core libraries work from embedded devices → Raspberry Pi → laptops → HPC clusters.
  • Reproducibility: Open-source nature means anyone can run your code with pip install or conda environments—no license server or version-matching headaches.
  • Embedded / IoT support: Since 2014, MicroPython has brought a capable subset of Python (including exception handling, coroutines, etc.) to low-cost hardware like the Raspberry Pi Pico and many other MCUs/SoCs.

Python’s general-purpose nature also makes it easier to integrate with web apps, databases, GUIs, automation scripts, and version control workflows—areas where Octave is weaker.

Comparison Table

Use Case Recommended Tool Reason
Quick MATLAB script testing / porting GNU Octave Best compatibility
Teaching numerical methods any Octave for pure MATLAB feel; Python for broader skills; Julia for high-performance numerical work
Large-scale data analysis & ML Python Mature ecosystem and tooling
High-performance numerical simulations Julia or Python + Numba/Cython Julia for clean high-speed code
Embedded / low-cost hardware Python (MicroPython) Much broader hardware support
Reproducible open research Python or Julia No licensing barriers
Existing large MATLAB codebase Octave (or Python + oct2py) Minimize immediate rewrite cost

With Python and Oct2Py, Octave can be a bridge for those transitioning away from MATLAB. While Python is often a default choice for new projects, Julia can be a compelling alternative for high-performance numerical work.

Other Mathematical Software

These systems generally have smaller user bases than Python or MATLAB/Octave, largely due to historical momentum and narrower focus.

  • SageMath — Open-source computer algebra system with excellent symbolic math capabilities.
  • Scilab — Another free MATLAB-like environment.
  • GDL — Open-source IDL work-alike, common in astronomy and geophysics.
  • Mathematica / Maple — Proprietary tools with strong symbolic mathematics focus.

Configure shells Bash, Zsh, PowerShell

The default interactive shell for operating systems is typically:

  • Linux: Bash
  • macOS: Zsh
  • Windows: PowerShell

Note that the non-interactive shell may default to a simpler POSIX shell like Dash, so ensure that script shebang line specifies the intended shell for running scripts.

Each shell vendor has configuration files to change the default shell parameters. Shells typically have a persistent command history file that stores the commands that have been executed. This allows users to recall and reuse previous commands. A very long history may retain mistyped commands or commands that are no longer relevant.

Bash

Get the location of the Bash command history file:

echo "${HISTFILE:-$HOME/.bash_history}"

Edit the ~/.bashrc file to include the following settings:

# Number of commands remembered in the current session (in memory)
export HISTSIZE=500

# Number of commands saved to the history file on disk
# Keep at least a little bigger than HISTSIZE to handle duplicates
export HISTFILESIZE=1000

# Ignore both duplicate and empty commands
export HISTCONTROL=ignoredups:ignorespace

Zsh

Get the location of the Zsh command history file:

echo "${HISTFILE:-${ZDOTDIR:-$HOME}/.zsh_history}"

Edit the ~/.zshrc file to include the following settings:

# Number of commands remembered in the current session (in memory)
export HISTSIZE=500

# Number of commands saved to the history file on disk
# Keep at least a little bigger than HISTSIZE to handle duplicates
export HISTFILESIZE=1000

setopt hist_ignore_dups
setopt hist_ignore_space

PowerShell

Get the location of the PowerShell command history file:

(Get-PSReadLineOption).HistorySavePath

Edit the “$profile” file to include the following settings: ignore duplicates and limit the number of commands in the history.

Set-PSReadLineOption -MaximumHistoryCount 500 `
                     -HistoryNoDuplicates

Matlab for Windows on ARM64

As ARM64 CPUs become more popular or even the only choice for some computer models, the Mathworks has certified Matlab R2026a is supported to work on ARM64 via Prism emulation.

Windows ARM64 laptops with Snapdragon X or Snapdragon X2 CPUs make a complete replacement for Intel-x86 based laptops, with nearly all software working natively or via Prism emulation. The next piece of the puzzle is to have GPUs on ARM64 laptops or desktops. NVIDIA RTX Spark GPU + ARM64 CPU computers are said to be emerging later in 2026 from Asus, Dell, HP, MSI, Microsoft Surface Ultra, etc.

As ARM64 CPU devices with GPUs supported by Matlab become available, hopefully Matlab will enable support for GPU computing on ARM64 as well.

CMake shorten build paths

CMake can be configured to use shorter paths for build paths, which is important for large or complex projects on Windows where the 260 character path limit is a problem for some tools. This is done via CMAKE_INTERMEDIATE_DIR_STRATEGY which is a CMake environment variable as well as a CMake command-line option. The default is to use full paths for human readability, but for those occasions where the path length is a problem, this option can be set to SHORT to use shorter paths.

This example below is contrived to use a long source file path - the problem in practice comes from nested dependencies and build directories, which can easily exceed the 260 character limit on Windows when building a project with CMake. However, this example still demonstrates the issue and the solution with the shorten build path option.

cmake_minimum_required(VERSION 4.2)

project(soLong LANGUAGES CXX)

# make a long path to demonstrate the issue
set(long_path "${CMAKE_BINARY_DIR}/this/is/a/very/long/path/that/will/exceed/the/260/character/limit/on/windows/when/building/a/project/with/cmake/lets/see/if/it/works/with/the/shorten/build/path/option/just/to/make/sure/it/is/long/enough/to/exceed/the/limit/we/need/to/make/sure/it/is/long/enough/to/exceed/the/limit/")

string(LENGTH "${long_path}" L)

message(STATUS "Long path length: ${L} characters")
message(STATUS "CMAKE_INTERMEDIATE_DIR_STRATEGY: ${CMAKE_INTERMEDIATE_DIR_STRATEGY}")

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
message(STATUS "See file ${CMAKE_BINARY_DIR}/compile_commands.json for the compile commands with the long path")

file(MAKE_DIRECTORY "${long_path}")

file(GENERATE OUTPUT "${long_path}/main.cpp" CONTENT "int main() { return 0; }")

add_executable(soLong "${long_path}/main.cpp")

Compare the -o flag parameter between these two commands. The SHORT will be much shorter than FULL.

cmake -Bbuild -DCMAKE_INTERMEDIATE_DIR_STRATEGY=SHORT && cat build/compile_commands.json
cmake -Bbuild -DCMAKE_INTERMEDIATE_DIR_STRATEGY=FULL && cat build/compile_commands.json

CMake print cache variables

CMake can print cache variables during the configuration phase using any of these methods.

The “cmake” command itself can print cache variables to the console. Variable values may be set by passing -D options to the “cmake” command, or by editing them in the CMake GUI or “ccmake” interface.

cmake -Bbuild -LAH
-L
Print only the variable names and values, without help messages
-LA
Print all variables, including advanced ones that are not shown by default.
-LAH
Also print help message for each variable.

The CMake GUI is available if installed and a graphical desktop is available. Press “Configure” to see the cache variables. Values may be edited if desired.

cmake-gui -S . -B build

The “ccmake” Curses-based interface is available on non-Windows platforms, which can also edit cache variables.

ccmake -B build

From the “ccmake” interface, press “c” to configure, “t” to toggle visibility of Advanced variables that are not shown by default.

Matlab Terminal app

Mathworks published a Terminal emulator app for Matlab, which is performant and well-integrated with Matlab. It does not require any Matlab toolboxes and can be used on all platforms that Matlab supports. Matlab Terminal supports multiple tabs, customizable themes, and various shell and AI Agent environments. For those using a separate IDE like VS Code with Matlab, the integrated terminal in VS Code is still a good choice. For users who prefer to work directly in Matlab, this Matlab Terminal app is a great addition.