CMake FetchContent vs. ExternalProject

Making multiple software projects work together is readily done by the build system:

instead of Git submodule or monorepo.

Meson subproject and CMake ExternalProject keep project namespaces separate. Meson subproject and CMake FetchContent download and configure all projects at configure time. CMake FetchContent comingles the CMake project namespaces. FetchContent can be easier to use than ExternalProject if you control both software projects' CMake scripts. If you don’t control the “child” project, it may be better to use ExternalProject instead of FetchContent.

For these examples, suppose we have a top-level project “parent” and a “child” project containing a library that is desired in parent. Suppose the child project can be built standalone (by itself) but also may be used directly from other CMake projects.

project CMAKE_SOURCE_DIR CMAKE_BINARY_DIR PROJECT_SOURCE_DIR
parent ~/foo ~/foo/build ~/foo
child: standalone ~/bar ~/bar/build ~/bar
child: CMake ExternalProject ~/foo/build/child-prefix/src/child ~/foo/build/child-prefix/src/child-build ~/foo/build/child-prefix/src/child
child: CMake FetchContent ~/foo ~/foo/build ~/foo/build/_deps/child-src

FetchContent

FetchContent populates content from the other project at configure time. FetchContent populates the “child” project with default values from the “parent” project. Varibles set in the “child” project generally do not affect the “parent” project unless specifically used from the “parent” project.

From “parent” project CMakeLists.txt:

cmake_minimum_required(VERSION 3.14)
project(parent Fortran)

include(FetchContent)
FetchContent_Declare(child
  GIT_REPOSITORY https://github.invalid/username/child.git
  GIT_TAG develop   # it's much better to use a specific Git revision or Git tag for reproducibility
)

FetchContent_MakeAvailable(child)

# your program
add_executable(myprog main.f90)
target_link_libraries(myprog mylib)  # mylib is from "child"
FetchContent_MakeAvailable
make “child” code configure, populating variables and targets as if it were part of “parent” CMake project.

suppose “child” project CMakeLists.txt contains:

project(child LANGUAGES Fortran)

add_library(mylib mylib.f90)
target_include_libraries(mylib INTERFACE ${CMAKE_CURRENT_BINARY_DIR}/include)
set_target_properties(mylib PROPERTIES
  Fortran_MODULE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/include)

The child project CMAKE_BINARY_DIR and CMAKE_SOURCE_DIR will be those of parent project. That is, if the parent project is in ~/foo and the build directory is ~/foo/build, then the child project in ~/childcode called by FetchContent will also have CMAKE_SOURCE_DIR of ~/foo and CMAKE_BINARY_DIR of ~/foo/build. So be careful in the child project when using such variables that may be defined by parent projects. This is why projects that aren’t specifically designed to work together may be better joined by ExternalProject. A typical technique within the child project that can operate standalone is to refer to CMAKE_CURRENT_SOURCE_DIR instead of CMAKE_SOURCE_DIR as the latter will break when used from FetchContent.

IMPORTANT: When using if() clauses to determine execution of FetchContent, ensure that the FetchContent stanzas are executed each time CMake is run. Otherwise, the FetchContent targets may fail to be available or may have missing target properties on CMake rebuild.

ExternalProject

ExternalProject populates content from the other project at build time. This means the other project’s libraries are not visible until the parent project is built. Since ExternalProject does not combine the project namespaces, ExternalProject may be necessary if you don’t control the other projects.

ExternalProject will not activate without the add_dependencies() statement. Upon cmake --build of the parent project, ExternalProject downloads, configures and builds.

From “parent” project CMakeLists.txt:

project(parent LANGUAGES Fortran)

include(ExternalProject)

set(mylist "a;b;c")
# passing a list to external project is best done via CMAKE_CACHE_ARGS
# CMAKE_ARGS doesn't work correctly for lists

set_directory_properties(PROPERTIES EP_UPDATE_DISCONNECTED true)
# don't repeatedly build ExternalProjects.
# dir prop scope: CMake_current_source_dir and subdirectories

set(child_ROOT ${PROJECT_BINARY_DIR}/child)

ExternalProject_Add(CHILD
  GIT_REPOSITORY https://github.com/scivision/cmake-externalproject
  GIT_TAG develop  # it's much better to use a specific Git revision or Git tag for reproducability
  CMAKE_ARGS -DCMAKE_INSTALL_PREFIX:PATH=${child_ROOT}
  CMAKE_CACHE_ARGS -Dmyvar:STRING=${mylist}   # need variable type e.g. STRING for this
  CONFIGURE_HANDLED_BY_BUILD ON
  BUILD_BYPRODUCTS ${child_ROOT}/${CMAKE_STATIC_LIBRARY_PREFIX}timestwo${CMAKE_STATIC_LIBRARY_SUFFIX}
)

file(MAKE_DIRECTORY ${child_ROOT}/include)  # avoid race condition

add_library(timestwo STATIC IMPORTED GLOBAL)
set_target_properties(timestwo PROPERTIES
  IMPORTED_LOCATION ${child_ROOT}/lib/${CMAKE_STATIC_LIBRARY_PREFIX}timestwo${CMAKE_STATIC_LIBRARY_SUFFIX}
  INTERFACE_INCLUDE_DIRECTORIES ${child_ROOT}/include)

add_executable(test_timestwo test_timestwo.f90)  # your program
add_dependencies(test_timestwo CHILD)  # externalproject won't download without this
target_link_libraries(test_timestwo PRIVATE timestwo)
add_dependencies()
make ExternalProject always update and build first
CONFIGURE_HANDLED_BY_BUILD ON
tells CMake not to reconfigure each build, unless the build system requests configure
BUILD_BYPRODUCTS
necessary for Ninja to not complain about missing targets. Note how we can’t use BINARY_DIR since it’s populated by ExternalProject_Get_Property()

The imported library ext is used in the “parent” project just like any other library.


“child” project CMakeLists.txt includes:

project(child Fortran)

add_library(timestwo STATIC timestwo.f90)
set_target_properties(timestwo PROPERTIES
  Fortran_MODULE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/include)

Configure “child” Fortran_MODULE_DIRECTORY so that it’s not necessary for “parent” to introspect “child” directory structure.

We have created live ExternalProject examples:

CMake can detect if a project is “top level” that is, NOT via FetchContent using PROJECT_IS_TOP_LEVEL.

cmake_minimum_required(VERSION 3.21)
project(child)

if(PROJECT_IS_TOP_LEVEL)
  message(STATUS "${PROJECT_NAME} directly building, not FetchContent")
endif()

Note that the PARENT_DIRECTORY and PROJECT_IS_TOP_LEVEL properties are NOT useful for detecting if the “child” is being used as an ExternalProject.


  • target_link_directories() is generally NOT preferred because library name collisions can occur, particularly with system libraries.

Reference: CMake staff comparison of multiple project with CMake

CMake version recommendations and install

CMake ≥ 3.17 is strongly recommended for general users for more robust and easy syntax. For project developers, we recommend CMake >= 3.19 as the new features make debugging CMake and setting up CI considerably easier.

Downloading the latest release of CMake is usually easy. For Linux and Mac, admin/sudo is NOT required.

There is an unoffical PyPi CMake package:

python -m pip install cmake

For platforms where CMake binaries aren’t easily available, use build_cmake.cmake.

Key features added

The priority of these features is subjective–we write from the scientific computing perspective.

CMake 3.21 adds more preset features, including making “generator” optional–the default CMake behavior will be used to determine generator. The cmake --install-prefix option can be used instead of the cumbersome cmake -DCMAKE_INSTALL_PREFIX=. PROJECT_IS_TOP_LEVEL and <PROJECT-NAME>_IS_TOP_LEVEL identify if a project is at the top of the project hierarchy. ctest --output-junit gives test output in standard tooling format.

CMake 3.20 adds support for Intel NextGen LLVM compiler and NVIDIA HPC compiler. ExternalProject_Add() learned CONFIGURE_HANDLED_BY_BUILD which avoids CMake commanding a reconfigure on each build. try_compiler(WORKING_DIRECTORY) was added. CMake presets in CMakePresets.json now covers configure, build and test, allowing many parameters to be declared with inheritance in JSON. CMake presets are a key feature for CI, as well as user configurations. ctest --test-dir build option avoids the need to manually cd build. cmake_path allows path manipulation and introspection without actually touching the filesystem.

CMake 3.19 added support for ISPC language. string(JSON GET|SET) parsing is very useful to avoid hard-coding parameters. FindPython/find_package accepts version ranges. Intel oneAPI works with CMake >= 3.19.6. Emits deprecation warning for cmake_minimum_required VERSION less than 2.8.12. CMakePresets.json enables configure parameter declarations in JSON.

CMake 3.18 adds CMake Profiler

cmake -B build --profiling-output=perf.json --profiling-format=google-trace

Adds REQUIRED parameter to find_program. Adds file(ARCHIVE_CREATE) and file(ARCHIVE_EXTRACT)

CMake 3.17 adds Ninja Multi-Config generator. cmake –debug-find shows what find_*() is doing. Eliminates Windows “sh.exe is on PATH” error. Recognizes that **Ninja 1.10 correctly works with Fortran**.

CMake 3.16 adds precompiled headers, unity builds, many advanced project features.

CMake 3.15 addes CMAKE_GENERATOR environment variable that works like global -G option. Enhances Python interpreter finding. Adds cmake --install command instead of “cmake –build build –target install”. Added Zstd compression.

CMake 3.14 is where we added check_fortran_source_runs(). FetchContent was enhanced with simpler syntax. The transitive link resolution was considerably enhanced in CMake 3.14. Projects just work in CMake >= 3.14 that fail at link-time with CMake < 3.14.


We don’t recommend use of the older CMake versions below as they take significantly more effort to support.

CMake 3.13 adds ctest --progress and better Matlab compiler support. Lots of new linking options are added, fixes to Fortran submodule bugs. The very convenient cmake -B build incantation, target_sources() with absolute path are also added. It’s significantly more difficult to use CMake older than 3.13 with medium to large projects.

CMake 3.12 adds transitive library specification (out of same directory) and full Fortran Submodule support. get_property(_test_names DIRECTORY . TESTS) retrieves test names in current directory.

CMake 3.11 allows specify targets initially w/o sources. FetchContent is added, allowing fast hierarchies of CMake and non-CMake projects.


The versions of CMake below have been deprecated as of CMake 3.19.

CMake 3.10 added Fortran Flang (LLVM) compiler and extensive MPI features.

CMake 3.9 added further C# and Cuda support, that was originally added in CMake 3.8.

CMake 3.8 added initial Cuda support

CMake 3.7 added comparing ≤ ≥ and version comparisons. Initial Fortran submodule support was added.

CMake 3.6 gave better OpenBLAS support.

CMake 3.5 enhanced FindBoost target with auto Boost prereqs.

CMake 3.4 added if(TEST) to see if a test name exists.

CMake 3.3 added list operations such as IN_LIST.

Diagnose CTest failures from logs

CTest automatically logs test outputs to:

${CMAKE_BINARY_DIR}/Testing/Temporary/LastTest.log

If you have a test failure and want to diagnose, first copy this file somewhere else to work with it, in case it gets overwritten. This file is usually quite useful with nice formatting even when running many tests in parallel.

A simple list of all “failed” and “not run” tests are in:

${CMAKE_BINARY_DIR}/Testing/Temporary/LastTestsFailed.log

“Not run” tests are those that have FIXTURES_REQUIRED that itself failed or did not run.

At the time of running CTest, one can also use the -O option like:

ctest -O test.log

“ctest -O” only logs what is printed to the screen during the CTest run. If the “ctest -V” option wasn’t used, the extra useful information as in LastTest.log such as the command line run will be missing in “test.log”.

CTest set environment variable

It’s often useful to set per-test environment variables in CMake’s CTest testing frontend. The environment variables appear and disappear with the start and end of each test, in isolation from any other tests that may be running in parallel. This is accomplished via the test property ENVIRONMENT.

Example: set environment variable FOO=1 for a test “bar” like:

set_tests_properties(bar PROPERTIES ENVIRONMENT "FOO=1")

multiple variables are set with a CMake list (semicolon delimited) like:

set_tests_properties(bar PROPERTIES ENVIRONMENT "FOO=1;BAZ=0")

Here comes an issue. In general, Windows needs DLLs to be on the current working directory or in environment variable PATH. Since Windows also delimits with a semicolon, we need to do a little extra work to append to PATH on Windows for CTest. We handle this by a script that appends to PATH for CTest on Windows:


In Python likewise set/unset environment variables within tests using PyTest monkeypatch fixture.

Rename and cleanup conda Python environment

Rename Python conda environment “old” to “new” by copying the environment and deleting the original environment:

conda create --name new --clone old

conda remove --name old --all

Each Miniconda/Anaconda environment consumes disk space. One may wish to delete old, unused conda environments to free disk space. Conda environment disk size can be checked by listing all environment paths

conda env list

This will show entries like:

py37 ~/miniconda3/envs/py37

Print the disk size of a conda environment:

  • Linux / MacOS: du -sh ~/miniconda3/envs/py37
  • Windows: dir miniconda3/envs/py37

Linux control groups tips

Linux control groups can limit any user’s CPU, memory or other resource usage. Control groups can be used to test program behavior under constrained resources. Control groups v2 are recommended in general with a new architecture and better performance. By default with RHEL/CentOS 8, we need to enable cgroups-v2.

Although setting up persistent control groups is straightforward, it’s possible to create a transient commend line initiated control group using systemd-run. This use can be good for diagnosing program behavior–for example, does a program’s memory use blow up then come down faster than “top” might show. An example use constraining a program to 2 GB of RAM is like:

systemd-run --scope -p MemoryMax=2G -p MemorySwapMax=0 ./my_program

The flag --user did not work–we needed to type the sudo password despite running as the standard user.

Another way to set hardware/firmware-based limits for more intensive benchmarks is to simply use a device with less RAM, edit BIOS/UEFI to only enable a limited amount of RAM, or on Linux use GRUB kernel mem= parameter to constrain the available RAM. Ensure the swap/paging file is turned off.

Clean delete untracked files from Git repo.

Sometimes files are accidentally spilled into a Git repo. Before the files are git add, they are “untracked”. If the files match a pattern in “.gitignore” they will not appear in Git operations generally. Untracked, non-ignored files show with:

git status --porcelain

like

?? oops.txt

where the question marks indicate the file is untracked.

These files may be interactively removed (deleted) by:

git clean -id

When there are files spilled in multiple directories, the “filter by pattern” options lets you select files to retain. The updated display shows files to be deleted. When satisfied, select “Clean”–there’s no recovering those files trivially, so be sure of your choices.

To clean files matching patterns in .gitignore, add the “-x” option like:

git clean -xid

That’s useful for cleaning up in source builds, perhaps from Makefile or LaTeX.

Fortran allocate large variable memory

Variables that are larger than a few kilobytes often should be put into heap memory instead of stack memory. In Fortran, compilers typically put variables with parameter property into stack memory. Our practice in Fortran is to put non-trivial arrays intended to be static/unchanged memory into an allocatable, protected array. Example:

module foo

implicit none (type, external)

integer, allocatable, protected :: x(:,:)

contains

subroutine init()
  allocate(x(1024,256))
  !! in real life, this would be some constant data array or
  !! expression filling the "constant" array x.
  x = 1
end subroutine init

end module


program bar

use foo, only : init, x

call init()

if (any(x /= 1)) error stop "did not init"

end program

In this example, x is approximately a one megabyte variable, assuming kind=int32. Even though the compiler may not warn if we instead declare this variable as parameter, it can cause segfaults and other seemingly random runtime errors.

Normally we would use a derived type instead of a bare module, but we did it here for simplicity.

Fortran allocate large variables

If the variable to be allocated is about one gigabyte or larger, sometimes special techniques are needed, even on systems with very large amounts of RAM including HPC. This is especially the case on Windows systems, where even the latest Windows 10 has particular limitations.

The error messages one may get upon allocating large variables in Fortran include:

Error allocating <N> bytes: Not enough space

Segmentation fault (core dumped)

For Windows, a peculiar limitation is that each variable (including allocatable) cannot exceed the virtual paging file size, even if the Windows computer has large amount of RAM that isn’t being exceeded. The paging file size may be inspected and set under: Control Panel | System and Security | System | Advanced system settings | Advanced | Performance | Settings | Advanced | Virtual memory

In general, the compiler may need to have the memory model flag set for the situation. This flag has a set of implications.

MacOS 11 excessive SSD write wear

As noted by Hector Martin and others, MacOS 11 appeared to have a possible kernel bug causing excessive SSD write wear whenever the SSD was in the “on” state. One can use “smartmontools” to check SSD write history:

brew install smartmontools

smartctl --all /dev/disk0

Note that SSD on state time can be much less than Mac powered-on time, particularly if the Mac is sitting idle. This is especially the case for the Mac Mini, which may sit powered on but unused for the majority of the time by some users.

Thankfully as noted by Jonas Ribe, Hector Martin and others, MacOS 11.4 appears to have fixed this SSD write bug:

Thankfully we haven’t see the 100+ TB of excess SSD wear pre-11.4 as Jonas did. We saw less than 5TB of excess wear on each of our mostly idle, continuously powered on Mac Minis.

Again, tentatively this problem is resolved by MacOS 11.4.