Scientific Computing

Fix Gfortran stack to static warning

June 17, 2020

GCC / Gfortran 10 brought new warnings for arrays too big for the current stack settings, that may cause unexpected behavior. The warnings are triggered like:

real :: big2(1000,1000)

Warning: Array ‘big2’ at (1) is larger than limit set by ‘-fmax-stack-var-size=’, moved from stack to static storage. This makes the procedure unsafe when called recursively, or concurrently from multiple threads. Consider using ‘-frecursive’, or increase the ‘-fmax-stack-var-size=’ limit, or change the code to use an ALLOCATABLE array. [-Wsurprising]

This is generally a true warning when one has assigned arrays as above too large for the stack. Simply making the procedure recursive may lead to segfaults.

Correct the example above like:

real, allocatable :: big2(:,:)

allocate(big2(1000,1000))

For multiple arrays of the same shape do like:

integer :: M=1000,N=2000,P=500

real, allocatable, dimension(:,:,:) :: w,x, y, z

allocate(w(M,N,P))
allocate(x,y,z, mold=x)

As with the Intel oneAPI heap-arrays command-line options, there could be a penalty in speed by having large arrays drive off the stack into heap memory.

Install latest GFortran on Linux

June 16, 2020

Newer version of compilers generally have more useful and detailed warning messages. As with any compiler, newer versions of Gfortran may require rebuilding other libraries linked with the Fortran compiler if the ABI presented by libgfortran changes. On Linux, one can switch Gfortran versions with update-alternatives. If experiencing errors getting any version of gfortran installed in Ubuntu, try:

add-apt-repository universe

The latest GCC / Gfortran for Ubuntu is available from the Ubuntu-test PPA. Add Ubuntu-test PPA by:

add-apt-repository ppa:ubuntu-toolchain-r/test

apt update

Install the most recent Gfortran listed at the PPA. Switch between compiler versions with update-alternatives.

Windows: Install latest Gfortran
macOS: get latest gfortran by brew install gcc

Setup Astrometry.net and usage tips

June 14, 2020

Astrometry.net is easy to use on Linux, macOS, and Windows. Windows uses Windows Subsytem for Linux for Astrometry.net. To get star index files, use downloadIndex.py.

Download/install the pre-compiled binary code:

Linux / Windows Subsystem for Linux: apt install astrometry.net
macOS Homebrew

The major steps in achieving a useful WCS starfield polynomial fit are:

source extraction (identifying star position in image frame)
quad asterism hashing, including tolerance for star position wiggling due to noise.
Match hash to star catalog, at least 3 choices are available:
- USNO-B
- Tycho-2
- 2MASS (infrared)
Bayesian decision process, find extremely high probability solution or reject.

Astrometry.new tips and techniques

Optional compile: normally not necessary.

Prereqs:

apt install libcairo2-dev libnetpbm10-dev netpbm libpng12-dev libjpeg-dev zlib1g-dev swig libcfitsio-dev

curl -O https://astrometry.net/downloads/astrometry.net-latest.tar.gz

tar xf astrometry.net-*.gz

cd astrometry.net-*

make
make py
make extra

make install INSTALL_DIR=~/astrometry.net

Add to ~/.profile

export PATH="$PATH:$HOME/astrometry.net/bin"

do not use ~ to avoid error:

cannot find executable astrometry-engine

Uncomment inparallel in ~/astrometry.net/etc/astrometry.cfg (or /etc/astrometry.cfg)

Copy the star index files with downloadIndex.py

python downloadIndex.py

If it can’t find the index file, be sure ~/astrometry.net/etc/astrometry.cfg contains:

add_path /home/username/astrometry/data

~ or $HOME will NOT work!

Reference paper

Program giving azimuth/elevation for each pixel of sky image

Alternative: online astrometry.net image scaling

Xvfb makes fake X11 for CI

June 12, 2020

Continuous integration for program that plot or need a display can be tricky, since in many cases the CI doesn’t have an X11 display server. Workarounds include generating plots using X server virtual framebuffer (Xvfb) dummy X11 display server. This maintains code coverage and may allow dumping plots to disk for further checks

GitHub Actions: “.github/workflows/ci.yml”: assuming the project uses PyTest, the xvfb-action enables Xvfb for that command:

- name: Run headless test
  uses: GabrielBB/xvfb-action
  with:
    run: pytest

Related: Detect CI via environment variable](/ci-detect-environment-variable)

Use conda install in GitHub Actions

June 11, 2020

GitHub Actions can use Conda to more easily install certain complicated Python packages via the setup-conda action.

For example to install Mayavi:

- uses: s-weigand/setup-conda

- run: conda install mayavi

Save WSJT-X raw audio for data analysis

June 10, 2020

Upload raw .wav WSJT-X data to the HamSci Zenodo data archive to help future data analysis. The location of the WSJT-X raw data is found by the WSJT-X menu: File → Open Log Directory. The raw data save location is typically:

Windows: $Env:LocalAppData/WSJT-X/save
Linux: ~/.local/share/WSJT-X/save
macOS: ~/Library/Application Support/WSJT-X/save

To save the raw data, from the WSJT-X menu: Save → Save All. One .wav file is saved per two minute cycle. This setting is persistent.

Archive raw WSPR data for easier upload to HamSci Zenodo archive:

Upload raw data to Zenodo by creating a Zenodo account to upload WSPR data to Zenodo. Upon clicking “Publish” the data is assigned a DOI and is citable.

Tips:

Avoid using a virtual machine for WSJT-X due to issues with broken/choppy audio.
WSJT-X collects about 1.7 GByte/day depending on how often you transmit (no recording occurs when you transmit).
raw audio data file size is: 12000 samples/sec * 16 bits/sample / 8 bits/byte * 86400 sec/day * 0.8 RX duty cycle = 1.7 GByte / day. That’s 2.88 Mbytes per 2 minute WSPR RX cycle.
Since this is 6 kHz of spectrum, you can widen your receiver filters (particularly if using an SDR or other advanced receiver) to also pass JT65, FT8, or other useful transmitters for even more potent results that fall within the 12 kS/s sampling bandwidth.

The raw data .wav files are uncompressed PCM audio. “tar” is used to make one archive file instead of thousands of sound files per day. The files are full of noise, which by definition is poorly compressible.

Git pull don't merge or rebase by default

June 9, 2020

Git 2.27 has default git pull behavior that we feel is beneficial. The Git 2.27 default is to not merge or rebase by default, unless the user specifies a default behavior. Specify “safe” default behavior for git pull so that linear Git history is maintained unless manually specifying git pull options. Git services such as GitHub allow enforcing linear history.

git config --global pull.ff only

If encountering a Git remote that cannot be fast-forwarded, the user can then either git rebase or git merge.

Reference: Git: rebase vs. merge

CMake CTest cost data

June 8, 2020

CMake’s CTest assigns a dynamic COST to each test that updates each time the test is run. Kitware considers the cost test data to be undocumented behavior, so it’s not part of the CMake COST docs.

The computed test cost data is stored under ${CMAKE_BINARY_DIR}/Testing/Temporary/CTestCostData.txt This file stores data for each test in a row:

  TestName NumberOfTestRuns Cost

Compare HDF5 data values

June 7, 2020

The h5diff tool has limitations for comparing HDF5 data files because it currently can compare only absolute tolerance or relative tolerance. The comparison is mutually exclusive, which fails for many floating point data. A more suitable comparison for floating point data is similar to Numpy:

is_close = abs(actual-desired) <= max(rtol * max(abs(actual), abs(desired)), atol)

rtol: relative tolerance, perhaps 1e-5
atol: absolute tolerance, perhaps 1e-8 but not zero

We use h5py to read / write HDF5 files from Python.

Comparing floating point data for Python in CI can be done by pytest.approx.

Detect project primary code languages

June 5, 2020

GitHub, GitLab and similar repository services deal with hundreds of coding languages. Accurate detection of coding languages in a project is useful for discovery of repositories that are of interest to users and for security scanning, among other purposes. Scientific computing developers are generally interested in a narrow subset of programming languages. HPC developers are generally interested in an even narrower subset of programming languages. We recognize the “long tail” of advanced research using specialized languages or even their own language. However, most contemporary HPC and scientific computing work revolves around a handful of programming languages.

To rapidly detect coding languages at each “git push”, GitHub developed the open-source Ruby-based Linguist. GitLab also uses Linguist. We developed a Python interface to Linguist that requires the end user to install Ruby and Linguist. However, Linguist is not readily usable from native Windows (including MSYS2) because some of Linguist’s dependencies have Unix-specific code, despite being written in Ruby. The same issues can happen in general in Python if the developers aren’t using multi-OS CI. GitHub recognized the accuracy shortcomings of Linguist (cited as 84% on average) and developed the 99% accurate closed-source OctoLingua OctoLingua deals with the 50 most popular code languages on GitHub. Little has been heard since July 2019 about OctoLingua.

We provide initial implementation of a tool code-sleuth that actively introspects projects, using a variety of heuristics and direct action. A key design factor of code-sleuth is to introspect languages using specific techniques such as invoking CMake or Meson to introspect the project developers intended languages. The goal is not to detect every language in a project, but instead to detect the primary languages of a project. Also, we desire to resolve the language standards required, including: Python, C++, C, Fortran. This detection will allow a user to know what compiler or environment is needed in automated fashion.