Scientific Computing

rsync to EXFAT drive

These rsync options are useful to sync with an ExFAT drive:

rsync -vrltD --progress --stats /source/a/ /dest/a
-vrltD
options from -a friendly with EXFAT.

The standard rsync options like:

rsync -a

don’t work with an EXFAT drive because EXFAT doesn’t understand permissions, owners, or groups.

rsync: mkstemp … failed: Function not implemented (38)


Per-file progress:

rsync -av --progress

Overall progress:

rsync -a --info=progress2

Markdown reference for Jekyll websites

Related: Create fast Jekyll GitHub Pages website


Having a good website template helps your website performance substantially (> 10%) by speeding up the site and making it more mobile-friendly. Mobile-friendly websites have been important (including for low-bandwidth users on any device) and feasible since the late 1990s with CSS. As Google has moved to a mobile-first web crawler, the important of being mobile-friendly is of critical importance for virtually all websites. Using appropriate layout elements in the page help your visitors and search engines understand the content better, boosting engagement rates and SERPP (search engine results page position).

The Wordpress Markdown syntax reference applies equally well to Jekyll and other website renderers that accept Markdown input. Almost 100% of website presenation can be done with Markdown.

Only use HTML directly to draw text arrows ↔, give detailed control of a few percent of images, and highlight text (e.g. red warning text).

Liquid generates almost all figures, table of contents, and image galleries.

HTML5 math character: Math characters directly in HTML5 Not all of these work in Jekyll Markdown, test them in the local preview first.

Speed of Matlab vs Python vs Julia vs IDL

The Benchmarks Game uses deep expert optimizations to exploit every advantage of each language. The benchmarks I’ve adapted from the Julia micro-benchmarks are done in the way a general scientist or engineer competent in the language, but not an advanced expert in the language would write them. Emphasis is on keeping the benchmarks written with priority on simplicity and length, where programmer time is far more important than CPU time. Jules Kouatchou runs benchmarks on massive clusters comparing Julia, Python, Fortran, etc. A prime purpose of these benchmarks is given ease of programming for a canonical task (say Mandelbrot), which languages have distinct runtime performance benefits.

Julia’s growing advantage is the performance of compiled languages with the relative ease of a scripted language. The majority of analysts scripting in engineering and science are working in Python, with Matlab in second place. The stable Julia 1.0 release finally brings the promise of API stability that was an adoption blocker in earlier Julia releases. Julia allows abstract expression of formulas, ideas, and arrays in ways not feasible in other major analysis applications. This allows advanced analysts unique, performant capabilities with Julia. Since Julia is readily called from Python, Julia work can be exploited from more popular packages.

Python often is “close enough” in performance to compiled languages like Fortran and C, by virtue of numeric libraries Numpy, Numba and the like. For particular tasks, Tensorflow, OpenCV, and directly loading Fortran libraries with f2py or ctypes minimizes Python’s performance penalty. This was not the case when Julia was conceived in 2009 and first released in 2012. Thanks to Anaconda, Intel MKL and PyCUDA, momentum and performance are solidly behind Python for scientific and engineering computing for the next several years at least.

Cython has Python-like syntax that is compiled to .c code that is much larger than the original Python code and isn’t very readable. However, substantial speed increases can result. Don’t convert the entire program to Cython! Just the slow functions.

PyPy does sophisticated analysis of Python code and can also offer massive speedups, without changes to existing code.


We have created a multi-code language benchmark suite. Fortran is comparable to Python with MKL, Matlab, Julia. With single-precision float, Python Cuda can be 1000+ times faster than Python, Matlab, Julia, and Fortran. However, the usual “price” of GPUs is the slow I/O. If large arrays need to be moved constantly on and off the GPU, special strategies may be necessary to get a speed advantage. For iterative algorithms, it’s worthwhile to use Numba or Cython with Python, to get Fortran-like speeds from Python, comparable with Matlab at the given test.

L3 Harris Geospatial IDL is used mostly by astronomers. IDL can be replaced by GDL, the free open-source IDL-compatible program. A better choice would be to move from IDL/GDL to Python or Julia in many cases.

Pi Machin benchmark


Related: Anaconda Accelerate: GPU from Python / Numba

Open source video / audio codecs

Open source codecs are important to everyone as license fees erode the ability to provide free content without some form of subscription or sponsorship, which can lead to filtering of content. Here are a couple updates for the end of 2018.

In the early 2000’s, Speex by Xiph.org (now making Opus) was great for dialup modem streaming of voice content. The Opus codec effectively replaces Speex and the latest Opus release further enhances performance.

For FFmpeg compiled with libopus support, encoding Opus audio with FFmpeg is like:

ffmpeg -i voice.wav -ac 1 -cutoff 4000 -b:a 6000 -ar 8000 -vbr off -frame_duration 60 -application voip voice.opus

YouTube supports AV1.

Use Julia from Jupyter notebook

IJulia allows running Julia from within the web browser-based Jupyter IDE. In general (for all operating systems) it’s recommended to install and update Julia via the downloads from Julia website.

Install Jupyter: locate Jupyter binary location

  • Windows: where jupyter
  • Linux, macOS: which jupyter

If Jupyter is not installed, just do

conda install jupyter

Install IJulia: start Julia with this environment variable (only necessary once)

  • Linux / macOS:

    JUPYTER=$(which jupyter) julia
  • Windows:

    $Env:JUPYTER=$(where.exe jupyter)
    
    julia

Finally, type at Julia prompt:

using Pkg

Pkg.add("IJulia")

This installs numerous packages via conda automatically.

Existing .jl files are NOT runnable from IJulia. Start the Notebook by EITHER:

  • Terminal/Command Prompt: jupyter lab
  • Julia: using IJulia; notebook()

Create an interactive Julia .ipynb Julia notebook by clicking New → Julia.

Headless Raspberry Pi setup w/o Ethernet

Some Raspberry Pi models such as the Zero and Zero W do not have an Ethernet port on the board. While one can use a USB-Ethernet adapter in the USB OTG port, if one wants to use only the Pi itself without adapters, the procedure below is required.

Install Raspberry Pi operating system on micro SD card.

On the SD card, edit /boot/config.txt, adding the line:

dtoverlay=dwc2

Now pick one of the following connection methods.

Ethernet over USB

Edit boot/cmdline.txt, adding after rootwait on the same line with a space:

rootwait modules-load=dwc2,g_ether

Boot the Pi with the micro SD card inserted, waiting 90 seconds or so. Then type from laptop (username, hostname are those picked for the Pi by the Raspberry Pi Imager program):

ssh username@hostname.local

If this doesn’t work, ensure that you see the new Ethernet port on your laptop. On Linux this would be seen in

ip a

reference

serial over USB

This method uses very basic USB drivers that should be on any laptop operating system.

Edit boot/cmdline.txt, adding after rootwait on the same line with a space:

rootwait modules-load=dwc2,g_serial

Boot the Pi with the micro SD card (using an HDMI monitor or SSH), and type in the Pi:

systemctl enable getty@ttyGS0.service

this is a one-time command that will be “remembered”

Reboot the Pi and connect from your laptop with a serial client like PuTTY at 115200 baud. You can find the port the device is on in Linux from

dmesg

ls /dev/tty*

before and after plugging in the Pi.


reference

Related: Headless Raspberry Pi setup with Ethernet

Convert README.rst to README.md

Since PyPI accepts Markdown formatted README.md, there is less reason to use the more complicated syntax of ReStructured Text .rst files. We have converted hundreds of README.rst to README.md with the process below.

Bulk convert RST .rst to Markdown .md:

pandoc -f rst -t markdown README.rst -o README.md

Slight hand correction: remove most \ that were inserted (find and replace with nothing)

Replace in Git repo:

git rm README.rst
git add README.md

For Python repos, be sure that pyproject.toml is configured for README.md

Prepare Git repo for public release

Eliminating unnecessary (particularly large) files and removing needless historical development details are two significant parts of preparing a Git repo for public release. The general public users, even if of a limited group don’t need large amounts of code development history, probably littered with large files. Here are several straightforward steps to prepare code for public Git release.

1. Create an empty Git repo

GitHub is an obvious first choice, as GitHub has by far the largest number of users and excellent integration with third party tools. Bitbucket and GitLab are two worthy alternatives.

Create an empty Git repo at the website, then clone the empty repo you created to the computer.

git clone https://github.invalid/username/myrepo

Copy the files you want into the myrepo directory–we’ll clean up extra files next

2. Remove unneeded files

These commands are executed in the new myrepo directory you cloned, NOT your old directory. These commands assume a Unix-like shell.

Find the biggest directories in myrepo directory:

du -h | sort -h

To inspect biggest files within ~/mydir:

ls -h mydir | sort -h

Find binary files (non-text, non-code) recursively:

find . -type f | perl -lne 'print if -B'

Find and eliminate .DS_Store files (from macOS):

find . -name .DS_Store

and then add .DS_Store to .gitignore

Sometimes it’s handy to remove or list all files EXCEPT those matching a pattern (inverse globbing):

shopt -s extglob

ls !(*inverse_pattern*)

Keep unwanted files out of the Git repo in the future by adding filename, directory names, and globbing patterns to .gitignore

3. Share and collaborate

When you’re confident things are ready, do

git add .

git commit -am "initial public release"

git push

and your files are on the Web for all.

Users will use GitHub Issues and Pull Requests to request and suggest code changes.

Instead of adding Collaborators, start by having people who want to make changes Fork and then Pull Request.

4. Ensure quality

Continuous Integration is vital to maintaining and improving code quality. GitHub Actions CI is a popular choice.

Google Earth on Linux

Linux Google Earth Pro is available at no charge. Download Google Earth 64-bit .deb or direct link and install:

gdebi google-earth-pro-stable*amd64.deb

Open from Google Earth menu icon, or from Terminal:

google-earth-pro

which is a softlink under /usr/bin/ to /opt/google/earth/pro/googleearth

This also installs a software source in /etc/apt/sources.list.d/google-earth-pro.list for automatic updates.

As noted in Google Earth release notes, the key improvements include:

  • all Google Earth installs are now “Pro” for free
  • high-resolution (Hi DPI) support
  • fixed broken networking system (blank links) etc.
  • works with new Google Photos layer (that replaced Panoramio)

Allow NaN in Matplotlib pcolormesh x,y coords

Matplotlib pcolormesh() is 10-100x or more faster than pcolor(), especially when using cartopy. However, the mesh generation requires valid edge coordinates–NaN is not allowed. A workaround for certain scenarios like geospatial plots is to “smear” the last valid x,y (say, latitude, longitude) out to replace the NaN’s. Like pcolor(), this method hides the invalid values. There may be slight aberrations at the edges.

Example code for pcolormesh with NaN coordinates: pcolormesh_NaN.py