Scientific Computing

Python plot HTML browser

HTML plotting is important for communicating key data to colleagues, the general public and policymakers. HTML plots allow easy sharing of interactive data plots to any web browser. There are numerous HTML plotting methods available from Python. These methods are completely open source and work in any web browser. The HTML file can be shared by several means:

  • email attachment
  • embedded in a webpage as an HTML5 iframe
  • Ipython notebook
  • hosted plotting service (Plotly, figshare, et al)

Matplotlib HTMLWriter can plot animated sequences. More powerful HTML plotting capability requires mpld3. Virtually any Matplotlib plot can be converted to HTML for display in any web browser. mpld3 uses HTML and D3.js to animate the plots.

from matplotlib.pyplot import figure
import mpld3

fig = figure()
ax = fig.gca()
ax.plot([1,2,3,4])

mpld3.show(fig)

mpld3.show() converts any Matplotlib plot to HTML and opens the figure in the web browser. mpld3.save_html(fig,'myfig.html') method saves figures to HTML files for archiving or uploading as an iframe to a website, etc. D3.js enabled interactive elements are available from the mpld3 API. Note the template_type='simple' keyword for .save_html(), which can increase robustness across web browsers.


The open source Plotly library can plot completely offline, without Plotly servers. Plotly can make offline Python plots that work in any web browser. Plotly offline plots don’t need IPython/Jupyter, although you can certainly use them as well.

Offline plotting is important as it doesn’t rely on external proprietary services that could prevent future users from making plots. The Plotly API is available for Python, R, Matlab, JavaScript, Scala and a growing number of other programming languages. Plotly examples show the simple syntax. When using plain Python, be sure to use plotly.offline.plot() as plotly.offline.iplot (iplot) will silently fail to do anything!

Trendnet TEG-s80g vs. TEG-s82g

The Trendnet TEG-S80g version 3.0 - 4.1 lack LED speed indicators, with only active (plugin detection) indicators for each port.

Model/Rev LED speed indicator metal jacks buffer memory (kB) MAC address table (entries)
TEG-S80g Rev 1.0 yes yes 128 4K
TEG-S80g Rev 2.1 yes yes 128 8K
TEG-S80g Rev 3.0 no no 256 8K
TEG-S80g Rev 4.1 no no 192 4K
TEG-S82g Rev 2.0 no no 256 8K

The TRENDnet TEG-S80g model has been reliable for me across a decade of use in small instrument networks with challenging physical environments.

Trendnet internals

MyPy PEP 585, 604 support

MyPy supports PEP 585 and 604, bringing concise Python 3.10 type annotation syntax to earlier Python versions. The new type annotation syntax works all supported Python versions, if each file using them has at the top:

from __future__ import annotations

Separately, Numpy 1.20 made the long-awaited Numpy type hinting a reality.

MyPy type check quick start

The benefits of Python static type checking and examples have been discussed at length and widely adopted and funded by major tech companies, especially Dropbox. Python static type checking enhances code quality now and in the future by defining (constraining) variables and functions (methods).

Type enforcement can be done with assert. Type hinting is more concise, flexible and readable than assert, with significantly less performance impact. Type hinting is being continually enhanced in CPython, numerous IDEs and type annotation checkers. With type hinting, the hint is right at the variable name (e.g. in the function declaration), while assert must occur in the code body.

MyPy is installed and upgraded by:

pip install -U mypy

MyPy static type checker considers the following to be interchangeable (valid) due to duck typing:

  • intfloat
  • floatcomplex

Note that str is not equivalent to bytes.

Usage

Add to pyproject.toml:

[tool.mypy]
files = ["src"]

assuming Python package files are under “src/” Then issue command:

python -m mypy

Note this command checks the package and not the top-level scripts, which must be manually specified. Configure pyproject.toml to eliminate nuisance errors or otherwise configure mypy.

It takes a little practice to understand the messages. Where multiple types are accepted, for example, str and pathlib.Path use typing.Union. See the examples below.

Examples

Many times a function argument can handle more than one type. This is handled as follows:

from __future__ import annotations
from pathlib import Path


def reader(fn: Path | str) -> str:
    fn = Path(fn).expanduser()

    txt = fn.read_text()

    return txt

Another case is where lists or tuples are used, the types within can be checked (optionally):

from __future__ import annotations


def reader(fn: Path | str) -> tuple[float, float]:
    fn = Path(fn).expanduser()

    txt: list[str] = fn.read_text().split(',')

    latlon = (float(txt[0]), float(txt[1]))

    return latlon

Or perhaps dictionaries, where optionally types within can be checked:

from __future__ import annotations


def reader(fn: Path | str) -> dict[str, float]:
    fn = Path(fn).expanduser()

    txt: list[str] = fn.read_text().split(',')

    params = {'lat': float(txt[0]),
              'lon': float(txt[1])}

    return params

If many value types are in the dictionary, or possibly some types are not yet supported for type hinting, simply use typing.Any e.g.

dict[str, typing.Any]

The default where no type is declared is typing.Any, which basically means “don’t check this variable at this location in the code”.


As in C++, Python can type hint that a function must not return.

def hello() -> typing.NoReturn:
    print("hello")
error: Implicit return in function which does not return

This is used for functions that always raise an error or always exit, and in general to help ensure control flow is not returned.

Unison file synchronizer

Unison file synchronizer works like a mashup of Dropbox, rsync and Git, although internally working via its own means. Unison attempts to remotely sync changes in files between a local and (typically) remote location. We have used Unison to manage systems on isolated remote networks. That is, where the remote network is isolated from directly accessing the Internet and so Dropbox etc. aren’t available.

Assuming one trusts Unison, this could be useful for cases where it is desirable to keep files out of the “cloud”. Unison file synchronizer has executable downloads for popular operating systems.

Red Hat Linux Python install

Red Hat is a common Linux distro used for HPC and other high reliability computing environments. Miniconda Python works well on Red Hat and many other Linux distros as well as macOS and Windows. Thanks to .whl binary wheel packaging, many packages such as Numpy can be quickly installed via pip without compilation whether using system Python or Miniconda.

Miniconda Python install does not require “sudo”. Using EPEL or IUS, requires “sudo”. One may also load Python 3 with modules, commonly used on HPC.

Improve blog post indexation

Google’s Search Console is useful for uncovering lagging performance in web pages, including for blogs. I tend to write terse posts that address very specific issues. Often, these pages perform well, but I saw a few percent of pages being marked in Search Console as status “crawled - currently not indexed”. The underlying theme on these pages was they had too little ordinary paragraph text. If there are too many lists, headers, or preformatted text relative to plain paragraphs, this “not indexed” status is likely to be applied.

A few of these type of pages also suffered from “soft 404” status. I found these were very short pages that contained text with “error” or “missing”. I reworded those articles to avoid those terms. I made sure the titles didn’t include those terms. I also ensured there were not too many header tags relative to the text–perhaps one header at most per “page” of text.

The fix to these issues is generally to include more meaningful text–be sure an article is at least one or two full paragraphs. Add context that would help a more novice user understand why you applied that solution or approach. Avoid sensational or colloquial text as the search engines are smart enough to recognize this as low quality writing. As always, maintaining good spelling and adequate grammar help the search engine better appraise the quality of your content. Also consider short (less than 50 character) but meaningful page titles.

For long-lived blogs such as this one, there is inevitably content that is no longer relevant to anyone except for historical purposes. You may not want to simply delete these posts that you took time to research and share, but realize these old posts are costing your current performance by wasting crawler budget. The approach I take is to mark these old pages with “noindex” metadata. That allows reminiscing about old technology such as Blackberry OS 10 without degrading the performance of currently relevant content. I think of it as a soft deprecation of the content.

Avoid array copies in Matlab and Python

More convenient array broadcasting was added to Matlab years ago, removing the need for bsxfun. Python Numpy has even more advanced array indexing and broadcasting features to conserve memory, speeding computations.

When translating between Matlab and Python avoid simply replacing Matlab repmat with numpy.tile that copies data in memory. It may be possible to use numpy.newaxis or numpy.broadcast_arrays for O(1) speed and memory saving.

Disable homebrew cleanup on macOS CI

Homebrew’s brew cleanup saves disk space by deleting old versions of packages. From time to time, CI macOS images get out of date with Homebrew, and auto-cleanup is triggered unintentionally upon brew install during a CI run. There is no benefit to this on CI.

Disable Homebrew auto cleanup by setting environment variable HOMEBREW_NO_INSTALL_CLEANUP=1. On GitHub Actions, this is accomplished near the top of the particular .github/workflows/*.yml file:

env:
   HOMEBREW_NO_INSTALL_CLEANUP: 1

Windows GCC availability

GCC (including G++ and Gfortran) on Windows is available via several means. We generally prefer MSYS2 for using GCC and numerous libraries with pacman package manager.

WSL
recommended, gives Linux system
MSYS2
recommended, well-maintained
Cygwin
usually WSL is preferred instead for better performance
TDM
usually MSYS2 is preferred instead