Scientific Computing

Convert Cinepak videos with FFmpeg for ImageJ

ImageJ cannot read Cinepak codec video files. Convert from Cinepak to popular video formats using FFmpeg.

Motion JPEG is widely-compatible with video players including ImageJ.

ffmpeg -i old.avi -c:v mjpeg -q:v 1 out.avi

Uncompressed AVI output file size could be a factor of 10 larger than the Cinepak version. By definition, every video player should be able to play uncompressed AVI–including ImageJ.

ffmpeg -i old.avi -c:v rawvideo out.avi

Lossless FFV1 preserves the original video quality with lossless compression. Many video players can handle FFV1 AVI video.

ffmpeg -i old.avi -c:v ffv1 out.avi

The advantage of using a PNG image stack comes in frame-by-frame analysis of the video.

Consider converting video to HDF5 using dmcutils/avi2hdf5.py for analysis purposes.

NSF Dear Colleague letter on Sondrestrom

The 26 DEC 2017 NSF “Dear Colleague” letter notes an effective shutdown of Sondrestrom ISR on 31 MAR 2018. This is generally following Recommendations 7.2, 7.3, and 9.11 of the 2015 NSF Geospace Section Portfolio Review, with final report issued 14 APR 2016.

We have been running instruments remotely at Sondrestrom since 2012. As soon as the NSF 2016 report was issued, I started to hear murmurs about the publication count from Sondrestrom. I homed in on Recommendation 7.36:

NSF GS should develop a common set of annual metrics from each facility which can be collected year-on-year to provide an underpinning of the next Senior Review. These metrics could include

  • science outputs both from facility staff and external users
  • annual expenditure (capital and resource)
  • data downloads and usage
  • key technical developments (hardware and software).

While noting that NSF GS may not have the informatics systems necessary at present, it seems likely that other NSF directorates or other funding agencies such as NIH may have developed such metrics and they should be employed.

Rationale

One of the key issues acknowledged by many at CEDAR 2017 workshop “save Sondrestrom ISR” meeting was the relatively low publication count involving Sondrestrom. NSF questioned whether instruments had real-time streaming capability, for support of space weather nowcasting. The internet bandwidth to all of Greenland is limited, and the satellite link to Sondrestrom costs $45/GByte. The throughput of 50 kB/sec led to a number of difficulties and workarounds. Data is mostly transported by mailing/carrying USB hard drives in and out of Sondrestrom.

Given flat (effectively declining) budgets, any program manager looks at the low-hanging fruit to cut. Sondrestrom ISR is unique globally in being the only ISR to run at such a short wavelength (23cm, 1.29 GHz). The Sondrestrom vacuum-tube (Klystron) based transmitter technology presents longevity concerns. The mechanically steered dish limits the spatiotemporal resolution considerably vs. electronically steered ISRs. The ISR is powered by a 600 kW generator, and the station power is provided by two 180 kW generators that cycle periodically to even out wear.

NSF also planned to cut Arecibo’s budget by ~ 75%, so running large, expensive facilities is a pressure NSF Geospace directorate seems concerned with pushing down. I think we have to acknowledge, something had to change. However, as the assessment below states, it’s not clear that planned economies by accessing EISCAT-3D will come to fruition in the next few years.

Review of the Portfolio Review

An assessment of the Geospace Portfolio Review was conducted in 2016 by National Academy of Science. Their assessment report DOI: 10.17226/24666 made a few critical points. (Note, you can download the 1.1 MB PDF for free as “guest”). Per the report, the Portfolio Review used per-facility metrics like:

  • Hours of operation per annum
  • Publications for at least 5 years
  • Number of site users (instruments placed) and data users
  • Current state of maintenance
  • Future science and technology plans
  • sources of funding
  • International agreements
  • Present and future plans in support of the survey

The plan to shift some recovered funds to EISCAT-3D was questioned in Section 5.2.2, as EISCAT-3D is not yet fully funded (isn’t built).

A key takeaway from the two reports is you can’t understand what isn’t measured. Opaque budgets and arbitrary metrics are not a great starting point for any effort. European funding agencies under Horizon 2020 have a mandate for open data, open publication. They accept metadata from repositories like Zenodo to close the loop. While not trivial, the problems have been partially solved in other funding agencies.

Simple AstroPy Python FITS image stack examples

Assume an image stack in file myimg.fits. FITS files do not memory map except in special cases. Usually FITS files are under 2 GB, making it feasible to work with the whole image stack on a modern PC. That is, load the whole image stack and then index the 3-D array in RAM.

from astropy.io import fits

fn = 'myimg.fits'

with fits.open(fn, mode='readonly') as h:
    img = h[0].data

    lat = h[0].header['GLAT']
    lon = h[0].header['GLON']

The header contained location metadata that we assigned to lat and lon.

Newer image formats HDF5 and NetCDF4 can have effectively unlimited file sizes, and easily store arbitrary organizations of variables, data, and metadata.


Related: read FITS image stack in Matlab

Identifying file type without extension

I received an email attachment, with no filename extension because of the spam filters in corporate email. I “knew” it was a legitimate file because I had just requested it from a notable researcher. Rather than bother the sender to tell me the original file extension, I determined the file type by:

file emailedfilename

which gave output

gzip compressed data, from Unix

Thus I changed the filename to be emailedfilename.tar.gz since a gzip’d file almost always contains a tar archive, and I could extract the files.

Note that tar is smart enough even with the wrong file extension to work, so I could have instead just done

tar -xf emailedfilename

University / School ham radio club license

This discussion pertains to United States of America Federal Communications Commission Part 97 regulations on Amateur Radio.


FCC 47 CFR § 97.5(b)2 lays out the structure for a club station license. In practical terms, this gives amateur radio clubs a memorable callsign of note they can rally around, to build branding, etc. The club station license does not confer operating privileges, but the trustee must hold an amateur radio license.

The trustee will receive FCC official mail, and ARRL LoTW (Logbook of the World) will only sign up clubs through the trustee. Thus clubs should ensure the trustee is someone who actually checks their physical mail, that is on campus at least several times a month on average. The trustee does not have to be be a school employee. However, some club constitution or bylaws require the trustee to be a school employee.

Control Operator vs. Club Trustee: §97.103(b) notes that by default, the station licensee (here, the school club trustee) is the control operator. Even if another person is the control operator, §97.103(a) holds the trustee and control operator equally responsible. Intuitively, §97.105 states the control operator is responsible for “immediate proper operation of the station, regardless of the type of control.”

Student operating privileges: provided the control operator is in control, anyone operating (including non-licensed persons) may use the control operator’s license privileges. Ideally, the club control operator will have an Extra class ham radio license so that operators get to use maximum privileges. §97.115(b) notes that for third-party communications (someone besides control operator working the radio), the “control operator is present at the control point and is continuously monitoring and supervising the third party’s participation”. Many clubs take §97.115(b) to mean the control operator is physically on site, indeed in the radio room itself.

Reference: FCC § 97.5

AGU FM2017 Python lunch notes

At the meeting, it was mentioned that

  • Juno Waves instrument uses python from day one
  • other researchers also using Fortran from Python
  • PyAstro is a useful conference
  • Major packages like Numpy typically lack funding. Numpy got its first funding in 2017!
  • see paper: The AstroPy Problem on funding geoscience software development
  • should make open source software part of CEDAR Decadal Survey
  • Autoplot: one line command to plot many science formats including: CDF, HDF5, NetCDF and many more.

NetCDF4 vs. HDF5 for large datasets

NetCDF4 uses a subset of HDF5 features, and adds some new features. NetCDF4 reads/writes specially structured HDF5 files. Performance of HDF5 and NetCDF4 is highly similar including on supercomputers. The main idea behind NetCDF4 is a simpler API than HDF5, while maintaining the same performance.

Python h5py makes HDF5 read/write very easy. NetCDF4 is a little more complicated to use from Python.

HDF5 from low-level languages such as C, C++ and Fortran is a little elaborate as compared to NetCDF4 ease of use. Using HDF or NetCDF4 from Python is easier as these examples collapse down to a couple lines of code.

Google Solar Eclipse MegaMovie

Eclipse MegaMovie is a Google provide to register thousands of images taken during the solar eclipse, to provide spatiotemporal diversity not available any other practical way.

It’s pretty evident this is a laudable first pass. It looks like a lot of photos have over-exposure and dynamic range issues. This could perhaps be addressed by “gap data” techniques. Currently the nice filters are over $1000, but perhaps a compromise filter can be made much more cheaply.

  • AGU FM17 poster “Eclipse Megamovie 2017: How did we do?”
  • AGU FM17 talk “Eclipse Megamovie 2017: A Citizen Science Project”

DavitPy lunch discussion, Dec. 2017

We heard from seasoned developers as well as end-users who had the simplest use case: using DavitPy to load/read SuperDARN data. Topics aired included:

  • DavitPy hosting: issue where someone leaves, data inaccessible for weeks. Don’t want that to happen with code. Data should be in archive like Zenodo.
    • consider countries with filtered internet. Code mirrored across multiple sites should help in general.
  • Breaking up DavitPy into a lean core that strictly handles SuperDARN data and makes basic plots. All else should be in a DavitPy-extras module.
  • Some users just want to load the SuperDARN data without using IDL. I noted that a lean DavitPy is amenable for loading transparently into Matlab. That is, Matlab can use Python user code, so that it’s transparent to Matlab users how to load the SuperDARN data into Matlab.

Previously, I had made an overly large pull request that incorporated fixes for the issues below. I will make separate, small pull requests for these issues.

A general issue for any Python program is that pyproject.toml should be used to configure the package as much as possible. A working example pyproject.toml using these features is illustrative.

Avoid Bash scripts as these don’t work on Windows unless using Windows Subsystem for Linux.

Example of rich Python 3 exception handling with backward compatibility:

import six

if six.PY2:
    ConnectionError = OSError

# ...

try:
    # download/upload from server or device
except ConnectionError:
    print('could not connect to device')
    return

Best practice for forward compatibility: do not use if six.PY3 or if not six.PY3, rather use if six.PY2.

Require “new enough” Python version

To avoid lots of needless GitHub Issues and emails from users with obsolete Python versions, set pyproject.toml to limit to Python required by the program.