ZIP files or GZ files and the like can be quick-and-dirty ways to compress individual data files for retrieval from remote sensors.
In particular, the
GeoRinex
program has extensive capabilities for transparently (without extracting to uncompressed file) reading .zip, .z, .gz, etc. compressed text files, which benefit greatly from storage space savings.
It was surprising to find that transparently processing similarly compressed binary data is not trivial, particularly with numpy.fromfile.
Numpy has
unresolved bugs
with
numpy.fromfile
that preclude easy use with inline reading via
zipfile.ZipFile
or
tarfile.
Specifically, the .fileno attribute is not available from zipfile or tarfile, and numpy.fromfile() relies on .fileno among other attributes.
numpy.frombuffer
is not generally suitable for this application either, because it does not advance the buffer position.
We are not saying there’s no way around this situation, but we chose a more generally beneficial path.
When raw data files need to be compressed and then later analyzed, we use HDF5.
Even when the original program writing the raw binary data cannot be modified,
a simple post-processing Python script with h5py reads the raw data and converts to lossless compressed HDF5 on the sensor.
Then, when the data is analyzed out-of-core processing can be used, or at least the whole file doesn’t have to be read to retrieve data from an arbitrary location in the HDF5 file.
This allows getting nearly all of the size and speed advantages of HDF5 without modifying the original program.
If faced with a large amount of arbitrarily named files that are empty (zero bytes) and it is desired to delete them, this can be easily done with
GNU Findutils.
macOS Homebrew
findutils
makes the command “gfind” in place of “find”.
Verify the file list to be deleted:
find ~/foo -type f -empty | sort
where ~/foo is the directory in which to delete the files and sort is used because in general the files are listed in random order.
If satisfied, actually delete the empty files with:
Numpy is well known to be slower at scalar operations than pure Python.
But many data science and STEM application using arrays are vastly faster and more convenient with Numpy than pure Python methods.
fromnumpyimport isnan
%timeit isnan(0.)
CPython: 428 ns ± 1.74 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
Over the years and major Windows releases, we have many times had to force upgrade Windows.
This is especially so on development machines that see a lot of programs installed in weird locations, external hard drives used, etc.
In general, the approach to force upgrade Windows version is:
make an external backup of files–this could be to a cloud service like Google Drive or OneDrive as well as unpluggable storage like a USB drive.
We usually don’t backup the entire PC, just manually drag over folders containing needed info, as it very well may be lost in this procedure.
obtain a USB 3 flash drive and necessary adapters (e.g. USB-C to USB 3) for your PC. USB 2 flash drives will be painfully slow. At this time, 8 GB or larger is required.
Download and run the Windows Media Creation Tool.
Be sure the USB 3 drive is plugged in before running, and create a bootable flash drive using the tool.
To help ensure you only have to do this once, and after ensuring you have backed up any data, consider the most powerful install option.
That is “choose what to keep” → Nothing. That erase all files to help ensure there isn’t any bit of bad configuration left over. You don’t want to have to keep repeating the upgrade.
I didn’t include screenshots etc. as while the particulars change over the years, the process has been the same since nearly the Windows 9x or even DOS days.
Generally the OS upgrades are a gamble that doesn’t always work, while hard reinstalls naturally virtually always work.
This is the case for Linux including Ubuntu as well.
Convert animated GIF to PNG stack using ImageMagick by:
magick in.gif out_%04.png
where 04 is governed by the number of images in the GIF–04 accommodates up to 10000 images.
GIFs are not a great format for science image data, because the palette is compressed to 8-bit (256 colors).
For plotting reduced data, GIFs can be fine.
Spyder IDE is a complex but usually stable Python program.
A problem symptom is Spyder not getting past the splash logo or not even showing the splash logo.
To totally reset Spyder (erasing all user preferences for Spyder), type in Terminal / Command Prompt:
spyder --reset
Normally, that fixes Spyder.
To diagnose further, start Spyder from Terminal instead of OS Start menu, it might give some hints.
Access GPU CUDA, cuDNN and NCCL functionality are accessed in a Numpy-like way from
CuPy.
CuPy also allows use of the GPU in a more
low-level
fashion as well.
Before starting GPU work in any programming language realize these general caveats:
I/O heavy workloads may make realizing GPU benefits more difficult
Consumer GPUs (GeForce) can be > 10x slower than workstation class (Tesla, Quadro)
CUDA requires a discrete Nvidia GPU.
Check for existence of an Nvidia GPU by:
Linux: a blank response means an Nvidia GPU is not detected.
lspci | grep -i nvidia
Windows: Look under the “render” tab to see if an Nvidia GPU exists.
cupy.cuda.runtime.CUDARuntimeError: cudaErrorInsufficientDriver: CUDA driver version is insufficient for CUDA runtime version
This means the CUDA Toolkit version is expecting a newer Nvidia driver.
The Nvidia driver can be updated via your standard Nvidia update program that was installed from the factory.
“Table 1” of the CUDA Toolkit release notes gives the
CUDA Toolkit required Driver Versions.
A code cell in popular Python IDEs including
PyCharm
and
Spyder
is created by line starting with # %%.
This “code cell” is analogous to IPython code cells and
Matlab code sections.
You will see like
importmath# %% user datax = 3y = 4# %% main loopfor i inrange(5):
x += y
The code cells allow running sections of code in an IDE without the need to constantly set/unset breakpoints in the IDE.
They also catch the eye of developers to delineate logical blocks of code in the algorithm.
We encourage the use of code cell syntax, even if you don’t use them in the IDE directly, as the IDE will highlight sections of code to visibly delineate these separate parts of the algorithm.
GitLab Community Edition is
open source.
Anyone may host their own self-managed GitLab instance if desired instead of gitlab.com.
Git SSH.
For this example, we use Kitware’s CMake GitLab instance.
First, create an account on the self-managed GitLab instance and fork the desired repo.
This will be available like