Numpy can't read .zip files

ZIP files or GZ files and the like can be quick-and-dirty ways to compress individual data files for retrieval from remote sensors. In particular, the GeoRinex program has extensive capabilities for transparently (without extracting to uncompressed file) reading .zip, .z, .gz, etc. compressed text files, which benefit greatly from storage space savings. It was surprising to find that transparently processing similarly compressed binary data is not trivial, particularly with numpy.fromfile. Numpy has unresolved bugs with numpy.fromfile that preclude easy use with inline reading via zipfile.ZipFile or tarfile. Specifically, the .fileno attribute is not available from zipfile or tarfile, and numpy.fromfile() relies on .fileno among other attributes.

numpy.frombuffer is not generally suitable for this application either, because it does not advance the buffer position. We are not saying there’s no way around this situation, but we chose a more generally beneficial path.

Use HDF5

When raw data files need to be compressed and then later analyzed, we use HDF5. Even when the original program writing the raw binary data cannot be modified, a simple post-processing Python script with h5py reads the raw data and converts to lossless compressed HDF5 on the sensor. Then, when the data is analyzed out-of-core processing can be used, or at least the whole file doesn’t have to be read to retrieve data from an arbitrary location in the HDF5 file. This allows getting nearly all of the size and speed advantages of HDF5 without modifying the original program.