xarray to_netcdf() file compression

As with HDF5 and h5py, using xarray to_netcdf() to write netCDF files can losslessly compress Datasets and DataArrays, but file compression is off by default. Each data variable must have the compression option set to take effect. We typically only compress variables of 2-D or higher rank.

Notes:

  • Specify format="NETCDF4", engine="netcdf4" to allow a broader range of data types.
  • if “chunksizes” is not set, the data variable will not compress. We arbitrarily made the chunk sizes half of each dimension, but this can be optimized for particular data.
  • “fletcher32” is a checksum that can be used to detect data corruption.
  • Setting “.attr” of a data variable will be written to the netCDF file as well. This is useful to note physical units, for example.
from pathlib import Path
import xarray


def write_netcdf(ds: xarray.Dataset, out_file: Path) -> None:
    enc = {}

    for k in ds.data_vars:
        if ds[k].ndim < 2:
            continue

        enc[k] = {
            "zlib": True,
            "complevel": 3,
            "fletcher32": True,
            "chunksizes": tuple(map(lambda x: x//2, ds[k].shape))
        }

    ds.to_netcdf(out_file, format="NETCDF4", engine="netcdf4", encoding=enc)