Compression¶
rustfits supports the FITS Tile Compression Convention for both
images (ZIMAGE) and tables (ZTABLE). The Python surface for a
compressed HDU is the same as the uncompressed one — read,
__getitem__, __setitem__, write, extend,
append — only the on-disk encoding differs.
This page covers turning compression on at write time, the
algorithm config objects, the Quantize config for float
images, the tile cache, and the repack() operation.
Compressed files written by rustfits are byte-exactly equivalent
to fitsio / cfitsio output on the same input (anchored by
heap-comparison tests across every algorithm), and funpack
decompresses rustfits-written files to bit-exact uncompressed
form. In the other direction, rustfits reads files written by
fpack, astropy, and fitsio. See Known limitations for the
narrow caveats.
Compressed images¶
Pass a compression config to compress= at create / write
time. The five algorithm classes are
Gzip1, Gzip2,
Rice1, Hcompress1, and
Plio1:
import numpy as np
import rustfits
img = np.random.randn(1024, 1024).astype("f4")
with rustfits.FITS("out.fits", "w+") as fits:
fits.write_image(
img, compress=rustfits.Gzip2(tile_shape=(128, 128)),
)
The string alias form works too — case-insensitive, with cfitsio synonyms accepted:
with rustfits.FITS("out.fits", "w+") as fits:
fits.write_image(img, compress="RICE_1")
fits.write_image(img, compress="gzip_2")
Each algorithm class carries the parameters relevant to that
codec (tile_shape everywhere; blocksize on RICE; scale
and smooth on HCOMPRESS; level on the GZIPs). Class
equality is field-wise, so a round-trip pattern works:
cfg = rustfits.Rice1(tile_shape=(16, 16), blocksize=64)
with rustfits.FITS("out.fits", "w+") as fits:
fits.write_image(img.astype("i4"), compress=cfg)
with rustfits.FITS("out.fits") as fits:
assert fits[1].compression == cfg
Reading is automatic: rustfits detects the ZIMAGE convention and
returns a CompressedImageHDU (which subclasses
ImageHDU, so isinstance(hdu, ImageHDU) is
True).
arr = rustfits.read("out.fits") # decompresses transparently
Quantized vs lossless floats¶
Float-image compression has two modes — lossless or lossy — and
the default is lossless. Choose with the quantize= kwarg:
# Default: lossless raw float bytes through GZIP.
with rustfits.FITS("loss.fits", "w+") as fits:
fits.write_image(img, compress=rustfits.Gzip2())
# Lossy: quantize floats to i32 with N-sigma per quantum, then
# compress. Much better compression (4-10x); precision loss
# is controlled by `level`.
with rustfits.FITS("lossy.fits", "w+") as fits:
fits.write_image(
img,
compress=rustfits.Rice1(),
quantize=rustfits.Quantize(level=4.0, method="dither1"),
)
Quantize parameters:
level(default 4.0) — N-sigma per quantum. Negative values pin bscale directly to-level.method— one of"no_dither","dither1"(default, matches cfitsio),"dither2"(preserves NaN through a reserved sentinel).seed— ZDITHER0 (default 0 → on-disk value 1).
Lossless float compression requires compress=Gzip1(...) or
compress=Gzip2(...) — Rice1, Hcompress1, and Plio1 don’t
round-trip raw float bit patterns. Integer HDUs reject
quantize= regardless.
BLANK on compressed integers¶
Same surface as the uncompressed case: blank= on write,
mask_blank=True on read, MaskedArray input auto-fills with
the sentinel. See Images for the full pattern.
Extending and modifying¶
extend(data) and __setitem__ both work on compressed
images. Boundary tiles (partial last tile for extend;
overlapping tiles for __setitem__) are decoded, modified,
re-encoded, and appended to the heap. Old tile blobs become
heap orphans; call repack() to reclaim them:
with rustfits.FITS("img.fits.fz", "r+") as fits:
hdu = fits[1]
hdu.extend(np.zeros((10,) + hdu.shape[1:], dtype=hdu.dtype))
hdu[100, 100] = 0
hdu.repack()
For quantized-float HDUs, __setitem__ reuses the existing
per-tile bscale/bzero/dither seed so unchanged pixels in a
modified tile round-trip bit-exactly — no compounding
quantization loss.
Compressed tables¶
Tables compress the same way: pass compress= to the table
writer. True picks cfitsio’s per-dtype defaults; a string or
algorithm-class instance applies one algorithm to every column; a
dict overrides per column:
import numpy as np
import rustfits
cat = np.zeros(1_000_000, dtype=[
("ra", "f8"), ("dec", "f8"), ("flag", "i4"),
])
# cfitsio's per-dtype defaults — fine for most cases.
with rustfits.FITS("cat.fits.fz", "w+") as fits:
fits.write_table(cat, compress=True)
# One algorithm everywhere.
with rustfits.FITS("cat.fits.fz", "w+") as fits:
fits.write_table(cat, compress="GZIP_2")
# Per-column overrides.
with rustfits.FITS("cat.fits.fz", "w+") as fits:
fits.write_table(
cat,
compress={"ra": rustfits.Gzip2(),
"dec": rustfits.Gzip2(),
"flag": rustfits.Rice1()},
)
The tile size (ztilelen) defaults to roughly 10 MB worth of
rows per tile. Pass ztilelen=N to override.
Reading a compressed table is the same as reading an
uncompressed one — rustfits detects the ZTABLE convention and
returns a CompressedTableHDU (which subclasses
TableHDU). All the row / column / subset /
__setitem__ patterns from Tables work the same way.
Tile cache¶
Decoded tiles are held in a bytes-bound LRU cache so repeat reads of overlapping regions don’t redecompress. Default capacity is 32 MiB per HDU; tune per HDU:
hdu.tile_cache_size # current capacity in bytes
hdu.set_tile_cache_size(0) # disable
hdu.set_tile_cache_size(256 * 1024 * 1024) # 256 MiB
hdu.tile_cache_used # bytes currently held
hdu.clear_tile_cache() # drop entries, keep capacity
Reclaiming heap orphans¶
For both compressed images and compressed tables, mutations
(__setitem__, extend, append with merge-into-last-
tile, VLA writes) leave orphaned bytes in the heap. Call
repack() to rebuild the heap with only live data. If the
HDU is the last on disk the file shrinks via set_len;
otherwise later HDUs shift backward.
with rustfits.FITS("img.fits.fz", "r+") as fits:
fits[1].repack()