CompressedTableHDU

Tile-compressed binary table HDU (ZTABLE convention). Subclass of rustfits.TableHDUisinstance(hdu, TableHDU) holds.

class rustfits.CompressedTableHDU

Bases: TableHDU

__getitem__(key, /)

Return self[key].

__len__()

Return len(self).

__setitem__(key, value, /)

Set self[key] to value.

add_checksum()

Compute and store both ZDATASUM and ZHECKSUM cards.

Same equivalent-uncompressed convention as add_datasum(). This is the call most users want.

add_datasum()

Compute and store the ZDATASUM checksum card.

Computed against the equivalent uncompressed table (original schema rebuilt from the Z-prefixed cards with cell data decoded back to its BITPIX-native big-endian layout), not the on-disk BINTABLE — per the FITS Tile Compression Convention. Same manual-refresh contract as TableHDU.add_datasum() — re-run after write() / append() / __setitem__ / repack().

append(data, *, names=None)

Append rows to a compressed table.

Same signature and input forms as TableHDU.append().

Notes

Merge-into-last-partial semantics. If the existing last tile has fewer than ztile_rows rows, append decodes it, concatenates the first M new rows (M = ztile_rows - last_tile_rows), re-encodes the now-fuller tile, and appends the new blobs to the heap end. Old last-tile blobs become orphans (PCOUNT grows monotonically — call repack() to reclaim them). Any rows that didn’t fit are encoded as fresh full tiles. Maintains the FITS Tile Compression Convention’s “all tiles same size except the last” invariant.

VLA columns are supported: existing per-cell compressed bytes are copied verbatim (no decode + re-encode); merge-tile new rows are encoded via the per-cell-then- fallback path; original-table heap offsets continue from the current ZPCOUNT so a funpack-reconstructed file stays consistent.

appending()

Open a batched-append context manager.

Inside the with block every append() (and extend(), its alias) call buffers its input in RAM rather than encoding into the trailing partial tile on every call. The buffer drains in ZTILELEN-aligned bursts when it crosses a 32 MB soft cap, and the residual drains at __exit__ (normal or exceptional), collapsing N merge-with-partial-tile re-encodes into 1.

Pattern:

with hdu.appending():
    for batch in batches:
        hdu.append(batch)
    # __exit__ here: drains the buffer

Performance: a sub-ZTILELEN-chunk append loop costs roughly write_table total instead of N × merge-tile cost; see the user-facing performance docs for the measured numbers.

Restrictions inside the context (raise ValueError): read(), __getitem__, write(), __setitem__, repack(), add_checksum(), add_datasum(), verify_checksum(), verify_datasum(). Exit the context first. FITS.close() also raises while a context is open (the natural nested-with pattern never triggers this; it fires only for forgotten __exit__).

clear_tile_cache()

Drop every cached (tile, column) decompressed slab. Keeps tile_cache_size as-is.

colnames

Column names in on-disk order, as a tuple. Same semantics as TableHDU.colnames.

compression

Per-column compression algorithm, as a dict.

{column_name: zctyp_value} preserving on-disk column order. Column names come from TTYPEn (preserved verbatim from the original table). Algorithm strings are the FITS-spec ZCTYPn values found on disk ('RICE_1' / 'GZIP_1' / 'GZIP_2').

Returns:

{column_name: algorithm_string}.

Return type:

dict

delete_column(_key)

Not supported on compressed tables. See insert_column() for the workaround.

dtype

The numpy structured dtype the original (uncompressed) table reads into.

Same scaling rules as TableHDU.dtype; sourced from the synthesized uncompressed-view cards.

extend(data, *, names=None)

Alias for append(). Mirrors TableHDU.extend() for parity with ImageHDU.extend() — generic code iterating HDUs and calling .extend(...) keeps working.

extending()

Alias for appending(). Mirrors the CompressedImageHDU.extending() method on the image side, so generic code that iterates HDUs and writes with hdu.extending(): ... works uniformly across every HDU type that supports the batched-context pattern.

extname

EXTNAME header value, or None when the keyword is absent.

EXTNAME is the user-visible name of the HDU (e.g. 'SCI', 'CATALOG'). Combined with extver it’s the standard way to identify HDUs without relying on position-by-index.

extver

EXTVER header value, defaulting to 1 when absent.

Per the FITS standard, multiple HDUs may share an EXTNAME and are distinguished by EXTVER. Returns 1 rather than None for the absent case so callers can compare/select without handling Optional[int].

has_data

True iff this HDU has a non-empty data section.

Works uniformly across image and table HDUs: the test is NAXIS > 0 AND every NAXISn > 0. For images that means “at least one pixel”; for tables it means “at least one row of at least one column”.

Useful for picking the first HDU worth reading in a file (primary HDUs are often empty stubs):

hdu = next(h for h in fits if h.has_data)
arr = hdu.read()

Edge case: a VLA table with NAXIS2=0 but PCOUNT>0 (heap-only) returns False — no main rows means there’s nothing to interpret the heap through, which is the right answer for the “is this HDU worth reading?” question.

header

The HDU’s FITSHeader.

Returns a live view of this HDU’s header cards. Mutations via the header object (__setitem__, __delitem__, update, add_comment, add_history, add_blank) write through to disk immediately, following the disk-write-before-commit ordering documented on FITSHeader.

Reads are cheap; mutations may grow the reserved header blocks in place if the new card list exceeds the current allotment.

index

The HDU’s 0-based position in its file.

Stable for the lifetime of the FITS object — even when an earlier HDU grows and shifts this HDU’s bytes forward, the index is unchanged because the HDU is still at the same position in the file’s HDU list.

insert_column(_name, _data)

Not supported on compressed tables.

Schema edits would require re-encoding every tile. Workaround: build a fresh CompressedTableHDU with the new schema via FITS.create_table_hdu() (compress= set) + write().

Raises:

NotImplementedError – Always.

iter(*, chunksize=None, columns=None, scale=True)

Iterate over table rows or row-chunks.

hdu.iter() is equivalent to for row in hdu — one row per iteration as a numpy scalar record. Passing chunksize switches to yielding structured arrays instead.

Parameters:
  • chunksize (int, optional) – None (default) yields one row per iteration as a numpy scalar record (the same single-element value hdu[i] returns). A positive integer yields a structured numpy.ndarray of up to chunksize rows per iteration; the final chunk may be shorter. 0 is rejected.

  • columns (list of str, optional) – Restrict iteration to these columns (case-insensitive), forwarded to read(). Each yielded record / chunk then carries only the named fields. This is the supported way to iterate a column subset — a single iter(columns=["x"]) still yields 1-field records, so use row["x"] to get the value.

  • scale (bool, default True) – Apply TSCALn / TZEROn scaling, forwarded to read().

Returns:

Yields numpy scalar records (chunksize=None) or structured numpy.ndarray chunks (chunksize=N).

Return type:

iterator

Notes

The row count is snapshotted when the iterator is created; rows added via append() mid-iteration are not seen. Closing the file mid-iteration makes the next batch read raise the usual closed-file error. The internal read buffer is auto-sized to an ~8 MiB byte budget (rows = budget / row_width) and is not currently user-configurable — for a huge-row table, drive a manual hdu[lo:hi] loop instead.

Works identically on CompressedTableHDU, decoding only the tiles each batch touches.

n_tiles

Number of tile chunks the original table was split into.

Equals the compressed table’s on-disk NAXIS2 — one BINTABLE row per tile.

ncols

Number of columns in the table (TFIELDS).

nrows

Number of rows in the ORIGINAL (uncompressed) table.

Sourced from ZNAXIS2; the on-disk NAXIS2 holds the number of tile chunks, not the user-visible row count.

read(*, rows=None, columns=None, scale=True, mask_null=False)

Read the (decompressed) table into a numpy structured array.

Same signature as TableHDU.read(): rows / columns / scale / mask_null. Per-tile decode runs lazily — only the tiles overlapping the requested row range are decompressed.

Currently unsupported:

  • mask_null=True raises NotImplementedError — TNULL masking on compressed-table reads is a separate follow-up.

Notes

Decoded (tile, column) byte slabs populate the LRU cache (subject to tile_cache_size). Subsequent reads of the same column range, or of other columns within the same tile range, hit warm slabs.

read_column(name, *, rows=None, as_bytes=False, scale=True, mask_null=False)

Read a single column into a plain (non-structured) ndarray.

Equivalent to hdu.read(columns=[name])[name] but skips the structured-array packaging — useful when you only want one column’s data.

Parameters:
  • name (str) – Column name. Case-insensitive against the table’s TTYPEn values.

  • rows (slice, list of int, or None, optional) – Same semantics as read()’s rows=.

  • as_bytes (bool, optional) – Only meaningful for A (character) columns. If True, return the on-disk bytes in an S<n> field with no decode, no NUL-truncation, and no trailing-space strip — useful when a column has non-ASCII bytes that the default U decode would reject. Default False.

  • scale (bool, optional) – Same as read()’s scale=.

  • mask_null (bool, optional) – If True and this column carries TNULL, return a numpy.ma.MaskedArray. Default False.

Returns:

Array of shape (n_selected,) + field_shapefield_shape is empty for scalar columns, (repeat,) or the TDIM shape for subarray columns.

Return type:

numpy.ndarray or numpy.ma.MaskedArray

repack()

Rebuild the heap, reclaiming orphan blobs.

append() (when a merge into the last partial tile re-encodes the existing blobs) and __setitem__ (every affected tile re-encoded and appended to the heap end) leave the old compressed bytes as orphans referenced by no descriptor. repack walks every live descriptor, streams its referenced bytes into a compact new heap, and rewrites descriptors to point at it.

Shrinks the on-disk file when the new heap is smaller (last HDU: set_len; non-last HDU: trailing HDUs shift backward in lockstep). No-op for an already-compact heap.

set_tile_cache_size(bytes)

Set the tile-cache capacity in bytes. 0 disables.

shape

Shape of the table, equivalent to (hdu.nrows, )

tile_cache_size

Current tile-cache capacity in bytes. Default 32 MiB. See CompressedImageHDU.tile_cache_size() for details.

tile_cache_used

Bytes currently held in the tile cache.

units

Per-column units (TUNITn), as a dict. Same semantics as TableHDU.units.

verify_checksum()

Verify the stored ZHECKSUM over the full HDU.

Returns True / False / None (None means the card is absent).

verify_datasum()

Verify the stored ZDATASUM against the equivalent uncompressed table bytes.

Returns True / False / None (None means the card is absent).

write(data, *, names=None)

Compress and write data to the table.

Same signature and input forms as TableHDU.write(): structured ndarray, {name: ndarray} dict, or list of per-column ndarrays + names=. Encodes each (tile, column) per the per-column ZCTYPn algorithm the file was created with, streams compressed blobs to the heap, fills the descriptor table, and updates PCOUNT.

Notes

Mid-write I/O failures taint the file (close + reopen to recover). See TableHDU.write() for the per-form validation rules.

ztile_rows

Rows per tile used at compression time (ZTILELEN).

The last tile may contain fewer rows if ZNAXIS2 is not a multiple of ZTILELEN.