CompressedTableHDU¶
Tile-compressed binary table HDU (ZTABLE convention). Subclass
of rustfits.TableHDU — isinstance(hdu, TableHDU) holds.
- class rustfits.CompressedTableHDU¶
Bases:
TableHDU- __getitem__(key, /)¶
Return self[key].
- __len__()¶
Return len(self).
- __setitem__(key, value, /)¶
Set self[key] to value.
- add_checksum()¶
Compute and store both
ZDATASUMandZHECKSUMcards.Same equivalent-uncompressed convention as
add_datasum(). This is the call most users want.
- add_datasum()¶
Compute and store the
ZDATASUMchecksum card.Computed against the equivalent uncompressed table (original schema rebuilt from the Z-prefixed cards with cell data decoded back to its BITPIX-native big-endian layout), not the on-disk BINTABLE — per the FITS Tile Compression Convention. Same manual-refresh contract as
TableHDU.add_datasum()— re-run afterwrite()/append()/__setitem__/repack().
- append(data, *, names=None)¶
Append rows to a compressed table.
Same signature and input forms as
TableHDU.append().Notes
Merge-into-last-partial semantics. If the existing last tile has fewer than
ztile_rowsrows,appenddecodes it, concatenates the firstMnew rows (M = ztile_rows - last_tile_rows), re-encodes the now-fuller tile, and appends the new blobs to the heap end. Old last-tile blobs become orphans (PCOUNTgrows monotonically — callrepack()to reclaim them). Any rows that didn’t fit are encoded as fresh full tiles. Maintains the FITS Tile Compression Convention’s “all tiles same size except the last” invariant.VLA columns are supported: existing per-cell compressed bytes are copied verbatim (no decode + re-encode); merge-tile new rows are encoded via the per-cell-then- fallback path; original-table heap offsets continue from the current
ZPCOUNTso a funpack-reconstructed file stays consistent.
- appending()¶
Open a batched-append context manager.
Inside the
withblock everyappend()(andextend(), its alias) call buffers its input in RAM rather than encoding into the trailing partial tile on every call. The buffer drains in ZTILELEN-aligned bursts when it crosses a 32 MB soft cap, and the residual drains at__exit__(normal or exceptional), collapsing N merge-with-partial-tile re-encodes into 1.Pattern:
with hdu.appending(): for batch in batches: hdu.append(batch) # __exit__ here: drains the buffer
Performance: a sub-ZTILELEN-chunk append loop costs roughly
write_tabletotal instead ofN × merge-tile cost; see the user-facing performance docs for the measured numbers.Restrictions inside the context (raise
ValueError):read(),__getitem__,write(),__setitem__,repack(),add_checksum(),add_datasum(),verify_checksum(),verify_datasum(). Exit the context first.FITS.close()also raises while a context is open (the natural nested-withpattern never triggers this; it fires only for forgotten__exit__).
- clear_tile_cache()¶
Drop every cached
(tile, column)decompressed slab. Keepstile_cache_sizeas-is.
- colnames¶
Column names in on-disk order, as a tuple. Same semantics as
TableHDU.colnames.
- compression¶
Per-column compression algorithm, as a dict.
{column_name: zctyp_value}preserving on-disk column order. Column names come fromTTYPEn(preserved verbatim from the original table). Algorithm strings are the FITS-specZCTYPnvalues found on disk ('RICE_1'/'GZIP_1'/'GZIP_2').- Returns:
{column_name: algorithm_string}.- Return type:
- delete_column(_key)¶
Not supported on compressed tables. See
insert_column()for the workaround.
- dtype¶
The numpy structured dtype the original (uncompressed) table reads into.
Same scaling rules as
TableHDU.dtype; sourced from the synthesized uncompressed-view cards.
- extend(data, *, names=None)¶
Alias for
append(). MirrorsTableHDU.extend()for parity withImageHDU.extend()— generic code iterating HDUs and calling.extend(...)keeps working.
- extending()¶
Alias for
appending(). Mirrors theCompressedImageHDU.extending()method on the image side, so generic code that iterates HDUs and writeswith hdu.extending(): ...works uniformly across every HDU type that supports the batched-context pattern.
- extname¶
EXTNAMEheader value, orNonewhen the keyword is absent.EXTNAME is the user-visible name of the HDU (e.g.
'SCI','CATALOG'). Combined withextverit’s the standard way to identify HDUs without relying on position-by-index.
- extver¶
EXTVERheader value, defaulting to1when absent.Per the FITS standard, multiple HDUs may share an
EXTNAMEand are distinguished byEXTVER. Returns1rather thanNonefor the absent case so callers can compare/select without handlingOptional[int].
- has_data¶
Trueiff this HDU has a non-empty data section.Works uniformly across image and table HDUs: the test is
NAXIS > 0AND everyNAXISn > 0. For images that means “at least one pixel”; for tables it means “at least one row of at least one column”.Useful for picking the first HDU worth reading in a file (primary HDUs are often empty stubs):
hdu = next(h for h in fits if h.has_data) arr = hdu.read()
Edge case: a VLA table with
NAXIS2=0butPCOUNT>0(heap-only) returnsFalse— no main rows means there’s nothing to interpret the heap through, which is the right answer for the “is this HDU worth reading?” question.
- header¶
The HDU’s
FITSHeader.Returns a live view of this HDU’s header cards. Mutations via the header object (
__setitem__,__delitem__,update,add_comment,add_history,add_blank) write through to disk immediately, following the disk-write-before-commit ordering documented onFITSHeader.Reads are cheap; mutations may grow the reserved header blocks in place if the new card list exceeds the current allotment.
- index¶
The HDU’s 0-based position in its file.
Stable for the lifetime of the
FITSobject — even when an earlier HDU grows and shifts this HDU’s bytes forward, the index is unchanged because the HDU is still at the same position in the file’s HDU list.
- insert_column(_name, _data)¶
Not supported on compressed tables.
Schema edits would require re-encoding every tile. Workaround: build a fresh
CompressedTableHDUwith the new schema viaFITS.create_table_hdu()(compress=set) +write().- Raises:
NotImplementedError – Always.
- iter(*, chunksize=None, columns=None, scale=True)¶
Iterate over table rows or row-chunks.
hdu.iter()is equivalent tofor row in hdu— one row per iteration as a numpy scalar record. Passingchunksizeswitches to yielding structured arrays instead.- Parameters:
chunksize (int, optional) –
None(default) yields one row per iteration as a numpy scalar record (the same single-element valuehdu[i]returns). A positive integer yields a structurednumpy.ndarrayof up tochunksizerows per iteration; the final chunk may be shorter.0is rejected.columns (list of str, optional) – Restrict iteration to these columns (case-insensitive), forwarded to
read(). Each yielded record / chunk then carries only the named fields. This is the supported way to iterate a column subset — a singleiter(columns=["x"])still yields 1-field records, so userow["x"]to get the value.scale (bool, default True) – Apply
TSCALn/TZEROnscaling, forwarded toread().
- Returns:
Yields numpy scalar records (
chunksize=None) or structurednumpy.ndarraychunks (chunksize=N).- Return type:
iterator
Notes
The row count is snapshotted when the iterator is created; rows added via
append()mid-iteration are not seen. Closing the file mid-iteration makes the next batch read raise the usual closed-file error. The internal read buffer is auto-sized to an ~8 MiB byte budget (rows = budget / row_width) and is not currently user-configurable — for a huge-row table, drive a manualhdu[lo:hi]loop instead.Works identically on
CompressedTableHDU, decoding only the tiles each batch touches.
- n_tiles¶
Number of tile chunks the original table was split into.
Equals the compressed table’s on-disk
NAXIS2— one BINTABLE row per tile.
- ncols¶
Number of columns in the table (
TFIELDS).
- nrows¶
Number of rows in the ORIGINAL (uncompressed) table.
Sourced from
ZNAXIS2; the on-diskNAXIS2holds the number of tile chunks, not the user-visible row count.
- read(*, rows=None, columns=None, scale=True, mask_null=False)¶
Read the (decompressed) table into a numpy structured array.
Same signature as
TableHDU.read():rows/columns/scale/mask_null. Per-tile decode runs lazily — only the tiles overlapping the requested row range are decompressed.Currently unsupported:
mask_null=TrueraisesNotImplementedError— TNULL masking on compressed-table reads is a separate follow-up.
Notes
Decoded (tile, column) byte slabs populate the LRU cache (subject to
tile_cache_size). Subsequent reads of the same column range, or of other columns within the same tile range, hit warm slabs.
- read_column(name, *, rows=None, as_bytes=False, scale=True, mask_null=False)¶
Read a single column into a plain (non-structured) ndarray.
Equivalent to
hdu.read(columns=[name])[name]but skips the structured-array packaging — useful when you only want one column’s data.- Parameters:
name (str) – Column name. Case-insensitive against the table’s
TTYPEnvalues.rows (slice, list of int, or None, optional) – Same semantics as
read()’srows=.as_bytes (bool, optional) – Only meaningful for
A(character) columns. IfTrue, return the on-disk bytes in anS<n>field with no decode, no NUL-truncation, and no trailing-space strip — useful when a column has non-ASCII bytes that the defaultUdecode would reject. DefaultFalse.mask_null (bool, optional) – If
Trueand this column carriesTNULL, return anumpy.ma.MaskedArray. DefaultFalse.
- Returns:
Array of shape
(n_selected,) + field_shape—field_shapeis empty for scalar columns,(repeat,)or theTDIMshape for subarray columns.- Return type:
- repack()¶
Rebuild the heap, reclaiming orphan blobs.
append()(when a merge into the last partial tile re-encodes the existing blobs) and__setitem__(every affected tile re-encoded and appended to the heap end) leave the old compressed bytes as orphans referenced by no descriptor.repackwalks every live descriptor, streams its referenced bytes into a compact new heap, and rewrites descriptors to point at it.Shrinks the on-disk file when the new heap is smaller (last HDU:
set_len; non-last HDU: trailing HDUs shift backward in lockstep). No-op for an already-compact heap.
- set_tile_cache_size(bytes)¶
Set the tile-cache capacity in bytes.
0disables.
- shape¶
Shape of the table, equivalent to (hdu.nrows, )
- tile_cache_size¶
Current tile-cache capacity in bytes. Default 32 MiB. See
CompressedImageHDU.tile_cache_size()for details.
- tile_cache_used¶
Bytes currently held in the tile cache.
- units¶
Per-column units (
TUNITn), as a dict. Same semantics asTableHDU.units.
- verify_checksum()¶
Verify the stored
ZHECKSUMover the full HDU.Returns
True/False/None(Nonemeans the card is absent).
- verify_datasum()¶
Verify the stored
ZDATASUMagainst the equivalent uncompressed table bytes.Returns
True/False/None(Nonemeans the card is absent).
- write(data, *, names=None)¶
Compress and write data to the table.
Same signature and input forms as
TableHDU.write(): structured ndarray,{name: ndarray}dict, or list of per-column ndarrays +names=. Encodes each(tile, column)per the per-columnZCTYPnalgorithm the file was created with, streams compressed blobs to the heap, fills the descriptor table, and updatesPCOUNT.Notes
Mid-write I/O failures taint the file (close + reopen to recover). See
TableHDU.write()for the per-form validation rules.
- ztile_rows¶
Rows per tile used at compression time (
ZTILELEN).The last tile may contain fewer rows if
ZNAXIS2is not a multiple ofZTILELEN.