TableHDU

Binary table HDU (BINTABLE). Subclass of rustfits.HDU.

class rustfits.TableHDU

Bases: HDU

__getitem__(key, /)

Return self[key].

__len__()

Return len(self).

__setitem__(key, value, /)

Set self[key] to value.

add_checksum()

Compute and store both DATASUM and CHECKSUM cards.

CHECKSUM is the encoded complement of the running checksum over (header + data) so that the total HDU checksum lands on 0xFFFFFFFF, per the FITS Checksum Convention. This is the call most users want — it writes both cards atomically.

See add_datasum() for the manual-refresh contract.

add_datasum()

Compute and store the DATASUM checksum card.

DATASUM is the unsigned-32-bit checksum of the HDU’s data section, per the FITS Checksum Convention. Call after any write that changes the data — write(), append(), __setitem__, insert_column(), delete_column(), repack(). rustfits does NOT auto-refresh checksums on mutation; the user opts in explicitly because checksum computation can be expensive on large data sections.

See also

add_checksum

Also compute the full HDU CHECKSUM card.

verify_datasum

Compare the stored DATASUM against the current data bytes.

append(data, *, names=None)

Append rows to the end of the table.

Grows NAXIS2 in the header and the data section to fit the new rows. For HDUs that are not the last on disk, the file tail is shifted forward and every later HDU’s offsets are bumped in lockstep; previously-issued handles remain valid.

Parameters:
  • data (numpy.ndarray, dict, or list/tuple of ndarrays) – Same three input forms as write(): a structured ndarray, a {name: ndarray} dict, or a list/tuple of per-column ndarrays paired with names=. Length defines the number of new rows.

  • names (list of str, optional) – Required for the list/tuple form; ignored otherwise.

Notes

Validate-then-mutate: input is fully validated (columns, dtypes, shapes) before any file or header bytes are touched, so a dtype mismatch can’t leave the file half-grown.

Mid-write I/O failures taint the file — subsequent reads and writes will raise until the user closes and reopens.

See also

extend

Alias of append, kept for symmetry with ImageHDU.extend().

write

Overwrite all rows in place.

appending()

No-op batched-append context manager.

Uncompressed table appends already go straight to disk (no partial-trailing-tile re-encode tax to amortize), so this context does nothing on enter or exit — it exists for API symmetry with CompressedTableHDU.appending(), where the context does meaningful work. Generic code that iterates HDUs of mixed compressed / uncompressed types can use the pattern uniformly:

for hdu in fits:
    with hdu.appending():
        for batch in batches:
            hdu.append(batch)
colnames

Column names in on-disk order, as a tuple.

Names are returned with their on-disk case preserved verbatim. Lookup against this list (e.g. by read()’s columns= argument) is case-insensitive throughout the API. Returned as a tuple so the value is immutable from the caller side.

Return type:

tuple of str

delete_column(key)

Remove a column from the table.

Parameters:

key (str or int) – Column name (case-insensitive) or a 0-based integer index. Negative indices wrap from the end.

Notes

Works on both fixed and VLA columns. For a VLA column, the descriptor bytes are removed from each row but the heap cells the column referenced are left as-is — they become orphans that repack() reclaims. Existing other VLA columns are preserved across the delete; their heap relocates backward to sit after the new (shorter) main rows.

Row shuffle runs in 1 MiB front-to-back strips so peak memory stays bounded. Mid-write I/O failures taint the file (close + reopen to recover).

dtype

The numpy structured dtype the table reads into.

Reflects the default-read (scale=True) dtype — i.e. columns with the TSCAL/TZERO unsigned trick appear as u2 / u4 / u8 / i1, and other scaled columns as f8. Useful for inspecting the column layout (names, per-cell shapes, types) without paying for an actual read.

Returns:

Structured dtype with one field per column.

Return type:

numpy.dtype

extend(data, *, names=None)

Alias for append().

Kept for symmetry with ImageHDU.extend() so generic code that iterates HDUs and calls .extend(...) on each continues to work. The primary table-side name is append() because that’s the natural verb for adding rows to a table.

extending()

Alias for appending(). Mirrors CompressedImageHDU.extending() for parity with the image side, so generic code that iterates HDUs of any type can use with hdu.extending(): uniformly.

extname

EXTNAME header value, or None when the keyword is absent.

EXTNAME is the user-visible name of the HDU (e.g. 'SCI', 'CATALOG'). Combined with extver it’s the standard way to identify HDUs without relying on position-by-index.

extver

EXTVER header value, defaulting to 1 when absent.

Per the FITS standard, multiple HDUs may share an EXTNAME and are distinguished by EXTVER. Returns 1 rather than None for the absent case so callers can compare/select without handling Optional[int].

has_data

True iff this HDU has a non-empty data section.

Works uniformly across image and table HDUs: the test is NAXIS > 0 AND every NAXISn > 0. For images that means “at least one pixel”; for tables it means “at least one row of at least one column”.

Useful for picking the first HDU worth reading in a file (primary HDUs are often empty stubs):

hdu = next(h for h in fits if h.has_data)
arr = hdu.read()

Edge case: a VLA table with NAXIS2=0 but PCOUNT>0 (heap-only) returns False — no main rows means there’s nothing to interpret the heap through, which is the right answer for the “is this HDU worth reading?” question.

header

The HDU’s FITSHeader.

Returns a live view of this HDU’s header cards. Mutations via the header object (__setitem__, __delitem__, update, add_comment, add_history, add_blank) write through to disk immediately, following the disk-write-before-commit ordering documented on FITSHeader.

Reads are cheap; mutations may grow the reserved header blocks in place if the new card list exceeds the current allotment.

index

The HDU’s 0-based position in its file.

Stable for the lifetime of the FITS object — even when an earlier HDU grows and shifts this HDU’s bytes forward, the index is unchanged because the HDU is still at the same position in the file’s HDU list.

insert_column(name, data, *, position=None, after=None, before=None, unit=None, inner_dtype=None, heap_format=None, bit_packed=False)

Insert a new column into the table.

Parameters:
  • name (str) – Column name (becomes TTYPEn). Must not duplicate an existing column (case-insensitive check).

  • data (numpy.ndarray) – Column values, shape (NAXIS2,) + per_cell_shape. For fixed columns the dtype determines the FITS letter (i2 / i4 / i8 / u1 / u2 / u4 / u8 / f4 / f8 / c8 / c16 / b1 + S / U strings); the unsigned-int trick on u2 / u4 / u8 emits TZERO. For VLA columns, pass Object dtype with one inner ndarray per row and set inner_dtype=.

  • position (int, optional) – 0-based column index in the result, 0..=ncols. ncols appends at the end (also the default when none of position / after / before is set). Mutually exclusive with after and before.

  • after (str or int, optional) – Insert after this column. Accepts a name (case-insensitive) or a 0-based integer index (negative wraps). Mutually exclusive with position and before.

  • before (str or int, optional) – Insert before this column. Same rules as after. Mutually exclusive with position and after.

  • unit (str, optional) – TUNITn string.

  • inner_dtype (str, optional) – Required when data is Object dtype (VLA insert). Inner element dtype as a string: 'f4' / 'i4' / '?' etc. Maps to the FITS inner-element letter.

  • heap_format ({'P', 'Q'}, optional) – For VLA columns only. 'P' (default) uses 8-byte descriptors with a 4 GB heap ceiling; 'Q' uses 16-byte descriptors with no practical ceiling.

  • bit_packed (bool, optional) – For boolean columns only. If True, emit an X (or PX / QX for VLA) bit-packed column instead of the default L (one byte per bool). Default False.

Raises:

ValueError – Duplicate name; multiple position kwargs set; unknown position; dtype mismatch; row count mismatch with NAXIS2; inner_dtype / heap_format set on non-Object input; or the file uses a non-default THEAP (see repack() for the same limitation).

Notes

Strip-based row shuffler bounds peak memory at ~1 MiB regardless of table size. Existing VLA columns are preserved across the insert; their heap is relocated forward to sit after the new (wider) main rows. Mid-write I/O failures taint the file (close + reopen to recover).

iter(*, chunksize=None, columns=None, scale=True)

Iterate over table rows or row-chunks.

hdu.iter() is equivalent to for row in hdu — one row per iteration as a numpy scalar record. Passing chunksize switches to yielding structured arrays instead.

Parameters:
  • chunksize (int, optional) – None (default) yields one row per iteration as a numpy scalar record (the same single-element value hdu[i] returns). A positive integer yields a structured numpy.ndarray of up to chunksize rows per iteration; the final chunk may be shorter. 0 is rejected.

  • columns (list of str, optional) – Restrict iteration to these columns (case-insensitive), forwarded to read(). Each yielded record / chunk then carries only the named fields. This is the supported way to iterate a column subset — a single iter(columns=["x"]) still yields 1-field records, so use row["x"] to get the value.

  • scale (bool, default True) – Apply TSCALn / TZEROn scaling, forwarded to read().

Returns:

Yields numpy scalar records (chunksize=None) or structured numpy.ndarray chunks (chunksize=N).

Return type:

iterator

Notes

The row count is snapshotted when the iterator is created; rows added via append() mid-iteration are not seen. Closing the file mid-iteration makes the next batch read raise the usual closed-file error. The internal read buffer is auto-sized to an ~8 MiB byte budget (rows = budget / row_width) and is not currently user-configurable — for a huge-row table, drive a manual hdu[lo:hi] loop instead.

Works identically on CompressedTableHDU, decoding only the tiles each batch touches.

ncols

Number of columns in the table (TFIELDS).

nrows

Number of rows in the table.

Reads the NAXIS2 header keyword. Equivalent to len(hdu); both are provided for symmetry with numpy (len(arr)) and pandas (df.nrows) idioms.

read(*, rows=None, columns=None, scale=True, mask_null=False)

Read rows from the table into a numpy structured array.

Parameters:
  • rows (slice, list of int, or None, optional) – Rows to read. None (default) reads every row in file order. A slice or iterable of ints selects a subset; negative indices are supported. Iterables are deduped with first-occurrence-wins ordering.

  • columns (list of str, or None, optional) – Column names to read. None (default) reads every column in file order. A list selects + reorders; matching is case-insensitive against the table’s column names.

  • scale (bool, optional) – If True (default), apply TSCAL / TZERO scaling: the unsigned-int trick promotes to the matching unsigned dtype with no precision loss, and other scaling produces f8. If False, return raw stored values in the on-disk BITPIX dtype.

  • mask_null (bool, optional) – If True, return a numpy.ma.MaskedArray with per-field bool masks set True where the stored integer equals TNULLn. The mask compare is in stored-int space (pre-scaling), so it composes correctly with the TSCAL / TZERO paths. Only applies to integer fixed-width columns; variable-length columns with TNULL are rejected. Default False.

Returns:

Structured array of shape (n_selected,) with one field per selected column. Dtype reflects the scale choice (scaled values for True, raw stored dtype for False).

Return type:

numpy.ndarray or numpy.ma.MaskedArray

Raises:

ValueError – If a row index is out of range, a column name is unknown, or mask_null=True is requested on a variable-length column carrying TNULL.

Notes

Both the rows= and columns= subsets validate fully before any file I/O happens, so an invalid selection leaves the file untouched.

Examples

Read the whole table:

arr = hdu.read()

Read three columns from rows 100..200:

arr = hdu.read(rows=slice(100, 200),
               columns=["RA", "DEC", "MAG"])

Read with masking on a column that has TNULL=-99:

arr = hdu.read(mask_null=True)
assert arr["FLAG"].mask.any()
read_column(name, *, rows=None, as_bytes=False, scale=True, mask_null=False)

Read a single column into a plain (non-structured) ndarray.

Equivalent to hdu.read(columns=[name])[name] but skips the structured-array packaging — useful when you only want one column’s data.

Parameters:
  • name (str) – Column name. Case-insensitive against the table’s TTYPEn values.

  • rows (slice, list of int, or None, optional) – Same semantics as read()’s rows=.

  • as_bytes (bool, optional) – Only meaningful for A (character) columns. If True, return the on-disk bytes in an S<n> field with no decode, no NUL-truncation, and no trailing-space strip — useful when a column has non-ASCII bytes that the default U decode would reject. Default False.

  • scale (bool, optional) – Same as read()’s scale=.

  • mask_null (bool, optional) – If True and this column carries TNULL, return a numpy.ma.MaskedArray. Default False.

Returns:

Array of shape (n_selected,) + field_shapefield_shape is empty for scalar columns, (repeat,) or the TDIM shape for subarray columns.

Return type:

numpy.ndarray or numpy.ma.MaskedArray

repack()

Rebuild the VLA heap, reclaiming orphan cells.

VLA writes (__setitem__ on a variable-length column) always append new cell bytes to the end of the heap, leaving the old bytes as orphans referenced by no descriptor. repack() walks every live descriptor, streams the referenced bytes into a compact new heap, and rewrites the descriptors to point at it. If the heap shrinks, the on-disk file shrinks too: the last HDU uses set_len, and a non-last HDU shifts the trailing HDUs backward in lockstep.

No-op for tables without VLA columns or with an already-compact heap.

Raises:

ValueError – If the file uses a non-default THEAP (heap offset other than NAXIS1 * NAXIS2). Rustfits never emits such files itself; the limitation only blocks repacking files written by other tools with a custom heap offset. Workaround: rewrite through a fresh FITS.create_table_hdu() + write().

shape

Shape of the table, equivalent to (hdu.nrows, )

units

Per-column units (TUNITn), as a dict.

Maps each column name (case preserved) to its TUNITn string, or None when TUNITn is unset for that column. Dict iteration follows on-disk column order.

Informational only — nothing in the read path consumes units.

Returns:

{column_name: unit_or_None}.

Return type:

dict

verify_checksum()

Verify the stored CHECKSUM over the full HDU.

Returns:

True if the stored CHECKSUM matches the current header + data; False if it doesn’t; None if the CHECKSUM card is absent.

Return type:

bool or None

verify_datasum()

Verify the stored DATASUM against the current data.

Returns:

True if the stored DATASUM matches the current data section; False if it doesn’t; None if the DATASUM card is absent.

Return type:

bool or None

write(data, *, names=None)

Bulk-write data into the table’s data section.

Overwrites all NAXIS2 rows; for appending new rows instead, use append(). Accepts three input forms, all normalizing through the same per-column strip-write kernel:

Parameters:
  • data (numpy.ndarray, dict, or list/tuple of ndarrays) –

    • Structured ndarray — field names must match the HDU’s columns (extras, missing, or duplicates rejected); field order may differ from HDU order. len(data) must equal NAXIS2.

    • Dict {name: ndarray} — one entry per HDU column; extras / missing rejected. Each value is a per-column ndarray with shape (NAXIS2,) + per_cell_shape.

    • List or tuple of ndarrays with names=[...] — parallel sequences; same per-column model as dict.

  • names (list of str, optional) – Required only when data is a list/tuple of ndarrays. Ignored for the structured-ndarray and dict forms.

Raises:

ValueError – Field name mismatch, missing/extra columns, length mismatch with NAXIS2, or per-cell shape mismatch.

Notes

Validate-then-mutate: any dtype/shape error is raised BEFORE the file is touched, so an invalid input leaves the table unchanged.

See also

append

Add new rows to the table.

__setitem__

Modify a subset of rows / columns / cells.