Tables

This page covers TableHDU — writing tables from numpy structured arrays, reading rows and columns, the column-subset objects, in-place edits, append and schema-edit operations, and variable-length (VLA) columns.

For tile-compressed tables, see Compression. The Python surface is the same; the on-disk encoding differs.

Tables written by rustfits — fixed columns, variable-length columns (numeric and string PA), and bit-packed X / PX columns — round-trip bit-exactly through astropy and fitsio (one astropy parser limitation on PX/QX is noted in Known limitations).

Writing a table

The shortest path is rustfits.write(), which auto-detects a table from a structured ndarray or {name: array} dict:

import numpy as np
import rustfits

cat = np.zeros(1000, dtype=[
    ("ra", "f8"), ("dec", "f8"), ("flag", "i4"),
])
cat["ra"] = np.random.uniform(0, 360, size=1000)
cat["dec"] = np.random.uniform(-90, 90, size=1000)

rustfits.write("cat.fits", cat)

For multi-HDU files, or for the list-of-arrays form with a separate names=[...] argument, or for any type-specific knobs (compress=, units=, var_dtypes=, bit_columns=, …), open FITS directly and use write_table():

with rustfits.FITS("out.fits", "w+") as fits:
    fits.write_table(cat, extname="cat")
    fits.write_table({"x": np.arange(10), "y": np.arange(10) * 2})

Pass units={"ra": "deg", "dec": "deg"} to attach informational TUNITn cards.

Allocating then filling

Use the lower-level create_table_hdu() + write() pair when you want to allocate the table first and fill (or extend) it later:

with rustfits.FITS("out.fits", "w+") as fits:
    fits.create_table_hdu(cat.dtype, nrows=1000, extname="cat")
    # ... build the rows, then:
    fits["cat"].write(cat)

Reading a table

A whole-table read returns a structured ndarray:

with rustfits.FITS("cat.fits") as fits:
    tab = fits[1].read()
    print(tab.dtype.names)
    print(tab["ra"][:5])

Read just the columns or rows you need:

with rustfits.FITS("cat.fits") as fits:
    hdu = fits[1]
    sub = hdu.read(columns=["ra", "dec"])    # column subset
    head = hdu.read(rows=slice(0, 100))      # first 100 rows
    picks = hdu.read(rows=[0, 5, 10, 17])    # fancy rows

    data_slice = hdu[10:30]                  # slice of rows
    subcols = hdu[['ra', 'dec']][50:200]     # column subset and slice

rows= accepts a slice (with arbitrary step, including negative) or an iterable of ints (negative indices wrap; duplicates are deduped, output order preserved).

By default TSCAL/TZERO scaling and the unsigned-int trick are applied; pass scale=False for raw stored values. For columns with a TNULLn integer sentinel, mask_null=True returns a numpy.ma.MaskedArray.

Iterating over rows

For a table that doesn’t fit comfortably in memory, iterate instead of reading the whole thing. for row in hdu yields one row at a time as a numpy scalar record (the same value hdu[i] returns):

with rustfits.FITS("cat.fits") as fits:
    hdu = fits[1]
    for row in hdu:
        use(row["ra"], row["dec"])

Rows are read from disk in internally-buffered batches — not one read per row — so this stays memory-bounded even on a huge table. The buffer is auto-sized (no knob to tune).

For vectorized work, iterate in chunks with iter(). Each iteration yields a structured ndarray of up to chunksize rows (the last chunk may be shorter):

with rustfits.FITS("cat.fits") as fits:
    hdu = fits[1]
    for chunk in hdu.iter(chunksize=100_000):
        total += chunk["flux"].sum()

iter forwards columns= and scale= to read(), so you can stream just the columns you need:

for chunk in hdu.iter(chunksize=100_000, columns=["ra", "dec"]):
    ...

Each yielded record or chunk then carries only the named fields — this is the way to iterate a column subset (a single-column iter(columns=["ra"]) still yields one-field records, so reach the value with row["ra"]).

The same surface works on a tile-compressed CompressedTableHDU, decoding only the tiles each batch touches. Note that the row count is fixed when the iterator is created, so rows appended mid-loop are not seen.

Column subsets

Indexing a table with a column name returns a lazy subset object:

with rustfits.FITS("cat.fits") as fits:
    hdu = fits[1]
    ra_col = hdu["ra"]              # SingleColumnSubset
    sub    = hdu[["ra", "dec"]]     # ColumnSubset (structured)

No data are read until rows are specified. Subset objects support slicing, indexing, read(), and write() — they’re a thin selector over the parent HDU, not a snapshot:

ra_all = ra_col[:]                  # plain ndarray
ra_first_100 = ra_col[:100]
ra_picks = ra_col[[0, 5, 10]]
one_value = ra_col[5]               # scalar (single-cell read)

tab = sub[:]                        # structured ndarray
head = sub[:10]
head_via_read = sub.read(rows=slice(0, 10))   # equivalent

Single-row indexing on the parent table returns a scalar, 0-d structured record (numpy.void):

row = hdu[0]              # scalar record with field access
row["ra"], row["dec"]

first_row = hdu[0:1]      # structured ndarray of length 1

How subsets relate to the parent table

A subset is a lazy selector, not a snapshot. Constructing hdu["ra"] does no I/O — it just remembers the parent HDU and the column name. The actual read happens when you slice or iterate the subset:

col = hdu["ra"]              # no I/O — returns a subset handle
first = col[0]               # READ: scalar from row 0
batch = col[100:200]         # READ: ndarray of 100 values
col[0] = 123.4               # WRITE: one cell

Two consequences worth knowing:

  • Subsets share the parent’s open file handle. If the parent FITS handle closes (its with block exits, or you call fits.close()), subsequent accesses on a subset object you kept around raise IOError — the file isn’t open any more. Keep the subset’s use inside the same with block as the parent.

  • Each access is a fresh read. Subsets don’t cache. Two calls to col[:] on the same subset re-read from disk each time, so if another writer mutated the file in between you’d see the new bytes on the second call. (Not a concern in single-process workflows; just a property of “view, not snapshot.”)

The slicing surface (subset[i], subset[a:b], subset[[i,j,k]]) is the shorthand; subset.read(rows=...) and subset.write(data, rows=...) are the discoverable named forms, and both take the same kwargs as HDU.read() / HDU.write().

Writing into a table

The __getitem__ surface is mirrored by __setitem__: row selections, fancy rows, whole-column writes, and the subset objects all accept assignment with the same shape they read.

with rustfits.FITS("cat.fits", "r+") as fits:
    hdu = fits[1]

    # Single-row write (record or shape-(1,) structured array).
    hdu[0] = hdu[0]                              # no-op

    # Slice write.
    hdu[100:200] = np.zeros(100, dtype=hdu.dtype)

    # Fancy-row write.
    hdu[[1, 3, 5]] = np.zeros(3, dtype=hdu.dtype)

    # Whole-column write.
    hdu["flag"] = np.zeros(len(hdu), dtype="i4")

    # Multi-column subset write.
    hdu[["ra", "dec"]] = np.zeros(len(hdu), dtype=[
        ("ra", "f8"), ("dec", "f8"),
    ])

    # Single-cell write via the subset — symmetric with
    # `hdu["ra"][5]` on read.
    hdu["ra"][5] = 123.4

    # Multiple rows of one column.
    hdu["ra"][[0, 1, 2]] = [10.0, 11.0, 12.0]

    # Column-subset row range.
    hdu[["ra", "dec"]][0:10] = np.zeros(10, dtype=[
        ("ra", "f8"), ("dec", "f8"),
    ])

Appending rows

append() (alias extend) grows the table along its rows. Accepts the same three input forms as write — structured ndarray, dict, or list+names.

new_rows = np.zeros(50, dtype=hdu.dtype)
hdu.append(new_rows)

If the table isn’t the last HDU on disk, later HDUs shift forward; offsets on any cached handles update transparently.

If your table is compressed and you plan build a table piecewise with many appends, it is much more efficient to use an appending() context

with rustfits.FITS(fname, 'r+') as fits:
    with fits['data'].appending():
        for i in range(nchunks):
            new_rows = np.zeros(chunksize, dtype=hdu.dtype)
            fits['data'].append(new_rows)

The appending() context buffers writes to avoid many decompressing and recompressing cycles when writing chunks smaller than the compressed tile size.

Adding and removing columns

insert_column() and delete_column() rewrite the table’s schema in place. Insert can append, or position the new column by index or relative to an existing column:

hdu.insert_column("mag", np.zeros(len(hdu), dtype="f4"))
hdu.insert_column("z", np.zeros(len(hdu), dtype="f4"),
                  after="mag")
hdu.insert_column("flag2", np.zeros(len(hdu), dtype="i4"),
                  position=0)        # at the start

hdu.delete_column("flag2")
hdu.delete_column(-1)                # by index; negative wraps

Both work on VLA columns too; see below.

Variable-length columns

VLA (variable-length array) columns store a different-length ndarray per row. Declare them at create time with the sidecar var_dtypes={col: inner_dtype} (the numpy field stays as Object dtype):

dtype = np.dtype([("id", "i4"), ("samples", "O")])
data = np.empty(3, dtype=dtype)
data["id"] = [10, 20, 30]
data["samples"][0] = np.array([1.0, 2.0, 3.0], dtype="f4")
data["samples"][1] = np.array([0.5], dtype="f4")
data["samples"][2] = np.array([], dtype="f4")

with rustfits.FITS("vla.fits", "w+") as fits:
    fits.write_table(data, var_dtypes={"samples": "f4"})

Reading returns Object-dtype cells (one ndarray per row):

tab = rustfits.read("vla.fits")
print(tab["samples"][0])      # array([1., 2., 3.], dtype=float32)
print(tab["samples"][2])      # empty array, dtype f4

String VLA columns work the same way with var_dtypes={col: "S"} (or "U"); cells are read as Python str (or bytes if you pass as_bytes=True).

VLA writes through __setitem__ follow the always-append-and- orphan model. Old cells become heap orphans; call repack() to reclaim them:

hdu["samples"][0] = np.array([99.0], dtype="f4")   # appends to heap
hdu.repack()                                       # reclaim orphans

Repr and accessors

Lightweight metadata without reading any rows:

hdu.nrows           # int
hdu.ncols
hdu.colnames        # tuple of names, case preserved
hdu.dtype           # numpy structured dtype
hdu.units           # dict, informational
hdu.extname         # EXTNAME or None
len(hdu)            # == nrows

ASCII tables

FITS defines an older text-based table extension (XTENSION='TABLE', distinct from the binary XTENSION='BINTABLE' everything else on this page uses). ASCII tables store row data as fixed-width text and are rare in modern files; pipelines almost always pick binary tables instead. rustfits supports them as a first-class HDU type:

import numpy as np
import rustfits

cat = np.zeros(100, dtype=[
    ("id", "i8"), ("flux", "f4"), ("name", "S8"),
])
cat["id"] = np.arange(100, dtype="i8")
cat["flux"] = np.random.uniform(size=100).astype("f4")
cat["name"] = [f"obj{i:04d}".encode() for i in range(100)]

with rustfits.FITS("cat.fits", "w+") as fits:
    fits.create_ascii_table_hdu(cat.dtype, nrows=len(cat))
    fits[1].write(cat)

After construction, the access surface matches binary tables one-for-one: read() with rows= / columns=, hdu[rows] / hdu["col"] / hdu[["a","b"]] and their __setitem__ counterparts, the column-subset objects, append() / extend(), insert_column / delete_column, add_checksum / verify_checksum, iter() / for row in hdu, and the appending() / extending() context managers. Read the sections above; just substitute create_ascii_table_hdu() for create_table_hdu() at construction. Files written by rustfits round-trip bit-exactly through astropy and fitsio.

What’s different from binary tables:

  • Narrower dtype mapping. Signed / unsigned ints map to I20 (unsigned via the TZERO=2**63 trick); f4E15.7, f8D25.17; S<w> / U<w>A<w>. Other numpy dtypes (b1, i1, complex) are rejected. Per-column overrides via formats={"col": "F12.4"} on create or format="..." on insert_column(). Read back the current per-column TFORM strings via formats (mirrors the formats= create kwarg, so it round-trips through create_ascii_table_hdu()).

  • No variable-length (VLA / Object dtype) columns. The format has no heap, so VLAs aren’t representable.

  • No bit-packed (``X``) columns, no subarray cells, no complex (``C`` / ``M``) columns. The FITS standard’s ASCII-table letters are only A / I / F / E / D.

  • No tile compression. ASCII tables can’t be compressed; no FITS convention exists for it.

  • TBCOL packs flush. rustfits matches cfitsio’s convention of no inter-column space byte.