Tables¶
This page covers TableHDU — writing tables
from numpy structured arrays, reading rows and columns, the
column-subset objects, in-place edits, append and schema-edit
operations, and variable-length (VLA) columns.
For tile-compressed tables, see Compression. The Python surface is the same; the on-disk encoding differs.
Tables written by rustfits — fixed columns, variable-length
columns (numeric and string PA), and bit-packed X /
PX columns — round-trip bit-exactly through astropy and
fitsio (one astropy parser limitation on PX/QX is noted
in Known limitations).
Writing a table¶
The shortest path is rustfits.write(), which auto-detects
a table from a structured ndarray or {name: array} dict:
import numpy as np
import rustfits
cat = np.zeros(1000, dtype=[
("ra", "f8"), ("dec", "f8"), ("flag", "i4"),
])
cat["ra"] = np.random.uniform(0, 360, size=1000)
cat["dec"] = np.random.uniform(-90, 90, size=1000)
rustfits.write("cat.fits", cat)
For multi-HDU files, or for the list-of-arrays form with a
separate names=[...] argument, or for any type-specific
knobs (compress=, units=, var_dtypes=,
bit_columns=, …), open FITS directly
and use write_table():
with rustfits.FITS("out.fits", "w+") as fits:
fits.write_table(cat, extname="cat")
fits.write_table({"x": np.arange(10), "y": np.arange(10) * 2})
Pass units={"ra": "deg", "dec": "deg"} to attach informational
TUNITn cards.
When you have an open handle but don’t want to care whether the
value is an image or a table — copying HDUs across files, say —
write() is the minimal, type-agnostic method.
Like the top-level rustfits.write() it accepts only the
universal kwargs (extname, header) and auto-detects the
HDU type; reach for write_table only when you need a
type-specific knob:
with rustfits.FITS("out.fits", "w+") as fits:
fits.write(cat, extname="cat") # structured ndarray → table
fits.write({"x": np.arange(10)}) # dict → table
Allocating then filling¶
Use the lower-level create_table_hdu() +
write() pair when you want to allocate
the table first and fill (or extend) it later:
with rustfits.FITS("out.fits", "w+") as fits:
fits.create_table_hdu(cat.dtype, nrows=1000, extname="cat")
# ... build the rows, then:
fits["cat"].write(cat)
Reading a table¶
A whole-table read returns a structured ndarray:
with rustfits.FITS("cat.fits") as fits:
tab = fits[1].read()
print(tab.dtype.names)
print(tab["ra"][:5])
Read just the columns or rows you need:
with rustfits.FITS("cat.fits") as fits:
hdu = fits[1]
sub = hdu.read(columns=["ra", "dec"]) # column subset
head = hdu.read(rows=slice(0, 100)) # first 100 rows
picks = hdu.read(rows=[0, 5, 10, 17]) # fancy rows
data_slice = hdu[10:30] # slice of rows
subcols = hdu[['ra', 'dec']][50:200] # column subset and slice
rows= accepts a slice (with arbitrary step, including
negative) or an iterable of ints (negative indices wrap;
duplicates are deduped, output order preserved).
By default TSCAL/TZERO scaling and the unsigned-int trick are
applied; pass scale=False for raw stored values. For columns
with a TNULLn integer sentinel, mask_null=True returns a
numpy.ma.MaskedArray.
Iterating over rows¶
For a table that doesn’t fit comfortably in memory, iterate
instead of reading the whole thing. for row in hdu yields one
row at a time as a numpy scalar record (the same value hdu[i]
returns):
with rustfits.FITS("cat.fits") as fits:
hdu = fits[1]
for row in hdu:
use(row["ra"], row["dec"])
Rows are read from disk in internally-buffered batches — not one read per row — so this stays memory-bounded even on a huge table. The buffer is auto-sized (no knob to tune).
For vectorized work, iterate in chunks with
iter(). Each iteration yields a
structured ndarray of up to chunksize rows (the last chunk may
be shorter):
with rustfits.FITS("cat.fits") as fits:
hdu = fits[1]
for chunk in hdu.iter(chunksize=100_000):
total += chunk["flux"].sum()
iter forwards columns= and scale= to
read(), so you can stream just the columns
you need:
for chunk in hdu.iter(chunksize=100_000, columns=["ra", "dec"]):
...
Each yielded record or chunk then carries only the named fields —
this is the way to iterate a column subset (a single-column
iter(columns=["ra"]) still yields one-field records, so reach
the value with row["ra"]).
The same surface works on a tile-compressed
CompressedTableHDU, decoding only the tiles each
batch touches. Note that the row count is fixed when the iterator
is created, so rows appended mid-loop are not seen.
Column subsets¶
Indexing a table with a column name returns a lazy subset object:
with rustfits.FITS("cat.fits") as fits:
hdu = fits[1]
ra_col = hdu["ra"] # SingleColumnSubset
sub = hdu[["ra", "dec"]] # ColumnSubset (structured)
No data are read until rows are specified. Subset objects support slicing,
indexing, read(), and write() — they’re a thin selector over the parent
HDU, not a snapshot:
ra_all = ra_col[:] # plain ndarray
ra_first_100 = ra_col[:100]
ra_picks = ra_col[[0, 5, 10]]
one_value = ra_col[5] # scalar (single-cell read)
tab = sub[:] # structured ndarray
head = sub[:10]
head_via_read = sub.read(rows=slice(0, 10)) # equivalent
Single-row indexing on the parent table returns a scalar, 0-d structured record
(numpy.void):
row = hdu[0] # scalar record with field access
row["ra"], row["dec"]
first_row = hdu[0:1] # structured ndarray of length 1
How subsets relate to the parent table¶
A subset is a lazy selector, not a snapshot. Constructing
hdu["ra"] does no I/O — it just remembers the parent HDU and
the column name. The actual read happens when you slice or
iterate the subset:
col = hdu["ra"] # no I/O — returns a subset handle
first = col[0] # READ: scalar from row 0
batch = col[100:200] # READ: ndarray of 100 values
col[0] = 123.4 # WRITE: one cell
Two consequences worth knowing:
Subsets share the parent’s open file handle. If the parent
FITShandle closes (itswithblock exits, or you callfits.close()), subsequent accesses on a subset object you kept around raiseIOError— the file isn’t open any more. Keep the subset’s use inside the samewithblock as the parent.Each access is a fresh read. Subsets don’t cache. Two calls to
col[:]on the same subset re-read from disk each time, so if another writer mutated the file in between you’d see the new bytes on the second call. (Not a concern in single-process workflows; just a property of “view, not snapshot.”)
The slicing surface (subset[i], subset[a:b],
subset[[i,j,k]]) is the shorthand; subset.read(rows=...)
and subset.write(data, rows=...) are the discoverable
named forms, and both take the same kwargs as
HDU.read() / HDU.write().
Writing into a table¶
The __getitem__ surface is mirrored by __setitem__: row
selections, fancy rows, whole-column writes, and the subset
objects all accept assignment with the same shape they read.
with rustfits.FITS("cat.fits", "r+") as fits:
hdu = fits[1]
# Single-row write (record or shape-(1,) structured array).
hdu[0] = hdu[0] # no-op
# Slice write.
hdu[100:200] = np.zeros(100, dtype=hdu.dtype)
# Fancy-row write.
hdu[[1, 3, 5]] = np.zeros(3, dtype=hdu.dtype)
# Whole-column write.
hdu["flag"] = np.zeros(len(hdu), dtype="i4")
# Multi-column subset write.
hdu[["ra", "dec"]] = np.zeros(len(hdu), dtype=[
("ra", "f8"), ("dec", "f8"),
])
# Single-cell write via the subset — symmetric with
# `hdu["ra"][5]` on read.
hdu["ra"][5] = 123.4
# Multiple rows of one column.
hdu["ra"][[0, 1, 2]] = [10.0, 11.0, 12.0]
# Column-subset row range.
hdu[["ra", "dec"]][0:10] = np.zeros(10, dtype=[
("ra", "f8"), ("dec", "f8"),
])
Appending rows¶
append() (alias extend) grows the
table along its rows. Accepts the same three input forms as
write — structured ndarray, dict, or list+names.
new_rows = np.zeros(50, dtype=hdu.dtype)
hdu.append(new_rows)
If the table isn’t the last HDU on disk, later HDUs shift forward; offsets on any cached handles update transparently.
If your table is compressed and you plan build a table piecewise with many
appends, it is much more efficient to use an
appending() context
with rustfits.FITS(fname, 'r+') as fits:
with fits['data'].appending():
for i in range(nchunks):
new_rows = np.zeros(chunksize, dtype=hdu.dtype)
fits['data'].append(new_rows)
The appending() context buffers writes to
avoid many decompressing and recompressing cycles when writing chunks smaller
than the compressed tile size.
Adding and removing columns¶
insert_column() and
delete_column() rewrite the table’s
schema in place. Insert can append, or position the new column
by index or relative to an existing column:
hdu.insert_column("mag", np.zeros(len(hdu), dtype="f4"))
hdu.insert_column("z", np.zeros(len(hdu), dtype="f4"),
after="mag")
hdu.insert_column("flag2", np.zeros(len(hdu), dtype="i4"),
position=0) # at the start
hdu.delete_column("flag2")
hdu.delete_column(-1) # by index; negative wraps
Both work on VLA columns too; see below.
Variable-length columns¶
VLA (variable-length array) columns store a different-length
ndarray per row. Declare them at create time with the sidecar
var_dtypes={col: inner_dtype} (the numpy field stays as
Object dtype):
dtype = np.dtype([("id", "i4"), ("samples", "O")])
data = np.empty(3, dtype=dtype)
data["id"] = [10, 20, 30]
data["samples"][0] = np.array([1.0, 2.0, 3.0], dtype="f4")
data["samples"][1] = np.array([0.5], dtype="f4")
data["samples"][2] = np.array([], dtype="f4")
with rustfits.FITS("vla.fits", "w+") as fits:
fits.write_table(data, var_dtypes={"samples": "f4"})
Reading returns Object-dtype cells (one ndarray per row):
tab = rustfits.read("vla.fits")
print(tab["samples"][0]) # array([1., 2., 3.], dtype=float32)
print(tab["samples"][2]) # empty array, dtype f4
String VLA columns work the same way with
var_dtypes={col: "S"} (or "U"); cells are read as
Python str (or bytes if you pass as_bytes=True).
VLA writes through __setitem__ follow the always-append-and-
orphan model. Old cells become heap orphans; call
repack() to reclaim them:
hdu["samples"][0] = np.array([99.0], dtype="f4") # appends to heap
hdu.repack() # reclaim orphans
Repr and accessors¶
Lightweight metadata without reading any rows:
hdu.nrows # int
hdu.ncols
hdu.colnames # tuple of names, case preserved
hdu.dtype # numpy structured dtype
hdu.units # dict, informational
hdu.extname # EXTNAME or None
len(hdu) # == nrows
ASCII tables¶
FITS defines an older text-based table extension
(XTENSION='TABLE', distinct from the binary
XTENSION='BINTABLE' everything else on this page uses).
ASCII tables store row data as fixed-width text and are rare
in modern files; pipelines almost always pick binary tables
instead. rustfits supports them as a first-class HDU type:
import numpy as np
import rustfits
cat = np.zeros(100, dtype=[
("id", "i8"), ("flux", "f4"), ("name", "S8"),
])
cat["id"] = np.arange(100, dtype="i8")
cat["flux"] = np.random.uniform(size=100).astype("f4")
cat["name"] = [f"obj{i:04d}".encode() for i in range(100)]
with rustfits.FITS("cat.fits", "w+") as fits:
fits.create_ascii_table_hdu(cat.dtype, nrows=len(cat))
fits[1].write(cat)
After construction, the access surface matches binary tables
one-for-one: read() with rows= / columns=,
hdu[rows] / hdu["col"] / hdu[["a","b"]] and their
__setitem__ counterparts, the column-subset objects,
append() / extend(), insert_column /
delete_column, add_checksum / verify_checksum,
iter() / for row in hdu, and the
appending() / extending() context managers. Read the
sections above; just substitute
create_ascii_table_hdu() for
create_table_hdu() at construction.
Files written by rustfits round-trip bit-exactly through
astropy and fitsio.
What’s different from binary tables:
Narrower dtype mapping. Signed / unsigned ints map to
I20(unsigned via theTZERO=2**63trick);f4→E15.7,f8→D25.17;S<w>/U<w>→A<w>. Other numpy dtypes (b1,i1, complex) are rejected. Per-column overrides viaformats={"col": "F12.4"}on create orformat="..."oninsert_column(). Read back the current per-column TFORM strings viaformats(mirrors theformats=create kwarg, so it round-trips throughcreate_ascii_table_hdu()).No variable-length (VLA / Object dtype) columns. The format has no heap, so VLAs aren’t representable.
No bit-packed (``X``) columns, no subarray cells, no complex (``C`` / ``M``) columns. The FITS standard’s ASCII-table letters are only
A/I/F/E/D.No tile compression. ASCII tables can’t be compressed; no FITS convention exists for it.
TBCOL packs flush. rustfits matches cfitsio’s convention of no inter-column space byte.