Special file drivers¶
Beyond a plain filesystem path, rustfits.FITS understands a
small set of driver prefixes that select where the bytes live. The
prefix is part of the filename string — the same convention cfitsio
and fitsio use — so existing muscle memory carries over.
Today the in-memory, gzip (read + write-back), and remote
(http / https / ftp / ftps) read drivers are
implemented.
Filename |
Backend |
Use |
|---|---|---|
|
on disk |
the default; streaming reads, ~1 MiB peak RSS |
|
in memory |
build or parse a FITS file with no disk access |
|
in memory (gunzipped) |
read+write a gzipped FITS file; |
|
in memory (downloaded) |
read a FITS file from a URL (read-only) |
|
in memory (downloaded) |
read a FITS file from an FTP server (read-only) |
In-memory files¶
mem:// (and its alias memkeep://) opens an empty FITS file
backed by an in-memory buffer instead of a disk file. You build HDUs
into it exactly as you would on disk, then extract the finished file
with to_bytes():
import numpy as np
import rustfits
data = np.arange(12, dtype="i4").reshape(3, 4)
with rustfits.FITS("mem://", "w+") as fits:
fits.write_image(data)
blob = fits.to_bytes() # -> Python bytes
# `blob` is a complete FITS file: send it over a socket, store it
# in a database, hand it to astropy, or write it to disk.
The two spellings mem:// and memkeep:// are aliases — they
do the same thing. cfitsio distinguishes them (free-the-buffer vs
keep-it on close); that distinction doesn’t apply here, because the
buffer is owned by the FITS object and
to_bytes() copies it out regardless. Both names
are accepted so a cfitsio/fitsio user’s existing code keeps working.
Parsing bytes you already have¶
The reverse direction — you hold FITS bytes (a database blob, an HTTP
response body, astropy’s serialization) and want to read them
without touching disk — is rustfits.FITS.from_bytes():
with rustfits.FITS.from_bytes(blob) as fits:
image = fits[0].read()
from_bytes copies the input into a private buffer, so the
returned FITS is completely independent of the
original object — mutating one never affects the other.
Modes and read-only files¶
To create in memory, use FITS("mem://", "w+") — the buffer
starts empty. To read existing bytes,
from_bytes() takes mode="r" (default) or
"r+" for in-place edits of the private copy; mode="w+" is
rejected, since it would discard the bytes you just passed.
One caveat: an in-memory buffer has no operating-system permission
layer, so read-only mode is advisory for in-memory files. Writing
to a buffer opened "r" is not rejected the way a disk file would
be. This is harmless — the writes only touch the private in-memory
copy, never any external bytes — but worth knowing if you rely on
"r" to prevent accidental mutation.
to_bytes on disk files¶
to_bytes() also works on an ordinary disk-backed
file: it flushes pending writes and returns the whole file as
bytes. Note this loads the entire file into memory, unlike the
streaming read paths — fine for modest files, but not how you’d read a
multi-gigabyte image. Call it before close(),
which drops the buffer.
Round-trips are byte-exact¶
A file built in memory is byte-for-byte identical to the same file
written to disk; the only difference is the storage backend. So
in-memory files interoperate cleanly with astropy, fitsio, and any
other FITS reader — the bytes from to_bytes() are
a valid FITS file by construction:
with rustfits.FITS("mem://", "w+") as fits:
fits.write_image(data)
blob = fits.to_bytes()
# Writing `blob` to disk yields the same file as
# FITS("out.fits", "w+") + write_image(data) would have.
with open("out.fits", "wb") as fh:
fh.write(blob)
When to use it¶
Serialize without a temp file — produce FITS bytes to send over a network, store in a database, or pass to another library.
Parse bytes you already hold —
from_bytesreads a blob directly instead of spilling it to a temp file first.Tests — build fixtures in memory without touching the filesystem.
The trade-off is memory: the whole file lives in RAM, which gives up rustfits’s usual streaming property (peak RSS ~1 MiB above the output array on disk reads). That’s inherent to in-memory files; for large files, work from a path on disk.
Gzipped files¶
Opening a path ending in .gz supports reading and writing a gzipped FITS
file. When reading, rustfits gunzips the whole file into an in-memory buffer
and then parses it exactly like any in-memory file.
with rustfits.FITS("image.fits.gz") as fits: # read-only
image = fits[0].read()
Writing is supported too: open a .gz with "w+" (truncate /
create) or "r+" (edit in place). rustfits builds the file in the
in-memory buffer and, when you close() it,
recompresses the buffer and writes the gzip stream back to the
.gz path.
with rustfits.FITS("image.fits.gz", "w+") as fits:
fits.write_image(image) # recompressed + saved on close
with rustfits.FITS("image.fits.gz", "r+") as fits:
fits[0].header["HISTORY"] = "edited in place"
The write-back is atomic: rustfits compresses to a temporary file
in the same directory and renames it over the target, so an
interrupted write (out of disk space, I/O error) leaves the original
.gz intact rather than half-written.
A few details:
The new bytes reach disk at close (or
sync()— see below). As a safety net, if you forget to close a written.gz, a finalizer flushes it when the object is garbage-collected; still, prefer the context manager so errors surface and timing is deterministic.sync()forces the current buffer to disk durably mid-session (recompress + atomic write +fsync), so you don’t have to close to checkpoint.Opening
r+/w+behaves like a plain-disk open:w+creates and claims the file immediately, and a permission/path error is raised atFITS(...)time, not deferred to close.A
.gzopenedr+is rewritten on close only if you actually mutated it — opening to read leaves the on-disk file (bytes, mtime) untouched.Because a gzip stream can’t be seeked and FITS needs random access, the decompressed file is held in RAM — the same caveat as
mem://. Fine for typical files; for very large data prefer an uncompressed path on disk. Per-HDU (tile) compression is almost always the better choice than a whole-file.gz— see Known limitations.Multi-member gzip streams are decoded in full (not truncated to the first member).
to_bytes()on a.gz-opened file returns the decompressed bytes (the in-memory representation), not the gzip stream.Detection is by the
.gzextension (case-insensitive). cfitsio’s.Z(LZW) and.zipwhole-file formats are not supported — only gzip.The top-level
rustfits.read()/rustfits.read_header()handle.gzpaths too, since they open viaFITS.
Remote files¶
A http://, https://, ftp://, or ftps:// URL is fetched
whole and parsed in memory — download-then-open:
url = "https://example.org/data/image.fits"
with rustfits.FITS(url) as fits: # read-only
image = fits[0].read()
# or the one-liner:
image = rustfits.read(url)
# FTP works the same way (anonymous login by default):
image = rustfits.read("ftp://archive.example.org/pub/vela.fits")
Details:
Read-only.
"r+"and"w+"raise before any network request (there is no write-back to a URL).Whole file in RAM. The entire file is downloaded into memory and parsed there, so this pays the full transfer even for a one-tile read, and peak RSS is the file size (same caveat as
mem://). Range-based partial reads — pulling only the bytes a slice needs — are a planned follow-up.A URL whose path ends in
.gzis gunzipped after download, just like a local.gzpath.The GIL is released during the transfer, so other Python threads keep running while a download is in flight.
HTTP schemes:
httpandhttps(TLS handled by rustls).FTP schemes:
ftpandftps(explicitAUTH TLS). Login is anonymous unless the URL carries credentials (ftp://user:pass@host/path); the port defaults to 21. Transfers are forced to binary mode so FITS bytes aren’t mangled.cfitsio’s
root:///gsiftp://are not supported — see Deferred drivers below.
Deferred drivers¶
cfitsio supports a few more storage backends that rustfits has not implemented yet. They aren’t hard to add on top of the existing backend abstraction — they’re deferred for lack of a concrete user need, not for technical reasons. If you have a use case for any of these, please open an issue — a real request is exactly what moves one off this list.
Driver |
Status / workaround |
|---|---|
|
Deferred. In Python the byte API already covers pipelines:
|
|
Deferred. To share a file between processes via RAM today,
pair |
|
Deferred. Needs the XRootD client library. Download the file with an XRootD tool first, then open the local copy. |
|
Deferred. Needs a GridFTP client. Fetch with a grid tool first, then open the local copy. |
Everything above plugs into the same internal backend abstraction the in-memory, gzip, and remote drivers already use, so adding one is mostly a matter of wiring up the byte source.