IO format¶
SkyKatana persists a SkyMaskPipe objecs to a directory containing one JSON metadata file plus one FITS table per stage, using a compact bit-packed boolean encoding designed for high-throughput streaming I/O, minimal impact on RAM usage, and compact file size.
Example Directory layout¶
<outdir>/
├─ metadata.json
├─ footmask.fits
├─ propmask.fits
├─ starmask.fits
└─ … (one <stage>.fits per discovered stage)
Stages are discovered dynamically from the in-memory object (attributes ending in mask that are valid HealSparse maps).
metadata.json¶
The JSON captures:
format:"skymaskpipe-bitpack-fits-stream"version: internal on-disk format versionclass: class name (SkyMaskPipe)stages: mapping from stage name →{ "filename": "<stage>.fits" }scalars: selected scalar attributes saved from the pipelineparams: JSON-safe copy of per-stage parameter dictionaries
This structure is created during write() and later restored by read().
Stage FITS format (bit-packed)¶
Each stage is a single-table FITS file with one row per coverage pixel that has at least one “child” sparse pixel set to True.
Table schema¶
COVPIX(typeK): coverage pixel index (NESTED)ENC(typeB): row occupancy flag (always 1 for non-empty rows)PACKED(typePB()): a byte array holding the little-endian bit-packed bitmap of set child pixels within that coverage row
The table header stores geometry and encoding:
NSIDE_COV,NSIDE_SPA(coverage & sparse resolutions)DTYPE='bool'ENCOD='BITPACK'NFINE(=(nside_sparse/nside_coverage)^2, children per coverage pixel)BITORD='L'(little-endian bit order within each byte)
Why bit-packing?¶
Instead of storing a dense boolean array per coverage pixel (wasteful when occupancy is sparse), SkyKatana writes only the minimal number of bytes needed to represent the highest child offset, with bits set for active children. Packing reduces I/O volume and memory pressure while preserving exact geometry.
Algorithms¶
Writing (write() → _write_stage_fits_bitpack)¶
- Discover stages in the object (e.g.,
footmask,propmask, …). - For each stage, iterate valid sparse pixels grouped by coverage pixel.
- For each row:
- Compute child offsets
off = fine_pix - (covpix * NFINE)(sorted). -
Pack offsets into bytes without allocating a full row bitmap:
-
Stream in batches to FITS (
rows_per_batch), appendingCOVPIX,ENC=1, andPACKEDarrays; flush at batch boundaries. -
After all stages, write
metadata.jsonand atomically move the temp directory into place.
Reading (read() → _read_stage_fits_bitpack_fast)¶
- Load
metadata.json, restore scalar attributes, enumerate stages, and choose a thread pool size. - For each stage (in parallel workers):
- Open FITS, validate encoding, and recreate an empty HealSparse map with
bit_packed=True. - Block streaming: read rows in chunks (
io_block_rows). Before expanding, estimate how many pixels the block will produce using byte-level popcount. A population count (popcount) operation returns the number of bits set to 1 in a binary word while a byte-level popcount applies that to each 8-bit byte. -
Expand
PACKED→ child pixel IDs:- For row
i, computebase = covpix[i] * NFINE. - Iterate bytes; for each set bit, append
base + bit_indexto the buffer. - After a buffer fill or end of block, update the HealSparse map.
- For row
-
Attach each loaded map back onto the newly constructed
SkyMaskPipeinstance under its recorded stage name.
Notes
- The read path is idempotent w.r.t. geometry, i.e. reading the same on-disk directory multiple times produces exactly the same in-memory result, without side-effects that accumulate or alter data.
- The writer enforces
nside_sparse % nside_coverage == 0to ensureNFINEis an integer.
Performance characteristics¶
- Streaming on both write and read minimizes peak memory
- Bit-packing reduces on-disk footprint and I/O.
- Parallel stage loading hides latency across multiple FITS files.
Forward/Backward compatibility¶
A simple on-disk version is stored in metadata.json (_BITPACK_FITS_VERSION), enabling improvements of the layout while keeping older readers able to validate or branch on version.