anu_ctlab_io
python I/O for the ANU CTLab array storage format(s).
Submodules
Package Contents
- class anu_ctlab_io.Dataset(data, *, dimension_names, voxel_unit, voxel_size, datatype=None, history=None, dataset_id=None)
Bases:
AbstractDatasetA
Dataset, containing the data and metadata read from one of the ANU CTLab file formats.Datasets are the primary interface to theanu_ctlab_iopackage, and should generally be constructed by users via theDataset.from_pathclassmethod. Note that the relevant extra (netcdforzarr) must be installed.The initializer of this class should only be used when manually constructing a
Dataset, which is not the primary usage of this library.- Parameters:
data (dask.array.Array)
dimension_names (tuple[str, Ellipsis])
voxel_unit (anu_ctlab_io._voxel_properties.VoxelUnit)
voxel_size (tuple[numpy.float32, numpy.float32, numpy.float32])
datatype (anu_ctlab_io._datatype.DataType | None)
history (anu_ctlab_io._parse_history.History | None)
dataset_id (str | None)
- classmethod from_path(path, *, filetype='auto', parse_history=True, **kwargs)
Creates a
Datasetfrom the data at the givenpath.The data at
pathmust be in one of the ANU mass data storage formats, and the optional extras required for the specific file format must be installed.- Parameters:
path (pathlib.Path | str) – The
pathto read data from.filetype (str)
parse_history (bool)
kwargs (Any)
- Return type:
- to_path(path, *, filetype='auto', dataset_id='auto', **kwargs)
Writes the
Datasetto the givenpath.The data will be written in one of the ANU mass data storage formats, and the optional extras required for the specific file format must be installed.
- Parameters:
path (pathlib.Path | str) – The
pathto write data to.filetype (str) – The format to write (“NetCDF”, “zarr”, or “auto”). If “auto”, format is inferred from path extension. When inferring, NetCDF is assumed for paths ending in
.ncor_nc, and Zarr for paths ending in.zarr. If datatype is present in filename (e.g., “tomo_output”), NetCDF is assumed.dataset_id (str | None) – Dataset identifier to write to file metadata. Options: - “auto” (default): Use self.dataset_id if available, otherwise generate new - str: Use this exact value - None: Generate new (legacy behavior)
kwargs (Any) – Additional keyword arguments passed to the format-specific writer.
- Return type:
None
- property voxel_size: tuple[numpy.float32, numpy.float32, numpy.float32]
The voxel size of the data in the dataset’s native unit.
- Return type:
tuple[numpy.float32, numpy.float32, numpy.float32]
- voxel_size_with_unit(voxel_unit)
Get the voxel size of the data converted to a target unit.
- Parameters:
voxel_unit (anu_ctlab_io._voxel_properties.VoxelUnit) – The unit to convert the voxel size to.
- Returns:
The voxel size as a tuple of three float32 values.
- Raises:
ValueError – If unit conversion is requested but the source or target unit is VOXEL.
- Return type:
tuple[numpy.float32, numpy.float32, numpy.float32]
- property voxel_unit: anu_ctlab_io._voxel_properties.VoxelUnit
The unit the data’s voxel size is in.
- Return type:
anu_ctlab_io._voxel_properties.VoxelUnit
- property dimension_names: tuple[str, Ellipsis]
The names of the data’s dimensions. Usually
("z", "y", "x").- Return type:
tuple[str, Ellipsis]
- property history: anu_ctlab_io._parse_history.History
The history metadata associated with the
Dataset.- Return type:
anu_ctlab_io._parse_history.History
- property mask_value: anu_ctlab_io._datatype.StorageDType | None
The mask value being used by the data.
- Return type:
anu_ctlab_io._datatype.StorageDType | None
- property data: dask.array.Array
The data contained within the
Dataset.This is a Dask Array.
- Return type:
dask.array.Array
- property dataset_id: str | None
The dataset identifier from the source file, if available.
- Return type:
str | None
- property mask: dask.array.Array
The masked areas of the
Dataset, as a boolean array.This has the same dimensions as the data, and will be all-zero if no mask value exists.
- Return type:
dask.array.Array
- property masked_data: dask.array.Array
The data contained within the
Dataset, as a masked array.This has better performance than manually creating a masked_array using mask in the case that the loaded datatype has no mask (i.e., OME-Zarr data), as it creates a masked array with nomask in these situations.
- Return type:
dask.array.Array
- add_to_history(key, value)
Add an entry to the dataset’s history metadata.
This method mutates the dataset in-place by adding a new history entry. The history will be automatically serialized when writing to NetCDF or Zarr formats.
- Parameters:
key (str) – The history key/identifier. Convention is to use timestamps (e.g., “20260128_150530_crop”) but any string is valid.
value (anu_ctlab_io._parse_history.HistoryValue) – The history entry value. Can be a dict with operation details (recommended) or a string. Dicts will be serialized to structured format.
- Return type:
None
Example:
ds = Dataset.from_path("data.nc") ds.add_to_history("20260128_crop", { "operation": "crop", "z_range": [10, 50], "reason": "Focus on region of interest" }) ds.to_path("cropped.nc") # History is preserved
- update_history(entries)
Update the dataset’s history with multiple entries at once.
This method mutates the dataset in-place by adding multiple history entries. Equivalent to calling
add_to_historymultiple times.- Parameters:
entries (dict[str, anu_ctlab_io._parse_history.HistoryValue]) – Dictionary of history entries to add. Keys are history identifiers, values are the entry data (dicts or strings).
- Return type:
None
Example:
ds.update_history({ "20260128_150530_crop": {"operation": "crop", "z_range": [10, 50]}, "20260128_150545_filter": {"operation": "gaussian_filter", "sigma": 2.0} })
- classmethod from_modified(source, *, data=None, voxel_size=None, voxel_unit=None, dimension_names=None, datatype=None, history_entry=None, history_key=None, dataset_id_suffix=None)
Create a new Dataset from a modified version of an existing one.
This factory method creates a new Dataset instance with selected attributes modified, while preserving the rest from the source. Optionally adds a history entry documenting the modification. This follows an immutable pattern where the source dataset is not modified.
- Parameters:
source (Dataset) – The source Dataset to create a modified copy from.
data (dask.array.Array | None) – New data array. If None, uses source’s data.
voxel_size (tuple[numpy.float32, numpy.float32, numpy.float32] | None) – New voxel size. If None, uses source’s voxel_size.
voxel_unit (anu_ctlab_io._voxel_properties.VoxelUnit | None) – New voxel unit. If None, uses source’s voxel_unit.
dimension_names (tuple[str, Ellipsis] | None) – New dimension names. If None, uses source’s dimension_names.
datatype (anu_ctlab_io._datatype.DataType | None) – New datatype. If None, uses source’s datatype.
history_entry (anu_ctlab_io._parse_history.HistoryValue | None) – History entry to add documenting the modification. If provided, a new history entry is added with the given key.
history_key (str | None) – Key for the history entry. If None and history_entry is provided, auto-generates a timestamp-based key like “20260128_150530_modification”.
dataset_id_suffix (str | None) – Suffix to append to the dataset_id. If provided, the new dataset’s dataset_id will be “{source.dataset_id}_{suffix}”. If source has no dataset_id, this parameter is ignored.
- Returns:
New Dataset instance with modifications applied.
- Return type:
Example:
ds = Dataset.from_path("data.nc") # Create cropped version with automatic history and modified dataset_id cropped = Dataset.from_modified( ds, data=ds.data[10:50, :, :], history_entry={"operation": "crop", "z_range": [10, 50]}, history_key="20260128_crop", dataset_id_suffix="cropped" ) # Result: dataset_id becomes "20250314_012913_tomoLoRes_SS_cropped" # Chain modifications scaled = Dataset.from_modified( cropped, voxel_size=(0.1, 0.1, 0.1), history_entry={"operation": "rescale", "new_voxel_size": [0.1, 0.1, 0.1]}, dataset_id_suffix="scaled" ) # Result: dataset_id becomes "20250314_012913_tomoLoRes_SS_cropped_scaled"
- class anu_ctlab_io.DataType(*args, **kwds)
Bases:
enum.EnumAn
Enumrepresenting the datatypes produced by MANGO.This is used when parsing metadata to construct a
Dataset, and generally should not need to be constructed by a user (use theDataset.from_pathclassmethod instead).When needed,
DataTypes should be constructed via either theinfer_from_pathor thefrom_basenameclassmethods.- PROJU16 = 'proju16'
- PROJF32 = 'projf32'
- TOMO_FLOAT = 'tomo_float'
- TOMO = 'tomo'
- FLOAT16 = 'float16'
- FLOAT64 = 'float64'
- SEGMENTED = 'segmented'
- DISTANCE_MAP = 'distance_map'
- LABELS = 'labels'
- RGBA8 = 'rgba8'
- property dtype: numpy.typing.DTypeLike
The numpy
dtypeappropriate for storing data of theDataType.Because of a historical decision in MANGO, the datatype listed in ANU CTLab NetCDFs is not guaranteed to have the correct signed/unsigned type – for some MANGO datatypes, data recorded in the NetCDF as an integer type is really an unsigned integer stored in an integer. The
dtypeis the real datatype of the data, regardless of whether a loaded NetCDF exhibits this behaviour (trust this value, not the NetCDF header).- Return type:
numpy.typing.DTypeLike
- property mask_value: StorageDType | None
The mask value of the
DataType.This value is corrected for signedness if required (see
dtype).- Return type:
StorageDType | None
- type anu_ctlab_io.StorageDType = np.uint8 | np.uint16 | np.uint32 | np.uint64 | np.float16 | np.float32 | np.float64
- class anu_ctlab_io.VoxelUnit(*args, **kwds)
Bases:
enum.EnumThe unit of size of a voxel.
- M = 'm'
- CM = 'cm'
- MM = 'mm'
- UM = 'um'
- NM = 'nm'
- ANGSTROM = 'angstrom'
- VOXEL = 'voxel'
- classmethod from_str(string)
Create a VoxelUnit from the string name of the unit.
Accepts a wide range of standard representations of each unit, and is case insensitive.
- Parameters:
string (str)
- Return type:
- to_full_name()
Return the full unit name for OME-Zarr metadata.
OME-Zarr specification requires full unit names (e.g., “millimeter”) rather than abbreviated forms (e.g., “mm”).
- Returns:
Full unit name as string.
- Return type:
str