anu_ctlab_io

python I/O for the ANU CTLab array storage format(s).

Submodules

Package Contents

class anu_ctlab_io.Dataset(data, *, dimension_names, voxel_unit, voxel_size, datatype=None, history=None, dataset_id=None)

Bases: AbstractDataset

A Dataset, containing the data and metadata read from one of the ANU CTLab file formats.

Datasets are the primary interface to the anu_ctlab_io package, and should generally be constructed by users via the Dataset.from_path classmethod. Note that the relevant extra (netcdf or zarr) must be installed.

The initializer of this class should only be used when manually constructing a Dataset, which is not the primary usage of this library.

Parameters:
  • data (dask.array.Array)

  • dimension_names (tuple[str, Ellipsis])

  • voxel_unit (anu_ctlab_io._voxel_properties.VoxelUnit)

  • voxel_size (tuple[numpy.float32, numpy.float32, numpy.float32])

  • datatype (anu_ctlab_io._datatype.DataType | None)

  • history (anu_ctlab_io._parse_history.History | None)

  • dataset_id (str | None)

classmethod from_path(path, *, filetype='auto', parse_history=True, **kwargs)

Creates a Dataset from the data at the given path.

The data at path must be in one of the ANU mass data storage formats, and the optional extras required for the specific file format must be installed.

Parameters:
  • path (pathlib.Path | str) – The path to read data from.

  • filetype (str)

  • parse_history (bool)

  • kwargs (Any)

Return type:

Dataset

to_path(path, *, filetype='auto', dataset_id='auto', **kwargs)

Writes the Dataset to the given path.

The data will be written in one of the ANU mass data storage formats, and the optional extras required for the specific file format must be installed.

Parameters:
  • path (pathlib.Path | str) – The path to write data to.

  • filetype (str) – The format to write (“NetCDF”, “zarr”, or “auto”). If “auto”, format is inferred from path extension. When inferring, NetCDF is assumed for paths ending in .nc or _nc, and Zarr for paths ending in .zarr. If datatype is present in filename (e.g., “tomo_output”), NetCDF is assumed.

  • dataset_id (str | None) – Dataset identifier to write to file metadata. Options: - “auto” (default): Use self.dataset_id if available, otherwise generate new - str: Use this exact value - None: Generate new (legacy behavior)

  • kwargs (Any) – Additional keyword arguments passed to the format-specific writer.

Return type:

None

property voxel_size: tuple[numpy.float32, numpy.float32, numpy.float32]

The voxel size of the data in the dataset’s native unit.

Return type:

tuple[numpy.float32, numpy.float32, numpy.float32]

voxel_size_with_unit(voxel_unit)

Get the voxel size of the data converted to a target unit.

Parameters:

voxel_unit (anu_ctlab_io._voxel_properties.VoxelUnit) – The unit to convert the voxel size to.

Returns:

The voxel size as a tuple of three float32 values.

Raises:

ValueError – If unit conversion is requested but the source or target unit is VOXEL.

Return type:

tuple[numpy.float32, numpy.float32, numpy.float32]

property voxel_unit: anu_ctlab_io._voxel_properties.VoxelUnit

The unit the data’s voxel size is in.

Return type:

anu_ctlab_io._voxel_properties.VoxelUnit

property dimension_names: tuple[str, Ellipsis]

The names of the data’s dimensions. Usually ("z", "y", "x").

Return type:

tuple[str, Ellipsis]

property history: anu_ctlab_io._parse_history.History

The history metadata associated with the Dataset.

Return type:

anu_ctlab_io._parse_history.History

property mask_value: anu_ctlab_io._datatype.StorageDType | None

The mask value being used by the data.

Return type:

anu_ctlab_io._datatype.StorageDType | None

property data: dask.array.Array

The data contained within the Dataset.

This is a Dask Array.

Return type:

dask.array.Array

property dataset_id: str | None

The dataset identifier from the source file, if available.

Return type:

str | None

property mask: dask.array.Array

The masked areas of the Dataset, as a boolean array.

This has the same dimensions as the data, and will be all-zero if no mask value exists.

Return type:

dask.array.Array

property masked_data: dask.array.Array

The data contained within the Dataset, as a masked array.

This has better performance than manually creating a masked_array using mask in the case that the loaded datatype has no mask (i.e., OME-Zarr data), as it creates a masked array with nomask in these situations.

Return type:

dask.array.Array

add_to_history(key, value)

Add an entry to the dataset’s history metadata.

This method mutates the dataset in-place by adding a new history entry. The history will be automatically serialized when writing to NetCDF or Zarr formats.

Parameters:
  • key (str) – The history key/identifier. Convention is to use timestamps (e.g., “20260128_150530_crop”) but any string is valid.

  • value (anu_ctlab_io._parse_history.HistoryValue) – The history entry value. Can be a dict with operation details (recommended) or a string. Dicts will be serialized to structured format.

Return type:

None

Example:

ds = Dataset.from_path("data.nc")
ds.add_to_history("20260128_crop", {
    "operation": "crop",
    "z_range": [10, 50],
    "reason": "Focus on region of interest"
})
ds.to_path("cropped.nc")  # History is preserved
update_history(entries)

Update the dataset’s history with multiple entries at once.

This method mutates the dataset in-place by adding multiple history entries. Equivalent to calling add_to_history multiple times.

Parameters:

entries (dict[str, anu_ctlab_io._parse_history.HistoryValue]) – Dictionary of history entries to add. Keys are history identifiers, values are the entry data (dicts or strings).

Return type:

None

Example:

ds.update_history({
    "20260128_150530_crop": {"operation": "crop", "z_range": [10, 50]},
    "20260128_150545_filter": {"operation": "gaussian_filter", "sigma": 2.0}
})
classmethod from_modified(source, *, data=None, voxel_size=None, voxel_unit=None, dimension_names=None, datatype=None, history_entry=None, history_key=None, dataset_id_suffix=None)

Create a new Dataset from a modified version of an existing one.

This factory method creates a new Dataset instance with selected attributes modified, while preserving the rest from the source. Optionally adds a history entry documenting the modification. This follows an immutable pattern where the source dataset is not modified.

Parameters:
  • source (Dataset) – The source Dataset to create a modified copy from.

  • data (dask.array.Array | None) – New data array. If None, uses source’s data.

  • voxel_size (tuple[numpy.float32, numpy.float32, numpy.float32] | None) – New voxel size. If None, uses source’s voxel_size.

  • voxel_unit (anu_ctlab_io._voxel_properties.VoxelUnit | None) – New voxel unit. If None, uses source’s voxel_unit.

  • dimension_names (tuple[str, Ellipsis] | None) – New dimension names. If None, uses source’s dimension_names.

  • datatype (anu_ctlab_io._datatype.DataType | None) – New datatype. If None, uses source’s datatype.

  • history_entry (anu_ctlab_io._parse_history.HistoryValue | None) – History entry to add documenting the modification. If provided, a new history entry is added with the given key.

  • history_key (str | None) – Key for the history entry. If None and history_entry is provided, auto-generates a timestamp-based key like “20260128_150530_modification”.

  • dataset_id_suffix (str | None) – Suffix to append to the dataset_id. If provided, the new dataset’s dataset_id will be “{source.dataset_id}_{suffix}”. If source has no dataset_id, this parameter is ignored.

Returns:

New Dataset instance with modifications applied.

Return type:

Dataset

Example:

ds = Dataset.from_path("data.nc")

# Create cropped version with automatic history and modified dataset_id
cropped = Dataset.from_modified(
    ds,
    data=ds.data[10:50, :, :],
    history_entry={"operation": "crop", "z_range": [10, 50]},
    history_key="20260128_crop",
    dataset_id_suffix="cropped"
)
# Result: dataset_id becomes "20250314_012913_tomoLoRes_SS_cropped"

# Chain modifications
scaled = Dataset.from_modified(
    cropped,
    voxel_size=(0.1, 0.1, 0.1),
    history_entry={"operation": "rescale", "new_voxel_size": [0.1, 0.1, 0.1]},
    dataset_id_suffix="scaled"
)
# Result: dataset_id becomes "20250314_012913_tomoLoRes_SS_cropped_scaled"
class anu_ctlab_io.DataType(*args, **kwds)

Bases: enum.Enum

An Enum representing the datatypes produced by MANGO.

This is used when parsing metadata to construct a Dataset, and generally should not need to be constructed by a user (use the Dataset.from_path classmethod instead).

When needed, DataTypes should be constructed via either the infer_from_path or the from_basename classmethods.

PROJU16 = 'proju16'
PROJF32 = 'projf32'
TOMO_FLOAT = 'tomo_float'
TOMO = 'tomo'
FLOAT16 = 'float16'
FLOAT64 = 'float64'
SEGMENTED = 'segmented'
DISTANCE_MAP = 'distance_map'
LABELS = 'labels'
RGBA8 = 'rgba8'
property is_discrete: bool

Whether the DataType is discrete.

Return type:

bool

property dtype: numpy.typing.DTypeLike

The numpy dtype appropriate for storing data of the DataType.

Because of a historical decision in MANGO, the datatype listed in ANU CTLab NetCDFs is not guaranteed to have the correct signed/unsigned type – for some MANGO datatypes, data recorded in the NetCDF as an integer type is really an unsigned integer stored in an integer. The dtype is the real datatype of the data, regardless of whether a loaded NetCDF exhibits this behaviour (trust this value, not the NetCDF header).

Return type:

numpy.typing.DTypeLike

property mask_value: StorageDType | None

The mask value of the DataType.

This value is corrected for signedness if required (see dtype).

Return type:

StorageDType | None

classmethod infer_from_path(path)

Create a DataType object by inferring it from the path to the data being loaded.

Relies on MANGO’s standardised file naming.

Return type:

DataType

Parameters:

path (str | pathlib.Path)

classmethod from_basename(basename)

Create a DataType object from it’s name as a string.

E.g., DataType.from_basename("tomo")

Return type:

DataType

Parameters:

basename (str)

type anu_ctlab_io.StorageDType = np.uint8 | np.uint16 | np.uint32 | np.uint64 | np.float16 | np.float32 | np.float64
class anu_ctlab_io.VoxelUnit(*args, **kwds)

Bases: enum.Enum

The unit of size of a voxel.

M = 'm'
CM = 'cm'
MM = 'mm'
UM = 'um'
NM = 'nm'
ANGSTROM = 'angstrom'
VOXEL = 'voxel'
classmethod from_str(string)

Create a VoxelUnit from the string name of the unit.

Accepts a wide range of standard representations of each unit, and is case insensitive.

Parameters:

string (str)

Return type:

VoxelUnit

to_full_name()

Return the full unit name for OME-Zarr metadata.

OME-Zarr specification requires full unit names (e.g., “millimeter”) rather than abbreviated forms (e.g., “mm”).

Returns:

Full unit name as string.

Return type:

str