anu_ctlab_io ============ .. py:module:: anu_ctlab_io .. autoapi-nested-parse:: python I/O for the ANU CTLab array storage format(s). Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/anu_ctlab_io/netcdf/index /autoapi/anu_ctlab_io/raw/index /autoapi/anu_ctlab_io/zarr/index Package Contents ---------------- .. py:class:: Dataset(data, *, dimension_names, voxel_unit, voxel_size, datatype = None, history = None, dataset_id = None) Bases: :py:obj:`AbstractDataset` A :any:`Dataset`, containing the data and metadata read from one of the ANU CTLab file formats. :any:`Dataset`\ s are the primary interface to the :py:mod:`anu_ctlab_io` package, and should generally be constructed by users via the :any:`Dataset.from_path` classmethod. Note that the relevant extra (:any:`netcdf` or :any:`zarr`) must be installed. The initializer of this class should only be used when manually constructing a :any:`Dataset`, which is not the primary usage of this library. .. py:method:: from_path(path, *, filetype = 'auto', parse_history = True, **kwargs) :classmethod: Creates a :any:`Dataset` from the data at the given ``path``. The data at ``path`` must be in one of the ANU mass data storage formats, and the optional extras required for the specific file format must be installed. :param path: The ``path`` to read data from. :rtype: :any:`Dataset` .. py:method:: to_path(path, *, filetype = 'auto', dataset_id = 'auto', **kwargs) Writes the :any:`Dataset` to the given ``path``. The data will be written in one of the ANU mass data storage formats, and the optional extras required for the specific file format must be installed. :param path: The ``path`` to write data to. :param filetype: The format to write ("NetCDF", "zarr", or "auto"). If "auto", format is inferred from path extension. When inferring, NetCDF is assumed for paths ending in ``.nc`` or ``_nc``, and Zarr for paths ending in ``.zarr``. If datatype is present in filename (e.g., "tomo_output"), NetCDF is assumed. :param dataset_id: Dataset identifier to write to file metadata. Options: - "auto" (default): Use self.dataset_id if available, otherwise generate new - str: Use this exact value - None: Generate new (legacy behavior) :param kwargs: Additional keyword arguments passed to the format-specific writer. .. py:property:: voxel_size :type: tuple[numpy.float32, numpy.float32, numpy.float32] The voxel size of the data in the dataset's native unit. .. py:method:: voxel_size_with_unit(voxel_unit) Get the voxel size of the data converted to a target unit. :param voxel_unit: The unit to convert the voxel size to. :return: The voxel size as a tuple of three float32 values. :raises ValueError: If unit conversion is requested but the source or target unit is VOXEL. .. py:property:: voxel_unit :type: anu_ctlab_io._voxel_properties.VoxelUnit The unit the data's voxel size is in. .. py:property:: dimension_names :type: tuple[str, Ellipsis] The names of the data's dimensions. Usually ``("z", "y", "x")``. .. py:property:: history :type: anu_ctlab_io._parse_history.History The history metadata associated with the :any:`Dataset`. .. py:property:: mask_value :type: anu_ctlab_io._datatype.StorageDType | None The mask value being used by the data. .. py:property:: data :type: dask.array.Array The data contained within the :any:`Dataset`. This is a `Dask Array `_. .. py:property:: dataset_id :type: str | None The dataset identifier from the source file, if available. .. py:property:: mask :type: dask.array.Array The masked areas of the :any:`Dataset`, as a boolean array. This has the same dimensions as the data, and will be all-zero if no mask value exists. .. py:property:: masked_data :type: dask.array.Array The data contained within the :any:`Dataset`, as a masked array. This has better performance than manually creating a masked_array using `mask` in the case that the loaded datatype has no mask (i.e., OME-Zarr data), as it creates a masked array with `nomask` in these situations. .. py:method:: add_to_history(key, value) Add an entry to the dataset's history metadata. This method mutates the dataset in-place by adding a new history entry. The history will be automatically serialized when writing to NetCDF or Zarr formats. :param key: The history key/identifier. Convention is to use timestamps (e.g., "20260128_150530_crop") but any string is valid. :param value: The history entry value. Can be a dict with operation details (recommended) or a string. Dicts will be serialized to structured format. Example:: ds = Dataset.from_path("data.nc") ds.add_to_history("20260128_crop", { "operation": "crop", "z_range": [10, 50], "reason": "Focus on region of interest" }) ds.to_path("cropped.nc") # History is preserved .. py:method:: update_history(entries) Update the dataset's history with multiple entries at once. This method mutates the dataset in-place by adding multiple history entries. Equivalent to calling :any:`add_to_history` multiple times. :param entries: Dictionary of history entries to add. Keys are history identifiers, values are the entry data (dicts or strings). Example:: ds.update_history({ "20260128_150530_crop": {"operation": "crop", "z_range": [10, 50]}, "20260128_150545_filter": {"operation": "gaussian_filter", "sigma": 2.0} }) .. py:method:: from_modified(source, *, data = None, voxel_size = None, voxel_unit = None, dimension_names = None, datatype = None, history_entry = None, history_key = None, dataset_id_suffix = None) :classmethod: Create a new Dataset from a modified version of an existing one. This factory method creates a new Dataset instance with selected attributes modified, while preserving the rest from the source. Optionally adds a history entry documenting the modification. This follows an immutable pattern where the source dataset is not modified. :param source: The source Dataset to create a modified copy from. :param data: New data array. If None, uses source's data. :param voxel_size: New voxel size. If None, uses source's voxel_size. :param voxel_unit: New voxel unit. If None, uses source's voxel_unit. :param dimension_names: New dimension names. If None, uses source's dimension_names. :param datatype: New datatype. If None, uses source's datatype. :param history_entry: History entry to add documenting the modification. If provided, a new history entry is added with the given key. :param history_key: Key for the history entry. If None and history_entry is provided, auto-generates a timestamp-based key like "20260128_150530_modification". :param dataset_id_suffix: Suffix to append to the dataset_id. If provided, the new dataset's dataset_id will be "{source.dataset_id}_{suffix}". If source has no dataset_id, this parameter is ignored. :return: New Dataset instance with modifications applied. Example:: ds = Dataset.from_path("data.nc") # Create cropped version with automatic history and modified dataset_id cropped = Dataset.from_modified( ds, data=ds.data[10:50, :, :], history_entry={"operation": "crop", "z_range": [10, 50]}, history_key="20260128_crop", dataset_id_suffix="cropped" ) # Result: dataset_id becomes "20250314_012913_tomoLoRes_SS_cropped" # Chain modifications scaled = Dataset.from_modified( cropped, voxel_size=(0.1, 0.1, 0.1), history_entry={"operation": "rescale", "new_voxel_size": [0.1, 0.1, 0.1]}, dataset_id_suffix="scaled" ) # Result: dataset_id becomes "20250314_012913_tomoLoRes_SS_cropped_scaled" .. py:class:: DataType(*args, **kwds) Bases: :py:obj:`enum.Enum` An ``Enum`` representing the datatypes produced by MANGO. This is used when parsing metadata to construct a :any:`Dataset`, and generally should not need to be constructed by a user (use the :any:`Dataset.from_path` classmethod instead). When needed, :any:`DataType`\ s should be constructed via either the :any:`infer_from_path` or the :any:`from_basename` classmethods. .. py:attribute:: PROJU16 :value: 'proju16' .. py:attribute:: PROJF32 :value: 'projf32' .. py:attribute:: TOMO_FLOAT :value: 'tomo_float' .. py:attribute:: TOMO :value: 'tomo' .. py:attribute:: FLOAT16 :value: 'float16' .. py:attribute:: FLOAT64 :value: 'float64' .. py:attribute:: SEGMENTED :value: 'segmented' .. py:attribute:: DISTANCE_MAP :value: 'distance_map' .. py:attribute:: LABELS :value: 'labels' .. py:attribute:: RGBA8 :value: 'rgba8' .. py:property:: is_discrete :type: bool Whether the :any:`DataType` is discrete. .. py:property:: dtype :type: numpy.typing.DTypeLike The numpy ``dtype`` appropriate for storing data of the :any:`DataType`. Because of a historical decision in MANGO, the datatype listed in ANU CTLab NetCDFs is not guaranteed to have the correct signed/unsigned type -- for some MANGO datatypes, data recorded in the NetCDF as an integer type is really an unsigned integer stored in an integer. The :any:`dtype` is the real datatype of the data, regardless of whether a loaded NetCDF exhibits this behaviour (trust this value, not the NetCDF header). .. py:property:: mask_value :type: StorageDType | None The mask value of the :any:`DataType`. This value is corrected for signedness if required (see :any:`dtype`\ ). .. py:method:: infer_from_path(path) :classmethod: Create a :any:`DataType` object by inferring it from the path to the data being loaded. Relies on MANGO's standardised file naming. :rtype: :any:`DataType` .. py:method:: from_basename(basename) :classmethod: Create a :any:`DataType` object from it's name as a string. E.g., ``DataType.from_basename("tomo")`` :rtype: :any:`DataType` .. py:type:: StorageDType :canonical: np.uint8 | np.uint16 | np.uint32 | np.uint64 | np.float16 | np.float32 | np.float64 .. py:class:: VoxelUnit(*args, **kwds) Bases: :py:obj:`enum.Enum` The unit of size of a voxel. .. py:attribute:: M :value: 'm' .. py:attribute:: CM :value: 'cm' .. py:attribute:: MM :value: 'mm' .. py:attribute:: UM :value: 'um' .. py:attribute:: NM :value: 'nm' .. py:attribute:: ANGSTROM :value: 'angstrom' .. py:attribute:: VOXEL :value: 'voxel' .. py:method:: from_str(string) :classmethod: Create a VoxelUnit from the string name of the unit. Accepts a wide range of standard representations of each unit, and is case insensitive. .. py:method:: to_full_name() Return the full unit name for OME-Zarr metadata. OME-Zarr specification requires full unit names (e.g., "millimeter") rather than abbreviated forms (e.g., "mm"). :return: Full unit name as string.