anu_ctlab_io
============

.. py:module:: anu_ctlab_io

.. autoapi-nested-parse::

   python I/O for the ANU CTLab array storage format(s).


Submodules
----------

.. toctree::
   :maxdepth: 1

   /autoapi/anu_ctlab_io/netcdf/index
   /autoapi/anu_ctlab_io/raw/index
   /autoapi/anu_ctlab_io/zarr/index


Package Contents
----------------

.. py:class:: Dataset(data, *, dimension_names, voxel_unit, voxel_size, datatype = None, history = None, dataset_id = None)

   Bases: :py:obj:`AbstractDataset`


   A :any:`Dataset`, containing the data and metadata read from one of the ANU CTLab file formats.

   :any:`Dataset`\ s are the primary interface to the :py:mod:`anu_ctlab_io` package, and should generally be
   constructed by users via the :any:`Dataset.from_path` classmethod. Note that the relevant extra (:any:`netcdf` or :any:`zarr`)
   must be installed.

   The initializer of this class should only be used when manually constructing a :any:`Dataset`, which is not
   the primary usage of this library.


   .. py:method:: from_path(path, *, filetype = 'auto', parse_history = True, **kwargs)
      :classmethod:


      Creates a :any:`Dataset` from the data at the given ``path``.

      The data at ``path`` must be in one of the ANU mass data storage formats, and the optional extras required for the specific
      file format must be installed.

      :param path: The ``path`` to read data from.
      :rtype: :any:`Dataset`


   .. py:method:: to_path(path, *, filetype = 'auto', dataset_id = 'auto', **kwargs)

      Writes the :any:`Dataset` to the given ``path``.

      The data will be written in one of the ANU mass data storage formats, and the optional extras required for the specific
      file format must be installed.

      :param path: The ``path`` to write data to.
      :param filetype: The format to write ("NetCDF", "zarr", or "auto"). If "auto", format is inferred from path extension.
          When inferring, NetCDF is assumed for paths ending in ``.nc`` or ``_nc``, and Zarr for paths ending in ``.zarr``.
          If datatype is present in filename (e.g., "tomo_output"), NetCDF is assumed.
      :param dataset_id: Dataset identifier to write to file metadata. Options:
          - "auto" (default): Use self.dataset_id if available, otherwise generate new
          - str: Use this exact value
          - None: Generate new (legacy behavior)
      :param kwargs: Additional keyword arguments passed to the format-specific writer.


   .. py:property:: voxel_size
      :type: tuple[numpy.float32, numpy.float32, numpy.float32]


      The voxel size of the data in the dataset's native unit.


   .. py:method:: voxel_size_with_unit(voxel_unit)

      Get the voxel size of the data converted to a target unit.

      :param voxel_unit: The unit to convert the voxel size to.
      :return: The voxel size as a tuple of three float32 values.
      :raises ValueError: If unit conversion is requested but the source or target unit is VOXEL.


   .. py:property:: voxel_unit
      :type: anu_ctlab_io._voxel_properties.VoxelUnit


      The unit the data's voxel size is in.


   .. py:property:: dimension_names
      :type: tuple[str, Ellipsis]


      The names of the data's dimensions. Usually ``("z", "y", "x")``.


   .. py:property:: history
      :type: anu_ctlab_io._parse_history.History


      The history metadata associated with the :any:`Dataset`.


   .. py:property:: mask_value
      :type: anu_ctlab_io._datatype.StorageDType | None


      The mask value being used by the data.


   .. py:property:: data
      :type: dask.array.Array


      The data contained within the :any:`Dataset`.

      This is a `Dask Array <https://docs.dask.org/en/stable/array.html>`_.


   .. py:property:: dataset_id
      :type: str | None


      The dataset identifier from the source file, if available.


   .. py:property:: mask
      :type: dask.array.Array


      The masked areas of the :any:`Dataset`, as a boolean array.

      This has the same dimensions as the data, and will be all-zero if no mask value exists.


   .. py:property:: masked_data
      :type: dask.array.Array


      The data contained within the :any:`Dataset`, as a masked array.

      This has better performance than manually creating a masked_array using `mask` in the case
      that the loaded datatype has no mask (i.e., OME-Zarr data), as it creates a masked array
      with `nomask` in these situations.


   .. py:method:: add_to_history(key, value)

      Add an entry to the dataset's history metadata.

      This method mutates the dataset in-place by adding a new history entry.
      The history will be automatically serialized when writing to NetCDF or Zarr formats.

      :param key: The history key/identifier. Convention is to use timestamps
          (e.g., "20260128_150530_crop") but any string is valid.
      :param value: The history entry value. Can be a dict with operation details
          (recommended) or a string. Dicts will be serialized to structured format.

      Example::

          ds = Dataset.from_path("data.nc")
          ds.add_to_history("20260128_crop", {
              "operation": "crop",
              "z_range": [10, 50],
              "reason": "Focus on region of interest"
          })
          ds.to_path("cropped.nc")  # History is preserved


   .. py:method:: update_history(entries)

      Update the dataset's history with multiple entries at once.

      This method mutates the dataset in-place by adding multiple history entries.
      Equivalent to calling :any:`add_to_history` multiple times.

      :param entries: Dictionary of history entries to add. Keys are history identifiers,
          values are the entry data (dicts or strings).

      Example::

          ds.update_history({
              "20260128_150530_crop": {"operation": "crop", "z_range": [10, 50]},
              "20260128_150545_filter": {"operation": "gaussian_filter", "sigma": 2.0}
          })


   .. py:method:: from_modified(source, *, data = None, voxel_size = None, voxel_unit = None, dimension_names = None, datatype = None, history_entry = None, history_key = None, dataset_id_suffix = None)
      :classmethod:


      Create a new Dataset from a modified version of an existing one.

      This factory method creates a new Dataset instance with selected attributes
      modified, while preserving the rest from the source. Optionally adds a history
      entry documenting the modification. This follows an immutable pattern where
      the source dataset is not modified.

      :param source: The source Dataset to create a modified copy from.
      :param data: New data array. If None, uses source's data.
      :param voxel_size: New voxel size. If None, uses source's voxel_size.
      :param voxel_unit: New voxel unit. If None, uses source's voxel_unit.
      :param dimension_names: New dimension names. If None, uses source's dimension_names.
      :param datatype: New datatype. If None, uses source's datatype.
      :param history_entry: History entry to add documenting the modification.
          If provided, a new history entry is added with the given key.
      :param history_key: Key for the history entry. If None and history_entry is provided,
          auto-generates a timestamp-based key like "20260128_150530_modification".
      :param dataset_id_suffix: Suffix to append to the dataset_id. If provided,
          the new dataset's dataset_id will be "{source.dataset_id}_{suffix}".
          If source has no dataset_id, this parameter is ignored.
      :return: New Dataset instance with modifications applied.

      Example::

          ds = Dataset.from_path("data.nc")

          # Create cropped version with automatic history and modified dataset_id
          cropped = Dataset.from_modified(
              ds,
              data=ds.data[10:50, :, :],
              history_entry={"operation": "crop", "z_range": [10, 50]},
              history_key="20260128_crop",
              dataset_id_suffix="cropped"
          )
          # Result: dataset_id becomes "20250314_012913_tomoLoRes_SS_cropped"

          # Chain modifications
          scaled = Dataset.from_modified(
              cropped,
              voxel_size=(0.1, 0.1, 0.1),
              history_entry={"operation": "rescale", "new_voxel_size": [0.1, 0.1, 0.1]},
              dataset_id_suffix="scaled"
          )
          # Result: dataset_id becomes "20250314_012913_tomoLoRes_SS_cropped_scaled"


.. py:class:: DataType(*args, **kwds)

   Bases: :py:obj:`enum.Enum`


   An ``Enum`` representing the datatypes produced by MANGO.

   This is used when parsing metadata to construct a :any:`Dataset`, and generally should not need
   to be constructed by a user (use the :any:`Dataset.from_path` classmethod instead).

   When needed, :any:`DataType`\ s should be constructed via either the :any:`infer_from_path` or
   the :any:`from_basename` classmethods.


   .. py:attribute:: PROJU16
      :value: 'proju16'


   .. py:attribute:: PROJF32
      :value: 'projf32'


   .. py:attribute:: TOMO_FLOAT
      :value: 'tomo_float'


   .. py:attribute:: TOMO
      :value: 'tomo'


   .. py:attribute:: FLOAT16
      :value: 'float16'


   .. py:attribute:: FLOAT64
      :value: 'float64'


   .. py:attribute:: SEGMENTED
      :value: 'segmented'


   .. py:attribute:: DISTANCE_MAP
      :value: 'distance_map'


   .. py:attribute:: LABELS
      :value: 'labels'


   .. py:attribute:: RGBA8
      :value: 'rgba8'


   .. py:property:: is_discrete
      :type: bool


      Whether the :any:`DataType` is discrete.


   .. py:property:: dtype
      :type: numpy.typing.DTypeLike


      The numpy ``dtype`` appropriate for storing data of the :any:`DataType`.

      Because of a historical decision in MANGO, the datatype listed in ANU CTLab NetCDFs is not
      guaranteed to have the correct signed/unsigned type -- for some MANGO datatypes, data recorded
      in the NetCDF as an integer type is really an unsigned integer stored in an integer.
      The :any:`dtype` is the real datatype of the data, regardless of whether a loaded NetCDF
      exhibits this behaviour (trust this value, not the NetCDF header).


   .. py:property:: mask_value
      :type: StorageDType | None


      The mask value of the :any:`DataType`.

      This value is corrected for signedness if required (see :any:`dtype`\ ).


   .. py:method:: infer_from_path(path)
      :classmethod:


      Create a :any:`DataType` object by inferring it from the path to the data being loaded.

      Relies on MANGO's standardised file naming.

      :rtype: :any:`DataType`


   .. py:method:: from_basename(basename)
      :classmethod:


      Create a :any:`DataType` object from it's name as a string.

      E.g., ``DataType.from_basename("tomo")``

      :rtype: :any:`DataType`


.. py:type:: StorageDType
   :canonical: np.uint8 | np.uint16 | np.uint32 | np.uint64 | np.float16 | np.float32 | np.float64


.. py:class:: VoxelUnit(*args, **kwds)

   Bases: :py:obj:`enum.Enum`


   The unit of size of a voxel.


   .. py:attribute:: M
      :value: 'm'


   .. py:attribute:: CM
      :value: 'cm'


   .. py:attribute:: MM
      :value: 'mm'


   .. py:attribute:: UM
      :value: 'um'


   .. py:attribute:: NM
      :value: 'nm'


   .. py:attribute:: ANGSTROM
      :value: 'angstrom'


   .. py:attribute:: VOXEL
      :value: 'voxel'


   .. py:method:: from_str(string)
      :classmethod:


      Create a VoxelUnit from the string name of the unit.

      Accepts a wide range of standard representations of each unit, and is case insensitive.


   .. py:method:: to_full_name()

      Return the full unit name for OME-Zarr metadata.

      OME-Zarr specification requires full unit names (e.g., "millimeter")
      rather than abbreviated forms (e.g., "mm").

      :return: Full unit name as string.