pyDive.arrays.h5_ndarray module

Note

This module has a shortcut: pyDive.h5.

class pyDive.arrays.h5_ndarray.h5_ndarray(shape, dtype=<type 'float'>, distaxes='all', target_offsets=None, target_ranks=None, no_allocation=False, **kwargs)

Represents a cluster-wide, multidimensional, homogeneous array of fixed-size elements. cluster-wide means that its elements are distributed across IPython.parallel-engines. The distribution is done in one or multiply dimensions along user-specified axes. The user can optionally specify which engine maps to which index range or leave the default that persuits an uniform distribution across all engines.

This h5_ndarray - class is auto-generated out of its local counterpart: pyDive.arrays.local.h5_ndarray.h5_ndarray.

The implementation is based on IPython.parallel and local pyDive.arrays.local.h5_ndarray.h5_ndarray - arrays. Every special operation pyDive.arrays.local.h5_ndarray.h5_ndarray implements (“__add__”, “__le__”, ...) is also available for h5_ndarray.

Note that array slicing is a cheap operation since no memory is copied. However this can easily lead to the situation where you end up with two arrays of the same size but of distinct element distribution. Therefore call dist_like() first before doing any manual stuff on their local arrays. However every cluster-wide array operation first equalizes the distribution of all involved arrays, so an explicit call to dist_like() is rather unlikely in most use cases.

If you try to access an attribute that is only available for the local array, the request is forwarded to an internal local copy of the whole distributed array (see: gather()). This internal copy is only created when you want to access it and is held until __setitem__ is called, i.e. the array’s content is manipulated.

__init__(shape, dtype=<type 'float'>, distaxes='all', target_offsets=None, target_ranks=None, no_allocation=False, **kwargs)

Creates an instance of h5_ndarray. This is a low-level method of instantiating an array, it should rather be constructed using factory functions (“empty”, “zeros”, “open”, ...)

Parameters:
  • shape (ints) – shape of array
  • dtype – datatype of a single element
  • distaxes (ints) – distributed axes. Accepts a single integer too. Defaults to ‘all’ meaning each axis is distributed.
  • target_offsets (list of lists) – For each distributed axis there is a (inner) list in the outer list. The inner list contains the offsets of the local array.
  • target_ranks (ints) – linear list of engine ranks holding the local arrays. The last distributed axis is iterated over first.
  • no_allocation (bool) – if True no instance of pyDive.arrays.local.h5_ndarray.h5_ndarray will be created on engine. Useful for manual instantiation of the local array.
  • kwargs – additional keyword arguments are forwarded to the constructor of the local array.
load()

Load array from file into main memory of all engines in parallel.

Returns:pyDive.ndarray instance
pyDive.arrays.h5_ndarray.open(filename, datapath, distaxes='all')[source]

Create an pyDive.h5.h5_ndarray instance respectively a structure of pyDive.h5.h5_ndarray instances from file.

Parameters:
  • filename – name of hdf5 file.
  • dataset_path – path within hdf5 file to a single dataset or hdf5 group.
  • ints (distaxes) – distributed axes. Defaults to ‘all’ meaning each axis is distributed.
Returns:

pyDive.h5.h5_ndarray instance / structure of pyDive.h5.h5_ndarray instances

pyDive.arrays.h5_ndarray.open_dset(filename, dataset_path, distaxes='all')[source]

Create a pyDive.h5.h5_ndarray instance from file.

Parameters:
  • filename – name of hdf5 file.
  • dataset_path – path within hdf5 file to a single dataset.
  • ints (distaxes) – distributed axes. Defaults to ‘all’ meaning each axis is distributed.
Returns:

pyDive.h5.h5_ndarray instance