pypret.io package

A subpackage that provides Python object persistence in HDF5 files.

It was written to make the storage of arbitrary nested Python structures in the exchangable HDF5 format easy. Its main purpose is to easily add persistence to existing numerical or data analysis codes.

While the files itself are plain HDF5 and can be read in any language supporting HDF5, the format is not compatible to Matlab’s own file format. If you are searching for such a solution look at the hdf5storage package.

Usage

The module exports a save() function that stores arbitrary structures of Python and NumPy data types. For example

>>> x = {'data': [1, 2, 3], 'xrange': np.arange(5, dtype=np.uint8)}
>>> io.save(x, "test.hdf5")

This function should suffice for most needs as long as only standard types are used. The load() function loads these files and restores the structure and the types of the data:

>>> io.load("test.hdf5")
{'data': [1, 2, 3], 'xrange': array([0, 1, 2, 3, 4], dtype=uint8)}

Custom Objects

If you are using objects as simple containers without functionality you may consider using the SimpleNamespace class from the types module of the standard library. The advantage is that io knows how to handle it.:

from types import SimpleNamespace
a = SimpleNamespace(name="my object", data=np.arange(5))
a.data2 = np.arange(10)
copra.save(a)

If your objects are containers with methods but without a custom __init__() the simplest way is to inherit or mix-in the IO class:

class Data(io.IO):
    x = 1

    def squared(self):
        return self.x * self.x

When using the IO class by default all instance attributes are stored and loaded. More flexibility can be achieved by specifying _io-attributes of your custom class.

_io_store : list of str or None, optional
Specify the the instance attributes that are stored exclusively. Acts as a whitelist. If None all instance attributes are stored. Default is None.
_io_store_not : list of str or None, optional
Specify which instance attributes are not stored. Acts as a blacklist. If None no blacklisting is done.

If you want to add attributes to storage you can call the _io_add_to_storage(key) method on your instance. The IO class initalizes the instance without calling __init__(). Instead __new__() is called on the class and afterwards the _post_init() method which subclasses can implement. A fully working example of a class is the following (reduced from copra.FourierTransform):

class Grid(io.IO):
    _io_store = ['N', 'dx', 'x0']

    def __init__(self, N, dx, x0=0.0):
        # This is _not_ called upon loading from storage
        self.N = N
        self.dx = dx
        self.x0 = x0
        self._post_init()

    def _post_init(self):
        # this is called upon loading from storage
        # calculate the grids
        n = np.arange(self.N)
        self.x = self.x0 + n * self.dx

In this example the object can be exactly reproduced upon loading but only a minimal amount of storage is required.

If you want to implement your own storage interface for a custom object you should inherit from IO and implement your own to_dict() and from_dict() methods. Look at the implementation of the default in IO to understand their behavior.

File Format

The file format this module uses is a straightforward mapping of Python types to the HDF5 data structure. Dictionaries and objects are mapped to HDF5 groups, numpy arrays use h5py’s type translation. Iterables are converted to groups by introducing artificial keys of the type idx_%d. This is rather inefficient which explains why the module should not be used to store large numerical arrays as a Python list. To store the type information it uses an HDF5 attribute __class__. Furthermore, for scalars the attribute __dtype__ and for strings the attribute __encoding__ are additionally used.

In conclusion, nested structures of Python types stored with this package are not suitable for exchanging. Dictionaries of numerical data stored with this package can be easily opened with any program that supports HDF5.

Public interface

class pypret.io.options.HDF5Options[source]

A class that handles the correct HDF5 options for different data sets.

The reason is simply that native HDF5 compression will actually increase the file size for small arrays (< 300 bytes). This class selects different HDF5 options based on the dataset over the method __call__. It can be subclassed to support more sophisticated selection strategies.

__init__()[source]

Initialize self. See help(type(self)) for accurate signature.

copy()[source]
pypret.io.save(val, path, archive=False, options=<pypret.io.options.HDF5Options object>)[source]

Saves an object in an HDF5 file.

Parameters:
  • val (object) – Any Python value that is made up of storeable instances. Those are built-in types, numpy datatypes and types with custom handlers.
  • path (str or Path instance) – Save path of the HDF5 file. Existing files will be overwritten!
  • archive (bool, optional) – If True will compress the whole hdf5 file. This is useful when dealing with (many) small HDF5 files as those contain significant overhead.
  • options (HDF5Options instance, optional) – The HDF5 options that will be used for saving. Defaults to the global options instance DEFAULT_OPTIONS.
pypret.io.load(path, obj=None, archive=None)[source]

Reads a possibly compressed HDF5 file.

If archive is None it is retrieved with python-magic.

class pypret.io.IO[source]

Provides an interface for saving to and loading from a HDF5 file.

This class can be mixed-in to easily add persistence to your existing Python classes. By default all attributes of an object will be stored. Upon loading these attributes will be loaded and __init__ will not be called.

Often a better way is to store only the necessary attributes by giving a list of attribute names in the private attribute _io_store. Then you have to overwrite the _post_init() method that initializes your object from these stored attributes. It is usually also be called at the end of the original __init__ and should not mean extra effort.

Lastly, you can simply overwrite load_from_dict to implement a completely custom loader.

classmethod from_dict(attrs)[source]
classmethod load(path)[source]
classmethod load_from_group(group)[source]
save(path, archive=False, options=<pypret.io.options.HDF5Options object>)[source]
save_to_group(g, name)[source]
to_dict()[source]
update(path)[source]
update_from_dict(attrs)[source]
update_from_group(group)[source]

Custom handlers

Implements functions that handle the serialization of types and classes.

Type handlers store and load objects of exactly that type. Instance handlers work also work for subclasses of that type.

The instance handlers are processed in the order they are stored. This means that if an object is an instance of several handled classes it will not raise an error and will be handled by the first matching handler in the OrderedDict.

class pypret.io.handlers.Handler[source]
classmethod create_dataset(data, level, name, **kwargs)[source]
classmethod create_group(level, name, options)[source]
classmethod get_type(level)[source]
classmethod is_dataset()[source]
classmethod is_group()[source]
level_type = 'dataset'
classmethod load_from_level(level, obj=None)[source]

The loader that has to be implemented by subclasses.

classmethod save_to_level(val, level, options, name)[source]

A generic wrapper around the custom save method that each handler implements. It creates a dataset or a group depending on the level_type class attribute and sets the __class__ attribute correctly. For more flexibility subclasses can overwrite this method.

class pypret.io.handlers.TypeHandler[source]

Handles data of a specific type or class.

casting = {'builtins.NoneType': <class 'NoneType'>, 'builtins.bool': <class 'bool'>, 'builtins.bytes': <class 'bytes'>, 'builtins.complex': <class 'complex'>, 'builtins.dict': <class 'dict'>, 'builtins.float': <class 'float'>, 'builtins.int': <class 'int'>, 'builtins.list': <class 'list'>, 'builtins.str': <class 'str'>, 'builtins.tuple': <class 'tuple'>, 'numpy.bool_': <class 'numpy.bool_'>, 'numpy.complex128': <class 'numpy.complex128'>, 'numpy.complex64': <class 'numpy.complex64'>, 'numpy.datetime64': <class 'numpy.datetime64'>, 'numpy.float16': <class 'numpy.float16'>, 'numpy.float32': <class 'numpy.float32'>, 'numpy.float64': <class 'numpy.float64'>, 'numpy.int16': <class 'numpy.int16'>, 'numpy.int32': <class 'numpy.int32'>, 'numpy.int64': <class 'numpy.int64'>, 'numpy.int8': <class 'numpy.int8'>, 'numpy.ndarray': <class 'numpy.ndarray'>, 'numpy.timedelta64': <class 'numpy.timedelta64'>, 'numpy.uint16': <class 'numpy.uint16'>, 'numpy.uint32': <class 'numpy.uint32'>, 'numpy.uint64': <class 'numpy.uint64'>, 'numpy.uint8': <class 'numpy.uint8'>, 'types.SimpleNamespace': <class 'types.SimpleNamespace'>}
classmethod register(t)[source]
types = []
class pypret.io.handlers.InstanceHandler[source]

Handles all instances of a specific (parent) class.

If an instance is subclass to several classes for which a handler exists, no error will be raised (in contrast to TypeHandler). Rather, the first match in the global instance_saver_handlers OrderedDict will be used.

casting = {'pypret.fourier.FourierTransform': <class 'pypret.fourier.FourierTransform'>, 'pypret.fourier.FourierTransformBase': <class 'pypret.fourier.FourierTransformBase'>, 'pypret.io.io.IO': <class 'pypret.io.io.IO'>, 'pypret.material.BaseMaterial': <class 'pypret.material.BaseMaterial'>, 'pypret.material.SellmeierF1': <class 'pypret.material.SellmeierF1'>, 'pypret.material.SellmeierF2': <class 'pypret.material.SellmeierF2'>, 'pypret.mesh_data.MeshData': <class 'pypret.mesh_data.MeshData'>, 'pypret.pnps.BasePNPS': <class 'pypret.pnps.BasePNPS'>, 'pypret.pnps.CollinearPNPS': <class 'pypret.pnps.CollinearPNPS'>, 'pypret.pnps.DSCAN': <class 'pypret.pnps.DSCAN'>, 'pypret.pnps.FROG': <class 'pypret.pnps.FROG'>, 'pypret.pnps.IFROG': <class 'pypret.pnps.IFROG'>, 'pypret.pnps.MIIPS': <class 'pypret.pnps.MIIPS'>, 'pypret.pnps.NoncollinearPNPS': <class 'pypret.pnps.NoncollinearPNPS'>, 'pypret.pnps.TDP': <class 'pypret.pnps.TDP'>, 'pypret.pulse.Pulse': <class 'pypret.pulse.Pulse'>, 'pypret.retrieval.nlo_retriever.BFGSRetriever': <class 'pypret.retrieval.nlo_retriever.BFGSRetriever'>, 'pypret.retrieval.nlo_retriever.DERetriever': <class 'pypret.retrieval.nlo_retriever.DERetriever'>, 'pypret.retrieval.nlo_retriever.LMRetriever': <class 'pypret.retrieval.nlo_retriever.LMRetriever'>, 'pypret.retrieval.nlo_retriever.NLORetriever': <class 'pypret.retrieval.nlo_retriever.NLORetriever'>, 'pypret.retrieval.nlo_retriever.NMRetriever': <class 'pypret.retrieval.nlo_retriever.NMRetriever'>, 'pypret.retrieval.retriever.BaseRetriever': <class 'pypret.retrieval.retriever.BaseRetriever'>, 'pypret.retrieval.step_retriever.COPRARetriever': <class 'pypret.retrieval.step_retriever.COPRARetriever'>, 'pypret.retrieval.step_retriever.GPARetriever': <class 'pypret.retrieval.step_retriever.GPARetriever'>, 'pypret.retrieval.step_retriever.GPDSCANRetriever': <class 'pypret.retrieval.step_retriever.GPDSCANRetriever'>, 'pypret.retrieval.step_retriever.PCGPARetriever': <class 'pypret.retrieval.step_retriever.PCGPARetriever'>, 'pypret.retrieval.step_retriever.PIERetriever': <class 'pypret.retrieval.step_retriever.PIERetriever'>, 'pypret.retrieval.step_retriever.StepRetriever': <class 'pypret.retrieval.step_retriever.StepRetriever'>}
instances = []
classmethod register(t)[source]