Serializer

The serializer is the core component used in the conversion of a Gordo config file into Python objects which interact in order to construct a full ML model capable of being served on Kubernetes.

Things like the dataset and model keys within the YAML config represents objects which will be (de)serialized by the serializer to complete this goal.

gordo.serializer.serializer.dump(obj: object, dest_dir: Union[os.PathLike, str], metadata: dict = None)[source]

Serialize an object into a directory, the object must be pickle-able.

Parameters
  • obj – The object to dump. Must be pickle-able.

  • dest_dir (Union[os.PathLike, str]) – The directory to which to save the model metadata: dict - any additional metadata to be saved alongside this model if it exists, will be returned from the corresponding “load” function

  • metadata (Optional dict of metadata which will be serialized to a file together) – with the model, and loaded again by load_metadata().

Returns

Return type

None

Example

>>> from sklearn.pipeline import Pipeline
>>> from sklearn.decomposition import PCA
>>> from gordo.machine.model.models import KerasAutoEncoder
>>> from gordo import serializer
>>> from tempfile import TemporaryDirectory
>>> pipe = Pipeline([
...     ('pca', PCA(3)),
...     ('model', KerasAutoEncoder(kind='feedforward_hourglass'))])
>>> with TemporaryDirectory() as tmp:
...     serializer.dump(obj=pipe, dest_dir=tmp)
...     pipe_clone = serializer.load(source_dir=tmp)
gordo.serializer.serializer.dumps(model: Union[sklearn.pipeline.Pipeline, gordo.machine.model.base.GordoBase]) → bytes[source]

Dump a model into a bytes representation suitable for loading from gordo.serializer.loads

Parameters

model (Union[Pipeline, GordoBase]) – A gordo model/pipeline

Returns

Serialized model which supports loading via serializer.loads()

Return type

bytes

Example

>>> from gordo.machine.model.models import KerasAutoEncoder
>>> from gordo import serializer
>>>
>>> model = KerasAutoEncoder('feedforward_symmetric')
>>> serialized = serializer.dumps(model)
>>> assert isinstance(serialized, bytes)
>>>
>>> model_clone = serializer.loads(serialized)
>>> assert isinstance(model_clone, KerasAutoEncoder)
gordo.serializer.serializer.load(source_dir: Union[os.PathLike, str]) → Any[source]

Load an object from a directory, saved by gordo.serializer.pipeline_serializer.dump

This take a directory, which is either top-level, meaning it contains a sub directory in the naming scheme: “n_step=<int>-class=<path.to.Class>” or the aforementioned naming scheme directory directly. Will return that unsterilized object.

Parameters

source_dir (Union[os.PathLike, str]) – Location of the top level dir the pipeline was saved

Returns

Return type

Union[GordoBase, Pipeline, BaseEstimator]

gordo.serializer.serializer.load_metadata(source_dir: Union[os.PathLike, str]) → dict[source]

Load the given metadata.json which was saved during the serializer.dump will return the loaded metadata as a dict, or empty dict if no file was found

Parameters

source_dir (Union[os.PathLike, str]) – Directory of the saved model, As with serializer.load(source_dir) this source_dir can be the top level, or the first dir into the serialized model.

Returns

Return type

dict

Raises

FileNotFoundError – If a ‘metadata.json’ file isn’t found in or above the supplied source_dir

gordo.serializer.serializer.loads(bytes_object: bytes) → gordo.machine.model.base.GordoBase[source]

Load a GordoBase model from bytes dumped from gordo.serializer.dumps

Parameters

bytes_object (bytes) – Bytes to be loaded, should be the result of serializer.dumps(model)

Returns

Custom gordo model, scikit learn pipeline or other scikit learn like object.

Return type

Union[GordoBase, Pipeline, BaseEstimator]

From Definition

The ability to take a ‘raw’ representation of an object in dict form and load it into a Python object.

gordo.serializer.from_definition.from_definition(pipe_definition: Union[str, Dict[str, Dict[str, Any]]]) → Union[sklearn.pipeline.FeatureUnion, sklearn.pipeline.Pipeline][source]

Construct a Pipeline or FeatureUnion from a definition.

Example

>>> import yaml
>>> from gordo import serializer
>>> raw_config = '''
... sklearn.pipeline.Pipeline:
...         steps:
...             - sklearn.decomposition.PCA:
...                 n_components: 3
...             - sklearn.pipeline.FeatureUnion:
...                 - sklearn.decomposition.PCA:
...                     n_components: 3
...                 - sklearn.pipeline.Pipeline:
...                     - sklearn.preprocessing.MinMaxScaler
...                     - sklearn.decomposition.TruncatedSVD:
...                         n_components: 2
...             - sklearn.ensemble.RandomForestClassifier:
...                 max_depth: 3
... '''
>>> config = yaml.safe_load(raw_config)
>>> scikit_learn_pipeline = serializer.from_definition(config)
Parameters
  • pipe_definition – List of steps for the Pipeline / FeatureUnion

  • constructor_class – What to place the list of transformers into, either sklearn.pipeline.Pipeline/FeatureUnion

Returns

pipeline

Return type

sklearn.pipeline.Pipeline

gordo.serializer.from_definition.import_locate(import_path: str) → Any[source]
gordo.serializer.from_definition.load_params_from_definition(definition: dict) → dict[source]

Deserialize each value from a dictionary. Could be used for preparing kwargs for methods

Parameters

definition (dict) –

Into Definitiion

The ability to take a Python object, such as a scikit-learn pipeline and convert it into a primitive dict, which can then be inserted into a YAML config file.

gordo.serializer.into_definition.into_definition(pipeline: sklearn.pipeline.Pipeline, prune_default_params: bool = False) → dict[source]

Convert an instance of sklearn.pipeline.Pipeline into a dict definition capable of being reconstructed with gordo.serializer.from_definition

Parameters
  • pipeline (sklearn.pipeline.Pipeline) – Instance of pipeline to decompose

  • prune_default_params (bool) – Whether to prune the default parameters found in current instance of the transformers vs what their default params are.

Returns

definitions for the pipeline, compatible to be reconstructed with gordo.serializer.from_definition()

Return type

dict

Example

>>> import yaml
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.decomposition import PCA
>>> from gordo.machine.model.models import KerasAutoEncoder
>>>
>>> pipe = Pipeline([('pca', PCA(4)), ('ae', KerasAutoEncoder(kind='feedforward_model'))])
>>> pipe_definition = into_definition(pipe)  # It is now a standard python dict of primitives.
>>> print(yaml.dump(pipe_definition))
sklearn.pipeline.Pipeline:
  memory: null
  steps:
  - sklearn.decomposition._pca.PCA:
      copy: true
      iterated_power: auto
      n_components: 4
      random_state: null
      svd_solver: auto
      tol: 0.0
      whiten: false
  - gordo.machine.model.models.KerasAutoEncoder:
      kind: feedforward_model
  verbose: false
gordo.serializer.into_definition.load_definition_from_params(params: dict) → dict[source]

Recursively decomposing each of values from params into the definition

Parameters

params (dict) –

Returns

Return type

dict