Serializer¶

The serializer is the core component used in the conversion of a Gordo config file into Python objects which interact in order to construct a full ML model capable of being served on Kubernetes.

Things like the dataset and model keys within the YAML config represents objects which will be (de)serialized by the serializer to complete this goal.

gordo.serializer.serializer.dump(obj: object, dest_dir: Union[os.PathLike, str], metadata: dict = None)[source]¶

Serialize an object into a directory, the object must be pickle-able.

Parameters

obj – The object to dump. Must be pickle-able.
dest_dir (Union[os.PathLike, str]) – The directory to which to save the model metadata: dict - any additional metadata to be saved alongside this model if it exists, will be returned from the corresponding “load” function
metadata (Optional dict of metadata which will be serialized to a file together) – with the model, and loaded again by load_metadata().

Returns

Return type

None

Example

>>> from sklearn.pipeline import Pipeline
>>> from sklearn.decomposition import PCA
>>> from gordo.machine.model.models import KerasAutoEncoder
>>> from gordo import serializer
>>> from tempfile import TemporaryDirectory
>>> pipe = Pipeline([
...     ('pca', PCA(3)),
...     ('model', KerasAutoEncoder(kind='feedforward_hourglass'))])
>>> with TemporaryDirectory() as tmp:
...     serializer.dump(obj=pipe, dest_dir=tmp)
...     pipe_clone = serializer.load(source_dir=tmp)

gordo.serializer.serializer.dumps(model: Union[sklearn.pipeline.Pipeline, gordo.machine.model.base.GordoBase]) → bytes[source]¶

Dump a model into a bytes representation suitable for loading from gordo.serializer.loads

Parameters: model (Union[Pipeline, GordoBase]) – A gordo model/pipeline
Returns: Serialized model which supports loading via serializer.loads()
Return type: bytes

Example

>>> from gordo.machine.model.models import KerasAutoEncoder
>>> from gordo import serializer
>>>
>>> model = KerasAutoEncoder('feedforward_symmetric')
>>> serialized = serializer.dumps(model)
>>> assert isinstance(serialized, bytes)
>>>
>>> model_clone = serializer.loads(serialized)
>>> assert isinstance(model_clone, KerasAutoEncoder)

gordo.serializer.serializer.load(source_dir: Union[os.PathLike, str]) → Any[source]¶

Load an object from a directory, saved by gordo.serializer.pipeline_serializer.dump

This take a directory, which is either top-level, meaning it contains a sub directory in the naming scheme: “n_step=<int>-class=<path.to.Class>” or the aforementioned naming scheme directory directly. Will return that unsterilized object.

Parameters: source_dir (Union[os.PathLike, str]) – Location of the top level dir the pipeline was saved
Returns
Return type: Union[GordoBase, Pipeline, BaseEstimator]

gordo.serializer.serializer.load_metadata(source_dir: Union[os.PathLike, str]) → dict[source]¶

Load the given metadata.json which was saved during the serializer.dump will return the loaded metadata as a dict, or empty dict if no file was found

Parameters: source_dir (Union[os.PathLike, str]) – Directory of the saved model, As with serializer.load(source_dir) this source_dir can be the top level, or the first dir into the serialized model.
Returns
Return type: dict
Raises: FileNotFoundError – If a ‘metadata.json’ file isn’t found in or above the supplied source_dir

gordo.serializer.serializer.loads(bytes_object: bytes) → gordo.machine.model.base.GordoBase[source]¶

Load a GordoBase model from bytes dumped from gordo.serializer.dumps

Parameters: bytes_object (bytes) – Bytes to be loaded, should be the result of serializer.dumps(model)
Returns: Custom gordo model, scikit learn pipeline or other scikit learn like object.
Return type: Union[GordoBase, Pipeline, BaseEstimator]

From Definition¶

The ability to take a ‘raw’ representation of an object in dict form and load it into a Python object.

gordo.serializer.from_definition.from_definition(pipe_definition: Union[str, Dict[str, Dict[str, Any]]]) → Union[sklearn.pipeline.FeatureUnion, sklearn.pipeline.Pipeline][source]¶

Construct a Pipeline or FeatureUnion from a definition.

Example

>>> import yaml
>>> from gordo import serializer
>>> raw_config = '''
... sklearn.pipeline.Pipeline:
...         steps:
...             - sklearn.decomposition.PCA:
...                 n_components: 3
...             - sklearn.pipeline.FeatureUnion:
...                 - sklearn.decomposition.PCA:
...                     n_components: 3
...                 - sklearn.pipeline.Pipeline:
...                     - sklearn.preprocessing.MinMaxScaler
...                     - sklearn.decomposition.TruncatedSVD:
...                         n_components: 2
...             - sklearn.ensemble.RandomForestClassifier:
...                 max_depth: 3
... '''
>>> config = yaml.safe_load(raw_config)
>>> scikit_learn_pipeline = serializer.from_definition(config)

Parameters

pipe_definition – List of steps for the Pipeline / FeatureUnion
constructor_class – What to place the list of transformers into, either sklearn.pipeline.Pipeline/FeatureUnion

Returns

pipeline

Return type

sklearn.pipeline.Pipeline

gordo.serializer.from_definition.import_locate(import_path: str) → Any[source]¶

gordo.serializer.from_definition.load_params_from_definition(definition: dict) → dict[source]¶

Deserialize each value from a dictionary. Could be used for preparing kwargs for methods

Parameters: definition (dict) –

Into Definitiion¶

The ability to take a Python object, such as a scikit-learn pipeline and convert it into a primitive dict, which can then be inserted into a YAML config file.

gordo.serializer.into_definition.into_definition(pipeline: sklearn.pipeline.Pipeline, prune_default_params: bool = False) → dict[source]¶

Convert an instance of sklearn.pipeline.Pipeline into a dict definition capable of being reconstructed with gordo.serializer.from_definition

Parameters

pipeline (sklearn.pipeline.Pipeline) – Instance of pipeline to decompose
prune_default_params (bool) – Whether to prune the default parameters found in current instance of the transformers vs what their default params are.

Returns

definitions for the pipeline, compatible to be reconstructed with gordo.serializer.from_definition()

Return type

dict

Example

>>> import yaml
>>> from sklearn.pipeline import Pipeline
>>> from sklearn.decomposition import PCA
>>> from gordo.machine.model.models import KerasAutoEncoder
>>>
>>> pipe = Pipeline([('pca', PCA(4)), ('ae', KerasAutoEncoder(kind='feedforward_model'))])
>>> pipe_definition = into_definition(pipe)  # It is now a standard python dict of primitives.
>>> print(yaml.dump(pipe_definition))
sklearn.pipeline.Pipeline:
  memory: null
  steps:
  - sklearn.decomposition._pca.PCA:
      copy: true
      iterated_power: auto
      n_components: 4
      random_state: null
      svd_solver: auto
      tol: 0.0
      whiten: false
  - gordo.machine.model.models.KerasAutoEncoder:
      kind: feedforward_model
  verbose: false

gordo.serializer.into_definition.load_definition_from_params(params: dict) → dict[source]¶

Recursively decomposing each of values from params into the definition

Parameters: params (dict) –
Returns
Return type: dict