Serializer¶
The serializer is the core component used in the conversion of a Gordo config file into Python objects which interact in order to construct a full ML model capable of being served on Kubernetes.
Things like the dataset
and model
keys within the YAML config represents
objects which will be (de)serialized by the serializer to complete this goal.
-
gordo.serializer.serializer.
dump
(obj: object, dest_dir: Union[os.PathLike, str], metadata: dict = None)[source]¶ Serialize an object into a directory, the object must be pickle-able.
- Parameters
obj – The object to dump. Must be pickle-able.
dest_dir (Union[os.PathLike, str]) – The directory to which to save the model metadata: dict - any additional metadata to be saved alongside this model if it exists, will be returned from the corresponding “load” function
metadata (Optional dict of metadata which will be serialized to a file together) – with the model, and loaded again by
load_metadata()
.
- Returns
- Return type
None
Example
>>> from sklearn.pipeline import Pipeline >>> from sklearn.decomposition import PCA >>> from gordo.machine.model.models import KerasAutoEncoder >>> from gordo import serializer >>> from tempfile import TemporaryDirectory >>> pipe = Pipeline([ ... ('pca', PCA(3)), ... ('model', KerasAutoEncoder(kind='feedforward_hourglass'))]) >>> with TemporaryDirectory() as tmp: ... serializer.dump(obj=pipe, dest_dir=tmp) ... pipe_clone = serializer.load(source_dir=tmp)
-
gordo.serializer.serializer.
dumps
(model: Union[sklearn.pipeline.Pipeline, gordo.machine.model.base.GordoBase]) → bytes[source]¶ Dump a model into a bytes representation suitable for loading from
gordo.serializer.loads
- Parameters
model (Union[Pipeline, GordoBase]) – A gordo model/pipeline
- Returns
Serialized model which supports loading via
serializer.loads()
- Return type
bytes
Example
>>> from gordo.machine.model.models import KerasAutoEncoder >>> from gordo import serializer >>> >>> model = KerasAutoEncoder('feedforward_symmetric') >>> serialized = serializer.dumps(model) >>> assert isinstance(serialized, bytes) >>> >>> model_clone = serializer.loads(serialized) >>> assert isinstance(model_clone, KerasAutoEncoder)
-
gordo.serializer.serializer.
load
(source_dir: Union[os.PathLike, str]) → Any[source]¶ Load an object from a directory, saved by
gordo.serializer.pipeline_serializer.dump
This take a directory, which is either top-level, meaning it contains a sub directory in the naming scheme: “n_step=<int>-class=<path.to.Class>” or the aforementioned naming scheme directory directly. Will return that unsterilized object.
- Parameters
source_dir (Union[os.PathLike, str]) – Location of the top level dir the pipeline was saved
- Returns
- Return type
Union[GordoBase, Pipeline, BaseEstimator]
-
gordo.serializer.serializer.
load_metadata
(source_dir: Union[os.PathLike, str]) → dict[source]¶ Load the given metadata.json which was saved during the
serializer.dump
will return the loaded metadata as a dict, or empty dict if no file was found- Parameters
source_dir (Union[os.PathLike, str]) – Directory of the saved model, As with serializer.load(source_dir) this source_dir can be the top level, or the first dir into the serialized model.
- Returns
- Return type
dict
- Raises
FileNotFoundError – If a ‘metadata.json’ file isn’t found in or above the supplied
source_dir
-
gordo.serializer.serializer.
loads
(bytes_object: bytes) → gordo.machine.model.base.GordoBase[source]¶ Load a GordoBase model from bytes dumped from
gordo.serializer.dumps
- Parameters
bytes_object (bytes) – Bytes to be loaded, should be the result of serializer.dumps(model)
- Returns
Custom gordo model, scikit learn pipeline or other scikit learn like object.
- Return type
Union[GordoBase, Pipeline, BaseEstimator]
From Definition¶
The ability to take a ‘raw’ representation of an object in dict
form
and load it into a Python object.
-
gordo.serializer.from_definition.
from_definition
(pipe_definition: Union[str, Dict[str, Dict[str, Any]]]) → Union[sklearn.pipeline.FeatureUnion, sklearn.pipeline.Pipeline][source]¶ Construct a Pipeline or FeatureUnion from a definition.
Example
>>> import yaml >>> from gordo import serializer >>> raw_config = ''' ... sklearn.pipeline.Pipeline: ... steps: ... - sklearn.decomposition.PCA: ... n_components: 3 ... - sklearn.pipeline.FeatureUnion: ... - sklearn.decomposition.PCA: ... n_components: 3 ... - sklearn.pipeline.Pipeline: ... - sklearn.preprocessing.MinMaxScaler ... - sklearn.decomposition.TruncatedSVD: ... n_components: 2 ... - sklearn.ensemble.RandomForestClassifier: ... max_depth: 3 ... ''' >>> config = yaml.safe_load(raw_config) >>> scikit_learn_pipeline = serializer.from_definition(config)
- Parameters
pipe_definition – List of steps for the Pipeline / FeatureUnion
constructor_class – What to place the list of transformers into, either sklearn.pipeline.Pipeline/FeatureUnion
- Returns
pipeline
- Return type
sklearn.pipeline.Pipeline
Into Definitiion¶
The ability to take a Python object, such as a scikit-learn
pipeline and convert it into a primitive dict
, which can then be inserted
into a YAML config file.
-
gordo.serializer.into_definition.
into_definition
(pipeline: sklearn.pipeline.Pipeline, prune_default_params: bool = False) → dict[source]¶ Convert an instance of
sklearn.pipeline.Pipeline
into a dict definition capable of being reconstructed withgordo.serializer.from_definition
- Parameters
pipeline (sklearn.pipeline.Pipeline) – Instance of pipeline to decompose
prune_default_params (bool) – Whether to prune the default parameters found in current instance of the transformers vs what their default params are.
- Returns
definitions for the pipeline, compatible to be reconstructed with
gordo.serializer.from_definition()
- Return type
dict
Example
>>> import yaml >>> from sklearn.pipeline import Pipeline >>> from sklearn.decomposition import PCA >>> from gordo.machine.model.models import KerasAutoEncoder >>> >>> pipe = Pipeline([('pca', PCA(4)), ('ae', KerasAutoEncoder(kind='feedforward_model'))]) >>> pipe_definition = into_definition(pipe) # It is now a standard python dict of primitives. >>> print(yaml.dump(pipe_definition)) sklearn.pipeline.Pipeline: memory: null steps: - sklearn.decomposition._pca.PCA: copy: true iterated_power: auto n_components: 4 random_state: null svd_solver: auto tol: 0.0 whiten: false - gordo.machine.model.models.KerasAutoEncoder: kind: feedforward_model verbose: false