Models

Models are a collection of Scikit-Learn like models, built specifically to fulfill a need. One example of which is the KerasAutoEncoder.

Other scikit-learn compliant models can be used within the config files without any additional configuration.

Base Model

The base model is designed to be inherited from any other models which need to be implemented within Gordo due to special model requirements. ie. PyTorch, Keras, etc.

class gordo.machine.model.base.GordoBase(**kwargs)[source]

Bases: abc.ABC

Initialize the model

abstract get_metadata()[source]

Get model specific metadata, if any

abstract get_params(deep=False)[source]

Return a dict containing all parameters used to initialized object

abstract score(X: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame], sample_weight: Optional[numpy.ndarray] = None)[source]

Score the model; must implement the correct default scorer based on model type

Custom Gordo models

This group of models are already implemented and ready to be used within config files, by simply specifying their full path. For example: gordo.machine.model.models.KerasAutoEncoder

class gordo.machine.model.models.KerasAutoEncoder(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]

Bases: gordo.machine.model.models.KerasBaseEstimator, sklearn.base.TransformerMixin

Subclass of the KerasBaseEstimator to allow fitting to just X without requiring y.

Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.

Parameters
  • kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs

  • kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method

score(X: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame], sample_weight: Optional[numpy.ndarray] = None) → float[source]

Returns the explained variance score between auto encoder’s input vs output

Parameters
  • X (Union[np.ndarray, pd.DataFrame]) – Input data to the model

  • y (Union[np.ndarray, pd.DataFrame]) – Target

  • sample_weight (Optional[np.ndarray]) – sample weights

Returns

score – Returns the explained variance score

Return type

float

class gordo.machine.model.models.KerasBaseEstimator(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]

Bases: tensorflow.keras.wrappers.scikit_learn.KerasRegressor, gordo.machine.model.base.GordoBase, sklearn.base.BaseEstimator

Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.

Parameters
  • kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs

  • kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method

classmethod extract_supported_fit_args(kwargs)[source]

Filtering only fit related kwargs

Parameters

kwargs (dict) –

fit(X: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], y: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], **kwargs)[source]

Fit the model to X given y.

Parameters
  • X (Union[np.ndarray, pd.DataFrame, xr.Dataset]) – numpy array or pandas dataframe

  • y (Union[np.ndarray, pd.DataFrame, xr.Dataset]) – numpy array or pandas dataframe

  • sample_weight (np.ndarray) – array like - weight to assign to samples

  • kwargs – Any additional kwargs to supply to keras fit method.

Returns

‘KerasAutoEncoder’

Return type

self

classmethod from_definition(definition: dict)[source]

Handler for gordo.serializer.from_definition

Parameters

definition (dict) –

get_metadata()[source]

Get metadata for the KerasBaseEstimator. Includes a dictionary with key “history”. The key’s value is a a dictionary with a key “params” pointing another dictionary with various parameters. The metrics are defined in the params dictionary under “metrics”. For each of the metrics there is a key who’s value is a list of values for this metric per epoch.

Returns

Metadata dictionary, including a history object if present

Return type

Dict

static get_n_features(X: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray]) → Union[int, tuple][source]
static get_n_features_out(y: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray]) → Union[int, tuple][source]
get_params(**params)[source]

Gets the parameters for this estimator

Parameters

params – ignored (exists for API compatibility).

Returns

Parameters used in this estimator

Return type

Dict[str, Any]

into_definition() → dict[source]

Handler for gordo.serializer.into_definition

Returns

Return type

dict

load_kind(kind)[source]
static parse_module_path(module_path) → Tuple[Optional[str], str][source]
predict(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]
Parameters
  • X (np.ndarray) – Input data

  • kwargs (dict) – kwargs which are passed to Kera’s predict method

Returns

np.ndarray

Return type

results

property sk_params

Parameters used for scikit learn kwargs

supported_fit_args = ['batch_size', 'epochs', 'verbose', 'callbacks', 'validation_split', 'shuffle', 'class_weight', 'initial_epoch', 'steps_per_epoch', 'validation_batch_size', 'max_queue_size', 'workers', 'use_multiprocessing']
class gordo.machine.model.models.KerasLSTMAutoEncoder(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]

Bases: gordo.machine.model.models.KerasLSTMBaseEstimator

Parameters
  • kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.

  • lookback_window (int) – Number of timestamps (lags) used to train the model.

  • batch_size (int) – Number of training examples used in one epoch.

  • epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.

  • verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.

  • kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.

property lookahead

Steps ahead in y the model should target

class gordo.machine.model.models.KerasLSTMBaseEstimator(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]

Bases: gordo.machine.model.models.KerasBaseEstimator, sklearn.base.TransformerMixin

Abstract Base Class to allow to train a many-one LSTM autoencoder and an LSTM 1 step forecast

Parameters
  • kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.

  • lookback_window (int) – Number of timestamps (lags) used to train the model.

  • batch_size (int) – Number of training examples used in one epoch.

  • epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.

  • verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.

  • kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.

fit(X: numpy.ndarray, y: numpy.ndarray, **kwargs) → gordo.machine.model.models.KerasLSTMForecast[source]

This fits a one step forecast LSTM architecture.

Parameters
  • X (np.ndarray) – 2D numpy array of dimension n_samples x n_features. Input data to train.

  • y (np.ndarray) – 2D numpy array representing the target

  • kwargs (dict) – Any additional args to be passed to Keras fit_generator method.

Returns

KerasLSTMForecast

Return type

class

get_metadata()[source]

Add number of forecast steps to metadata

Returns

metadata – Metadata dictionary, including forecast steps.

Return type

dict

abstract property lookahead

Steps ahead in y the model should target

predict(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]
Parameters

X (np.ndarray) – Data to predict/transform. 2D numpy array of dimension n_samples x n_features where n_samples must be > lookback_window.

Returns

results – 2D numpy array of dimension (n_samples - lookback_window) x 2*n_features. The first half of the array (results[:, :n_features]) corresponds to X offset by lookback_window+1 (i.e., X[lookback_window:,:]) whereas the second half corresponds to the predicted values of X[lookback_window:,:].

Return type

np.ndarray

Example

>>> import numpy as np
>>> from gordo.machine.model.factories.lstm_autoencoder import lstm_model
>>> from gordo.machine.model.models import KerasLSTMForecast
>>> #Define train/test data
>>> X_train = np.array([[1, 1], [2, 3], [0.5, 0.6], [0.3, 1], [0.6, 0.7]])
>>> X_test = np.array([[2, 3], [1, 1], [0.1, 1], [0.5, 2]])
>>> #Initiate model, fit and transform
>>> lstm_ae = KerasLSTMForecast(kind="lstm_model",
...                             lookback_window=2,
...                             verbose=0)
>>> model_fit = lstm_ae.fit(X_train, y=X_train.copy())
>>> model_transform = lstm_ae.predict(X_test)
>>> model_transform.shape
(2, 2)
score(X: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame], sample_weight: Optional[numpy.ndarray] = None) → float[source]

Returns the explained variance score between 1 step forecasted input and true input at next time step (note: for LSTM X is offset by lookback_window).

Parameters
  • X (Union[np.ndarray, pd.DataFrame]) – Input data to the model.

  • y (Union[np.ndarray, pd.DataFrame]) – Target

  • sample_weight (Optional[np.ndarray]) – Sample weights

Returns

score – Returns the explained variance score.

Return type

float

class gordo.machine.model.models.KerasLSTMForecast(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]

Bases: gordo.machine.model.models.KerasLSTMBaseEstimator

Parameters
  • kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.

  • lookback_window (int) – Number of timestamps (lags) used to train the model.

  • batch_size (int) – Number of training examples used in one epoch.

  • epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.

  • verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.

  • kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.

property lookahead

Steps ahead in y the model should target

class gordo.machine.model.models.KerasRawModelRegressor(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]

Bases: gordo.machine.model.models.KerasAutoEncoder

Create a scikit-learn like model with an underlying tensorflow.keras model from a raw config. .. rubric:: Examples

>>> import yaml
>>> import numpy as np
>>> config_str = '''
...   # Arguments to the .compile() method
...   compile:
...     loss: mse
...     optimizer: adam
...
...   # The architecture of the model itself.
...   spec:
...     tensorflow.keras.models.Sequential:
...       layers:
...         - tensorflow.keras.layers.Dense:
...             units: 4
...         - tensorflow.keras.layers.Dense:
...             units: 1
... '''
>>> config = yaml.safe_load(config_str)
>>> model = KerasRawModelRegressor(kind=config)
>>>
>>> X, y = np.random.random((10, 4)), np.random.random((10, 1))
>>> model.fit(X, y, verbose=0)
KerasRawModelRegressor(kind: {'compile': {'loss': 'mse', 'optimizer': 'adam'},
 'spec': {'tensorflow.keras.models.Sequential': {'layers': [{'tensorflow.keras.layers.Dense': {'units': 4}},
                                                            {'tensorflow.keras.layers.Dense': {'units': 1}}]}}})
>>> out = model.predict(X)

Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.

Parameters
  • kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs

  • kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method

load_kind(kind)[source]
gordo.machine.model.models.create_keras_timeseriesgenerator(X: numpy.ndarray, y: Optional[numpy.ndarray], batch_size: int, lookback_window: int, lookahead: int) → tensorflow.keras.preprocessing.sequence.TimeseriesGenerator[source]

Provides a keras.preprocessing.sequence.TimeseriesGenerator for use with LSTM’s, but with the added ability to specify the lookahead of the target in y.

If lookahead==0 then the generated samples in X will have as their last element the same as the corresponding Y. If lookahead is 1 then the values in Y is shifted so it is one step in the future compared to the last value in the samples in X, and similar for larger values.

Parameters
  • X (np.ndarray) – 2d array of values, each row being one sample.

  • y (Optional[np.ndarray]) – array representing the target.

  • batch_size (int) – How big should the generated batches be?

  • lookback_window (int) – How far back should each sample see. 1 means that it contains a single measurement

  • lookahead (int) – How much is Y shifted relative to X

Returns

3d matrix with a list of batchX-batchY pairs, where batchX is a batch of X-values, and correspondingly for batchY. A batch consist of batch_size nr of pairs of samples (or y-values), and each sample is a list of length lookback_window.

Return type

TimeseriesGenerator

Examples

>>> import numpy as np
>>> X, y = np.random.rand(100,2), np.random.rand(100, 2)
>>> gen = create_keras_timeseriesgenerator(X, y,
...                                        batch_size=10,
...                                        lookback_window=20,
...                                        lookahead=0)
>>> len(gen) # 9 = (100-20+1)/10
9
>>> len(gen[0]) # batchX and batchY
2
>>> len(gen[0][0]) # batch_size=10
10
>>> len(gen[0][0][0]) # a single sample, lookback_window = 20,
20
>>> len(gen[0][0][0][0]) # n_features = 2
2

Utils

Shared utility functions used by models and other components interacting with the model’s.

gordo.machine.model.utils.make_base_dataframe(tags: Union[List[gordo_dataset.sensor_tag.SensorTag], List[str]], model_input: numpy.ndarray, model_output: numpy.ndarray, target_tag_list: Union[List[gordo_dataset.sensor_tag.SensorTag], List[str], None] = None, index: Optional[numpy.ndarray] = None, frequency: Optional[datetime.timedelta] = None) → pandas.core.frame.DataFrame[source]

Construct a dataframe which has a MultiIndex column consisting of top level keys ‘model-input’ and ‘model-output’. Takes care of aligning model output if different than model input lengths, as setting column names based on passed tags and target_tag_list.

Parameters
  • tags (List[Union[str, SensorTag]]) – Tags which will be assigned to model-input and/or model-output if the shapes match.

  • model_input (np.ndarray) – Original input given to the model

  • model_output (np.ndarray) – Raw model output

  • target_tag_list (Optional[Union[List[SensorTag], List[str]]]) – Tags to be assigned to model-output if not assinged but model output matches model input, tags will be used.

  • index (Optional[np.ndarray]) – The index which should be assinged to the resulting dataframe, will be clipped to the length of model_output, should the model output less than its input.

  • frequency (Optional[datetime.timedelta]) – The spacing of the time between points.

Returns

Return type

pd.DataFrame

gordo.machine.model.utils.metric_wrapper(metric, scaler: Optional[sklearn.base.TransformerMixin] = None)[source]

Ensures that a given metric works properly when the model itself returns a y which is shorter than the target y, and allows scaling the data before applying the metrics.

Parameters
  • metric – Metric which must accept y_true and y_pred of the same length

  • scaler (Optional[TransformerMixin]) – Transformer which will be applied on y and y_pred before the metrics is calculated. Must have method transform, so for most scalers it must already be fitted on y.