Models¶

Models are a collection of Scikit-Learn like models, built specifically to fulfill a need. One example of which is the KerasAutoEncoder.

Other scikit-learn compliant models can be used within the config files without any additional configuration.

Base Model¶

The base model is designed to be inherited from any other models which need to be implemented within Gordo due to special model requirements. ie. PyTorch, Keras, etc.

class gordo.machine.model.base.GordoBase(**kwargs)[source]¶

Bases: abc.ABC

Initialize the model

abstract get_metadata()[source]¶: Get model specific metadata, if any

abstract get_params(deep=False)[source]¶: Return a dict containing all parameters used to initialized object

abstract score(X: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame], sample_weight: Optional[numpy.ndarray] = None)[source]¶: Score the model; must implement the correct default scorer based on model type

Custom Gordo models¶

This group of models are already implemented and ready to be used within config files, by simply specifying their full path. For example: gordo.machine.model.models.KerasAutoEncoder

class gordo.machine.model.models.KerasAutoEncoder(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]¶

Bases: gordo.machine.model.models.KerasBaseEstimator, sklearn.base.TransformerMixin

Subclass of the KerasBaseEstimator to allow fitting to just X without requiring y.

Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.

Parameters

kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs
kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method

score(X: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame], sample_weight: Optional[numpy.ndarray] = None) → float[source]¶

Returns the explained variance score between auto encoder’s input vs output

Parameters

X (Union[np.ndarray, pd.DataFrame]) – Input data to the model
y (Union[np.ndarray, pd.DataFrame]) – Target
sample_weight (Optional[np.ndarray]) – sample weights

Returns

score – Returns the explained variance score

Return type

float

class gordo.machine.model.models.KerasBaseEstimator(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]¶

Bases: tensorflow.keras.wrappers.scikit_learn.KerasRegressor, gordo.machine.model.base.GordoBase, sklearn.base.BaseEstimator

Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.

Parameters

kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs
kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method

classmethod extract_supported_fit_args(kwargs)[source]¶

Filtering only fit related kwargs

Parameters: kwargs (dict) –

fit(X: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], y: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], **kwargs)[source]¶

Fit the model to X given y.

Parameters

X (Union[np.ndarray, pd.DataFrame, xr.Dataset]) – numpy array or pandas dataframe
y (Union[np.ndarray, pd.DataFrame, xr.Dataset]) – numpy array or pandas dataframe
sample_weight (np.ndarray) – array like - weight to assign to samples
kwargs – Any additional kwargs to supply to keras fit method.

Returns

‘KerasAutoEncoder’

Return type

self

classmethod from_definition(definition: dict)[source]¶

Handler for gordo.serializer.from_definition

Parameters: definition (dict) –

get_metadata()[source]¶

Get metadata for the KerasBaseEstimator. Includes a dictionary with key “history”. The key’s value is a a dictionary with a key “params” pointing another dictionary with various parameters. The metrics are defined in the params dictionary under “metrics”. For each of the metrics there is a key who’s value is a list of values for this metric per epoch.

Returns: Metadata dictionary, including a history object if present
Return type: Dict

static get_n_features(X: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray]) → Union[int, tuple][source]¶

static get_n_features_out(y: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray]) → Union[int, tuple][source]¶

get_params(**params)[source]¶

Gets the parameters for this estimator

Parameters: params – ignored (exists for API compatibility).
Returns: Parameters used in this estimator
Return type: Dict[str, Any]

into_definition() → dict[source]¶

Handler for gordo.serializer.into_definition

Returns
Return type: dict

load_kind(kind)[source]¶

static parse_module_path(module_path) → Tuple[Optional[str], str][source]¶

predict(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]¶

Parameters

X (np.ndarray) – Input data
kwargs (dict) – kwargs which are passed to Kera’s predict method

Returns

np.ndarray

Return type

results

property sk_params¶: Parameters used for scikit learn kwargs

supported_fit_args = ['batch_size', 'epochs', 'verbose', 'callbacks', 'validation_split', 'shuffle', 'class_weight', 'initial_epoch', 'steps_per_epoch', 'validation_batch_size', 'max_queue_size', 'workers', 'use_multiprocessing']¶

class gordo.machine.model.models.KerasLSTMAutoEncoder(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]¶

Bases: gordo.machine.model.models.KerasLSTMBaseEstimator

Parameters

kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.
lookback_window (int) – Number of timestamps (lags) used to train the model.
batch_size (int) – Number of training examples used in one epoch.
epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.
verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.
kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.

property lookahead¶: Steps ahead in y the model should target

class gordo.machine.model.models.KerasLSTMBaseEstimator(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]¶

Bases: gordo.machine.model.models.KerasBaseEstimator, sklearn.base.TransformerMixin

Abstract Base Class to allow to train a many-one LSTM autoencoder and an LSTM 1 step forecast

Parameters

kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.
lookback_window (int) – Number of timestamps (lags) used to train the model.
batch_size (int) – Number of training examples used in one epoch.
epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.
verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.
kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.

fit(X: numpy.ndarray, y: numpy.ndarray, **kwargs) → gordo.machine.model.models.KerasLSTMForecast[source]¶

This fits a one step forecast LSTM architecture.

Parameters

X (np.ndarray) – 2D numpy array of dimension n_samples x n_features. Input data to train.
y (np.ndarray) – 2D numpy array representing the target
kwargs (dict) – Any additional args to be passed to Keras fit_generator method.

Returns

KerasLSTMForecast

Return type

class

get_metadata()[source]¶

Add number of forecast steps to metadata

Returns: metadata – Metadata dictionary, including forecast steps.
Return type: dict

abstract property lookahead¶: Steps ahead in y the model should target

predict(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]¶

Parameters: X (np.ndarray) – Data to predict/transform. 2D numpy array of dimension n_samples x n_features where n_samples must be > lookback_window.
Returns: results – 2D numpy array of dimension (n_samples - lookback_window) x 2*n_features. The first half of the array (results[:, :n_features]) corresponds to X offset by lookback_window+1 (i.e., X[lookback_window:,:]) whereas the second half corresponds to the predicted values of X[lookback_window:,:].
Return type: np.ndarray

Example

>>> import numpy as np
>>> from gordo.machine.model.factories.lstm_autoencoder import lstm_model
>>> from gordo.machine.model.models import KerasLSTMForecast
>>> #Define train/test data
>>> X_train = np.array([[1, 1], [2, 3], [0.5, 0.6], [0.3, 1], [0.6, 0.7]])
>>> X_test = np.array([[2, 3], [1, 1], [0.1, 1], [0.5, 2]])
>>> #Initiate model, fit and transform
>>> lstm_ae = KerasLSTMForecast(kind="lstm_model",
...                             lookback_window=2,
...                             verbose=0)
>>> model_fit = lstm_ae.fit(X_train, y=X_train.copy())
>>> model_transform = lstm_ae.predict(X_test)
>>> model_transform.shape
(2, 2)

score(X: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame], sample_weight: Optional[numpy.ndarray] = None) → float[source]¶

Returns the explained variance score between 1 step forecasted input and true input at next time step (note: for LSTM X is offset by lookback_window).

Parameters

X (Union[np.ndarray, pd.DataFrame]) – Input data to the model.
y (Union[np.ndarray, pd.DataFrame]) – Target
sample_weight (Optional[np.ndarray]) – Sample weights

Returns

score – Returns the explained variance score.

Return type

float

class gordo.machine.model.models.KerasLSTMForecast(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]¶

Bases: gordo.machine.model.models.KerasLSTMBaseEstimator

Parameters

kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.
lookback_window (int) – Number of timestamps (lags) used to train the model.
batch_size (int) – Number of training examples used in one epoch.
epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.
verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.
kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.

property lookahead¶: Steps ahead in y the model should target

class gordo.machine.model.models.KerasRawModelRegressor(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]¶

Bases: gordo.machine.model.models.KerasAutoEncoder

Create a scikit-learn like model with an underlying tensorflow.keras model from a raw config. .. rubric:: Examples

>>> import yaml
>>> import numpy as np
>>> config_str = '''
...   # Arguments to the .compile() method
...   compile:
...     loss: mse
...     optimizer: adam
...
...   # The architecture of the model itself.
...   spec:
...     tensorflow.keras.models.Sequential:
...       layers:
...         - tensorflow.keras.layers.Dense:
...             units: 4
...         - tensorflow.keras.layers.Dense:
...             units: 1
... '''
>>> config = yaml.safe_load(config_str)
>>> model = KerasRawModelRegressor(kind=config)
>>>
>>> X, y = np.random.random((10, 4)), np.random.random((10, 1))
>>> model.fit(X, y, verbose=0)
KerasRawModelRegressor(kind: {'compile': {'loss': 'mse', 'optimizer': 'adam'},
 'spec': {'tensorflow.keras.models.Sequential': {'layers': [{'tensorflow.keras.layers.Dense': {'units': 4}},
                                                            {'tensorflow.keras.layers.Dense': {'units': 1}}]}}})
>>> out = model.predict(X)

Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.

Parameters

kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs
kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method

load_kind(kind)[source]¶

gordo.machine.model.models.create_keras_timeseriesgenerator(X: numpy.ndarray, y: Optional[numpy.ndarray], batch_size: int, lookback_window: int, lookahead: int) → tensorflow.keras.preprocessing.sequence.TimeseriesGenerator[source]¶

Provides a keras.preprocessing.sequence.TimeseriesGenerator for use with LSTM’s, but with the added ability to specify the lookahead of the target in y.

If lookahead==0 then the generated samples in X will have as their last element the same as the corresponding Y. If lookahead is 1 then the values in Y is shifted so it is one step in the future compared to the last value in the samples in X, and similar for larger values.

Parameters

X (np.ndarray) – 2d array of values, each row being one sample.
y (Optional[np.ndarray]) – array representing the target.
batch_size (int) – How big should the generated batches be?
lookback_window (int) – How far back should each sample see. 1 means that it contains a single measurement
lookahead (int) – How much is Y shifted relative to X

Returns

3d matrix with a list of batchX-batchY pairs, where batchX is a batch of X-values, and correspondingly for batchY. A batch consist of batch_size nr of pairs of samples (or y-values), and each sample is a list of length lookback_window.

Return type

TimeseriesGenerator

Examples

>>> import numpy as np
>>> X, y = np.random.rand(100,2), np.random.rand(100, 2)
>>> gen = create_keras_timeseriesgenerator(X, y,
...                                        batch_size=10,
...                                        lookback_window=20,
...                                        lookahead=0)
>>> len(gen) # 9 = (100-20+1)/10
9
>>> len(gen[0]) # batchX and batchY
2
>>> len(gen[0][0]) # batch_size=10
10
>>> len(gen[0][0][0]) # a single sample, lookback_window = 20,
20
>>> len(gen[0][0][0][0]) # n_features = 2
2

Model Extensions:

Utils¶

Shared utility functions used by models and other components interacting with the model’s.

gordo.machine.model.utils.make_base_dataframe(tags: Union[List[gordo_dataset.sensor_tag.SensorTag], List[str]], model_input: numpy.ndarray, model_output: numpy.ndarray, target_tag_list: Union[List[gordo_dataset.sensor_tag.SensorTag], List[str], None] = None, index: Optional[numpy.ndarray] = None, frequency: Optional[datetime.timedelta] = None) → pandas.core.frame.DataFrame[source]¶

Construct a dataframe which has a MultiIndex column consisting of top level keys ‘model-input’ and ‘model-output’. Takes care of aligning model output if different than model input lengths, as setting column names based on passed tags and target_tag_list.

Parameters

tags (List[Union[str, SensorTag]]) – Tags which will be assigned to model-input and/or model-output if the shapes match.
model_input (np.ndarray) – Original input given to the model
model_output (np.ndarray) – Raw model output
target_tag_list (Optional[Union[List[SensorTag], List[str]]]) – Tags to be assigned to model-output if not assinged but model output matches model input, tags will be used.
index (Optional[np.ndarray]) – The index which should be assinged to the resulting dataframe, will be clipped to the length of model_output, should the model output less than its input.
frequency (Optional[datetime.timedelta]) – The spacing of the time between points.

Returns

Return type

pd.DataFrame

gordo.machine.model.utils.metric_wrapper(metric, scaler: Optional[sklearn.base.TransformerMixin] = None)[source]¶

Ensures that a given metric works properly when the model itself returns a y which is shorter than the target y, and allows scaling the data before applying the metrics.

Parameters

metric – Metric which must accept y_true and y_pred of the same length
scaler (Optional[TransformerMixin]) – Transformer which will be applied on y and y_pred before the metrics is calculated. Must have method transform, so for most scalers it must already be fitted on y.