Models¶
Models are a collection of Scikit-Learn
like models, built specifically to fulfill a need. One example of which is
the KerasAutoEncoder
.
Other scikit-learn compliant models can be used within the config files without any additional configuration.
Base Model¶
The base model is designed to be inherited from any other models which need to be implemented within Gordo due to special model requirements. ie. PyTorch, Keras, etc.
Custom Gordo models¶
This group of models are already implemented and ready to be used within
config files, by simply specifying their full path. For example:
gordo.machine.model.models.KerasAutoEncoder
-
class
gordo.machine.model.models.
KerasAutoEncoder
(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]¶ Bases:
gordo.machine.model.models.KerasBaseEstimator
,sklearn.base.TransformerMixin
Subclass of the KerasBaseEstimator to allow fitting to just X without requiring y.
Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.
- Parameters
kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs
kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method
-
score
(X: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame], sample_weight: Optional[numpy.ndarray] = None) → float[source]¶ Returns the explained variance score between auto encoder’s input vs output
- Parameters
X (Union[np.ndarray, pd.DataFrame]) – Input data to the model
y (Union[np.ndarray, pd.DataFrame]) – Target
sample_weight (Optional[np.ndarray]) – sample weights
- Returns
score – Returns the explained variance score
- Return type
float
-
class
gordo.machine.model.models.
KerasBaseEstimator
(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]¶ Bases:
tensorflow.keras.wrappers.scikit_learn.KerasRegressor
,gordo.machine.model.base.GordoBase
,sklearn.base.BaseEstimator
Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.
- Parameters
kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs
kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method
-
classmethod
extract_supported_fit_args
(kwargs)[source]¶ Filtering only
fit
related kwargs- Parameters
kwargs (dict) –
-
fit
(X: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], y: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], **kwargs)[source]¶ Fit the model to X given y.
- Parameters
X (Union[np.ndarray, pd.DataFrame, xr.Dataset]) – numpy array or pandas dataframe
y (Union[np.ndarray, pd.DataFrame, xr.Dataset]) – numpy array or pandas dataframe
sample_weight (np.ndarray) – array like - weight to assign to samples
kwargs – Any additional kwargs to supply to keras fit method.
- Returns
‘KerasAutoEncoder’
- Return type
self
-
classmethod
from_definition
(definition: dict)[source]¶ Handler for
gordo.serializer.from_definition
- Parameters
definition (dict) –
-
get_metadata
()[source]¶ Get metadata for the KerasBaseEstimator. Includes a dictionary with key “history”. The key’s value is a a dictionary with a key “params” pointing another dictionary with various parameters. The metrics are defined in the params dictionary under “metrics”. For each of the metrics there is a key who’s value is a list of values for this metric per epoch.
- Returns
Metadata dictionary, including a history object if present
- Return type
Dict
-
static
get_n_features
(X: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray]) → Union[int, tuple][source]¶
-
static
get_n_features_out
(y: Union[numpy.ndarray, pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray]) → Union[int, tuple][source]¶
-
get_params
(**params)[source]¶ Gets the parameters for this estimator
- Parameters
params – ignored (exists for API compatibility).
- Returns
Parameters used in this estimator
- Return type
Dict[str, Any]
-
into_definition
() → dict[source]¶ Handler for
gordo.serializer.into_definition
- Returns
- Return type
dict
-
predict
(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]¶ - Parameters
X (np.ndarray) – Input data
kwargs (dict) – kwargs which are passed to Kera’s
predict
method
- Returns
np.ndarray
- Return type
results
-
property
sk_params
¶ Parameters used for scikit learn kwargs
-
supported_fit_args
= ['batch_size', 'epochs', 'verbose', 'callbacks', 'validation_split', 'shuffle', 'class_weight', 'initial_epoch', 'steps_per_epoch', 'validation_batch_size', 'max_queue_size', 'workers', 'use_multiprocessing']¶
-
class
gordo.machine.model.models.
KerasLSTMAutoEncoder
(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]¶ Bases:
gordo.machine.model.models.KerasLSTMBaseEstimator
- Parameters
kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.
lookback_window (int) – Number of timestamps (lags) used to train the model.
batch_size (int) – Number of training examples used in one epoch.
epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.
verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.
kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.
-
property
lookahead
¶ Steps ahead in y the model should target
-
class
gordo.machine.model.models.
KerasLSTMBaseEstimator
(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]¶ Bases:
gordo.machine.model.models.KerasBaseEstimator
,sklearn.base.TransformerMixin
Abstract Base Class to allow to train a many-one LSTM autoencoder and an LSTM 1 step forecast
- Parameters
kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.
lookback_window (int) – Number of timestamps (lags) used to train the model.
batch_size (int) – Number of training examples used in one epoch.
epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.
verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.
kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.
-
fit
(X: numpy.ndarray, y: numpy.ndarray, **kwargs) → gordo.machine.model.models.KerasLSTMForecast[source]¶ This fits a one step forecast LSTM architecture.
- Parameters
X (np.ndarray) – 2D numpy array of dimension n_samples x n_features. Input data to train.
y (np.ndarray) – 2D numpy array representing the target
kwargs (dict) – Any additional args to be passed to Keras fit_generator method.
- Returns
KerasLSTMForecast
- Return type
class
-
get_metadata
()[source]¶ Add number of forecast steps to metadata
- Returns
metadata – Metadata dictionary, including forecast steps.
- Return type
dict
-
abstract property
lookahead
¶ Steps ahead in y the model should target
-
predict
(X: numpy.ndarray, **kwargs) → numpy.ndarray[source]¶ - Parameters
X (np.ndarray) – Data to predict/transform. 2D numpy array of dimension n_samples x n_features where n_samples must be > lookback_window.
- Returns
results – 2D numpy array of dimension (n_samples - lookback_window) x 2*n_features. The first half of the array (results[:, :n_features]) corresponds to X offset by lookback_window+1 (i.e., X[lookback_window:,:]) whereas the second half corresponds to the predicted values of X[lookback_window:,:].
- Return type
np.ndarray
Example
>>> import numpy as np >>> from gordo.machine.model.factories.lstm_autoencoder import lstm_model >>> from gordo.machine.model.models import KerasLSTMForecast >>> #Define train/test data >>> X_train = np.array([[1, 1], [2, 3], [0.5, 0.6], [0.3, 1], [0.6, 0.7]]) >>> X_test = np.array([[2, 3], [1, 1], [0.1, 1], [0.5, 2]]) >>> #Initiate model, fit and transform >>> lstm_ae = KerasLSTMForecast(kind="lstm_model", ... lookback_window=2, ... verbose=0) >>> model_fit = lstm_ae.fit(X_train, y=X_train.copy()) >>> model_transform = lstm_ae.predict(X_test) >>> model_transform.shape (2, 2)
-
score
(X: Union[numpy.ndarray, pandas.core.frame.DataFrame], y: Union[numpy.ndarray, pandas.core.frame.DataFrame], sample_weight: Optional[numpy.ndarray] = None) → float[source]¶ Returns the explained variance score between 1 step forecasted input and true input at next time step (note: for LSTM X is offset by lookback_window).
- Parameters
X (Union[np.ndarray, pd.DataFrame]) – Input data to the model.
y (Union[np.ndarray, pd.DataFrame]) – Target
sample_weight (Optional[np.ndarray]) – Sample weights
- Returns
score – Returns the explained variance score.
- Return type
float
-
class
gordo.machine.model.models.
KerasLSTMForecast
(kind: Union[Callable, str], lookback_window: int = 1, batch_size: int = 32, **kwargs)[source]¶ Bases:
gordo.machine.model.models.KerasLSTMBaseEstimator
- Parameters
kind (Union[Callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo.machine.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs.
lookback_window (int) – Number of timestamps (lags) used to train the model.
batch_size (int) – Number of training examples used in one epoch.
epochs (int) – Number of epochs to train the model. An epoch is an iteration over the entire data provided.
verbose (int) – Verbosity mode. Possible values are 0, 1, or 2 where 0 = silent, 1 = progress bar, 2 = one line per epoch.
kwargs (dict) – Any arguments which are passed to the factory building function and/or any additional args to be passed to the intermediate fit method.
-
property
lookahead
¶ Steps ahead in y the model should target
-
class
gordo.machine.model.models.
KerasRawModelRegressor
(kind: Union[str, Callable[[int, Dict[str, Any]], tensorflow.keras.models.Model]], **kwargs)[source]¶ Bases:
gordo.machine.model.models.KerasAutoEncoder
Create a scikit-learn like model with an underlying tensorflow.keras model from a raw config. .. rubric:: Examples
>>> import yaml >>> import numpy as np >>> config_str = ''' ... # Arguments to the .compile() method ... compile: ... loss: mse ... optimizer: adam ... ... # The architecture of the model itself. ... spec: ... tensorflow.keras.models.Sequential: ... layers: ... - tensorflow.keras.layers.Dense: ... units: 4 ... - tensorflow.keras.layers.Dense: ... units: 1 ... ''' >>> config = yaml.safe_load(config_str) >>> model = KerasRawModelRegressor(kind=config) >>> >>> X, y = np.random.random((10, 4)), np.random.random((10, 1)) >>> model.fit(X, y, verbose=0) KerasRawModelRegressor(kind: {'compile': {'loss': 'mse', 'optimizer': 'adam'}, 'spec': {'tensorflow.keras.models.Sequential': {'layers': [{'tensorflow.keras.layers.Dense': {'units': 4}}, {'tensorflow.keras.layers.Dense': {'units': 1}}]}}}) >>> out = model.predict(X)
Initialized a Scikit-Learn API compatitble Keras model with a pre-registered function or a builder function directly.
- Parameters
kind (Union[callable, str]) – The structure of the model to build. As designated by any registered builder functions, registered with gordo_compontents.model.register.register_model_builder. Alternatively, one may pass a builder function directly to this argument. Such a function should accept n_features as it’s first argument, and pass any additional parameters to **kwargs
kwargs (dict) – Any additional args which are passed to the factory building function and/or any additional args to be passed to Keras’ fit() method
-
gordo.machine.model.models.
create_keras_timeseriesgenerator
(X: numpy.ndarray, y: Optional[numpy.ndarray], batch_size: int, lookback_window: int, lookahead: int) → tensorflow.keras.preprocessing.sequence.TimeseriesGenerator[source]¶ Provides a keras.preprocessing.sequence.TimeseriesGenerator for use with LSTM’s, but with the added ability to specify the lookahead of the target in y.
If lookahead==0 then the generated samples in X will have as their last element the same as the corresponding Y. If lookahead is 1 then the values in Y is shifted so it is one step in the future compared to the last value in the samples in X, and similar for larger values.
- Parameters
X (np.ndarray) – 2d array of values, each row being one sample.
y (Optional[np.ndarray]) – array representing the target.
batch_size (int) – How big should the generated batches be?
lookback_window (int) – How far back should each sample see. 1 means that it contains a single measurement
lookahead (int) – How much is Y shifted relative to X
- Returns
3d matrix with a list of batchX-batchY pairs, where batchX is a batch of X-values, and correspondingly for batchY. A batch consist of batch_size nr of pairs of samples (or y-values), and each sample is a list of length lookback_window.
- Return type
TimeseriesGenerator
Examples
>>> import numpy as np >>> X, y = np.random.rand(100,2), np.random.rand(100, 2) >>> gen = create_keras_timeseriesgenerator(X, y, ... batch_size=10, ... lookback_window=20, ... lookahead=0) >>> len(gen) # 9 = (100-20+1)/10 9 >>> len(gen[0]) # batchX and batchY 2 >>> len(gen[0][0]) # batch_size=10 10 >>> len(gen[0][0][0]) # a single sample, lookback_window = 20, 20 >>> len(gen[0][0][0][0]) # n_features = 2 2
Utils¶
Shared utility functions used by models and other components interacting with the model’s.
-
gordo.machine.model.utils.
make_base_dataframe
(tags: Union[List[gordo_dataset.sensor_tag.SensorTag], List[str]], model_input: numpy.ndarray, model_output: numpy.ndarray, target_tag_list: Union[List[gordo_dataset.sensor_tag.SensorTag], List[str], None] = None, index: Optional[numpy.ndarray] = None, frequency: Optional[datetime.timedelta] = None) → pandas.core.frame.DataFrame[source]¶ Construct a dataframe which has a MultiIndex column consisting of top level keys ‘model-input’ and ‘model-output’. Takes care of aligning model output if different than model input lengths, as setting column names based on passed tags and target_tag_list.
- Parameters
tags (List[Union[str, SensorTag]]) – Tags which will be assigned to
model-input
and/ormodel-output
if the shapes match.model_input (np.ndarray) – Original input given to the model
model_output (np.ndarray) – Raw model output
target_tag_list (Optional[Union[List[SensorTag], List[str]]]) – Tags to be assigned to
model-output
if not assinged but model output matches model input,tags
will be used.index (Optional[np.ndarray]) – The index which should be assinged to the resulting dataframe, will be clipped to the length of
model_output
, should the model output less than its input.frequency (Optional[datetime.timedelta]) – The spacing of the time between points.
- Returns
- Return type
pd.DataFrame
-
gordo.machine.model.utils.
metric_wrapper
(metric, scaler: Optional[sklearn.base.TransformerMixin] = None)[source]¶ Ensures that a given metric works properly when the model itself returns a y which is shorter than the target y, and allows scaling the data before applying the metrics.
- Parameters
metric – Metric which must accept y_true and y_pred of the same length
scaler (Optional[TransformerMixin]) – Transformer which will be applied on y and y_pred before the metrics is calculated. Must have method transform, so for most scalers it must already be fitted on y.