Anomaly Models¶
Models which implment a .anomaly(X, y)
and can be served under the
model server /anomaly/prediction
endpoint.
AnomalyDetectorBase¶
The base class for all other anomaly detector models
-
class
gordo.machine.model.anomaly.base.
AnomalyDetectorBase
(**kwargs)[source]¶ Bases:
sklearn.base.BaseEstimator
,gordo.machine.model.base.GordoBase
Initialize the model
-
abstract
anomaly
(X: Union[pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], y: Union[pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], frequency: Optional[datetime.timedelta] = None) → Union[pandas.core.frame.DataFrame, xarray.core.dataset.Dataset][source]¶ Take X, y and optionally frequency; returning a dataframe containing anomaly score(s)
-
abstract
DiffBasedAnomalyDetector¶
Calculates the absolute value prediction differences between y and yhat as well
as the absolute difference error between both matrices via numpy.linalg.norm(..., axis=1)
-
class
gordo.machine.model.anomaly.diff.
DiffBasedAnomalyDetector
(base_estimator: sklearn.base.BaseEstimator = tensorflow.keras.wrappers.scikit_learn.KerasRegressor, scaler: sklearn.base.TransformerMixin = MinMaxScaler(), require_thresholds: bool = True, shuffle: bool = False, window: Optional[int] = None, smoothing_method: Optional[str] = None)[source]¶ Bases:
gordo.machine.model.anomaly.base.AnomalyDetectorBase
Estimator which wraps a
base_estimator
and provides a diff error based approach to anomaly detection.It trains a
scaler
to the target after training, purely for error calculations. The underlyingbase_estimator
is trained with the original, unscaled,y
.Threshold calculation is based on a rolling statistic of the validation errors on the last fold of cross-validation.
- Parameters
base_estimator (sklearn.base.BaseEstimator) – The model to which normal
.fit
,.predict
methods will be used. defaults to py:class:gordo.machine.model.models.KerasAutoEncoder withkind='feedforward_hourglass
scaler (sklearn.base.TransformerMixin) – Defaults to
sklearn.preprocessing.RobustScaler
Used for transforming model output and the originaly
to calculate the difference/error in model output vs expected.require_thresholds (bool) – Requires calculating
thresholds_
via a call tocross_validate()
. If this is set (default True), butcross_validate()
was not called before callinganomaly()
anAttributeError
will be raised.shuffle (bool) – Flag to shuffle or not data in
.fit
so that the model, if relevant, will be trained on a sample of data accross the time range and not just the last elements according to model argvalidation_split
.window (int) – Window size for smoothed thresholds
smoothing_method (str) – Method to be used together with
window
to smooth metrics. Must be one of: ‘smm’: simple moving median, ‘sma’: simple moving average or ‘ewma’: exponential weighted moving average.
-
anomaly
(X: Union[pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], y: Union[pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], frequency: Optional[datetime.timedelta] = None) → Union[pandas.core.frame.DataFrame, xarray.core.dataset.Dataset][source]¶ Create an anomaly dataframe from the base provided dataframe.
- Parameters
X (pd.DataFrame) – Dataframe representing the data to go into the model.
y (pd.DataFrame) – Dataframe representing the target output of the model.
- Returns
A superset of the original base dataframe with added anomaly specific features
- Return type
pd.DataFrame
-
cross_validate
(*, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, numpy.ndarray], cv=TimeSeriesSplit(max_train_size=None, n_splits=3), **kwargs)[source]¶ Run TimeSeries cross validation on the model, and will update the model’s threshold values based on the cross validation folds.
- Parameters
X (Union[pd.DataFrame, np.ndarray]) – Input data to the model
y (Union[pd.DataFrame, np.ndarray]) – Target data
kwargs (dict) – Any additional kwargs to be passed to
sklearn.model_selection.cross_validate()
- Returns
- Return type
dict
-
class
gordo.machine.model.anomaly.diff.
DiffBasedKFCVAnomalyDetector
(base_estimator: sklearn.base.BaseEstimator = tensorflow.keras.wrappers.scikit_learn.KerasRegressor, scaler: sklearn.base.TransformerMixin = MinMaxScaler(), require_thresholds: bool = True, shuffle: bool = True, window: int = 144, smoothing_method: str = 'smm', threshold_percentile: float = 0.99)[source]¶ Bases:
gordo.machine.model.anomaly.diff.DiffBasedAnomalyDetector
Estimator which wraps a
base_estimator
and provides a diff error based approach to anomaly detection.It trains a
scaler
to the target after training, purely for error calculations. The underlyingbase_estimator
is trained with the original, unscaled,y
.Threshold calculation is based on a percentile of the smoothed validation errors as calculated from cross-validation predictions.
- Parameters
base_estimator (sklearn.base.BaseEstimator) – The model to which normal
.fit
,.predict
methods will be used. defaults to py:class:gordo.machine.model.models.KerasAutoEncoder withkind='feedforward_hourglass
scaler (sklearn.base.TransformerMixin) – Defaults to
sklearn.preprocessing.RobustScaler
Used for transforming model output and the originaly
to calculate the difference/error in model output vs expected.require_thresholds (bool) – Requires calculating
thresholds_
via a call tocross_validate()
. If this is set (default True), butcross_validate()
was not called before callinganomaly()
anAttributeError
will be raised.shuffle (bool) – Flag to shuffle or not data in
.fit
so that the model, if relevant, will be trained on a sample of data accross the time range and not just the last elements according to model argvalidation_split
.window (int) – Window size for smooth metrics and threshold calculation.
smoothing_method (str) – Method to be used together with
window
to smooth metrics. Must be one of: ‘smm’: simple moving median, ‘sma’: simple moving average or ‘ewma’: exponential weighted moving average.threshold_percentile (float) – Percentile of the validation data to be used to calculate the threshold.
-
cross_validate
(*, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, numpy.ndarray], cv=KFold(n_splits=5, random_state=0, shuffle=True), **kwargs)[source]¶ Run Kfold cross validation on the model, and will update the model’s threshold values based on a percentile of the validation metrics.
- Parameters
X (Union[pd.DataFrame, np.ndarray]) – Input data to the model
y (Union[pd.DataFrame, np.ndarray]) – Target data
kwargs (dict) – Any additional kwargs to be passed to
sklearn.model_selection.cross_validate()
- Returns
- Return type
dict