Anomaly Models¶
Models which implment a .anomaly(X, y) and can be served under the
model server /anomaly/prediction endpoint.
AnomalyDetectorBase¶
The base class for all other anomaly detector models
-
class
gordo.machine.model.anomaly.base.AnomalyDetectorBase(**kwargs)[source]¶ Bases:
sklearn.base.BaseEstimator,gordo.machine.model.base.GordoBaseInitialize the model
-
abstract
anomaly(X: Union[pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], y: Union[pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], frequency: Optional[datetime.timedelta] = None) → Union[pandas.core.frame.DataFrame, xarray.core.dataset.Dataset][source]¶ Take X, y and optionally frequency; returning a dataframe containing anomaly score(s)
-
abstract
DiffBasedAnomalyDetector¶
Calculates the absolute value prediction differences between y and yhat as well
as the absolute difference error between both matrices via numpy.linalg.norm(..., axis=1)
-
class
gordo.machine.model.anomaly.diff.DiffBasedAnomalyDetector(base_estimator: sklearn.base.BaseEstimator = tensorflow.keras.wrappers.scikit_learn.KerasRegressor, scaler: sklearn.base.TransformerMixin = MinMaxScaler(), require_thresholds: bool = True, shuffle: bool = False, window: Optional[int] = None, smoothing_method: Optional[str] = None)[source]¶ Bases:
gordo.machine.model.anomaly.base.AnomalyDetectorBaseEstimator which wraps a
base_estimatorand provides a diff error based approach to anomaly detection.It trains a
scalerto the target after training, purely for error calculations. The underlyingbase_estimatoris trained with the original, unscaled,y.Threshold calculation is based on a rolling statistic of the validation errors on the last fold of cross-validation.
- Parameters
base_estimator (sklearn.base.BaseEstimator) – The model to which normal
.fit,.predictmethods will be used. defaults to py:class:gordo.machine.model.models.KerasAutoEncoder withkind='feedforward_hourglassscaler (sklearn.base.TransformerMixin) – Defaults to
sklearn.preprocessing.RobustScalerUsed for transforming model output and the originalyto calculate the difference/error in model output vs expected.require_thresholds (bool) – Requires calculating
thresholds_via a call tocross_validate(). If this is set (default True), butcross_validate()was not called before callinganomaly()anAttributeErrorwill be raised.shuffle (bool) – Flag to shuffle or not data in
.fitso that the model, if relevant, will be trained on a sample of data accross the time range and not just the last elements according to model argvalidation_split.window (int) – Window size for smoothed thresholds
smoothing_method (str) – Method to be used together with
windowto smooth metrics. Must be one of: ‘smm’: simple moving median, ‘sma’: simple moving average or ‘ewma’: exponential weighted moving average.
-
anomaly(X: Union[pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], y: Union[pandas.core.frame.DataFrame, xarray.core.dataarray.DataArray], frequency: Optional[datetime.timedelta] = None) → Union[pandas.core.frame.DataFrame, xarray.core.dataset.Dataset][source]¶ Create an anomaly dataframe from the base provided dataframe.
- Parameters
X (pd.DataFrame) – Dataframe representing the data to go into the model.
y (pd.DataFrame) – Dataframe representing the target output of the model.
- Returns
A superset of the original base dataframe with added anomaly specific features
- Return type
pd.DataFrame
-
cross_validate(*, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, numpy.ndarray], cv=TimeSeriesSplit(max_train_size=None, n_splits=3), **kwargs)[source]¶ Run TimeSeries cross validation on the model, and will update the model’s threshold values based on the cross validation folds.
- Parameters
X (Union[pd.DataFrame, np.ndarray]) – Input data to the model
y (Union[pd.DataFrame, np.ndarray]) – Target data
kwargs (dict) – Any additional kwargs to be passed to
sklearn.model_selection.cross_validate()
- Returns
- Return type
dict
-
class
gordo.machine.model.anomaly.diff.DiffBasedKFCVAnomalyDetector(base_estimator: sklearn.base.BaseEstimator = tensorflow.keras.wrappers.scikit_learn.KerasRegressor, scaler: sklearn.base.TransformerMixin = MinMaxScaler(), require_thresholds: bool = True, shuffle: bool = True, window: int = 144, smoothing_method: str = 'smm', threshold_percentile: float = 0.99)[source]¶ Bases:
gordo.machine.model.anomaly.diff.DiffBasedAnomalyDetectorEstimator which wraps a
base_estimatorand provides a diff error based approach to anomaly detection.It trains a
scalerto the target after training, purely for error calculations. The underlyingbase_estimatoris trained with the original, unscaled,y.Threshold calculation is based on a percentile of the smoothed validation errors as calculated from cross-validation predictions.
- Parameters
base_estimator (sklearn.base.BaseEstimator) – The model to which normal
.fit,.predictmethods will be used. defaults to py:class:gordo.machine.model.models.KerasAutoEncoder withkind='feedforward_hourglassscaler (sklearn.base.TransformerMixin) – Defaults to
sklearn.preprocessing.RobustScalerUsed for transforming model output and the originalyto calculate the difference/error in model output vs expected.require_thresholds (bool) – Requires calculating
thresholds_via a call tocross_validate(). If this is set (default True), butcross_validate()was not called before callinganomaly()anAttributeErrorwill be raised.shuffle (bool) – Flag to shuffle or not data in
.fitso that the model, if relevant, will be trained on a sample of data accross the time range and not just the last elements according to model argvalidation_split.window (int) – Window size for smooth metrics and threshold calculation.
smoothing_method (str) – Method to be used together with
windowto smooth metrics. Must be one of: ‘smm’: simple moving median, ‘sma’: simple moving average or ‘ewma’: exponential weighted moving average.threshold_percentile (float) – Percentile of the validation data to be used to calculate the threshold.
-
cross_validate(*, X: Union[pandas.core.frame.DataFrame, numpy.ndarray], y: Union[pandas.core.frame.DataFrame, numpy.ndarray], cv=KFold(n_splits=5, random_state=0, shuffle=True), **kwargs)[source]¶ Run Kfold cross validation on the model, and will update the model’s threshold values based on a percentile of the validation metrics.
- Parameters
X (Union[pd.DataFrame, np.ndarray]) – Input data to the model
y (Union[pd.DataFrame, np.ndarray]) – Target data
kwargs (dict) – Any additional kwargs to be passed to
sklearn.model_selection.cross_validate()
- Returns
- Return type
dict