ML Server¶

The ML Server is responsible for giving different “views” into the model being served.

Server¶

This module contains code for generating the Gordo server Flask application.

Running this module will run the application using Flask’s development webserver. Gunicorn can be used to run the application as gevent async workers by using the run_server() function.

class gordo.server.server.Config[source]¶

Bases: object

Server config

gordo.server.server.adapt_proxy_deployment(wsgi_app: Callable) → Callable[source]¶

Decorator specific to fixing behind-proxy-issues when on Kubernetes and using Envoy proxy.

Parameters: wsgi_app (typing.Callable) – The underlying WSGI application of a flask app, for example

Notes

Special note about deploying behind Ambassador, or prefixed proxy paths in general:

When deployed on kubernetes/ambassador there is a prefix in-front of the server. ie:

/gordo/v0/some-project-name/some-target

The server itself only knows about routes to the right of such a prefix: such as /metadata or /predictions when in reality, the full path is:

/gordo/v0/some-project-name/some-target/metadata

This is solved by getting the current application’s assigned prefix, where HTTP_X_ENVOY_ORIGINAL_PATH is the full path, including the prefix. and PATH_INFO is the actual relative path the server knows about.

This function wraps the WSGI app itself to map the current full path to the assigned route function.

ie. /metadata -> metadata route function, by default, but updates /gordo/v0/some-project-name/some-target/metadata -> metadata route function

Returns
Return type: Callable

Example

>>> app = Flask(__name__)
>>> app.wsgi_app = adapt_proxy_deployment(app.wsgi_app)

gordo.server.server.build_app(config: Optional[Dict[str, Any]] = None, prometheus_registry: Optional[prometheus_client.registry.CollectorRegistry] = None)[source]¶: Build app and any associated routes

gordo.server.server.create_prometheus_metrics(project: Optional[str] = None, registry: Optional[prometheus_client.registry.CollectorRegistry] = None) → gordo.server.prometheus.metrics.GordoServerPrometheusMetrics[source]¶

gordo.server.server.enable_prometheus()[source]¶

gordo.server.server.run_cmd(cmd)[source]¶: Run a shell command and handle CalledProcessError and OSError types

Note

This function is abstracted from run_server() in order to test the calling of commands that would allow the subprocess call to break, depending on how it is parameterized. For example, calling this without sending stderr to stdout will cause a segmentation fault when calling an executable that does not exist.

gordo.server.server.run_server(host: str, port: int, workers: int, log_level: str, config_module: Optional[str] = None, worker_connections: Optional[int] = None, threads: Optional[int] = None, worker_class: str = 'gthread', server_app: str = 'gordo.server.server:build_app()')[source]¶

Run application with Gunicorn server using Gevent Async workers

Parameters

host (str) – The host to run the server on.
port (int) – The port to run the server on.
workers (int) – The number of worker processes for handling requests.
log_level (str) – The log level for the gunicorn webserver. Valid log level names can be found in the [gunicorn documentation](http://docs.gunicorn.org/en/stable/settings.html#loglevel).
config_module (str) – The config module. Will be passed with python: [prefix](https://docs.gunicorn.org/en/stable/settings.html#config).
worker_connections (int) – The maximum number of simultaneous clients per worker process.
threads (str) – The number of worker threads for handling requests.
worker_class (str) – The type of workers to use.
server_app (str) – The application to run

Views¶

A collection of implemented views into the Model being served.

Views:

Base
Anomaly

Utils¶

Shared utility functions and decorators which are used by the Views

gordo.server.utils.dataframe_from_dict(data: dict) → pandas.core.frame.DataFrame[source]¶

The inverse procedure done by multi_lvl_column_dataframe_from_dict() Reconstructed a MultiIndex column dataframe from a previously serialized one.

Expects data to be a nested dictionary where each top level key has a value capable of being loaded from pandas.core.DataFrame.from_dict()

Parameters: data (dict) – Data to be loaded into a MultiIndex column dataframe
Returns: MultiIndex column dataframe.
Return type: pandas.core.DataFrame

Examples

>>> serialized = {
... 'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4},
...              'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}},
... 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6},
...              'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}
... }
>>> dataframe_from_dict(serialized)  
                feature0                    feature1
       sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1
2019-01-01             0             1             2             3
2019-02-01             4             5             6             7

gordo.server.utils.dataframe_from_parquet_bytes(buf: bytes) → pandas.core.frame.DataFrame[source]¶

Convert bytes representing a parquet table into a pandas dataframe.

Parameters: buf (bytes) – Bytes representing a parquet table. Can be the direct result from func::gordo.server.utils.dataframe_into_parquet_bytes
Returns
Return type: pandas.DataFrame

gordo.server.utils.dataframe_into_parquet_bytes(df: pandas.core.frame.DataFrame, compression: str = 'snappy') → bytes[source]¶

Convert a dataframe into bytes representing a parquet table.

Parameters

df (pd.DataFrame) – DataFrame to be compressed
compression (str) – Compression to use, passed to pyarrow.parquet.write_table()

Returns

Return type

bytes

gordo.server.utils.dataframe_to_dict(df: pandas.core.frame.DataFrame) → dict[source]¶

Convert a dataframe can have a pandas.MultiIndex as columns into a dict where each key is the top level column name, and the value is the array of columns under the top level name. If it’s a simple dataframe, pandas.core.DataFrame.to_dict() will be used.

This allows json.dumps() to be performed, where pandas.DataFrame.to_dict() would convert such a multi-level column dataframe into keys of tuple objects, which are not json serializable. However this ends up working with pandas.DataFrame.from_dict()

Parameters: df (pandas.DataFrame) – Dataframe expected to have columns of type pandas.MultiIndex 2 levels deep.
Returns: List of records representing the dataframe in a ‘flattened’ form.
Return type: List[dict]

Examples

>>> import pprint
>>> import pandas as pd
>>> import numpy as np
>>> columns = pd.MultiIndex.from_tuples((f"feature{i}", f"sub-feature-{ii}") for i in range(2) for ii in range(2))
>>> index = pd.date_range('2019-01-01', '2019-02-01', periods=2)
>>> df = pd.DataFrame(np.arange(8).reshape((2, 4)), columns=columns, index=index)
>>> df  
                feature0                    feature1
           sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1
2019-01-01             0             1             2             3
2019-02-01             4             5             6             7
>>> serialized = dataframe_to_dict(df)
>>> pprint.pprint(serialized)
{'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4},
              'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}},
 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6},
              'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}}

gordo.server.utils.extract_X_y(method)[source]¶

For a given flask view, will attempt to extract an ‘X’ and ‘y’ from the request and assign it to flask’s ‘g’ global request context

If it fails to extract ‘X’ and (optionally) ‘y’ from the request, it will not run the function but return a BadRequest response notifying the client of the failure.

Parameters: method (Callable) – The flask route to decorate, and will return it’s own response object and will want to use flask.g.X and/or flask.g.y
Returns: Will either run a flask.Response with status code 400 if it fails to extract the X and optionally the y. Otherwise will run the decorated method which is also expected to return some sort of flask.Response object.
Return type: flask.Response

gordo.server.utils.find_path_in_dict(path: List[str], data: dict) → Any[source]¶

Find a path in dict recursively

Examples

>>> find_path_in_dict(["parent", "child"], {"parent": {"child": 42}})
42

Parameters

path (List[str]) –
data (dict) –

gordo.server.utils.load_metadata(directory: str, name: str) → dict[source]¶

Load metadata from a directory for a given model by name.

Parameters

directory (str) – Directory to look for the model’s metadata
name (str) – Name of the model to load metadata for, this would be the sub directory within the directory parameter.

Returns

Return type

dict

gordo.server.utils.load_model[source]¶

Load a given model from the directory by name.

Parameters

directory (str) – Directory to look for the model
name (str) – Name of the model to load, this would be the sub directory within the directory parameter.

Returns

Return type

BaseEstimator

gordo.server.utils.metadata_required(f)[source]¶: Decorate a view which has gordo_name as a url parameter and will set g.metadata to that model’s metadata

gordo.server.utils.model_required(f)[source]¶: Decorate a view which has gordo_name as a url parameter and will set g.model to be the loaded model and g.metadata to that model’s metadata

gordo.server.utils.parse_iso_datetime(datetime_str: str) → datetime.datetime[source]¶

Model IO¶

The general model input/output operations applied by the views

gordo.server.model_io.get_model_output(model: sklearn.pipeline.Pipeline, X: numpy.ndarray) → numpy.ndarray[source]¶

Get the raw output from the current model given X. Will try to predict and then transform, raising an error if both fail.

Parameters: X (np.ndarray) – 2d array of sample(s)
Returns: The raw output of the model in numpy array form.
Return type: np.ndarray