ML Server¶
The ML Server is responsible for giving different “views” into the model being served.
Server¶
This module contains code for generating the Gordo server Flask application.
Running this module will run the application using Flask’s development webserver.
Gunicorn can be used to run the application as gevent async workers by using the
run_server()
function.
-
gordo.server.server.
adapt_proxy_deployment
(wsgi_app: Callable) → Callable[source]¶ Decorator specific to fixing behind-proxy-issues when on Kubernetes and using Envoy proxy.
- Parameters
wsgi_app (typing.Callable) – The underlying WSGI application of a flask app, for example
Notes
Special note about deploying behind Ambassador, or prefixed proxy paths in general:
When deployed on kubernetes/ambassador there is a prefix in-front of the server. ie:
/gordo/v0/some-project-name/some-target
The server itself only knows about routes to the right of such a prefix: such as
/metadata
or/predictions
when in reality, the full path is:/gordo/v0/some-project-name/some-target/metadata
This is solved by getting the current application’s assigned prefix, where
HTTP_X_ENVOY_ORIGINAL_PATH
is the full path, including the prefix. andPATH_INFO
is the actual relative path the server knows about.This function wraps the WSGI app itself to map the current full path to the assigned route function.
ie.
/metadata
-> metadata route function, by default, but updates/gordo/v0/some-project-name/some-target/metadata
-> metadata route function- Returns
- Return type
Callable
Example
>>> app = Flask(__name__) >>> app.wsgi_app = adapt_proxy_deployment(app.wsgi_app)
-
gordo.server.server.
build_app
(config: Optional[Dict[str, Any]] = None, prometheus_registry: Optional[prometheus_client.registry.CollectorRegistry] = None)[source]¶ Build app and any associated routes
-
gordo.server.server.
create_prometheus_metrics
(project: Optional[str] = None, registry: Optional[prometheus_client.registry.CollectorRegistry] = None) → gordo.server.prometheus.metrics.GordoServerPrometheusMetrics[source]¶
-
gordo.server.server.
run_cmd
(cmd)[source]¶ Run a shell command and handle CalledProcessError and OSError types
Note
This function is abstracted from
run_server()
in order to test the calling of commands that would allow the subprocess call to break, depending on how it is parameterized. For example, calling this without sending stderr to stdout will cause a segmentation fault when calling an executable that does not exist.
-
gordo.server.server.
run_server
(host: str, port: int, workers: int, log_level: str, config_module: Optional[str] = None, worker_connections: Optional[int] = None, threads: Optional[int] = None, worker_class: str = 'gthread', server_app: str = 'gordo.server.server:build_app()')[source]¶ Run application with Gunicorn server using Gevent Async workers
- Parameters
host (str) – The host to run the server on.
port (int) – The port to run the server on.
workers (int) – The number of worker processes for handling requests.
log_level (str) – The log level for the gunicorn webserver. Valid log level names can be found in the [gunicorn documentation](http://docs.gunicorn.org/en/stable/settings.html#loglevel).
config_module (str) – The config module. Will be passed with python: [prefix](https://docs.gunicorn.org/en/stable/settings.html#config).
worker_connections (int) – The maximum number of simultaneous clients per worker process.
threads (str) – The number of worker threads for handling requests.
worker_class (str) – The type of workers to use.
server_app (str) – The application to run
Utils¶
Shared utility functions and decorators which are used by the Views
-
gordo.server.utils.
dataframe_from_dict
(data: dict) → pandas.core.frame.DataFrame[source]¶ The inverse procedure done by
multi_lvl_column_dataframe_from_dict()
Reconstructed a MultiIndex column dataframe from a previously serialized one.Expects
data
to be a nested dictionary where each top level key has a value capable of being loaded frompandas.core.DataFrame.from_dict()
- Parameters
data (dict) – Data to be loaded into a MultiIndex column dataframe
- Returns
MultiIndex column dataframe.
- Return type
pandas.core.DataFrame
Examples
>>> serialized = { ... 'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4}, ... 'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}}, ... 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6}, ... 'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}} ... } >>> dataframe_from_dict(serialized) feature0 feature1 sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1 2019-01-01 0 1 2 3 2019-02-01 4 5 6 7
-
gordo.server.utils.
dataframe_from_parquet_bytes
(buf: bytes) → pandas.core.frame.DataFrame[source]¶ Convert bytes representing a parquet table into a pandas dataframe.
- Parameters
buf (bytes) – Bytes representing a parquet table. Can be the direct result from func::gordo.server.utils.dataframe_into_parquet_bytes
- Returns
- Return type
pandas.DataFrame
-
gordo.server.utils.
dataframe_into_parquet_bytes
(df: pandas.core.frame.DataFrame, compression: str = 'snappy') → bytes[source]¶ Convert a dataframe into bytes representing a parquet table.
- Parameters
df (pd.DataFrame) – DataFrame to be compressed
compression (str) – Compression to use, passed to
pyarrow.parquet.write_table()
- Returns
- Return type
bytes
-
gordo.server.utils.
dataframe_to_dict
(df: pandas.core.frame.DataFrame) → dict[source]¶ Convert a dataframe can have a
pandas.MultiIndex
as columns into a dict where each key is the top level column name, and the value is the array of columns under the top level name. If it’s a simple dataframe,pandas.core.DataFrame.to_dict()
will be used.This allows
json.dumps()
to be performed, wherepandas.DataFrame.to_dict()
would convert such a multi-level column dataframe into keys oftuple
objects, which are not json serializable. However this ends up working withpandas.DataFrame.from_dict()
- Parameters
df (pandas.DataFrame) – Dataframe expected to have columns of type
pandas.MultiIndex
2 levels deep.- Returns
List of records representing the dataframe in a ‘flattened’ form.
- Return type
List[dict]
Examples
>>> import pprint >>> import pandas as pd >>> import numpy as np >>> columns = pd.MultiIndex.from_tuples((f"feature{i}", f"sub-feature-{ii}") for i in range(2) for ii in range(2)) >>> index = pd.date_range('2019-01-01', '2019-02-01', periods=2) >>> df = pd.DataFrame(np.arange(8).reshape((2, 4)), columns=columns, index=index) >>> df feature0 feature1 sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1 2019-01-01 0 1 2 3 2019-02-01 4 5 6 7 >>> serialized = dataframe_to_dict(df) >>> pprint.pprint(serialized) {'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4}, 'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}}, 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6}, 'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}}
-
gordo.server.utils.
extract_X_y
(method)[source]¶ For a given flask view, will attempt to extract an ‘X’ and ‘y’ from the request and assign it to flask’s ‘g’ global request context
If it fails to extract ‘X’ and (optionally) ‘y’ from the request, it will not run the function but return a
BadRequest
response notifying the client of the failure.- Parameters
method (Callable) – The flask route to decorate, and will return it’s own response object and will want to use
flask.g.X
and/orflask.g.y
- Returns
Will either run a
flask.Response
with status code 400 if it fails to extract the X and optionally the y. Otherwise will run the decoratedmethod
which is also expected to return some sort offlask.Response
object.- Return type
flask.Response
-
gordo.server.utils.
find_path_in_dict
(path: List[str], data: dict) → Any[source]¶ Find a path in dict recursively
Examples
>>> find_path_in_dict(["parent", "child"], {"parent": {"child": 42}}) 42
- Parameters
path (List[str]) –
data (dict) –
-
gordo.server.utils.
load_metadata
(directory: str, name: str) → dict[source]¶ Load metadata from a directory for a given model by name.
- Parameters
directory (str) – Directory to look for the model’s metadata
name (str) – Name of the model to load metadata for, this would be the sub directory within the directory parameter.
- Returns
- Return type
dict
-
gordo.server.utils.
load_model
[source]¶ Load a given model from the directory by name.
- Parameters
directory (str) – Directory to look for the model
name (str) – Name of the model to load, this would be the sub directory within the directory parameter.
- Returns
- Return type
BaseEstimator
-
gordo.server.utils.
metadata_required
(f)[source]¶ Decorate a view which has
gordo_name
as a url parameter and will setg.metadata
to that model’s metadata
Model IO¶
The general model input/output operations applied by the views
-
gordo.server.model_io.
get_model_output
(model: sklearn.pipeline.Pipeline, X: numpy.ndarray) → numpy.ndarray[source]¶ Get the raw output from the current model given X. Will try to predict and then transform, raising an error if both fail.
- Parameters
X (np.ndarray) – 2d array of sample(s)
- Returns
The raw output of the model in numpy array form.
- Return type
np.ndarray