ML Server

The ML Server is responsible for giving different “views” into the model being served.


This module contains code for generating the Gordo server Flask application.

Running this module will run the application using Flask’s development webserver. Gunicorn can be used to run the application as gevent async workers by using the run_server() function.

class gordo.server.server.Config[source]

Bases: object

Server config

gordo.server.server.adapt_proxy_deployment(wsgi_app: Callable) → Callable[source]

Decorator specific to fixing behind-proxy-issues when on Kubernetes and using Envoy proxy.


wsgi_app (typing.Callable) – The underlying WSGI application of a flask app, for example


Special note about deploying behind Ambassador, or prefixed proxy paths in general:

When deployed on kubernetes/ambassador there is a prefix in-front of the server. ie:


The server itself only knows about routes to the right of such a prefix: such as /metadata or /predictions when in reality, the full path is:


This is solved by getting the current application’s assigned prefix, where HTTP_X_ENVOY_ORIGINAL_PATH is the full path, including the prefix. and PATH_INFO is the actual relative path the server knows about.

This function wraps the WSGI app itself to map the current full path to the assigned route function.

ie. /metadata -> metadata route function, by default, but updates /gordo/v0/some-project-name/some-target/metadata -> metadata route function


Return type



>>> app = Flask(__name__)
>>> app.wsgi_app = adapt_proxy_deployment(app.wsgi_app)
gordo.server.server.build_app(config: Optional[Dict[str, Any]] = None, prometheus_registry: Optional[prometheus_client.registry.CollectorRegistry] = None)[source]

Build app and any associated routes

gordo.server.server.create_prometheus_metrics(project: Optional[str] = None, registry: Optional[prometheus_client.registry.CollectorRegistry] = None) → gordo.server.prometheus.metrics.GordoServerPrometheusMetrics[source]

Run a shell command and handle CalledProcessError and OSError types


This function is abstracted from run_server() in order to test the calling of commands that would allow the subprocess call to break, depending on how it is parameterized. For example, calling this without sending stderr to stdout will cause a segmentation fault when calling an executable that does not exist.

gordo.server.server.run_server(host: str, port: int, workers: int, log_level: str, config_module: Optional[str] = None, worker_connections: Optional[int] = None, threads: Optional[int] = None, worker_class: str = 'gthread', server_app: str = 'gordo.server.server:build_app()')[source]

Run application with Gunicorn server using Gevent Async workers

  • host (str) – The host to run the server on.

  • port (int) – The port to run the server on.

  • workers (int) – The number of worker processes for handling requests.

  • log_level (str) – The log level for the gunicorn webserver. Valid log level names can be found in the [gunicorn documentation](

  • config_module (str) – The config module. Will be passed with python: [prefix](

  • worker_connections (int) – The maximum number of simultaneous clients per worker process.

  • threads (str) – The number of worker threads for handling requests.

  • worker_class (str) – The type of workers to use.

  • server_app (str) – The application to run


A collection of implemented views into the Model being served.



Shared utility functions and decorators which are used by the Views

gordo.server.utils.dataframe_from_dict(data: dict) → pandas.core.frame.DataFrame[source]

The inverse procedure done by multi_lvl_column_dataframe_from_dict() Reconstructed a MultiIndex column dataframe from a previously serialized one.

Expects data to be a nested dictionary where each top level key has a value capable of being loaded from pandas.core.DataFrame.from_dict()


data (dict) – Data to be loaded into a MultiIndex column dataframe


MultiIndex column dataframe.

Return type



>>> serialized = {
... 'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4},
...              'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}},
... 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6},
...              'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}
... }
>>> dataframe_from_dict(serialized)  
                feature0                    feature1
       sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1
2019-01-01             0             1             2             3
2019-02-01             4             5             6             7
gordo.server.utils.dataframe_from_parquet_bytes(buf: bytes) → pandas.core.frame.DataFrame[source]

Convert bytes representing a parquet table into a pandas dataframe.


buf (bytes) – Bytes representing a parquet table. Can be the direct result from func::gordo.server.utils.dataframe_into_parquet_bytes


Return type


gordo.server.utils.dataframe_into_parquet_bytes(df: pandas.core.frame.DataFrame, compression: str = 'snappy') → bytes[source]

Convert a dataframe into bytes representing a parquet table.

  • df (pd.DataFrame) – DataFrame to be compressed

  • compression (str) – Compression to use, passed to pyarrow.parquet.write_table()


Return type


gordo.server.utils.dataframe_to_dict(df: pandas.core.frame.DataFrame) → dict[source]

Convert a dataframe can have a pandas.MultiIndex as columns into a dict where each key is the top level column name, and the value is the array of columns under the top level name. If it’s a simple dataframe, pandas.core.DataFrame.to_dict() will be used.

This allows json.dumps() to be performed, where pandas.DataFrame.to_dict() would convert such a multi-level column dataframe into keys of tuple objects, which are not json serializable. However this ends up working with pandas.DataFrame.from_dict()


df (pandas.DataFrame) – Dataframe expected to have columns of type pandas.MultiIndex 2 levels deep.


List of records representing the dataframe in a ‘flattened’ form.

Return type



>>> import pprint
>>> import pandas as pd
>>> import numpy as np
>>> columns = pd.MultiIndex.from_tuples((f"feature{i}", f"sub-feature-{ii}") for i in range(2) for ii in range(2))
>>> index = pd.date_range('2019-01-01', '2019-02-01', periods=2)
>>> df = pd.DataFrame(np.arange(8).reshape((2, 4)), columns=columns, index=index)
>>> df  
                feature0                    feature1
           sub-feature-0 sub-feature-1 sub-feature-0 sub-feature-1
2019-01-01             0             1             2             3
2019-02-01             4             5             6             7
>>> serialized = dataframe_to_dict(df)
>>> pprint.pprint(serialized)
{'feature0': {'sub-feature-0': {'2019-01-01': 0, '2019-02-01': 4},
              'sub-feature-1': {'2019-01-01': 1, '2019-02-01': 5}},
 'feature1': {'sub-feature-0': {'2019-01-01': 2, '2019-02-01': 6},
              'sub-feature-1': {'2019-01-01': 3, '2019-02-01': 7}}}

For a given flask view, will attempt to extract an ‘X’ and ‘y’ from the request and assign it to flask’s ‘g’ global request context

If it fails to extract ‘X’ and (optionally) ‘y’ from the request, it will not run the function but return a BadRequest response notifying the client of the failure.


method (Callable) – The flask route to decorate, and will return it’s own response object and will want to use flask.g.X and/or flask.g.y


Will either run a flask.Response with status code 400 if it fails to extract the X and optionally the y. Otherwise will run the decorated method which is also expected to return some sort of flask.Response object.

Return type


gordo.server.utils.find_path_in_dict(path: List[str], data: dict) → Any[source]

Find a path in dict recursively


>>> find_path_in_dict(["parent", "child"], {"parent": {"child": 42}})
  • path (List[str]) –

  • data (dict) –

gordo.server.utils.load_metadata(directory: str, name: str) → dict[source]

Load metadata from a directory for a given model by name.

  • directory (str) – Directory to look for the model’s metadata

  • name (str) – Name of the model to load metadata for, this would be the sub directory within the directory parameter.


Return type



Load a given model from the directory by name.

  • directory (str) – Directory to look for the model

  • name (str) – Name of the model to load, this would be the sub directory within the directory parameter.


Return type



Decorate a view which has gordo_name as a url parameter and will set g.metadata to that model’s metadata


Decorate a view which has gordo_name as a url parameter and will set g.model to be the loaded model and g.metadata to that model’s metadata

gordo.server.utils.parse_iso_datetime(datetime_str: str) → datetime.datetime[source]

Model IO

The general model input/output operations applied by the views

gordo.server.model_io.get_model_output(model: sklearn.pipeline.Pipeline, X: numpy.ndarray) → numpy.ndarray[source]

Get the raw output from the current model given X. Will try to predict and then transform, raising an error if both fail.


X (np.ndarray) – 2d array of sample(s)


The raw output of the model in numpy array form.

Return type
