Skip to content

Latest commit

 

History

History
269 lines (202 loc) · 17.8 KB

File metadata and controls

269 lines (202 loc) · 17.8 KB

AzureML Inference Server

The inference server is the component that facilitates inferencing to deployed models. Requests made to the HTTP server run user-provided code that interfaces with the user models.

This server is used with most images in the Azure ML ecosystem, and is considered the primary component of the base image, as it contains the python assets required for inferencing.

Inference Server contents

  • Scoring Endpoints:

    • The table below details the different types of requests that can be made on the /score path and how input value can be sent with the request. The GET request uses query parameters to send the input, while the POST request uses the request body that can be deserialized to JSON.
Request Type Query Parameters Request Body Raw Data
GET
POST
OPTIONS
  • Raw Data:

    • Required Setup: @rawhttp decorator on the run function. More info here.

    • All of the request types can use Raw Data, which can be set by using a @rawhttp decorator on the user run function. This allows for raw data (such as binary data) to be sent to the run function and then used by the model. The run function can then directly use AMLResponse to build some HTTP response and respond with the output of the model.

  • Health:

    • The “/” endpoint is the health check endpoint. A GET request can be made to this endpoint. A response of the plain text string “Healthy” is expected. The cluster will check this frequently to determine whether the service is healthy.
  • Schema / Discoverability:

    • Swagger schemas can be generated to understand the input type to feed the model and the output type generated by the model. This allows for discoverability, as users can receive the schema and understand the data shape requirements of the model.
    • Swagger schema generation only works for JSON data. For raw data or non-json structured data (e.g. xml), swagger will not work.
    • Required setup:
      • @input_schema and @output_schema decorators must be specified with the data types above the run function.

      • The inference-schema package must be included in the dependencies. The azureml-inference-server-http package includes this dependency by default.

      • More info here.

  • Logging and Metrics:

    • Application insights: There are several cases where application insights will be logged with relevant information. All these will only be logged if application insights is enabled with AML_APPINSIGHTS_ENABLED.
      • Request Log: With every endpoint that is not the “/” health check endpoint, a log is created with information about the following. This is logged in the ‘requests’ table from base images:

        • Request id
        • Response value
        • Request id
        • Client Request Id
        • Container Id
        • Request path
        • URL of request, including params
        • Duration
        • Success (True or false)
        • Start time
        • Response code
        • Http method
      • Model Data Log: When the scoring function is run, a log is created about the model data with the following information from base images. This output is seen in the trace table. For this logging to take place, MDC must also be enabled, with AML_MODEL_DC_STORAGE_ENABLED

        • Container Id
        • Request Id
        • Client Request Id
        • Workspace Name
        • Service Name
        • Models
        • Input
        • Prediction
      • Exception Log: Writes telemetry when exception is encountered. The following details are logged. This output can be seen in the ‘exceptions’ table from base image, along with other information:

        • Container ID
        • Request ID
        • Client Request Id
      • Print Hook:

        • This class intercepts stdout/stderr output, appends a comma-separated, prefix, sends the modified message to syslog and then sends the unmodified message back to the original destination.
        • All messages within the user run function with the request-id prefix automatically prepended as follows:
          • 04c6f58d-510f-4e3a-933e-60ac20f2707d,User run function invoked.
        • Within the init function, a series of zeroes will be prepended as follows:
          • 00000000-0000-0000-0000-000000000000,User init function invoked.
        • This will be visible in the trace logs under STDOUT

Runtime Environment:

The following variables come from the environment created by individual requests and then used within the base image codebase. These are created as defined by the wsgi standard and part of the gunicorn dependencies. Read more about the wsgi environment variables here.

  • Request ID: If the x-ms-request-id is specified as a request header, the runtime environment will contain the following:
Variable Default Value Purpose
HTTP_X_MS_REQUEST_ID Generated UUID Used as the unique id for logs related to the request

This header will be deprecated in future.

  • Request ID: If the x-request-id is specified as a request header, the runtime environment will contain the following:
Variable Default Value Purpose
HTTP_X_REQUEST_ID Generated UUID Used as the unique id for logs related to the request

Currently copying the value of x-ms-request-id by default if it is not specified in the header. It is used as a single ID across MIR-FD/ScoringFE - envoy-on-vm - user_container to track an individual request. This ID is generated by AzureML service and make sure it is unique for every request.

Logging to AppInsights:

In the table below, the columns with the same letter will see the same value.

Request Response AppInsights
x-request-id x-ms-request-id x-ms-client-request-id x-request-id x-ms-request-id x-ms-client-request-id Request ID Client Request Id
- - - A A - A -
A - - A A - A -
- B - A B B A B
- - C A C C A C
A B - A B B A B
- B C A B C A C
A - C A C C A C
A B C A B C A C
  • Client Request ID: If the x-ms-client-request-id is specified as a request header, the runtime environment will contain the following:
Variable Default Value Purpose
HTTP_X_CLIENT_REQUEST_ID EMPTY Users can use this id to associate and track their own end to end scenario

e.g. call service A, then call an AzureML endpoint, then call service B. In all three calls the client can use the same x-ms-client-request-id to track this end to end scenario for further investigation.

  • Trace ID: If the trace-id is specified as a request header, this ID will be passed back in the response
  • Server Version: If sending of the server version as part of response is not disabled then x-ms-server-version is sent in the response.
  • Logging: Several of the environment variables from the runtime environment are used for logging.
Variable Default Value Purpose
REQUEST_METHOD GET Request method
QUERY_STRING None Query Parameters
PATH_INFO / URL path of target within application
HTTP_HOST SERVER_NAME/unknown Host name, while SERVER_NAME is server name. These are generally equivalent, however the wsgi docs recommend using HTTP_HOST for URL reconstruction over SERVER_NAME
  • Separately, we also have some environment variables defined during the run script of gunicorn. These are set during the start of each gunicorn process.
Variable Value Purpose
LD_LIBRARY_PATH AZUREM\L_CONDA_ENVIRONMENT_PATH (Path to Conda environment) Defines the run-time shared library loader
PYTHONPATH AML_SERVER_ROOT Adds additional directories where Python will look for modules and packages

Server Configuration:

Environment variables (dynamic configuration):

There are many environment variables that allow for dynamic configuration. I have broken up the environment variables by which parts of the server can be modified. All of these variables can be set in the dockerfile or can be set during deployment, depending on the use case.

  • Server Setup: These environment variables define the paths to the app and server that are defined before the initial app is built.
Variable Default Value Purpose
AML_APP_ROOT /var/azureml-app Root directory for the app
AML_SERVER_ROOT Current directory Real path Root directory for the server.
AML_ENTRY_SCRIPT None Path to entry script file
AML_SOURCE_DIRECTORY None Path to source directory
AZUREML_MODEL_DIR None Directory of the model
SERVER_VERSION_LOG_RESPONSE_ENABLED None Disable sending the server version as part of response
AML_CORS_ORIGINS None Enable CORS for the specified origins
  • AML Blueprint setup: The following environment variables are used when configuring the Blueprint for the server
Variable Default Value Purpose
SERVICE_NAME ML Service Name of the service (used for Swagger schema generation)
SERVICE_PATH_PREFIX None Prefix for the service path (used for Swagger schema generation)
SERVICE_VERSION 1.0 Version of the service (used for Swagger schema generation)
SCORING_TIMEOUT_MS 1 Hour Dictates how long scoring function with run before timeout.
  • Gunicorn configuration:
Variable Default Value Purpose
WORKER_COUNT 1 Number of Gunicorn workers to create
WORKER_TIMEOUT 300 Amount of time master waits for the worker to contact it before the worker is killed
WORKER_PRELOAD False Indicates whether "preload_app" is set to true in Gunicorn, which means that the application code is loaded before workers are forked and that shared memory is used.
  • Logging: The following environment variables are used for logging purposes.
Variable Default Value Purpose
AZUREML_LOG_LEVEL INFO Sets the Logging level
AML_DBG_MODEL_INFO None Debug Model logging will take place if this is true
AML_APP_INSIGHTS_ENABLED None Enables Appinsights
AML_APP_INSIGHTS_KEY None Key to user AppInsights
AML_APP_INSIGHTS_ENDPOINT https://dc.services.visualstudio.com/v2/track Endpoint of AppInsights
AML_MODEL_DC_STORAGE_ENABLED None Enables Model Data Collection
HOSTNAME None Container name
WORKSPACE_NAME None User workspace name

Network configuration:

  • Gunicorn: This is the static network configuration for base image.

  • Server socket: 127.0.0.1:31311. This defines where the server will be run by gunicorn.

  • Nginx:

    • The server listens on port 5001 and sets the proxied server to port 31311
Variable Value Purpose
proxy_pass http://127.0.0.1:313111 Sets the protocol and address of a proxied server.
proxy_connect_timeout 1000s Defines a timeout for establishing a connection with a proxied server.
proxy_read_timeout 1000s Defines a timeout for reading a response from the proxied server.
client_max_body_size 100m Sets the maximum allowed size of the client request body.

CORS Support:

Cross-origin resource sharing is a way to allow resources on a webpage to be requested from another domain. CORS works via HTTP headers sent with the client request and returned with the service response. For more information on CORS and valid headers, see Cross-origin resource sharing in Wikipedia.

Users can specify the domains allowed for access through the AML_CORS_ORIGINS environment variable, as a comma separated list of domains, such as www.microsoft.com, www.bing.com. While discouraged, users can also set it to * to allow access from all domains. CORS is disabled if this environment variable is not set.

Existing usage to employ @rawhttp as a way to specify CORS header is not affected, and can be used if you need more granular control of CORS (such as the need to specify other CORS headers). See here for an example.

Load Server Config from JSON:

Server supports the loading of the config using a json file. Config file can be specified using:

  1. The env variable AZUREML_CONFIG_FILE (the absolute path to the json configuration file).
  2. CLI parameter --config_file.

Note:

  1. All the paths mentioned in the config.json (configuration file) should be absolute path.
  2. Priority: CLI > ENV Variable > config file

config.json will be searched in below locations by default if config file is not provided explicitly using env variable/CLI paramter:

  1. AML_APP_ROOT directory
  2. Directory containing the scoring script

Config file will support only below keys:

Key Required Default Value
AML_APP_ROOT No "/var/azureml-app"
AZUREML_SOURCE_DIRECTORY No
AZUREML_ENTRY_SCRIPT Yes
SERVICE_NAME No "ML service"
WORKSPACE_NAME No ""
SERVICE_PATH_PREFIX No ""
SERVICE_VERSION No "1.0"
SCORING_TIMEOUT_MS No 3600 * 1000
AZUREML_LOG_LEVEL No "INFO"
AML_APP_INSIGHTS_ENABLED No False
AML_APP_INSIGHTS_KEY No None
AML_MODEL_DC_STORAGE_ENABLED No False
APP_INSIGHTS_LOG_RESPONSE_ENABLED No "True"
AML_CORS_ORIGINS No None
AZUREML_MODEL_DIR No False
HOSTNAME No "Unknown"
AZUREML_DEBUG_PORT No None

The code for the config can be found here: config.py.

Sample config.json:

{
      "AZUREML_ENTRY_SCRIPT": "/mnt/d/tests/manual/default_score.py"
      "AML_CORS_ORIGINS": "www.microsoft.com ",
      "SCORING_TIMEOUT_MS": 6000,
      "AML_APP_INSIGHTS_ENABLED": true
}