AzureML Inference Server

The inference server is the component that facilitates inferencing to deployed models. Requests made to the HTTP server run user-provided code that interfaces with the user models.

This server is used with most images in the Azure ML ecosystem, and is considered the primary component of the base image, as it contains the python assets required for inferencing.

Inference Server contents

Scoring Endpoints:
- The table below details the different types of requests that can be made on the /score path and how input value can be sent with the request. The GET request uses query parameters to send the input, while the POST request uses the request body that can be deserialized to JSON.

Request Type	Query Parameters	Request Body	Raw Data
GET	☑	☒	☑
POST	☒	☑	☑
OPTIONS	☒	☒	☑

Raw Data:
- Required Setup: @rawhttp decorator on the run function. More info here.
- All of the request types can use Raw Data, which can be set by using a @rawhttp decorator on the user run function. This allows for raw data (such as binary data) to be sent to the run function and then used by the model. The run function can then directly use AMLResponse to build some HTTP response and respond with the output of the model.
Health:
- The “/” endpoint is the health check endpoint. A GET request can be made to this endpoint. A response of the plain text string “Healthy” is expected. The cluster will check this frequently to determine whether the service is healthy.
Schema / Discoverability:
- Swagger schemas can be generated to understand the input type to feed the model and the output type generated by the model. This allows for discoverability, as users can receive the schema and understand the data shape requirements of the model.
- Swagger schema generation only works for JSON data. For raw data or non-json structured data (e.g. xml), swagger will not work.
- Required setup:
  - @input_schema and @output_schema decorators must be specified with the data types above the run function.
  - The inference-schema package must be included in the dependencies. The azureml-inference-server-http package includes this dependency by default.
  - More info here.
Logging and Metrics:
- Application insights: There are several cases where application insights will be logged with relevant information. All these will only be logged if application insights is enabled with AML_APPINSIGHTS_ENABLED.
  - Request Log: With every endpoint that is not the “/” health check endpoint, a log is created with information about the following. This is logged in the ‘requests’ table from base images:
    - Request id
    - Response value
    - Request id
    - Client Request Id
    - Container Id
    - Request path
    - URL of request, including params
    - Duration
    - Success (True or false)
    - Start time
    - Response code
    - Http method
  - Model Data Log: When the scoring function is run, a log is created about the model data with the following information from base images. This output is seen in the trace table. For this logging to take place, MDC must also be enabled, with AML_MODEL_DC_STORAGE_ENABLED
    - Container Id
    - Request Id
    - Client Request Id
    - Workspace Name
    - Service Name
    - Models
    - Input
    - Prediction
  - Exception Log: Writes telemetry when exception is encountered. The following details are logged. This output can be seen in the ‘exceptions’ table from base image, along with other information:
    - Container ID
    - Request ID
    - Client Request Id
  - Print Hook:
    - This class intercepts stdout/stderr output, appends a comma-separated, prefix, sends the modified message to syslog and then sends the unmodified message back to the original destination.
    - All messages within the user run function with the request-id prefix automatically prepended as follows:
      - 04c6f58d-510f-4e3a-933e-60ac20f2707d,User run function invoked.
    - Within the init function, a series of zeroes will be prepended as follows:
      - 00000000-0000-0000-0000-000000000000,User init function invoked.
    - This will be visible in the trace logs under STDOUT

Runtime Environment:

The following variables come from the environment created by individual requests and then used within the base image codebase. These are created as defined by the wsgi standard and part of the gunicorn dependencies. Read more about the wsgi environment variables here.

Request ID: If the x-ms-request-id is specified as a request header, the runtime environment will contain the following:

Variable	Default Value	Purpose
HTTP_X_MS_REQUEST_ID	Generated UUID	Used as the unique id for logs related to the request

This header will be deprecated in future.

Request ID: If the x-request-id is specified as a request header, the runtime environment will contain the following:

Variable	Default Value	Purpose
HTTP_X_REQUEST_ID	Generated UUID	Used as the unique id for logs related to the request

Currently copying the value of x-ms-request-id by default if it is not specified in the header. It is used as a single ID across MIR-FD/ScoringFE - envoy-on-vm - user_container to track an individual request. This ID is generated by AzureML service and make sure it is unique for every request.

Logging to AppInsights:

In the table below, the columns with the same letter will see the same value.

Request			Response			AppInsights
x-request-id	x-ms-request-id	x-ms-client-request-id	x-request-id	x-ms-request-id	x-ms-client-request-id	Request ID	Client Request Id
-	-	-	A	A	-	A	-
A	-	-	A	A	-	A	-
-	B	-	A	B	B	A	B
-	-	C	A	C	C	A	C
A	B	-	A	B	B	A	B
-	B	C	A	B	C	A	C
A	-	C	A	C	C	A	C
A	B	C	A	B	C	A	C

Client Request ID: If the x-ms-client-request-id is specified as a request header, the runtime environment will contain the following:

Variable	Default Value	Purpose
HTTP_X_CLIENT_REQUEST_ID	EMPTY	Users can use this id to associate and track their own end to end scenario

e.g. call service A, then call an AzureML endpoint, then call service B. In all three calls the client can use the same x-ms-client-request-id to track this end to end scenario for further investigation.

Trace ID: If the trace-id is specified as a request header, this ID will be passed back in the response
Server Version: If sending of the server version as part of response is not disabled then x-ms-server-version is sent in the response.
Logging: Several of the environment variables from the runtime environment are used for logging.

Variable	Default Value	Purpose
REQUEST_METHOD	GET	Request method
QUERY_STRING	None	Query Parameters
PATH_INFO	/	URL path of target within application
HTTP_HOST	SERVER_NAME/unknown	Host name, while SERVER_NAME is server name. These are generally equivalent, however the wsgi docs recommend using HTTP_HOST for URL reconstruction over SERVER_NAME

Separately, we also have some environment variables defined during the run script of gunicorn. These are set during the start of each gunicorn process.

Variable	Value	Purpose
LD_LIBRARY_PATH	AZUREM\L_CONDA_ENVIRONMENT_PATH (Path to Conda environment)	Defines the run-time shared library loader
PYTHONPATH	AML_SERVER_ROOT	Adds additional directories where Python will look for modules and packages

Server Configuration:

Environment variables (dynamic configuration):

There are many environment variables that allow for dynamic configuration. I have broken up the environment variables by which parts of the server can be modified. All of these variables can be set in the dockerfile or can be set during deployment, depending on the use case.

Server Setup: These environment variables define the paths to the app and server that are defined before the initial app is built.

Variable	Default Value	Purpose
AML_APP_ROOT	/var/azureml-app	Root directory for the app
AML_SERVER_ROOT	Current directory Real path	Root directory for the server.
AML_ENTRY_SCRIPT	None	Path to entry script file
AML_SOURCE_DIRECTORY	None	Path to source directory
AZUREML_MODEL_DIR	None	Directory of the model
SERVER_VERSION_LOG_RESPONSE_ENABLED	None	Disable sending the server version as part of response
AML_CORS_ORIGINS	None	Enable CORS for the specified origins

AML Blueprint setup: The following environment variables are used when configuring the Blueprint for the server

Variable	Default Value	Purpose
SERVICE_NAME	ML Service	Name of the service (used for Swagger schema generation)
SERVICE_PATH_PREFIX	None	Prefix for the service path (used for Swagger schema generation)
SERVICE_VERSION	1.0	Version of the service (used for Swagger schema generation)
SCORING_TIMEOUT_MS	1 Hour	Dictates how long scoring function with run before timeout.

Gunicorn configuration:

Variable	Default Value	Purpose
WORKER_COUNT	1	Number of Gunicorn workers to create
WORKER_TIMEOUT	300	Amount of time master waits for the worker to contact it before the worker is killed
WORKER_PRELOAD	False	Indicates whether "preload_app" is set to true in Gunicorn, which means that the application code is loaded before workers are forked and that shared memory is used.

Logging: The following environment variables are used for logging purposes.

Variable	Default Value	Purpose
AZUREML_LOG_LEVEL	INFO	Sets the Logging level
AML_DBG_MODEL_INFO	None	Debug Model logging will take place if this is true
AML_APP_INSIGHTS_ENABLED	None	Enables Appinsights
AML_APP_INSIGHTS_KEY	None	Key to user AppInsights
AML_APP_INSIGHTS_ENDPOINT	https://dc.services.visualstudio.com/v2/track	Endpoint of AppInsights
AML_MODEL_DC_STORAGE_ENABLED	None	Enables Model Data Collection
HOSTNAME	None	Container name
WORKSPACE_NAME	None	User workspace name

Network configuration:

Gunicorn: This is the static network configuration for base image.
Server socket: 127.0.0.1:31311. This defines where the server will be run by gunicorn.
Nginx:
- The server listens on port 5001 and sets the proxied server to port 31311

Variable	Value	Purpose
proxy_pass	http://127.0.0.1:313111	Sets the protocol and address of a proxied server.
proxy_connect_timeout	1000s	Defines a timeout for establishing a connection with a proxied server.
proxy_read_timeout	1000s	Defines a timeout for reading a response from the proxied server.
client_max_body_size	100m	Sets the maximum allowed size of the client request body.

CORS Support:

Cross-origin resource sharing is a way to allow resources on a webpage to be requested from another domain. CORS works via HTTP headers sent with the client request and returned with the service response. For more information on CORS and valid headers, see Cross-origin resource sharing in Wikipedia.

Users can specify the domains allowed for access through the AML_CORS_ORIGINS environment variable, as a comma separated list of domains, such as www.microsoft.com, www.bing.com. While discouraged, users can also set it to * to allow access from all domains. CORS is disabled if this environment variable is not set.

Existing usage to employ @rawhttp as a way to specify CORS header is not affected, and can be used if you need more granular control of CORS (such as the need to specify other CORS headers). See here for an example.

Load Server Config from JSON:

Server supports the loading of the config using a json file. Config file can be specified using:

The env variable AZUREML_CONFIG_FILE (the absolute path to the json configuration file).
CLI parameter --config_file.

Note:

All the paths mentioned in the config.json (configuration file) should be absolute path.
Priority: CLI > ENV Variable > config file

config.json will be searched in below locations by default if config file is not provided explicitly using env variable/CLI paramter:

AML_APP_ROOT directory
Directory containing the scoring script

Config file will support only below keys:

Key	Required	Default Value
AML_APP_ROOT	No	"/var/azureml-app"
AZUREML_SOURCE_DIRECTORY	No
AZUREML_ENTRY_SCRIPT	Yes
SERVICE_NAME	No	"ML service"
WORKSPACE_NAME	No	""
SERVICE_PATH_PREFIX	No	""
SERVICE_VERSION	No	"1.0"
SCORING_TIMEOUT_MS	No	3600 * 1000
AZUREML_LOG_LEVEL	No	"INFO"
AML_APP_INSIGHTS_ENABLED	No	False
AML_APP_INSIGHTS_KEY	No	None
AML_MODEL_DC_STORAGE_ENABLED	No	False
APP_INSIGHTS_LOG_RESPONSE_ENABLED	No	"True"
AML_CORS_ORIGINS	No	None
AZUREML_MODEL_DIR	No	False
HOSTNAME	No	"Unknown"
AZUREML_DEBUG_PORT	No	None

The code for the config can be found here: config.py.

Sample config.json:

{
      "AZUREML_ENTRY_SCRIPT": "/mnt/d/tests/manual/default_score.py"
      "AML_CORS_ORIGINS": "www.microsoft.com ",
      "SCORING_TIMEOUT_MS": 6000,
      "AML_APP_INSIGHTS_ENABLED": true
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AzureML Inference Server

Inference Server contents

Runtime Environment:

Server Configuration:

Environment variables (dynamic configuration):

Network configuration:

CORS Support:

Load Server Config from JSON:

FilesExpand file tree

AzureMLInferenceServer.md

Latest commit

History

AzureMLInferenceServer.md

File metadata and controls

AzureML Inference Server

Inference Server contents

Runtime Environment:

Server Configuration:

Environment variables (dynamic configuration):

Network configuration:

CORS Support:

Load Server Config from JSON: