The inference server is the component that facilitates inferencing to deployed models. Requests made to the HTTP server run user-provided code that interfaces with the user models.
This server is used with most images in the Azure ML ecosystem, and is considered the primary component of the base image, as it contains the python assets required for inferencing.
-
Scoring Endpoints:
- The table below details the different types of requests that can be made on the /score path and how input value can be sent with the request. The GET request uses query parameters to send the input, while the POST request uses the request body that can be deserialized to JSON.
| Request Type | Query Parameters | Request Body | Raw Data |
|---|---|---|---|
| GET | ☑ | ☒ | ☑ |
| POST | ☒ | ☑ | ☑ |
| OPTIONS | ☒ | ☒ | ☑ |
-
Raw Data:
-
Required Setup: @rawhttp decorator on the run function. More info here.
-
All of the request types can use Raw Data, which can be set by using a @rawhttp decorator on the user run function. This allows for raw data (such as binary data) to be sent to the run function and then used by the model. The run function can then directly use AMLResponse to build some HTTP response and respond with the output of the model.
-
-
Health:
- The “/” endpoint is the health check endpoint. A GET request can be made to this endpoint. A response of the plain text string “Healthy” is expected. The cluster will check this frequently to determine whether the service is healthy.
-
Schema / Discoverability:
- Swagger schemas can be generated to understand the input type to feed the model and the output type generated by the model. This allows for discoverability, as users can receive the schema and understand the data shape requirements of the model.
- Swagger schema generation only works for JSON data. For raw data or non-json structured data (e.g. xml), swagger will not work.
- Required setup:
-
@input_schemaand@output_schemadecorators must be specified with the data types above the run function. -
The inference-schema package must be included in the dependencies. The
azureml-inference-server-httppackage includes this dependency by default. -
More info here.
-
-
Logging and Metrics:
- Application insights: There are several cases where application insights will be logged with relevant information. All these will only be logged if application insights is enabled with
AML_APPINSIGHTS_ENABLED.-
Request Log: With every endpoint that is not the “/” health check endpoint, a log is created with information about the following. This is logged in the ‘requests’ table from base images:
- Request id
- Response value
- Request id
- Client Request Id
- Container Id
- Request path
- URL of request, including params
- Duration
- Success (True or false)
- Start time
- Response code
- Http method
-
Model Data Log: When the scoring function is run, a log is created about the model data with the following information from base images. This output is seen in the
tracetable. For this logging to take place, MDC must also be enabled, withAML_MODEL_DC_STORAGE_ENABLED- Container Id
- Request Id
- Client Request Id
- Workspace Name
- Service Name
- Models
- Input
- Prediction
-
Exception Log: Writes telemetry when exception is encountered. The following details are logged. This output can be seen in the ‘exceptions’ table from base image, along with other information:
- Container ID
- Request ID
- Client Request Id
-
Print Hook:
- This class intercepts stdout/stderr output, appends a comma-separated, prefix, sends the modified message to syslog and then sends the unmodified message back to the original destination.
- All messages within the user run function with the request-id prefix automatically prepended as follows:
04c6f58d-510f-4e3a-933e-60ac20f2707d,User run function invoked.
- Within the init function, a series of zeroes will be prepended as follows:
00000000-0000-0000-0000-000000000000,User init function invoked.
- This will be visible in the
tracelogs under STDOUT
-
- Application insights: There are several cases where application insights will be logged with relevant information. All these will only be logged if application insights is enabled with
The following variables come from the environment created by individual requests and then used within the base image codebase. These are created as defined by the wsgi standard and part of the gunicorn dependencies. Read more about the wsgi environment variables here.
- Request ID: If the
x-ms-request-idis specified as a request header, the runtime environment will contain the following:
| Variable | Default Value | Purpose |
|---|---|---|
| HTTP_X_MS_REQUEST_ID | Generated UUID | Used as the unique id for logs related to the request |
This header will be deprecated in future.
- Request ID: If the
x-request-idis specified as a request header, the runtime environment will contain the following:
| Variable | Default Value | Purpose |
|---|---|---|
| HTTP_X_REQUEST_ID | Generated UUID | Used as the unique id for logs related to the request |
Currently copying the value of x-ms-request-id by default if it is not specified in the header. It is used as a single ID across MIR-FD/ScoringFE - envoy-on-vm - user_container to track an individual request. This ID is generated by AzureML service and make sure it is unique for every request.
Logging to AppInsights:
In the table below, the columns with the same letter will see the same value.
| Request | Response | AppInsights | |||||
|---|---|---|---|---|---|---|---|
| x-request-id | x-ms-request-id | x-ms-client-request-id | x-request-id | x-ms-request-id | x-ms-client-request-id | Request ID | Client Request Id |
| - | - | - | A | A | - | A | - |
| A | - | - | A | A | - | A | - |
| - | B | - | A | B | B | A | B |
| - | - | C | A | C | C | A | C |
| A | B | - | A | B | B | A | B |
| - | B | C | A | B | C | A | C |
| A | - | C | A | C | C | A | C |
| A | B | C | A | B | C | A | C |
- Client Request ID: If the
x-ms-client-request-idis specified as a request header, the runtime environment will contain the following:
| Variable | Default Value | Purpose |
|---|---|---|
| HTTP_X_CLIENT_REQUEST_ID | EMPTY | Users can use this id to associate and track their own end to end scenario |
e.g. call service A, then call an AzureML endpoint, then call service B. In all three calls the client can use the same x-ms-client-request-id to track this end to end scenario for further investigation.
- Trace ID: If the
trace-idis specified as a request header, this ID will be passed back in the response - Server Version: If sending of the server version as part of response is not disabled then
x-ms-server-versionis sent in the response. - Logging: Several of the environment variables from the runtime environment are used for logging.
| Variable | Default Value | Purpose |
|---|---|---|
| REQUEST_METHOD | GET | Request method |
| QUERY_STRING | None | Query Parameters |
| PATH_INFO | / | URL path of target within application |
| HTTP_HOST | SERVER_NAME/unknown | Host name, while SERVER_NAME is server name. These are generally equivalent, however the wsgi docs recommend using HTTP_HOST for URL reconstruction over SERVER_NAME |
- Separately, we also have some environment variables defined during the run script of gunicorn. These are set during the start of each gunicorn process.
| Variable | Value | Purpose |
|---|---|---|
| LD_LIBRARY_PATH | AZUREM\L_CONDA_ENVIRONMENT_PATH (Path to Conda environment) | Defines the run-time shared library loader |
| PYTHONPATH | AML_SERVER_ROOT | Adds additional directories where Python will look for modules and packages |
There are many environment variables that allow for dynamic configuration. I have broken up the environment variables by which parts of the server can be modified. All of these variables can be set in the dockerfile or can be set during deployment, depending on the use case.
- Server Setup: These environment variables define the paths to the app and server that are defined before the initial app is built.
| Variable | Default Value | Purpose |
|---|---|---|
| AML_APP_ROOT | /var/azureml-app | Root directory for the app |
| AML_SERVER_ROOT | Current directory Real path | Root directory for the server. |
| AML_ENTRY_SCRIPT | None | Path to entry script file |
| AML_SOURCE_DIRECTORY | None | Path to source directory |
| AZUREML_MODEL_DIR | None | Directory of the model |
| SERVER_VERSION_LOG_RESPONSE_ENABLED | None | Disable sending the server version as part of response |
| AML_CORS_ORIGINS | None | Enable CORS for the specified origins |
- AML Blueprint setup: The following environment variables are used when configuring the Blueprint for the server
| Variable | Default Value | Purpose |
|---|---|---|
| SERVICE_NAME | ML Service | Name of the service (used for Swagger schema generation) |
| SERVICE_PATH_PREFIX | None | Prefix for the service path (used for Swagger schema generation) |
| SERVICE_VERSION | 1.0 | Version of the service (used for Swagger schema generation) |
| SCORING_TIMEOUT_MS | 1 Hour | Dictates how long scoring function with run before timeout. |
- Gunicorn configuration:
| Variable | Default Value | Purpose |
|---|---|---|
| WORKER_COUNT | 1 | Number of Gunicorn workers to create |
| WORKER_TIMEOUT | 300 | Amount of time master waits for the worker to contact it before the worker is killed |
| WORKER_PRELOAD | False | Indicates whether "preload_app" is set to true in Gunicorn, which means that the application code is loaded before workers are forked and that shared memory is used. |
- Logging: The following environment variables are used for logging purposes.
| Variable | Default Value | Purpose |
|---|---|---|
| AZUREML_LOG_LEVEL | INFO | Sets the Logging level |
| AML_DBG_MODEL_INFO | None | Debug Model logging will take place if this is true |
| AML_APP_INSIGHTS_ENABLED | None | Enables Appinsights |
| AML_APP_INSIGHTS_KEY | None | Key to user AppInsights |
| AML_APP_INSIGHTS_ENDPOINT | https://dc.services.visualstudio.com/v2/track | Endpoint of AppInsights |
| AML_MODEL_DC_STORAGE_ENABLED | None | Enables Model Data Collection |
| HOSTNAME | None | Container name |
| WORKSPACE_NAME | None | User workspace name |
-
Gunicorn: This is the static network configuration for base image.
-
Server socket: 127.0.0.1:31311. This defines where the server will be run by gunicorn.
-
Nginx:
- The server listens on port 5001 and sets the proxied server to port 31311
| Variable | Value | Purpose |
|---|---|---|
| proxy_pass | http://127.0.0.1:313111 | Sets the protocol and address of a proxied server. |
| proxy_connect_timeout | 1000s | Defines a timeout for establishing a connection with a proxied server. |
| proxy_read_timeout | 1000s | Defines a timeout for reading a response from the proxied server. |
| client_max_body_size | 100m | Sets the maximum allowed size of the client request body. |
Cross-origin resource sharing is a way to allow resources on a webpage to be requested from another domain. CORS works via HTTP headers sent with the client request and returned with the service response. For more information on CORS and valid headers, see Cross-origin resource sharing in Wikipedia.
Users can specify the domains allowed for access through the AML_CORS_ORIGINS environment variable, as a comma
separated list of domains, such as www.microsoft.com, www.bing.com. While discouraged, users can also set it to
* to allow access from all domains. CORS is disabled if this environment variable is not set.
Existing usage to employ @rawhttp as a way to specify CORS header is not affected, and can be used if you need more
granular control of CORS (such as the need to specify other CORS headers). See here
for an example.
Server supports the loading of the config using a json file. Config file can be specified using:
- The env variable
AZUREML_CONFIG_FILE(the absolute path to the json configuration file). - CLI parameter --config_file.
Note:
- All the paths mentioned in the config.json (configuration file) should be absolute path.
- Priority: CLI > ENV Variable > config file
config.json will be searched in below locations by default if config file is not provided explicitly using env variable/CLI paramter:
- AML_APP_ROOT directory
- Directory containing the scoring script
Config file will support only below keys:
| Key | Required | Default Value |
|---|---|---|
| AML_APP_ROOT | No | "/var/azureml-app" |
| AZUREML_SOURCE_DIRECTORY | No | |
| AZUREML_ENTRY_SCRIPT | Yes | |
| SERVICE_NAME | No | "ML service" |
| WORKSPACE_NAME | No | "" |
| SERVICE_PATH_PREFIX | No | "" |
| SERVICE_VERSION | No | "1.0" |
| SCORING_TIMEOUT_MS | No | 3600 * 1000 |
| AZUREML_LOG_LEVEL | No | "INFO" |
| AML_APP_INSIGHTS_ENABLED | No | False |
| AML_APP_INSIGHTS_KEY | No | None |
| AML_MODEL_DC_STORAGE_ENABLED | No | False |
| APP_INSIGHTS_LOG_RESPONSE_ENABLED | No | "True" |
| AML_CORS_ORIGINS | No | None |
| AZUREML_MODEL_DIR | No | False |
| HOSTNAME | No | "Unknown" |
| AZUREML_DEBUG_PORT | No | None |
The code for the config can be found here: config.py.
Sample config.json:
{
"AZUREML_ENTRY_SCRIPT": "/mnt/d/tests/manual/default_score.py"
"AML_CORS_ORIGINS": "www.microsoft.com ",
"SCORING_TIMEOUT_MS": 6000,
"AML_APP_INSIGHTS_ENABLED": true
}