Troubleshooting with a local model deployment

Article
11/18/2022

Try a local model deployment as a first step in troubleshooting deployment to Azure Container Instances (ACI) or Azure Kubernetes Service (AKS). Using a local web service makes it easier to spot and fix common Azure Machine Learning Docker web service deployment errors.

Prerequisites

An Azure subscription. Try the trial subscription.
Option A (Recommended) - Debug locally on Azure Machine Learning Compute Instance
- An Azure Machine Learning Workspace with compute instance running
Option B - Debug locally on your compute
- The Azure Machine Learning SDK.
- The Azure CLI.
- The CLI extension for Azure Machine Learning.
- Have a working Docker installation on your local system.
- To verify your Docker installation, use the command docker run hello-world from a terminal or command prompt. For information on installing Docker, or troubleshooting Docker errors, see the Docker Documentation.
Option C - Enable local debugging with Azure Machine Learning inference HTTP server.
- The Azure Machine Learning inference HTTP server (preview) is a Python package that allows you to easily validate your entry script (score.py) in a local development environment. If there's a problem with the scoring script, the server will return an error. It will also return the location where the error occurred.
- The server can also be used when creating validation gates in a continuous integration and deployment pipeline. For example, start the server with thee candidate script and run the test suite against the local endpoint.

Azure Machine learning inference HTTP server

The local inference server allows you to quickly debug your entry script (score.py). In case the underlying score script has a bug, the server will fail to initialize or serve the model. Instead, it will throw an exception & the location where the issues occurred. Learn more about Azure Machine Learning inference HTTP Server

Install the azureml-inference-server-http package from the pypi feed:
```
python -m pip install azureml-inference-server-http
```
Start the server and set score.py as the entry script:
```
azmlinfsrv --entry_script score.py
```
Send a scoring request to the server using curl:
```
curl -p 127.0.0.1:5001/score
```

Note

Learn frequently asked questions about Azure machine learning Inference HTTP server.

Debug locally

You can find a sample local deployment notebook in the MachineLearningNotebooks repo to explore a runnable example.

Warning

Local web service deployments are not supported for production scenarios.

To deploy locally, modify your code to use LocalWebservice.deploy_configuration() to create a deployment configuration. Then use Model.deploy() to deploy the service. The following example deploys a model (contained in the model variable) as a local web service:

APPLIES TO: Python SDK azureml v1

from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig, Model
from azureml.core.webservice import LocalWebservice


# Create inference configuration based on the environment definition and the entry script
myenv = Environment.from_conda_specification(name="env", file_path="myenv.yml")
inference_config = InferenceConfig(entry_script="score.py", environment=myenv)
# Create a local deployment, using port 8890 for the web service endpoint
deployment_config = LocalWebservice.deploy_configuration(port=8890)
# Deploy the service
service = Model.deploy(
    ws, "mymodel", [model], inference_config, deployment_config)
# Wait for the deployment to complete
service.wait_for_deployment(True)
# Display the port that the web service is available on
print(service.port)

If you are defining your own conda specification YAML, list azureml-defaults version >= 1.0.45 as a pip dependency. This package is needed to host the model as a web service.

At this point, you can work with the service as normal. The following code demonstrates sending data to the service:

import json

test_sample = json.dumps({'data': [
    [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    [10, 9, 8, 7, 6, 5, 4, 3, 2, 1]
]})

test_sample = bytes(test_sample, encoding='utf8')

prediction = service.run(input_data=test_sample)
print(prediction)

For more information on customizing your Python environment, see Create and manage environments for training and deployment.

Update the service

During local testing, you may need to update the score.py file to add logging or attempt to resolve any problems that you've discovered. To reload changes to the score.py file, use reload(). For example, the following code reloads the script for the service, and then sends data to it. The data is scored using the updated score.py file:

Important

The reload method is only available for local deployments. For information on updating a deployment to another compute target, see how to update your webservice.

service.reload()
print(service.run(input_data=test_sample))

Note

The script is reloaded from the location specified by the InferenceConfig object used by the service.

To change the model, Conda dependencies, or deployment configuration, use update(). The following example updates the model used by the service:

service.update([different_model], inference_config, deployment_config)

Delete the service

To delete the service, use delete().

Inspect the Docker log

You can print out detailed Docker engine log messages from the service object. You can view the log for ACI, AKS, and Local deployments. The following example demonstrates how to print the logs.

# if you already have the service object handy
print(service.get_logs())

# if you only know the name of the service (note there might be multiple services with the same name but different version number)
print(ws.webservices['mysvc'].get_logs())

If you see the line Booting worker with pid: <pid> occurring multiple times in the logs, it means, there isn't enough memory to start the worker. You can address the error by increasing the value of memory_gb in deployment_config

Next steps

Learn more about deployment: