Creating managed online endpoints in Azure ML

Created:

Updated:

Topic: Azure ML

Introduction

Suppose you’ve trained a machine learning model to accomplish some task, and you’d now like to provide that model’s inference capabilities as a service. Maybe you’re writing an application of your own that will rely on this service, or perhaps you want to make the service available to others. This is the purpose of endpoints — they provide a simple web-based API for feeding data to your model and getting back inference results.

Azure ML currently supports three types of endpoints: batch endpoints, Kubernetes online endpoints, and managed online endpoints. I’m going to focus on managed online endpoints in this post, but let me start by explaining how the three types differ.

Diagram showing an overview of the types of endpoints.

Batch endpoints are designed to handle large requests, working asynchronously and generating results that are held in blob storage. Because compute resources are only provisioned when the job starts, the latency of the response is higher than using online endpoints. However, that can result in substantially lower costs. Online endpoints, on the other hand, are designed to quickly process smaller requests and provide near-immediate responses. Compute resources are provisioned at the time of deployment, and are always up and running, which depending on your scenario may mean higher costs than batch endpoints. However, you get real-time responses, which is criticial to many scenarios. If you want to deploy an online endpoint, you have two options: Kubernetes online endpoints allow you to manage your own compute resources using Kubernetes, while managed online endpoints rely on Azure to manage compute resources, OS updates, scaling, and security. For more information about the different endpoint types and which one is right for you, check out the documentation.

In this post, I’ll show you how to work with managed online endpoints. We’ll start by getting familiar with our PyTorch model. We’ll then write a scoring function that loads the model and performs predictions based on user input. After that, we’ll explore several different options for creating managed online endpoints that call our scoring function. And finally, I’ll demonstrate a couple of ways to invoke our endpoints.

The code for this project can be found on GitHub.

Throughout this post, I’ll assume you’re familiar with machine learning concepts like training and prediction, but I won’t assume familiarity with Azure.

Azure ML setup

Here’s how you can set up Azure ML to follow the steps in this post.

  • You need to have an Azure subscription. You can get a free subscription to try it out.
  • Create a resource group.
  • Create a new machine learning workspace by following the “Create the workspace” section of the documentation. Keep in mind that you’ll be creating a “machine learning workspace” Azure resource, not a “workspace” Azure resource, which is entirely different!
  • If you have access to GitHub Codespaces, click on the “Code” button in this GitHub repo, select the “Codespaces” tab, and then click on “New codespace.”
  • Alternatively, if you plan to use your local machine:
    • Install the Azure CLI by following the instructions in the documentation.
    • Install the ML extension to the Azure CLI by following the “Installation” section of the documentation.
  • In a terminal window, login to Azure by executing az login --use-device-code.
  • Set your default subscription by executing az account set -s "<YOUR_SUBSCRIPTION_NAME_OR_ID>". You can verify your default subscription by executing az account show, or by looking at ~/.azure/azureProfile.json.
  • Set your default resource group and workspace by executing az configure --defaults group="<YOUR_RESOURCE_GROUP>" workspace="<YOUR_WORKSPACE>". You can verify your defaults by executing az configure --list-defaults or by looking at ~/.azure/config.
  • You can now open the Azure Machine Learning studio, where you’ll be able to see and manage all the machine learning resources we’ll be creating.
  • Although not essential to run the code in this post, I highly recommend installing the Azure Machine Learning extension for VS Code.

You’re now ready to start working with Azure ML!

Training and saving the models

To keep this post simple and focused on endpoints, I provide the already trained model in the GitHub project, under model. This way you can go straight to learning Azure ML endpoints without having to run any code.

If you want to re-create the model provided, you first need to create and activate the conda environment. If you’re running this project on Codespaces, there’s nothing to do — the conda environment is created and activated automatically when the container is created. If you’re running the code locally, you’ll need to execute the following commands from the root of the GitHub repo:

conda env create -f environment.yml
conda activate aml-managed-endpoint

You can then run src/train.py, which saves the model using the following code:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/src/train.py
  ...
  torch.save(model.state_dict(), path)
  ...

For a full explanation of the PyTorch training code, check out my PyTorch blog post.

If you’d like to train on Azure, you can look at the documentation on how to do that.

Creating the model on Azure

Before we can deploy our ML model, we need to create an Azure ML resource that brings it to the cloud. There are a few different ways to create this resource — my preferred way is to use YAML files, so this is the method I’ll show in this post.

Let’s start by looking at the YAML file for our model, cloud/model.yml. You can see that this file starts by specifying a schema, which is super helpful because it enables VS Code to make suggestions and highlight any mistakes we make. The attributes in this file make it clear that an Azure ML model consists of a name, a version, and a path to the location where we saved the trained model files locally:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/model.yml
$schema: https://azuremlschemas.azureedge.net/latest/model.schema.json
name: model-managed
version: 1
path: "../model/weights.pth"

How will you select the correct schema when creating a new resource? You can always copy the schemas from my blog or from the documentation, but the easiest way is to use the Azure Machine Learning extension for VS Code. If you have it installed, you can select the Azure icon in VS Code’s left navigation pane, log into Azure, expand your subscription and ML workspace, select “Models,” and click the ”+” button to create a YAML file with the correct model schema and attributes.

Screenshot showing how to create a new model using the Azure Machine Learning extension for VS Code.

Now that you have the YAML file containing the model specifications, you can create the model resource in the cloud. If you have the Azure ML extension installed, you can do so by right clicking anywhere on the open YAML file, and selecting “Azure ML: Execute YAML.” Alternatively, you can run the following CLI command in the terminal:

az ml model create -f cloud/model.yml

If you go to the Azure ML studio, and use the left navigation to go to the “Models” page, you’ll see your newly created model listed there.

In order to deploy our model as an Azure ML endpoint, we’ll use deployment and endpoint YAML files to specify the details of the configuration. I’ll show bits and pieces of these YAML files throughout the rest of this post as I present each setting. We’ll create four endpoints with different configurations to help you understand the range of alternatives available to you. If you look at the deployment YAML file for endpoint 1, for example, you’ll notice that it refers to the model resource we just created:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-1/deployment.yml
...
model: azureml:model-managed@latest
...

In this case, we want the deployment to always pick up the latest version of the model, so we add the @latest tag at the end of the model reference. If you wanted a specific version of the model, you could have used the following syntax instead:

...
model: azureml:model-managed:1
...

Creating the scoring files

When invoked, an endpoint will call a scoring file, which we need to provide. This scoring file needs to follow a prescribed structure: it needs to contain an init() function that will be called when the endpoint is created or updated, and a run(...) function that will be called every time the endpoint is invoked. Let’s look at these in more detail.

First we’ll take a look at the init() function:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/src/score.py
import json
import logging
import os

import numpy as np
import torch
from torch import Tensor, nn

from neural_network import NeuralNetwork

model = None
device = None

def init():
    logging.info('Init started')

    global model
    global device

    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    logging.info('Device: %s', device)

    model_path = os.path.join(os.getenv('AZUREML_MODEL_DIR'), 'weights.pth')

    model = NeuralNetwork().to(device)
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()

    logging.info('Init completed')

In our simple scenario, the init() function’s main task is to load the model. Because we saved just the weights, we need to instantiate a new version of the NeuralNetwork class before we can load the saved weights into it. Notice the use of the AZUREML_MODEL_DIR environment variable, which gives us the path to the model root folder on Azure. Notice also that since we’re using PyTorch, we need to ensure that both the loaded weights and the neural network we instantiate are on the same device (GPU or CPU).

I find it useful to add logging.info(...) calls at the beginning and end of the function to make sure that it’s being called as expected. When we cover invoking the endpoint, I’ll show you where to look for the logs. I also like to add a logging.info(...) call that tells me whether the code is running on GPU or CPU, as a sanity check.

Now let’s look at the run(...) function:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/src/score.py
labels_map = {
    0: 'T-Shirt',
    1: 'Trouser',
    2: 'Pullover',
    3: 'Dress',
    4: 'Coat',
    5: 'Sandal',
    6: 'Shirt',
    7: 'Sneaker',
    8: 'Bag',
    9: 'Ankle Boot',
}

def predict(trained_model: nn.Module, x: Tensor) -> torch.Tensor:
    with torch.no_grad():
        y_prime = trained_model(x)
        probabilities = nn.functional.softmax(y_prime, dim=1)
        predicted_indices = probabilities.argmax(1)
    return predicted_indices
  
def run(raw_data):
    logging.info('Run started')

    x = json.loads(raw_data)['data']
    x = np.array(x).reshape((1, 1, 28, 28))
    x = torch.from_numpy(x).float().to(device)

    predicted_index = predict(model, x).item()
    predicted_name = labels_map[predicted_index]

    logging.info('Predicted name: %s', predicted_name)

    logging.info('Run completed')
    return predicted_name

Notice that run(...) takes a raw_data parameter as input, which contains the data we specify when invoking the endpoint. In our scenario, we’ll be passing in a JSON dictionary with a data key corresponding to a 28 × 28 matrix containing an image with float pixel values between 0.0 and 1.0. Our run(...) function loads the JSON, transforms it into a tensor of the format that our predict(...) function expects, calls the predict(...) function, converts the predicted int into a human-readable name, and returns that name.

If you look at the deployment YAML files, you’ll see that they all refer to a Python scoring file:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-1/deployment.yml
...
code_configuration:
  code: ../../src/
  scoring_script: score.py
...

Creating the environments

An Azure Machine Learning environment specifies the runtime where we can run training and prediction code on Azure, along with any additional configuration. You can learn more about the three different environment options available to you in my blog post about environments on Azure ML. Managed online endpoints support all three environment types, and I’ll demonstrate the first two in this section.

The first endpoints demonstrates the use of a curated environment that assumes the unlerlying compute contains no GPU:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-1/deployment.yml
...
environment: azureml:AzureML-pytorch-1.7-ubuntu18.04-py37-cpu-inference@latest
...

And the second endpoint demonstrates the use of a system-managed environment extended by a conda file, which assumes the underlying compute contains a GPU:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-2/deployment.yml
...
environment:
  conda_file: score-conda.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.0.3-cudnn8-ubuntu18.04:latest
...
https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-2/score-conda.yml
name: managed-endpoint-score
channels:
  - pytorch
  - conda-forge
  - defaults
dependencies:
  - python==3.9.5
  - pytorch==1.8.1
  - pip
  - pip:
    - azureml-defaults

Choosing the compute

Now we’ll choose the machine where we’ll be deploying the environments and inference code for our endpoints. You can learn more about this topic in my blog post about compute on Azure ML.

Endpoint 1 of this project relies on a curated environment that runs on the CPU, so there’s no point in paying for a VM with a GPU. For this endpoint, I chose a “Standard_DS3_v2” VM because a small size is enough for my purposes. Endpoint 2 relies on a base image environment that requires GPU support, so we’ll pair it with a GPU VM — I chose a “Standard_NC6s_v3” VM, which is also small. Our scenario doesn’t require a GPU for scoring, but I decided to show both options here because your scenario might be different.

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-1/deployment.yml
...
instance_type: Standard_DS3_v2
...
https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-2/deployment.yml
...
instance_type: Standard_NC6s_v3
...

Choosing the instance count

As the name implies, the instance_count setting determines how many machines you want running at deployment. Since this is just a demo, we’ll set this setting to one for all endpoints.

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-1/deployment.yml
...
instance_count: 1
...

You might want to set it to a higher number in your production code. You can also set auto-scaling to determine the instance count real-time, based on traffic.

Choosing the authentication mode

There are two authentication modes you can choose from: key authentication never expires, while aml_token authentication expires after an hour. The project for this post uses key authentication for all of its endpoints except for endpoint 3, which demonstrates how to use aml_token. The authentication mode can be set in the endpoint YAML in the following way:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-1/endpoint.yml
...
auth_mode: key
...
https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-3/endpoint.yml
...
auth_mode: aml_token
...

The difference between key and aml_token will become clear when we invoke the endpoints.

Notice that this setting affects all deployments in the endpoint, therefore it’s set in the endpoint.yml file, not in the deployment.yml file. The section on “Ensuring a safe rollout” later on in this post explains the differences between a deployment and an endpoint, in the practical sense.

Creating the endpoints

At this point, you’ve learned about every single line of YAML code in all endpoint and deployment specification files of the accompanying project. Let’s look at the complete endpoint and deployment files for our first endpoint:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-1/endpoint.yml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: endpoint-managed-1
auth_mode: key
https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-1/deployment.yml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: endpoint-managed-1
model: azureml:model-managed@latest
code_configuration:
  code: ../../src/
  scoring_script: score.py
environment: azureml:AzureML-pytorch-1.7-ubuntu18.04-py37-cpu-inference@latest
instance_type: Standard_DS3_v2
instance_count: 1

The name of an endpoint needs to be unique within a region. You can change the name of your endpoints in the YAML specification files, or you can pass a unique name to the CLI command at creation time, as shown below. You can create endpoints 1, 2 and 3 using the following CLI commands:

az ml online-endpoint create -f cloud/endpoint-X/endpoint.yml --name <ENDPOINTX>
az ml online-deployment create -f cloud/endpoint-X/deployment.yml --all-traffic --endpoint-name <ENDPOINTX>

You can now go to the Azure ML studio, click on “Endpoints,” and in the “Real-time endpoints” page, you’ll see the list of endpoints you created.

Ensuring a safe rollout

Let’s imagine a scenario where we used a managed online endpoint to deploy our PyTorch model using a machine with a CPU, but our team now decides that we need to use a GPU instead. We change the deployment to use a GPU, and that works fine in our internal testing. But this endpoint is already in use by clients, and we don’t want to disrupt the service. Opening it up to all clients is a risky move that may reveal issues and cause instability.

That’s where Azure ML’s safe rollout feature comes in. Instead of making an abrupt switch, we can use a “blue-green” deployment approach, where we roll out the new version of the code to a small subset of clients, and tune the size of that subset as we go. After ensuring that the clients calling the new version of the code encounter no issues for a while, we can increase the percentage of clients, until we’ve completed the switch.

Endpoint 4 in the accompanying project will enable this scenario by specifying two deployments:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-4/endpoint.yml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: endpoint-managed-4
auth_mode: key
https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-4/deployment-blue.yml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: endpoint-managed-4
model: azureml:model-managed@latest
code_configuration:
  code: ../../src/
  scoring_script: score.py
environment: azureml:AzureML-pytorch-1.7-ubuntu18.04-py37-cpu-inference@latest
instance_type: Standard_DS3_v2
instance_count: 1
https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/cloud/endpoint-4/deployment-green.yml
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: green
endpoint_name: endpoint-managed-4
model: azureml:model-managed@latest
code_configuration:
  code: ../../src/
  scoring_script: score.py
environment:
  conda_file: score-conda.yml
  image: mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.0.3-cudnn8-ubuntu18.04:latest
instance_type: Standard_NC6s_v3
instance_count: 1

You can create the endpoint and deployments for endpoint 4 using CLI commands similar to the ones in the previous section. When you’re ready to adjust their traffic allocation, you can do that with an additional command, as shown below:

az ml online-endpoint create -f cloud/endpoint-4/endpoint.yml --name <ENDPOINT4>
az ml online-deployment create -f cloud/endpoint-4/deployment-blue.yml --all-traffic --endpoint-name <ENDPOINT4>
az ml online-deployment create -f cloud/endpoint-4/deployment-green.yml --endpoint-name <ENDPOINT4>
az ml online-endpoint update --name <ENDPOINT4> --traffic "blue=90 green=10"

For more information about safe rollout, check out the documentation.

Creating the sample request

Before we can invoke the endpoints, we need to create a file containing input data for our prediction code. Recall that in our scenario, the run(...) function takes in the JSON representation of a single image encoded as a 28 × 28 matrix, and returns the class that the image belongs to as a string, such as “Shirt.”

We can easily get an image file from our dataset for testing, but we still need to convert it into JSON. You can find code to create a JSON sample request in src/create_sample_request.py. This code loads Fashion MNIST data, gets an image from the dataset, creates a matrix of shape 28 × 28 containing the image’s pixel values, and adds it to a JSON dictionary with key data.

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/src/create_sample_request.py
import json
import os
from pathlib import Path

from train import _get_data

DATA_PATH = 'aml-managed-endpoint/data'
SAMPLE_REQUEST = 'aml-managed-endpoint/sample-request'


def create_sample_request() -> None:
    """Creates a sample request to be used in prediction."""
    batch_size = 64
    (_, test_dataloader) = _get_data(batch_size)

    (x_batch, _) = next(iter(test_dataloader))
    x = x_batch[0, 0, :, :].cpu().numpy().tolist()

    os.makedirs(name=SAMPLE_REQUEST, exist_ok=True)
    with open(Path(SAMPLE_REQUEST, 'sample_request.json'),
              'w',
              encoding='utf-8') as file:
        json.dump({'data': x}, file)


def main() -> None:
    create_sample_request()


if __name__ == '__main__':
    main()

Here’s a bit of the generated sample_request.json file:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/sample-request/sample_request.json
{"data": [[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.42352941632270813, 0.43921568989753723, 0.46666666865348816, 0.3921568691730499, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.16
...

I’ve checked in the sample request JSON, so you only need to run this code if you want to re-generate it.

Invoking the endpoints using the CLI

We’re ready to invoke the endpoints!

Let’s first invoke them using the CLI. The only two pieces of information we need to pass to the invocation are the name of the endpoint and the request file, as you can see below. (Replace <ENDPOINTX> with the name of the endpoint you’d like to invoke.)

az ml online-endpoint invoke --request-file sample-request/sample_request.json -n <ENDPOINTX> 
"\"Shirt\""

Let’s take a look at the logs for this endpoint, by going to the Azure ML studio, clicking on the endpoint name, and then “Deployment logs.”

Screenshot of the logs for an endpoint invocation.

If you scroll down a bit, you’ll find the logging we added to the init() function of the scoring file. I invoked the endpoint twice, so I can also see the logging of the run(...) function printed twice.

Screenshot of the logs for an endpoint invocation showing our custom logging.

Invoking the endpoints using REST

We can also invoke the endpoint using the REST (representational state transfer) protocol. Let’s now come back to the two different authentication modes, key and aml_token, and see how we can invoke endpoints created with each of these alternatives.

Let’s first consider the key authentication mode, which we used for endpoint 1. To find the REST scoring URI for this endpoint and its authentication key, we go to the Azure ML studio, select “Endpoints,” click on the name of the endpoint, and then select the “Consume” tab.

Screenshot showing the REST scoring URI and key for the endpoint created using key authentication.

The bearer token used in the request can be found in the same panel, under “Authentication.” In key authentication mode, our key never expires, so we don’t need to worry about refreshing it. We can execute the following curl command to do a POST that invokes the endpoint:

curl --location \
     --request POST https://<ENDPOINT1>.westus2.inference.ml.azure.com/score \
     --header "Authorization: Bearer NXdYObRnl2KhCE7ldFzgIUAevDupm6ZB" \
     --header "Content-Type: application/json" \
     --data @sample-request/sample_request.json
"Shirt"%

Make sure you replace <ENDPOINT1> with the name of your endpoint.

Similar to the CLI invocation, we get a string back, such as “Shirt.”

Now let’s consider endpoint 3, which was created using aml_token authentication mode.

Screenshot showing the REST scoring URI for the endpoint created using aml_token authentication.

As you can see, just like in the previous endpoint, the Azure ML studio gives us a REST scoring URI. And even though it doesn’t give us a token, it tells us what we need to do to get one. Let’s follow the instructions and execute the following command:

az ml online-endpoint get-credentials --name <ENDPOINT3>

You’ll get a JSON dictionary with key accessToken and a long string value, which we’ll abbreviate as <TOKEN>. We can now use it to invoke the endpoint:

curl --location --request POST https://<ENDPOINT3>.westus2.inference.ml.azure.com/score \
     --header "Authorization: Bearer <TOKEN>" \
     --header "Content-Type: application/json" \
     --data @sample-request/sample_request.json
"Shirt"%

Tokens expire after one hour, and you can refresh them by executing the same get-credentials call I show above.

The GitHub project for this post contains shell executable files that you can use to invoke these two endpoints. Feel free to reuse them in your project — just make sure to change the endpoint name and location of the file to score. Here are the contents of these files:

https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/rest/invoke_key.sh
ENDPOINT_NAME=endpoint-managed-1

SCORING_URI=$(az ml online-endpoint show --name $ENDPOINT_NAME --query scoring_uri -o tsv)
echo "SCORING_URI: $SCORING_URI"

PRIMARY_KEY=$(az ml online-endpoint get-credentials --name $ENDPOINT_NAME --query primaryKey -o tsv)
echo "PRIMARY_KEY: $PRIMARY_KEY"

OUTPUT=$(curl --location \
     --request POST $SCORING_URI \
     --header "Authorization: Bearer $PRIMARY_KEY" \
     --header "Content-Type: application/json" \
     --data @sample-request/sample_request.json)
echo "OUTPUT: $OUTPUT"
https://github.com/bstollnitz/aml-managed-endpoint/blob/master/aml-managed-endpoint/rest/invoke_aml_token.sh
ENDPOINT_NAME=endpoint-managed-3

SCORING_URI=$(az ml online-endpoint show --name $ENDPOINT_NAME --query scoring_uri -o tsv)
echo "SCORING_URI: $SCORING_URI"

ACCESS_TOKEN=$(az ml online-endpoint get-credentials --name $ENDPOINT_NAME --query accessToken -o tsv)
echo "PRIMARY_KEY: $ACCESS_TOKEN"

OUTPUT=$(curl --location \
     --request POST $SCORING_URI \
     --header "Authorization: Bearer $ACCESS_TOKEN" \
     --header "Content-Type: application/json" \
     --data @sample-request/sample_request.json)
echo "OUTPUT: $OUTPUT"

You can now invoke the endpoints by simply running these scripts:

rest/invoke_key.sh
rest/invoke_aml_token.sh

Conclusion

In this post, you’ve seen how to create and invoke managed online endpoints using Azure ML. There are many methods for creating Azure ML resources — here I showed how to use YAML files to specify the details for each resource, and how to use VS Code or the CLI to create them in the cloud. I then presented the main concepts you need to know to make the right choices when creating these YAML file. And finally, I showed different ways to invoke an endpoint. I hope that you learned something new, and that you’ll try these features on your own!

The complete code for this post can be found on GitHub.

Thank you to Sethu Raman from the Azure ML team at Microsoft for reviewing the content in this post.