Bea Stollnitz - Mixing methods for creating Azure ML resources

Introduction

Azure ML supports four methods for creating resources: the Azure ML SDK, the Azure ML CLI, the Azure ML Studio, and through REST APIs. Most of the time, data scientists and ML engineers will be working only with the first three methods. If you need a refresher about these methods, I recommend reading my introductory post.

My post that covers basic training and deployment in Azure ML demonstrates how to accomplish a single basic scenario using each of those methods. But it turns out that you don’t need to pick just one method for a particular project — you can mix all three methods! That’s what we’re going to cover in this post.

The scenario we’ll use here is the same one that I explored in my post about pipelines and components — if you’re not familiar with these topics you might want to read that post first.

You can find the code that goes along with this post on GitHub. The README file for the project contains details about Azure and project setup. You’ll also find there a reference list of all the commands explained in this post.

Training and inference on your development machine

As usual, we’ll start by training and testing the model on our development machine. You can run the training code by clicking “Run and Debug” in VS Code’s left navigation, selecting the “Train locally” run configuration, and pressing F5. Then repeat for the “Test locally” run configuration. You can analyze the logged metrics by running the following command:

mlflow ui

The test step should give you an accuracy of about 84 to 85%. Once we’re satisfied with the test accuracy, we can move on to deploying this code in the cloud.

Creating the Azure ML resources

Every Azure ML project involves the creation of a set of cloud resources. If you read my post about pipelines and components where I explain the scenario in this article in detail, you learned that we need to create the following cloud resources:

A compute cluster
A dataset
Two components
A pipeline
A model
An endpoint
A deployment

In that post, I explain in detail how to create all of these resources using the CLI, and I also link to a project where I create all of these resources using the SDK. Once you know those two methods, using the studio should be straightforward. In this project, I will create each of these resources using a different method, to demonstrate that you can mix resource creation methods in a single project. Here’s a diagram highlighting the methods I chose when creating these resources:

Diagram showing my choice of creation method for each resource.

Anything you can do with one creation method can also be accomplished using the other methods. Therefore, you can choose any possible combination of creation methods for the resources in your project!

Compute cluster

We’ll create our compute cluster using the Azure ML Studio. The main disadvantage of using the studio is that any operation we perform there is not easily repeatable. However, since I can reuse this compute cluster for all my future projects, I’m not concerned about repeatability. Also, creating a compute cluster involves many non-trivial decisions, and the user interface of the studio is particularly helpful in guiding us through those decisions.

To create a compute cluster, click on “Compute” in the left navigation, then on “Compute clusters” in the top menu, and then on “New.”

Screenshot of Azure ML Studio UI for creating a compute cluster

A new window opens that guides you through the choices you need to make to select your compute. What I like the most about this page is that it tells me exactly how much available quota I have in my subscription for each type of VM, and it tells me the cost of the VM per hour.

Screenshot of Azure ML Studio UI for creating a compute cluster

You can choose any machine you’d like from this list. I selected “Dedicated,” “CPU,” “Select from all options,” and then I picked “Standard_DS4_v2.”

After pressing “Next,” a new window opens that allows you to choose a compute name, the minimum and maximum number of nodes, and a few other settings.

Screenshot of Azure ML Studio UI for creating a compute cluster

Again, you can give the cluster any unique name you’d like, and choose however many machines your budget allows. For this project, I chose to name the cluster “cluster-cpu” and configured it to scale between 0 and 4 nodes.

You can confirm that your cluster was created by clicking on “Compute” in the left navigation, “Compute clusters,” and then looking for the name of your cluster on that list.

Data

I also decided to create the dataset using the Azure ML Studio. Creating a dataset using the studio UI is straightforward, so I don’t anticipate having to re-create the resource multiple times before I get it right. Also, I intend to reuse this data for multiple projects.

I clicked on “Data” from the left navigation, then on “Create,” and then on “From local files.”

Screenshot of Azure ML Studio UI for creating data.

Then I gave it the name “data-fashion-mnist,” changed the “Dataset type” to “File,” clicked “Next,” then “Browse,” “Browse folder,” and then I selected the “data” folder that was generated in the project when I ran the “train.py” file on my dev machine. Then I clicked “Upload,” “Next,” and “Create.”

You can verify that your data resource was created by going to “Data” in the left navigation and looking for the name you gave your data on that list. If you click on that name and then “Explore,” you’ll see all the files uploaded to the cloud listed there. This is what this list looks like for Fashion MNIST:

Screenshot of Azure ML Studio UI that shows Fashion MNIST.

Components and pipeline

I decided to create the components and pipeline using the SDK. The pipeline in this post only requires two components, so it could also be easily created using the CLI or the studio.

Diagram of pipeline.

However, as I add complexity to my project, it may become cumbersome to keep extending the YAML definition of the pipeline, or to keep repeating manual steps in the UI. Using the SDK is probably the best option to define large and complex pipelines.

Here’s the full code used to create the components and pipeline, which you can find in pipeline_job.py:

"""Creates and runs an Azure ML pipeline."""

import logging
from pathlib import Path
from typing import Dict

from azure.ai.ml import MLClient, Input, load_component
from azure.identity import DefaultAzureCredential
from azure.ai.ml.dsl import pipeline

COMPUTE_NAME = "cluster-cpu"
DATA_NAME = "data-fashion-mnist"
DATA_VERSION = "1"
EXPERIMENT_NAME = "aml_pipeline_mixed"
TRAIN_PATH = Path(Path(__file__).parent, "train.yml")
TEST_PATH = Path(Path(__file__).parent, "test.yml")


def main() -> None:
    logging.basicConfig(level=logging.INFO)
    credential = DefaultAzureCredential()
    ml_client = MLClient.from_config(credential=credential)

    # Making sure the compute exists on Azure ML. If it doesn't, we get an error
    # here.
    ml_client.compute.get(name=COMPUTE_NAME)

    # Getting the data set, which should already be created on Azure ML.
    data = ml_client.data.get(name=DATA_NAME, version=DATA_VERSION)

    # We'll use the components directly, without registering them first.
    train_component = load_component(source=TRAIN_PATH)
    test_component = load_component(source=TEST_PATH)

    # Create and submit pipeline.
    @pipeline(default_compute=COMPUTE_NAME,
              experiment_name=EXPERIMENT_NAME,
              display_name="train_test_fashion_mnist")
    def pipeline_func(data_dir: Input) -> Dict:
        train_job = train_component(data_dir=data_dir)
        # Ignoring pylint because "test_job" shows up in the Studio UI.
        test_job = test_component(  # pylint: disable=unused-variable
            data_dir=data_dir,
            model_dir=train_job.outputs.model_dir)

        return {
            "model_dir": train_job.outputs.model_dir,
        }

    pipeline_job = pipeline_func(
        data_dir=Input(type="uri_folder", path=data.id))

    pipeline_job = ml_client.jobs.create_or_update(pipeline_job)
    ml_client.jobs.stream(pipeline_job.name)


if __name__ == "__main__":
    main()

I assume that we already have the compute and data registered in Azure ML — we did that using the Studio. I also assume that I already have YAML files for each component, so I just need to load those. I can register the components with Azure ML before using them in the pipeline, or I can use them directly, which is what I show in this code. The pipeline has one input, which is the folder where the data is located, and returns one output, with is the folder where it saves the trained model.

The GitHub project for my earlier blog post on pipelines shows how to use the SDK to create components without an external YAML file, and it registers the components with Azure ML before consuming them in the pipeline. Check it out to see a different way of working with components.

Press F5 to execute the pipeline_job.py file. It will stream information regarding the pipeline execution for a while, and stop streaming when the pipeline execution is done. You can then go to the Studio, then “Jobs,” then click on the experiment with name “aml_pipeline_mixed” to see the job execution taking place. Its status will be set to “Completed” when done.

Model

When the pipeline has completed, it will print its run ID. For example:

Execution Summary
=================
RunId: upbeat_door_hggh1wmbq0

This is an important piece of information, because it will allow you to refer to the trained model produced as output of this job. We’ll use the CLI to register this model with Azure ML, by executing the following commands:

run_id=upbeat_door_hggh1wmbq0
az ml model create --name model-pipeline-mixed --version 1 --path "azureml://jobs/$run_id/outputs/model_dir" --type mlflow_model

We specify the name and version of the model we’re creating, and the path to the trained model. Notice the special syntax used to refer to that model, which says that we want the “model_dir” output of the job execution with the run ID we just obtained. Notice also that we explicitly state that the model we’re registering was saved using MLflow — you can look at train.py to see the code where I save it.

For this scenario, we don’t need to download the model to our dev machine because we can deploy it directly from the cloud. But you can download the trained model if you want, using the following command:

az ml job download --name $run_id --output-name "model_dir"

As usual, you can look in the Studio, click on “Models,” and verify that a model with name “model-pipeline-mixed” is listed.

Endpoint and deployment

We’ll also use the CLI to create the endpoint and deployment. The CLI is my preferred method for resource creation because it naturally separates the machine learning logic (stored in Python files) from the cloud logic (stored in YAML files, or executed in CLI commands). It also makes it really easy to repeat commands, just like the SDK.

You could certainly create the endpoint and deployment by running a command with several inline properties, like we did for the model. But this time we’ll set those properties in a YAML file. Here are the contents for endpoint.yml, the YAML file for the endpoint:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: endpoint-pipeline-mixed
auth_mode: key

And here’s deployment.yml, the YAML file for the deployment:

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: blue
endpoint_name: endpoint-pipeline-mixed
model: azureml:model-pipeline-mixed@latest
instance_type: Standard_DS4_v2
instance_count: 1

To create the endpoint and deployment, execute the following commands:

az ml online-endpoint create -f cloud/endpoint.yml
az ml online-deployment create -f cloud/deployment.yml --all-traffic

You can verify that they were created by cliking on the endpoint in the Studio, and making sure that a deployment named “blue” is mentioned on that page and set to 100%.

You can now invoke the endpoint with the following CLI comamnd:

az ml online-endpoint invoke --name endpoint-pipeline-mixed --request-file test_data/images_azureml.json

If you’re interested in learning more about the result you get back, check out my blog post about basic training in the cloud.

Once you’re done with the endpoint, I recommend cleaning it up to avoid getting charged:

az ml online-endpoint delete --name endpoint-pipeline-mixed -y

Conclusion

Hopefully this post made it clear that you can mix and match Azure ML resource creation methods in a single project. Understanding how to use each method and their relative strengths will enable you to be more productive when working with Azure ML.