Created:
Topic: Azure ML: from beginner to pro
When training a machine learning model or deploying it to an endpoint, you’ll need to choose an appropriate machine to run it. I’ll use the term “compute” to refer to the virtual machine (or set of machines) that runs your code in the cloud. The goal of this blog post is to give you an overview of all the compute options available to you in Azure ML, so that you can choose an appropriate option for your scenario. I’ll assume that you’re already familiar with the basic concepts of Azure ML, and that you have some experience using it for your own projects.
Throughout this post, I’ll discuss the following three major compute types available in Azure ML:
I’ll also briefly mention the available VM sizes, including how to get more quota for a particular VM size.
Let’s dive right in.
A compute instance is a virtual machine (VM) that is set up for running machine learning code during development (not in production).
Creating a compute instance is easy: just go to the Azure ML Studio, click on “Compute” in the left menu, and then click “New” within the “Compute instances” tab.
You’ll then be asked to give your instance a name and select a machine type and size.
There are many ways to use a compute instance during development, and I’ll cover a few common scenarios in this section.
I like to use VS Code for development. Though I can certainly run VS Code on my local machine during the development phase, I can also run VS Code on an Azure ML compute instance. This is especially useful if I’m using a machine that’s underpowered for ML development. To get started, navigate to the “Compute” section in Azure ML Studio, and click on the link labeled “VS Code” to the right of your compute instance. This will launch VS Code in a window on your local machine, but connected to the files on your remote compute instance.
If you go to the terminal within VS Code, you’ll notice that it opens within the ~/cloudfiles/code
directory:
This is the same directory that you’ll see if you go to the “Notebooks” section of the Azure ML Studio. You can work in this directory if you need access to the files you keep there. I typically keep all my Azure ML code organized in GitHub repos though, so I prefer to clone my repos into the ~/localfiles
directory instead, because it’s faster than working in ~/cloudfiles
. Keep in mind that if you’re cloning your repo into the compute instance for the first time, you’ll need to provide GitHub authentication information, which you can do by configuring SSH or by creating a personal access token.
Once you have your repo cloned into the compute instance, you can open the repo folder in VS Code using the “Open Folder” command in the “File” menu. You can run and debug your code as you would locally. And once you’re ready to create Azure ML resources, you can execute az ml
commands as usual.
I’ve described how to use VS Code on a compute instance because that’s my preferred development environment. If you’re more comfortable working in JupyterLab, Jupyter notebooks, or RStudio, you can also launch those applications just as easily.
Another way to make use of a compute instance is by running a notebook right within the Azure ML Studio. You can create a notebook by navigating to the Studio, clicking on “Notebooks” in the left menu, and then clicking on the ”+” icon. Then when you’re ready to run your code, you’ll need to select a compute instance in the top taskbar:
The compute instance you created earlier will be displayed in the drop down, and you’ll be able to run your notebook on that VM.
A third way to use a compute instance is to refer to it directly in the YAML files you pass to Azure CLI commands. For example, when creating a command job, you could refer to your compute instance:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
...
compute: azureml:instance-cpu
I don’t recommend this approach during early development, since it’s faster to iterate using VS Code (either locally or connected to a remote compute instance). And I also don’t recommend this approach for production, since (unlike a compute cluster) a compute instance can’t scale to accommodate increased traffic. You could potentially use this approach somewhere in between early development and production, to test your YAML files more quickly on a compute instance than you could on a compute cluster.
A compute cluster is a set of VMs that can scale up automatically based on traffic. Unlike compute instances, compute clusters are meant to be used in production.
You can create a compute cluster using the Azure ML Studio, similarly to what you did to create a compute instance, by selecting “Compute” in the left menu, then the “Compute clusters” tab, then the “New” button.
You’ll then be asked to give your cluster a name and select a machine tier, type, and size.
Instead of using the Azure ML Studio, you can also use a YAML file and the CLI to create a compute cluster. I prefer this option because I can keep all the YAML files for production-level resources within the relevant project. Here’s a typical YAML file for a compute cluster:
$schema: https://azuremlschemas.azureedge.net/latest/amlCompute.schema.json
name: cluster-cpu
type: amlcompute
size: Standard_DS4_v2
min_instances: 0
max_instances: 4
This YAML file says that I want a cluster of zero to four machines of size “Standard_DS4_v2” — the actual number of machines will depend on the amount of usage. Assuming this YAML is saved in a file named cluster-cpu.yml
, I can create the compute cluster with the following command:
az ml compute create -f cluster-cpu.yml
Once I’ve created a compute cluster, I can use it for training or for deployment using batch endpoints. This is how I might use it in the YAML file for a training job:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json
...
compute: azureml:cluster-cpu
...
And I can use it similarly in the YAML file for a batch endpoint’s deployment:
$schema: https://azuremlschemas.azureedge.net/latest/batchDeployment.schema.json
...
compute: azureml:cluster-cpu
...
When creating a managed online endpoint, we don’t create a separate compute cluster resource. Instead, we specify the minimum and maximum number of machines right in the YAML for the deployment:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
...
instance_type: Standard_DS3_v2
instance_count: 1
scale_settings:
min_instances: 0
max_instances: 4
So, technically, managed online endpoints don’t use compute clusters. However, behind the scenes, they use Azure ML compute in a very similar way, with similar scaling capabilities.
The third and last category is called “attached computes,” which refers to any compute target that you manage yourself outside of Azure. There are several options available here:
The most commonly used attached compute is Azure Arc-enabled Kubernetes. This is a good option if you want to use Kubernetes to create a cluster of VMs that scale according to traffic. You can let Azure create the Kubernetes cluster, or you can create the cluster yourself from machines that you already own, on-premises. Unlike the compute cluster option, Azure won’t manage the cluster for you — you’re expected to configure scaling based on traffic, perform OS updates, and handle security yourself.
To attach an Azure Arc-enabled Kubernetes cluster, go to the Azure ML Studio, select “Compute” in the left menu, then the “Attached computes” tab. Click on the “New” dropdown and select “Kubernetes.”
At the time of writing, Azure ML only supports this compute option for Kubernetes online endpoints. You can find out more about it in the documentation.
Note that there’s another way of using Kubernetes in Azure ML: Azure Kubernetes Service (AKS). This option (which appears in the Studio under “Inference clusters”) is an older approach, and it supports only a subset of the capabilities offered by Azure Arc-enabled Kubernetes — so I discourage you from adopting AKS at this point.
When creating a compute instance or compute cluster, you’ll need to specify the size of the virtual machine (or machines) you want to use. If you’re using the Azure ML Studio to create these resources, you’ll see a list of supported and recommended machine sizes, information about which sizes fit within your quota, and the price for each size. I showed this UI in screenshots earlier in this post.
If you’re using the SDK or CLI to create a compute instance or compute cluster, you can still rely on Azure ML Studio for your VM size selection. Alternatively, you can consult the compute target documentation if you’re creating a compute target for use in training or batch endpoints, or the managed online endpoint documentation if you’re creating a managed endpoint.
You can check the compute quota you have and how much is in use by going to the Azure Portal, clicking on your subscription, and selecting “Usage + quotas” on the left menu. You should see a table like the following:
As you can see, the table indicates that I have a quota of 50 vCPUs of the NCSv3 family, 24 of which are in use at the moment.
If your subscription does not have enough quota for the virtual machine you’re trying to create, you’ll get a helpful error message, and you’ll need to request a quota increase. From within the Azure ML Studio, click on the question mark on the top right of the page, and click on “New support request.”
On the “Problem description” page, select the “Service and subscription limits (quotas)” issue type, and the “Machine Learning Service: Virtual Machine Quota” quota type.
Then click next. Screen 2 is skipped, and you’re taken right to screen 3, “Additional details.” There, click on “Enter details,” choose the location of where you want your compute to be located, and select the VM series you’re looking to increase. For example, if you don’t have enough quota for the “Standard_NC6s_v3” GPU virtual machines, you’ll need to ask for an increase in the quota for the “NCSv3” series.
The support request will ask you how many vCPUs you want access to. The NCSv3 family of machines comes in three flavors: small (Standard_NC6s_v3) which uses 6 vCPUs, medium (Standard_NC12s_v3) which uses 12 vCPUs, and large (Standard_NC24s_v3) which uses 24 vCPUs. So, for example, if you plan to scale up to 3 small NCSv3 machines in your cluster, you’ll need to request 18 vCPUs.
In this post, I described the main types of compute resources you can use to run your Azure ML code: compute instances (used for development), compute clusters (used for training and batch endpoints in production), and attached compute like Azure Arc-enabled Kubernetes (allowing you to manage the compute resources yourself).
I hope that you learned something in this post, and that this knowledge will be useful in your Azure ML projects.
Thank you to Daniel Schneider from the Azure ML team at Microsoft for sharing his knowledge with me, and inspiring this post.