Topic: Azure ML
When creating Azure ML resources such as jobs and endpoints, you often need to make decisions regarding the environment used to run your code in the cloud. The environment specifies the software runtime and libraries that you want installed and how you want them configured. You’ll need to select an environment when training your model in the cloud (for example, when creating a command job), or when deploying your trained model for consumption (for example, with a managed online endpoint).
This post provides you with a general overview of the range of environment choices, and details for the most common scenarios. I’ll assume that you’re already familiar with the basic concepts of Azure ML, and that you have some experience using Azure ML for your own projects.
Azure ML supports three different types of environments:
Regardless of which type you choose, one thing to keep in mind when selecting an environment is that it should match the virtual machine type you choose for your compute. So, for example, if you select a machine without a GPU, it’s pointless to specify an environment that installs CUDA. For more information on how to choose virtual machines, see my post on Choosing the compute for Azure ML resources.
Let’s take a look at each of the environment types.
Curated environments are prebuilt Docker images provided by Microsoft, and they’re the easiest to get started with. In addition to Ubuntu and optional GPU support, they include different versions of TensorFlow and PyTorch, as well as many other popular frameworks and packages. A curated environment is a great choice if its pre-installed packages cover your needs because it will deploy more quickly than the other kinds of environments.
The full list of prebuilt Docker images available for training and inference can be found in the documentation. It is also listed in the Azure ML studio, under “Environments,” and then “Curated environments.”
In addition, if you’re using the Azure ML extension for VS Code, you can find the same list without leaving VS Code. Click on the Azure icon in the left navigation pane, expand your subscription and ML workspace, then expand “Environments” and “Azure ML Curated Environments.”
Once you’ve selected a curated environment that has the packages you need, you can refer to it in your YAML file. For example, if you want to use a curated environment to train a PyTorch model on a compute cluster with GPUs, you might add the following to your command job YAML file:
$schema: https://azuremlschemas.azureedge.net/latest/commandJob.schema.json ... environment: azureml:AzureML-pytorch-1.10-ubuntu18.04-py38-cuda11-gpu@latest ...
Or if you want to use a curated environment to deploy a PyTorch model on a compute instance without a GPU, you might add the following to your deployment YAML file:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json ... environment: azureml:AzureML-pytorch-1.9-ubuntu18.04-py37-cpu-inference@latest ...
Notice that I added an
@latest tag to the end of the environment name. This is super useful — this way I can get the latest version of this environment without keeping track of version numbers!
If you want to be specific about the curated environment version number, you can specify it using the following syntax:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json ... environment: azureml:AzureML-pytorch-1.9-ubuntu18.04-py37-cpu-inference:3 ...
Also, notice that the inference curated environments end with the word “inference,” while the training ones do not. That makes it easy to determine which environments are available for each task.
System-managed environments are Docker images provided by Microsoft that contain just the basics: Ubuntu, and optionally CUDA and cuDNN. Keep in mind that these images don’t contain Python or any machine learning frameworks, so when using a system-managed environment, you’ll typically provide an additional conda file. I recommend choosing this method if you need packages that aren’t included in the curated environments. A full list of available base images can be found in this GitHub repo.
As an example, if I want to deploy a managed online endpoint on a machine with a GPU, I might choose to extend the following base image with a conda file:
$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json ... environment: conda_file: score-conda.yml image: mcr.microsoft.com/azureml/openmpi4.1.0-cuda11.1-cudnn8-ubuntu20.04:latest ...
Notice that, similarly to curated environments, I can specify that I want the latest available version of this image by adding
:latest to the end of the
mcr path. Alternatively, I could have added a version number such as
By looking at the name of the image, you can guess that it’s based on Ubuntu 20.04, and that it contains CuDNN 8 and CUDA 11.1 installed. Unlike curated environments, there’s no mention of Python or PyTorch because these are not installed by default. If you need those dependencies to run your code, you can add them to your
score-conda.yml file, as shown below:
channels: - pytorch - conda-forge - defaults dependencies: - python==3.9.5 - pytorch==1.8.1 - pip - pip: - azureml-defaults
Notice that in addition to Python and PyTorch, I also add the
azureml-defaults package. This package is needed when the environment is used for deployment. You can customize this conda file with any other packages you like.
The third type of environment is based on a Docker container that you create and customize yourself, giving you full control of its contents and configuration. You can take a look at this sample in the documentation for more information.
In this post, I presented an overview of the different types of environments supported by Azure ML. I explained the options I prefer and use most often, and provided links to the documentation so you can dig deeper into any of these topics. I hope that you found this information useful.
Thank you to Shivani Sambare from the Azure ML team at Microsoft for reviewing the content in this post.