A component is self-contained set of code that performs one step in machine learning pipeline, such as data preprocessing, model training, model scoring and so on. A component is analogous to a function, in that it has a name, parameters, expects certain input and returns some value. Any python script can be wrapped as a component following the component spec.
Azure Machine Learning Gallery contains rich components and pipelines for common machine learning tasks. It can accelerate AI adoption by enabling enterprises and individuals to easily leverage best work of the community instead of starting from scratch.
In this tutorial, you will learn how to build a machine learning pipeline with existing components in the gallery in 2 steps:
- Register the component to your Azure Machine Learning workspace.
- Build a pipeline using the registered component and built-in modules in Azure Machine Learning designer.
! NOTE:
Components equals to Modules in Azure Machine Learning studio UI.
This tutorial will use Automobile Price Prediction as an example. The related components can be found under components/automobile-price-prediction.
To use components from this gallery and build a pipeline, you need to register components to your Azure Machine Learning workspace first.
This tutorial will explain how to register component from the gallery with a sample component - XGBRegressorTraining under folder components/automobile-price-prediction.
-
Go to https://ml.azure.com and select your workspace.
! NOTE:
Please open the designer before you do following steps if you have never opened it in your workspace before. This is to make sure the required data types are registered to the workspace so that you can register components successfully to workspace.
-
Add &flight=cm at end of the URL of your workspace to enable components feature. You will see Modules tab under Assests blade on the left navigation bar.
-
Click Create -> From YAML file. Choose Github repo as source. Fill in the URL of cleanse component YAML spec file (https://github.com/Azure/AzureMachineLearningGallery/blob/main/components/automobile-price-prediction/xgboost-regressor-training/XGBRegressorTraining.spec.yaml).
! NOTE:
If you have created components in your workspace before, click New Module -> From YAML file to create a new component.
-
Follow the wizard to finish the creation.
After creation, you will see the component both in component asset page for management.
Azure Machine Learning designer is the UI to build machine learning pipelines. It provides an easy drag-n-drop interface to build, test and manage your machine learning pipelines.
-
Open a new pipeline in the designer. You can find the registered component under Custom Module category in Designer module palette.
-
Find Automobile price data (Raw) dataset under Sample datasets in the asset library to the left of canvas and drag it to canvas. Then you can right click the dataset and click Visualize to preview the data.
-
Drag the following components/moudles to canvas and config parameters in right panel of each component/module as following:
Module Parameter Select Columns in Dataset Click Edit column, and select Include Column types -> Numeric. This is because this XGBRegressor component can only process numeric features. Clean Missing Data Click Edit column, and select Include All columns. This is to clean missing data in the dataset.
Cleaning mode: select Remove entire row to remove rows with missing value.Split Data Splitting mode is by default set as Split Rows.
Fraction of rows in the first output dataset: You can set split fraction of the input data.XGBRegressorTraining Label_Col: Input price - the label column name.
Model_FileName: Input the output model name, e.g.xgb_modelfile.json.
Learning_rate: Set the learning rate of XGBRegressor, by default 0.1.
Max_depth: Maximum tree depth for base learners, by default 5. -
Connect them to build the pipeline as shown below.
-
Submit a run.
Select a compute target and submit a run.
-
Check result of the run.
If the run finishes successfully, each component's output will be stored in the workspace's default blob. You can access the output in storage account by View Output in the right click menu of the component. The output is a json file of the trained XGBRegressor.
If the run fails, check the 70_driver_log under Outputs + Logs to troubleshot.
This tutorial walks you through how to use existing components from the gallery to build a machine learning pipeline. Follow the second part of the tutorial to learn how to create a component with your own code.




