In this article, we’ll learn to train a linear regression model using low code functionality offered via Designer in Azure Machine Learning. We’ll go through step-by-step process to create pipeline, import and prepare dataset, and then use linear regression to train the model. We’ll then score and evaluate our model. This article is a part of the Azure Machine Learning Series.
- Azure Machine Learning - Create Workspace for Machine Learning
- Azure Machine Learning – Create Compute Instance and Compute Cluster
- Azure Machine Learning - Writing Python Script in Notebook
- Azure Machine Learning - Model Training
- Azure Machine Learning - Linear Regression
Let us get in the step-by-step process through the machine learning workflow using designer in Azure Machine Learning.
Step 1
Follow the Azure Machine Learning - Create Workspace for Machine Learning article and create a machine learning workspace. Once, you’ve done it, you can visit the Azure Machine Learning Studio. The welcome page will look something like this below.
Create Pipeline
Step 2
On the left panel, click on Designer.
Here, select the Easy-to-use-prebuilt components.
Step 3
With this, a new pipeline has been created. We can see the empty canvas.
Create and Set Compute Instance
Step 4
Click on the Settings Sign.
On the left, under select compute type select Compute Instance.
Now, click on Create Azure ML compute instance.
Step 5
Here, select the General-Purpose Category. This will support workloads types such as ML model training, Automated Machine Learning and Pipeline runs with 6 cores, 14 GB of RAM and 28GB of storage provided. Furthermore, it’ll charge around $0.29 per hour.
Now, Click on Create.
Once, the Creation process starts, we can see the update in the Settings.
As the Instance is created and is running, we can see the green colored update.
Importing Dataset
Step 6
As we are exploring right now, let us use sample dataset available in Azure ML itself. Among the 16 dataset available, Automobile Price Data is a bigger dataset and linear regression can be applied here. Let us select it and Drag and Drop it to the Canvas.
Visualizing Data
Step 7
Right-Click Automobile Price Data tab and choose Preview Data.
Here, we can see, 205 Rows and 26 Columns. Click on Maximize button.
Step 8
As we explore the data, we can see normalized-losses column has a lot of NaN compared to other which is basically missing data. These missing data can arise issues and make our model less accurate.
As we explore more, we can also see some missing values on Price.
As we click on the data column, we can see the visualization with details. Here, Price has only 4 missing values and normalized-losses has 41 missing values.
Data Preparation
Step 9
Now, as we see the missing values, we know we have to prepare the data. We can opt to provide mean values to price which might not substantially change the model. For normalized-losses, with 41 missing data, it would be very inaccurate to do either of it. Hence, we need to clean the data, for price just some rows and in case of normalized-losses, the entire column itself.
Remove Column
Step 10
Now, on the component palette select Select Columns in Dataset under the Data Transformation.
Connect the output of Automobile price data to the input of Select Columns in Dataset.
Step 11
On the Right-hand side component detail panel, Click on Edit Column.
Now, here we select the Column.
Under Include, Select All Columns.
Next, Click on the + button.
Here, Select Exclude and choose Column names.
Here select, normalized losses.
Once, this is done, Click on Save.
Under the Comment section, write your comment. Here I’ve added, Column normalized losses excluded.
Now, the Canvas would look similar to this.
Cleaning
Step 12
As we have discussed above, since normalized losses column had tremendously many missing values, we excluded the entire column. But the price column had only 4 missing values. We can solve this with cleaning the missing data.
On the component palette on the left-hand side, search for Clean Missing Data under Data Transformation.
Drag and drop this component to the main canvas.
Next, we connect the output of the Select Columns in Dataset to Clean Missing Data.
Step 13
Now, Click on Edit Column in the Component detail panel on the right-hand side.
Here, under include select All Columns and Save.
Step 14
Now, under the Cleaning Mode, select Remove entire row and add the comment.
The Canvas will depict to similar as follows.
Training our Model
Step 15
We know that the dataset must be divided into training and testing data. Hence, for this we use the Split Data component under Data Transformation.
We now connect the left output of Clean Missing Data to Split Data.
Next, we set the fraction of rows in the first output data to 0.7 ie. 70% of data is set for Training and 30% for Testing and we comment it too.
Step 16
Now, from Machine Learning Algorithms, Drag and Drop Linear Regression to the Canvas.
Similarly, we select the Train Model component too.
Connect and relocate the tabs in the following way. The output of Linear Regression goes to the left input of Train Model.
The left output of Split Data goes to right input of the Train Model.
Now, since we know this is a linear regression model, lets rename our pipeline to Train Linear Regression Model.
Step 17
Now, Click on Train Model and select Edit Column.
Here, select Column names, choose the price and click on Save.
Scoring Model
Step 18
Now, search for Score Model in the search bar. Drag and Drop the Score Model component to the canvas.
Now, connect the left input of the Score Model to the output of Train Model and Right input to the right output of the Split Data as shown here.
Evaluation
Step 19
Now, search for evaluate in the search bar. Drag and Drop the Evaluate model to the canvas and connect the output of Score Model to Evaluate Model.
Submission
Step 20
Now, our entire workflow has been completed. We can now finally run our model. For this, Click on the Submit button.
Click on Create new.
Now, let us name our experiment.
Here, as I added Learning with space, we can see the criteria for the naming convention. Oly character between 1 to 250 that starts with numbers or letters are allowed with “-” and “_” are only allowed as special character.
We’ve named it learn-linear-regression.
Now, Click on Submit.
We can see, the workflow is running.
As the progress occur, we can see Completed update in various tabs and Running in the ones that are in process.
We can see, the evaluation is finalizing now.
Finally, the pipeline has been run successfully and we are provided with the update of Completed in all components.
Scored Labels
Step 21
Now, let us check the scored labels under the Score Model.
For this, right-click the Score Model component and select Scored dataset under Preview data.
Here, we can see, the Scored Labels column has been added. This is the predicated pricing from our linear regression model.
We can see, all the missing values and cleared too and learn about the detail of the Scored Labels from the Visualization.
Evaluation Result
Step 22
Now, right-click the Evaluate Model component and select Evaluation result under Preview data.
Here, we can see the 5 values such as Mean Absolute Error, Coefficient of Determination and more.
With this, we have successfully created and run the entire machine learning pipeline from data preparation, model training and testing to scoring and evaluation.
Exploration
We can now, check the usage of the dedicated cores from the Usage + Quotas in ML Workspace through the Azure Portal.
Delete Resources
Step 23
In order to save yourself from any charges to incur, once the ML model is of no use, it is vital to delete the resources.
You can delete individual resource such as workspace through the delete option or delete the entire resource group to delete all the containing resources.
Click on Delete and retype the resource group name and finally Click the Delete button to remove the entire resource group containing all the components that was created to run the machine learning pipeline.
Conclusion
Thus, in this article, we learned in detailed step to create a machine learning pipeline using Designer in Azure Machine Learning Studio. We went through data preparation, model training, testing, scoring and evaluation to finally learn how machine learning is enabled by low-code feature in Microsoft Azure.