In this article, we'll go through a hands-on experience to build a machine learning model to predict price of automobiles. The previous article explored about Azure Machine Learning and we went through a step-by-step process to create Machine Learning Workspace in Azure, creating the compute instances and compute cluster. This article builds up to the last article – designing a full-on machine learning project. We explore on developing a machine learning model for Automobile Price Prediction in this article. The machine learning workflow are explained and discussed in detail in the process.
Step 1
Let us start with the creation of ML Workspace and Compute Instance or Cluster. You can learn the step-by-step process from Azure Machine Learning – Create ML Workspace and Compute Cluster.
Step 2
From the Menu Icon on the top-left, Choose Pipelines.
We now create a new Pipeline by selecting the New pipeline as shown below.
Select the Compute Cluster or instance you want to use.
Save the Draft name as Automobile Predictions – Our Machine Learning project we are going to work on.
We’ll now be presented with the blank canvas to create our machine learning model through drag-and-drop process; a complete low code approach enabled by Azure.
Step 3
We can explore various datasets available in Azure Machine Learning Studio which are freely available.
Moreover, we also have different algorithms such as Boosted Decision Tree Regression, Neural Network Regression, Linear Regression, and many more which are widely used in machine learning.
Classification and Clustering algorithms are also freely available to choose upon for use in our project in Azure Machine Learning Studio.
Now, to start the project, we initiate with the Dataset we want to use. For our Automobile Price Prediction, we use the Automobile price data (Raw). This can be obtained under the Sample datasets section.
Simply Drag and Drop the Dataset into the main canvas. It’ll now look similar to this below.
Step 4
Data Preparation
In order to preview the data, left click on the Data tab and select Preview Data.
We can see, there are 205 Rows and 26 Columns. We can observe in our dataset we have data on Normalized Losses, make, fuel-type, length, width, num-of-cylinders, stroke, price, and many more.
On the right size, we can also view statistics on the main data and a visualized graph with mean, median, standard deviation and missing values can be seen.
Step 5
Under normalized-losses, we can see there are 41 missing values which are huge compared to other data with a few missing values. This is why we need to prepare our data.
Firstly, we transform our column. In the search bar, check out for Select Columns Transform.
Select it and drag and drop it to our main canvas.
We now connect the Select Columns Transform to Automobile price data. This is important before we perform any operations on the section to transform our data.
Now, in order to select the columns in dataset, click on the Edit column.
Here, we choose add all and then for normalized losses specifically, remove it from including in our data to perform any operation since it had huge number of missing values while the rest had only a few.
After this, click on Save.
Now, if we preview the data, we can see, the normalized losses have been removed.
If you want to learn more about Azure Machine Learning Studio and building machine learning model, learn from this video.
Step 6
We know there are a few missing values in the dataset. This will obstruct our main goal for Price Prediction. Hence, to work it through, we search for Clean Missing Data.
Drag and Drop the Clean Missing Data into the canvas and Connect it with the above section.
Now, Click on Clean Missing Data and Select Edit column to clean the specific column.
Under By Name to select column, Click on Add all on Available Column and remove normalized losses. Once done, save it.
Now, under the Cleaning mode, we can see different options from Replace with mean, median and mode. Moreover, we have a better option to remove the entire column. Removing a few rows will not much affect our model as we have fewer missing data on the rows. Hence, we select remove the row.
Step 7
Now, we need to split the data for training and testing. On search bar, search for Split.
Now, we drag and drop the Split Data to the canvas and connect it to clean missing data.
It is a normal convention to split 70-75% of data for model training and the remaining for testing. Here, we set the Fraction of rows in the first output dataset to 0.7 and Random seed to 123.
Step 8
For any machine learning model training, we perform it in specific algorithm. The automobile price prediction is a regression problem. We have discussed about Regression and Linear Regression in previous article, Why do AI Engineers need Calculus?.
Here, we search for regression and drag and drop the linear regression to canvas.
Step 9
Now, we need to train the model. We drag and drop the train model into canvas. Connect one part of the split data in Train Model and connect the Linear Regression to Train Model.
Next for Linear Regression, we do not change much of the settings and only put the random seed number to 123.
Now in Train Model, we select the Price to Label column. In this project, we are performing a regression model in order to obtain a predictive price from machine learning.
Once done, the canvas will look something like this.
Step 10
Now we need to score and evaluate the model. Search for Score Model. We now connect the output of Train Model to Score Model and the data second dataset from split data to score model too. This is used for testing how well our model has performed.
Step 11
Finally, we evaluate the model. Search for evaluate and connect the Evaulate Model to the output of Score Model.
Finally, our resultant output must depict as following.
Now, we click on Submit to run our pipeline.
Step 12
Set the experiment name to TestPricing and Description to Automobile Predictions and Submit.
We’ll get the notification of the experiment is now running.
Step 13
Lastly, when the experiment has now been completed, all the segment will be shown in green with the notification alongside popping up in the canvas.
We can explore the datasets from Preview data. For example, in Clean Missing data, we can see, we now have 193 rows where we initially had 205. Also, we have removed the 1 column of normalized loss which had huge number of missing data.
If we preview the result datasets, we can see dataset 1 has 135 rows ie. 70% of main dataset and 58 rows in dataset2 that we used for testing and evaluation.
Step 14
Now, if we check the scored dataset, we can see, there is a new column of Scored Labels. This is the predicted Price data we performed in this project.
When we view the evaluation result, we can see, various values obtained. Majorly, we can see the Coefficient is 0.867. More it is near to 1, it's better. This showcases our experiment was highly successful and we’ve now built a model to predict the price of automobiles. We can now use this model to predict the price of any other automobiles.
Step 15
Clean Up Resources
Once you are done with the resources and if you are not using them even for a few days, delete them. Azure will charge for the storage and use of few resources even though you might have stopped the service.
We can select the instances and Click on Stop or Delete the resource from Compute page.
Conclusion
Thus, in this article, we went through a step- by-step tutorial to build a machine learning model for Automobile Price Prediction using Linear Regression. We used the low-code functionality provided by Azure, its sample dataset of automobiles, and even scored and evaluated our predicted outcome which resulted in a 0.867 coefficient which can be regarded as highly successful to build such a machine learning model.