Developing A Recommender Solution With Azure Machine Learning

Chervine Bhiwoo
5y
31.4k
0
3

Article

Introduction

Machine learning uses computers to run predictive models that learn from existing data to forecast future behaviors, outcomes, and trends.

Azure Machine Learning is a powerful cloud-based predictive analytics service that makes it possible to quickly create and deploy predictive models as analytics solutions. Azure Machine Learning not only provides tools to model predictive analytics but also provides a fully-managed service you can use to publish your predictive models as ready-to-consume web services.

Read Predictive Analytics with Microsoft Azure Machine Learning for more details on the mechanisms of Azure Machine Learning.

Scope

This article shows how Azure Machine Learning can be used to develop a Recommender Solution.

Ever wonder how websites like Amazon and eBay provide useful suggestions and recommendations? This article is for you!

Designing the Experiment

The following is the procedure to develop the experiment:

Add the dataset

In Azure Machine Learning, an existing dataset can be used or a new one can be loaded from an Azure Database, Azure Blob Storage, Data Feed Reader, Web Service or a Hive Query.

In this example, the Movie Ratings Sample Data shall be used.

Figure 1: Movie Rating

The Movie Rating sample has the following columns:

Figure 2: Movie Rating Sample

Exclude the columns that shall not be needed

To do so, the project columns tool object can be used. Add it to the experiment.

Figure 3: Project Column

Now, from the right menu, select the "launch column selector" to select the fields that shall be needed. Here, the timestamp column shall be excluded.

Figure 4: Select Column

Split the Data

Now, the data shall be partitioned into 2 distinct sets:

Train Data: Used to “train” the recommender. That is, the algorithm shall use this data to "learn" and make predictions.

Test Data: Used to validate the results of the recommender

Drag the split tool and connect it as in the following.

Figure 5: Result

Deciding the amount of data to use for training and testing is subjective.

The ratio should be typed as a decimal number between 0 and 1 to represent the percentage of rows sent to the first output dataset.

For example, if you type 0.75 as the value, the dataset would be split using the ratio 75:25, with 75% of the rows sent to the first output dataset, and 25% sent to the second output dataset.

Figure 6: Splitting Mode

Add the Train Matchbox Recommender

The Train, a recommendation model based on the Matchbox recommender engine. It has the ability to learn about people’s preferences from observing how they rate items such as movies, content or other products.

This is where learning occurs.

Figure 7: Learning occurs

Add the Score Matchbox Recommender

The Score Matchbox Recommender scores predictions for a dataset using the Matchbox recommender.

It generates results based on a trained recommendation model.

Figure 8: Recommendation model

Add the Evaluate Recommender

The Evaluate Recommender tests the accuracy of recommender model predictions.

Figure 9: Recommender model predictions

At this point in time, the solution is as in the following and can be executed by clicking on the Run button.

Figure 10: Click Run Button

After its execution, click on the output of the score Matchbox Recommender and click on visualize. All the movie IDs together with their respective "related" movies" will now be displayed as shown below.

Figure 11: Movie ID

However, this won't be very useful for analysis purposes. What will be meaningful is to have the movie names instead of the movie IDs.

Fortunately, the Join operator can be used as shown below.

Add the IMDB Movie Title Sample

This sample has all the Movie Names and their respective Movie IDs.

Figure 12: IMDB Movie Title

Figure 13: Movie Name

Add the Meta Data Editor and make it treat the values as a String

This can be done by selecting all the columns from the column selector and set the data type to String from the right pane.

Figure 14: Column Selector

Join the Movie IDs from the MetaData editor with the one from the Score MatchBox Recommender

In the column selector, select "Item" from the left column and select "Movie Id" from the right column selector.

This will join the Item column form the score Match Box Recommender to the Movie ID from the IMDB Movie titles. So, if the experiment is executed again, the Movie Name and all the related Movie IDs shall be listed as in the following.

Figure 15: Movie List

Now, the names of the related movies shall be needed too! To do so, proceed with the step below.

Add another Join operator, to join the result from the previous join (result) with the Movie Titles sample.

Figure 16: Movie Title Sample

In the left column selector, select related item 1 and in the right column selector, select Movie ID.

This will join the related movie id 1 with the Movie Titles sample to return the name of the related movie.

Run the experiment to obtain a list of movies and their related movies.

Figure 17: Related Movie

From this experiment, we have a list of movies (Movie Name) and their Related Movie (Movie Name (2)).

For example, we can detect that people who like Thor also liked Iron Man.