Predictive Analytics With Microsoft Azure Machine Learning

1.0 Machine Learning

1.1 Definition

Machine learning simply detects patterns in large amounts of data to predict what happens when you get new information.

Machine Learning Algorithms fall into 2 broad categories, Supervised Learning and Unsupervised Learning.

1.2 Supervised Learning

In supervised learning, a data set is provided to the algorithm that returns a “right answer” back. The algorithm uses a known data set (called the training data set) to make predictions. The training data set includes input data and response values. From this data set, the supervised learning algorithm seeks to build a model that can predict the response values for a new data set. Supervised learning includes two categories of algorithms, namely regression and classification.

1.2.1 Regression

Regression is used for continuous-response values, for example, predicting housing prices based on size.

  • From an existing data set, the algorithm plots a graph of houses and their respective price.
  • From this data set, we now want to predict the price of a house of 900 Square Feet.

The algorithm will detect the tendency of the data and present it in the form of a straight line to make a forecast as shown in the figure below.

make a forecast

1.2.2 Classification

Classification is used for categorical response values, where the data can be separated into specific “classes”. Classification is used to predict discreet value output (in other words 0/1, Yes/No).

Consider a case where we need to determine whether a cancer is malignant or not based on its size.

From an existing data set, the algorithm compares the tumor size to the cancer type.

Now if we need to forecast whether a tumor of size Z is dangerous, the algorithm will determine this as in the figure below and will find that it's not harmful.

size Z is dangerous

1.3 Unsupervised learning

In unsupervised machine learning, the algorithm will try to identify structure in the data given a data set. The most common unsupervised learning method is cluster analysis that is used for exploratory data analysis to find hidden patterns or grouping in data.

Examples

  1. Market Research

    Market researchers use cluster analysis to partition the general population of consumers into market segments and to better understand the relationships among various groups of consumers/potential customers and for use in market segmentation, product positioning, new product development and selecting test markets.

  2. Social Network Analysis

    Clustering may be used to recognize communities within large groups of people.

  3. Crime Analysis

    Cluster analysis can be used to identify areas where there are greater incidences of specific types of crime. By identifying these distinct areas or "hot spots" where a similar crime has happened over a period of time, it is possible to manage law enforcement resources more effectively.

2.0 Introducing Azure Machine Language (Azure ML)

Azure ML is Microsoft Cloud solution to do predictive analytics. Traditionally this requires complex software and high performing computers that are not accessible to everybody. By using the powers of cloud computing, Azure ML provides a fully-managed solution for predictive analytics that is accessible to a much broader audience. Predicting future outcomes is now attainable.

2.1 Creating an Azure ML Workspace

To start using Azure ML, you first need to create a workspace using the following procedure:

  1. Go to your Azure account and navigate to machine learning.

    You may also get a free trial account here.

  2. Go to Machine Learning and click on New > Machine Learning > Quick Create, fill in the required information and hit Create!

    required information

2.2 Accessing Azure ML Studio

To access Azure ML Studio, go to https://studio.azureml.net/Home/ and sign in with your live ID.

3.0 Overview of Azure ML Studio

AML studio is a browser based workbench to author predictive analytics solutions. AML studio has 4 sections as described below.

Overview of Azure ML Studio

4.0 Working with Experiments

For this experiment we shall use a fictitious loan data set and will try to predict whether someone will be able to repay his loan based on past data. The data set is as follows:

Working with Experiments

4.1 Adding a new data set

To upload a new data set, go to Experiments > New > Data set > from local file.

Adding a new data set

4.2 Creating a new Experiment

To create a new Experiment, click on New > Experiment.

Experiment

4.2.1 Working with data sets

Clicking on New experiment will bring you to a new canvas where you can add all the elements needed for your experiment.

  • From here, you may drag objects from the left pane and place on the canvas.

  • From the saved Data set tab, you may browse both, some sample data sets and the data sets you uploaded.

  • For the next example, drag the file loan_hist.csv that we just uploaded and place it on the canvas.

    drag the file

Once you select your data set, you may view the contents on the data set by clicking on visualize.

select your data set

4.2.2 Split the data

The next step is to split the data into "test" and "training".

Select the split object on the left and drag it onto the screen. Next, we need to specify which percentage of the data will be used and training and which percentage is test.

Test data will be used to evaluate the accuracy of the trained data.

Test data

4.2.3 Train model

The train model is where the “learning” occurs. It takes 2 inputs, the data set and an algorithm.

Since we need to answer a “two class question” that falls under classification, we shall use a classification algorithm.

classification

The next step is to configure the train model, to determine what fields to "predict".

To do so, click on the train model and click on the "launch column selector" on the right to select the required field.

In our case, we need to predict the field "Loan Paid?".

Loan Paid

4.2.4 Score and Evaluate Model

We'll use the scoring data that was separated out by the Split module to score our trained models. We can then compare the results of the two models to see which generated better results.

The score model takes 2 input parameters, the train model and the test data.

To evaluate the two scoring results we'll use the Evaluate Model module.

The evaluate model can take up to two score models as input parameters for comparison.

input parameter for comparison

4.2.5 Adding another algorithm

If we want to use more algorithms for making comparisons, we may add more algorithms. The following is the procedure for adding another algorithm to our experiment.

  1. You may copy and paste the existing train and score model.
  2. Remove the algorithm connector from the copied train model.
  3. Add a new predictive algorithm to the new train model.
  4. Connect the new score model to the existing Evaluate model.

    new score model

    Evaluate model

4.2.6 Running and evaluating the results

Hit Run. Once completed, click on the output port of the evaluate model and click visualize.

Running and evaluating the results

The Evaluate Model module produces a pair of curves and metrics that allow you to compare the results of the two scored models. You can view the results as Receiver Operator Characteristic (ROC) curves, Precision/Recall curves, or Lift curves. Additional data displayed includes a confusion matrix, cumulative AUC values and other metrics. You can change the threshold value by moving the slider left or right and see how it affects the set of metrics.

By examining these values you can decide which model is closest to giving you the results you're looking for. You can return and iterate on your experiment by changing values in the various models.

5.0 Publishing and create web service

Once we know which algorithm to use, we can now put the experiment into production and create a web service so that we can allow applications to connect and parse data to it.

The following is the procedure to deploy a new web service for our experiment.

  1. Right-click on the model we need and click save as Trained Model.

    model

  2. Create a new model for publishing.

  3. Add the Data Source, Train Model and Score Model.

    Also, add a missing value scrubber, to replace all the missing values from the data set.

    add a missing value scrubber

  4. Define the input and output ports of the score model.

    input and output ports

  5. Run the experiment and hit publish web service,

    Run the experiment

The web service is now created.

To place it into production, go to the web service, click on settings and set it as ready for production.

Go to the web service

6.0 Testing and consuming the web service

6.1 Testing

On the web service page, click on the “Test” link as shown below.

Test

Fill in the form with the test values and click OK.

test values and click OK

Let's run the same test with someone with lower income and higher loan amount.

lower income and higher Loan amount

6.2 Consuming the web service from a client application

Azure ML generates sample code to consume your service from C#, R and Python.

To view the code click on “API help page” from the services page.

API help page

The following changes need to be made to the C# Project.

  1. Replace the API key with your key.

    API key with your key

  2. Key in the required parameters.

    required parameters

  3. Run and verify the outcome.

    Run and verify the outcome

Azure ML Free Tier

Azure ML is now available to test free of charge without a subscription or credit card, all you need to get going is a Microsoft account!

Click here to get started now!

References

  1. Unsupervised Learning
  2. Cluster analysis
  3. Walkthrough Step 4: Train and evaluate the predictive analytic models