1.0 Machine Learning
1.1 Definition
Machine learning simply detects patterns in large amounts of data to predict what happens when you get new information.
Machine Learning Algorithms fall into 2 broad categories, Supervised Learning and Unsupervised Learning.
1.2 Supervised Learning
In supervised learning, a data set is provided to the algorithm that returns a “right answer” back. The algorithm uses a known data set (called the training data set) to make predictions. The training data set includes input data and response values. From this data set, the supervised learning algorithm seeks to build a model that can predict the response values for a new data set. Supervised learning includes two categories of algorithms, namely regression and classification.
1.2.1 Regression
Regression is used for continuous-response values, for example, predicting housing prices based on size.
- From an existing data set, the algorithm plots a graph of houses and their respective price.
- From this data set, we now want to predict the price of a house of 900 Square Feet.
The algorithm will detect the tendency of the data and present it in the form of a straight line to make a forecast as shown in the figure below.
1.2.2 Classification
Classification is used for categorical response values, where the data can be separated into specific “classes”. Classification is used to predict discreet value output (in other words 0/1, Yes/No).
Consider a case where we need to determine whether a cancer is malignant or not based on its size.
From an existing data set, the algorithm compares the tumor size to the cancer type.
Now if we need to forecast whether a tumor of size Z is dangerous, the algorithm will determine this as in the figure below and will find that it's not harmful.
1.3 Unsupervised learning
In unsupervised machine learning, the algorithm will try to identify structure in the data given a data set. The most common unsupervised learning method is cluster analysis that is used for exploratory data analysis to find hidden patterns or grouping in data.
Examples
- Market Research
Market researchers use cluster analysis to partition the general population of consumers into market segments and to better understand the relationships among various groups of consumers/potential customers and for use in market segmentation, product positioning, new product development and selecting test markets.
- Social Network Analysis
Clustering may be used to recognize communities within large groups of people.
- Crime Analysis
Cluster analysis can be used to identify areas where there are greater incidences of specific types of crime. By identifying these distinct areas or "hot spots" where a similar crime has happened over a period of time, it is possible to manage law enforcement resources more effectively.
2.0 Introducing Azure Machine Language (Azure ML)
Azure ML is Microsoft Cloud solution to do predictive analytics. Traditionally this requires complex software and high performing computers that are not accessible to everybody. By using the powers of cloud computing, Azure ML provides a fully-managed solution for predictive analytics that is accessible to a much broader audience. Predicting future outcomes is now attainable.
2.1 Creating an Azure ML Workspace
To start using Azure ML, you first need to create a workspace using the following procedure:
- Go to your Azure account and navigate to machine learning.
You may also get a free trial account here.
- Go to Machine Learning and click on New > Machine Learning > Quick Create, fill in the required information and hit Create!
2.2 Accessing Azure ML Studio
To access Azure ML Studio, go to https://studio.azureml.net/Home/ and sign in with your live ID.
3.0 Overview of Azure ML Studio
AML studio is a browser based workbench to author predictive analytics solutions. AML studio has 4 sections as described below.
4.0 Working with Experiments
For this experiment we shall use a fictitious loan data set and will try to predict whether someone will be able to repay his loan based on past data. The data set is as follows:
4.1 Adding a new data set
To upload a new data set, go to Experiments > New > Data set > from local file.
4.2 Creating a new Experiment
To create a new Experiment, click on New > Experiment.
4.2.1 Working with data sets
Clicking on New experiment will bring you to a new canvas where you can add all the elements needed for your experiment.
- From here, you may drag objects from the left pane and place on the canvas.
- From the saved Data set tab, you may browse both, some sample data sets and the data sets you uploaded.
- For the next example, drag the file loan_hist.csv that we just uploaded and place it on the canvas.
Once you select your data set, you may view the contents on the data set by clicking on visualize.
4.2.2 Split the data
The next step is to split the data into "test" and "training".
Select the split object on the left and drag it onto the screen. Next, we need to specify which percentage of the data will be used and training and which percentage is test.
Test data will be used to evaluate the accuracy of the trained data.
4.2.3 Train model
The train model is where the “learning” occurs. It takes 2 inputs, the data set and an algorithm.
Since we need to answer a “two class question” that falls under classification, we shall use a classification algorithm.
The next step is to configure the train model, to determine what fields to "predict".
To do so, click on the train model and click on the "launch column selector" on the right to select the required field.
In our case, we need to predict the field "Loan Paid?".
4.2.4 Score and Evaluate Model
We'll use the scoring data that was separated out by the Split module to score our trained models. We can then compare the results of the two models to see which generated better results.
The score model takes 2 input parameters, the train model and the test data.
To evaluate the two scoring results we'll use the Evaluate Model module.
The evaluate model can take up to two score models as input parameters for comparison.
4.2.5 Adding another algorithm
If we want to use more algorithms for making comparisons, we may add more algorithms. The following is the procedure for adding another algorithm to our experiment.
- You may copy and paste the existing train and score model.
- Remove the algorithm connector from the copied train model.
- Add a new predictive algorithm to the new train model.
- Connect the new score model to the existing Evaluate model.
4.2.6 Running and evaluating the results
Hit Run. Once completed, click on the output port of the evaluate model and click visualize.
The Evaluate Model module produces a pair of curves and metrics that allow you to compare the results of the two scored models. You can view the results as Receiver Operator Characteristic (ROC) curves, Precision/Recall curves, or Lift curves. Additional data displayed includes a confusion matrix, cumulative AUC values and other metrics. You can change the threshold value by moving the slider left or right and see how it affects the set of metrics.
By examining these values you can decide which model is closest to giving you the results you're looking for. You can return and iterate on your experiment by changing values in the various models.
5.0 Publishing and create web service
Once we know which algorithm to use, we can now put the experiment into production and create a web service so that we can allow applications to connect and parse data to it.
The following is the procedure to deploy a new web service for our experiment.
- Right-click on the model we need and click save as Trained Model.
- Create a new model for publishing.
- Add the Data Source, Train Model and Score Model.
Also, add a missing value scrubber, to replace all the missing values from the data set.
- Define the input and output ports of the score model.
- Run the experiment and hit publish web service,
The web service is now created.
To place it into production, go to the web service, click on settings and set it as ready for production.
6.0 Testing and consuming the web service
6.1 Testing
On the web service page, click on the “Test” link as shown below.
Fill in the form with the test values and click OK.
Let's run the same test with someone with lower income and higher loan amount.
6.2 Consuming the web service from a client application
Azure ML generates sample code to consume your service from C#, R and Python.
To view the code click on “API help page” from the services page.
The following changes need to be made to the C# Project.
- Replace the API key with your key.
- Key in the required parameters.
- Run and verify the outcome.
Azure ML Free Tier
Azure ML is now available to test free of charge without a subscription or credit card, all you need to get going is a Microsoft account!
Click here to get started now!
References
- Unsupervised Learning
- Cluster analysis
- Walkthrough Step 4: Train and evaluate the predictive analytic models