Introduction
There are distinct phases or steps that have been carried out to build a complete machine learning model. The sequence of the phases or steps can be defined as a Machine Learning Workflow.
Machine Learning
Machine Learning is an apparatus for turning information into knowledge. Machine learning (ML) is a type of artificial intelligence (AI) that allows software applications to become more accurate at predicting outcomes without being explicitly programmed to do so. Machine learning algorithms use historical data as input to predict new output values.
Machine Learning Workflow
- Specifying Problem
- Data Preparation
- Selection of Algorithm
- Training the Model
- Testing the Model.
Machine Learning Workflow
- Ask the right question
The ML workflow starts with defining a specific question or problem with a defined boundary. The right question will lead you to know about data and its preparation, identifying algorithm, testing model, and overall outcome of the Model. Some examples: 1. Suppose you need to predict an individual’s credit risk based on the information they gave on a credit application. Credit risk assessment is a complex problem, however, the ML solution can add a new dimension for effective analysis. 2. A solution that will tell which tweets will get retweets.
- Data preparation
This is the most important phase of the machine learning solution, which absolutely depends on phase 1 i.e. the Problem. Defining the problem or accurate question leads to knowing about data and its preparation. Almost 60% of the overall time will be spent on data preparation. Data Preparation, in general, means transforming raw data into a formatted which can be modeled using machine learning algorithms. This phase includes a number of sub-steps like Data cleaning, Filtering, Manipulating, Scaling and Reduction, Sample, and Splitting. Furthermore, the actions which are carried out for data cleaning or manipulating, are: adding column/rows, Clean missing data, edit metadata, join data, remove duplicate rows, categorization, and many more. Another important point to note that we always split data into at least two parts, training and testing the dataset, which is also considered as a part of data preparation.
- Selecting the Algorithm
Choosing the algorithm is solely dependent upon the problem (Phase 1: Question) for which we are designing the ML model. There are numerous well-established algorithms available and are ready to apply for machine learning solutions. Anomaly Detection, Classification, Clustering, Regressions are the types of model or algorithm which are categorized based on the problems. There are, furthermore, many mature algorithms that are available under each category. Some of the examples of machine learning algorithms are Linear Regression, Neural Network Regression, Two class Decision Forest, Multiclass Decision Jungle, K-means Clustering, PCA-Based Anomaly Detection, etc. As an ML solution, we never work on designing or creating algorithms and this is not part of the machine learning solution. Nevertheless, we only do trails with different established algorithms and find a suitable one for our problem.
- Training the Model
This stage is also known as the fitting stage, where the prepared and formatted data are used in the selected algorithm to train the model. This process, alternatively means the model will learn from the prepared training data.
- Testing and Evaluating the Model
As in earlier stage ( data preparation), data are divided into two parts: The training and testing dataset. In this stage, testing data are used to check the score of the model and to know how well it performs. Test data are feed into the trained model and evaluate the output with actual data to know the accuracy level.
- Maintenance
This is also one the curial part to maximize the model performance where the new or recent data are again used for model and proceed through all the processes.
Most of the phases are repetitive depending on the result of testing and evaluation. If the evaluation score is below the expectation, then the process will step back by one phase and select another algorithm to process further. This is a continuous process of machine learning. Sometime, we may need to jump back in the data preparation phase based on the evaluation.
Conclusion
Machine Learning workflow is a combination of the defined steps in a specific succession. It starts with defining problems and processes through Data preparation, Algorithm Selection, Training Model, Testing, and Evaluation respectively. More importantly, the later phases are iterative depending upon the evaluation. Maintenance, in addition, has also great significance in machine learning performance.