Spam Detection For Text Messages In ASP.NET Core Using ML.NET

Habibul Rehman
Jul 13, 2020

22.1k
0
11
- facebook
- twitter
- linkedIn
- Reddit
- WhatsApp
- Email
- Print
- Other Artcile

Problem

This problem is centred around developing an ASP.NET Core MVC Web application that will detect whether a message is a spam or not, based on message content given by the user. To solve this problem, we will build an ML model that takes an input of one column Message, 'Label' column which is what you want to predict, and in this case, is named 'Label', which will tell us the predicted result. To train our model, we will use the SMS Spam Collection Dataset downloaded from UCI ML Repository.

As we are classifying the text messages into one of two categories, we will use binary classification for this problem.

Solution

To solve this problem, first, we will build an estimator to define the ML pipeline we want to use. Then we will train this estimator on existing data, and lastly, we'll consume the model in our ASP.NET Core Web Application to predict whether a few examples messages are spam or not.

Prerequsites

ASP.NET Core (I'm using .NET Core 2.1)
Visual Studio (I'm using VS2019)
Microsoft.ML package (I'm using ML 1.3.1)

Let's start.

Before going to build our Machine Learning model, we need to set up our ASP.NET Core web application.

First, we will create an ASP.NET Core web application and setup the basic data structures that will be used later for Machine Learning model.

Open Visual Studio and select ASP.NET Core.
Please enter your project name. I'm going to enter Spam Detection and click on the "Create" button.
Now, we will select the design pattern that we will be used to develop our project. In my case, I'm going to select the Web Application MVC.
We have created our ASP.NET Core MVC web application successfully. Now, first of all, we will download the SMS Spam Collection Data Set from UCI ML Repository and transform the data structure according to our requirement.
After downloading the data, extract the "zip" file.
We will add a new folder named "Data" and will place the "SMSSpamCollection" dataset file inside this.
There are two columns inside the dataset file - Lable and Message. In the original dataset, the Lable column is having two values - spam and ham. We will replace spam and ham with True and False respectively. Our data will look like this (Top 5 records from dataset just for view).

false Go until jurong point, crazy.. Available only in bugis n great world la e buffet... Cine there got amore wat...
false Ok lar... Joking wif u oni...
true Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. Text FA to 87121 to receive entry question(std txt rate)T&C's apply 08452810075over18's
false U dun say so early hor... U c already then say...
false Nah I don't think he goes to usf, he lives around here though

Now, we will implement the basic data structure that will be used to build our ML.NET Model. But first, we will install the prerequisite package using "NuGet Package Manger". Open Nuget Manager Console and use "Install-Package Microsoft.ML".
Now, we will create two new classes inside the Data folder, named "SpamInput" and "SpamPrediction" and both classes will be using "Microsoft.ML.Data".
Here is our "SpamInput" class snippet.
1. public class SpamInput
2. {
3. [LoadColumn(0)]
4. public string Label { get; set; }
5. [LoadColumn(1)]
6. public string Message { get; set; }
7. }
And, here is our "SpamPrediction" class snippet.
1. public class SpamPrediction
2. {
3. [ColumnName("PredictedLabel")]
4. public string isSpam { get; set; }
5. }
After implementing the basic data strucutres, now we will move to build our ML.NET Model for SMS Spam Detection.
First of all, we will set up the MLContext which is a catalog of components in ML.NET.
1. var mlContext = new MLContext();
Now, we will specify the schema for spam data and read it into DataView.
1. var data = mlContext.Data.LoadFromTextFile<SpamInput>(path: TrainDataPath, hasHeader: true, separatorChar: '\t');
After specifying the data schema, we will make data process configuration with pipeline data transformations.
1. var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Label")
2. .Append(mlContext.Transforms.Text.FeaturizeText("FeaturesText", new Microsoft.ML.Transforms.Text.TextFeaturizingEstimator.Options {
3. WordFeatureExtractor = new Microsoft.ML.Transforms.Text.WordBagEstimator.Options { NgramLength = 2, UseAllLengths = true },
4. CharFeatureExtractor = new Microsoft.ML.Transforms.Text.WordBagEstimator.Options { NgramLength = 3, UseAllLengths = false },
5. }, "Message"))
6. .Append(mlContext.Transforms.CopyColumns("Features", "FeaturesText"))
7. .Append(mlContext.Transforms.NormalizeLpNorm("Features", "Features"))
8. .AppendCacheCheckpoint(mlContext);
Now, we will set up a training algorithm that will be used to train our ML.NET Model.
1. var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll
2. (mlContext.BinaryClassification.Trainers.AveragedPerceptron(
3. labelColumnName: "Label", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "Label"
4. ).Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
5. var trainingPipeline = dataProcessPipeline.Append(trainer);
To start training our ML.NET Model, we will fit data into model.
1. var model = trainingPipeline.Fit(data);
At last but not least, we will predict the message using ML.NET Model.
1. var predictor=mlContext.Model.CreatePredictionEngine<SpamInput, SpamPrediction>(model);
2. //Predict
3. var prediction = predictor.Predict(input);
I've created a new class named "SpamDetectionMLModel" inside "ML Model" folder. I've implemented the ML.NET in this class and will use this as an interface.
Here is the result how "SpamDetectionMLModel" looks like.
1. public class SpamDetectionMLModel
2. {
3. private static string AppPath => Path.GetDirectoryName(Environment.GetCommandLineArgs()[0]);
4. private static string TrainDataPath => Path.Combine(AppPath, "..", "..", "..", "Data", "SMSSpamCollection.csv");
5. private MLContext mlContext;
6. private ITransformer _model;
7. private EstimatorChain<TransformerChain<KeyToValueMappingTransformer>> _trainingPipeline;
8. private IDataView _data;
9. public SpamDetectionMLModel()
10. {
11. mlContext = null;
12. _model = null;
13. _trainingPipeline = null;
14. _data = null;
15. }
16. public void Build()
17. {
18. // Set up the MLContext, which is a catalog of components in ML.NET.
19. mlContext = new MLContext();
20. // Specify the schema for spam data and read it into DataView.
21. _data = mlContext.Data.LoadFromTextFile<SpamInput>(path: TrainDataPath, hasHeader: true, separatorChar: '\t');
22. // Data process configuration with pipeline data transformations
23. var dataProcessPipeline = mlContext.Transforms.Conversion.MapValueToKey("Label", "Label")
24. .Append(mlContext.Transforms.Text.FeaturizeText("FeaturesText", new Microsoft.ML.Transforms.Text.TextFeaturizingEstimator.Options
25. {
26. WordFeatureExtractor = new Microsoft.ML.Transforms.Text.WordBagEstimator.Options { NgramLength = 2, UseAllLengths = true },
27. CharFeatureExtractor = new Microsoft.ML.Transforms.Text.WordBagEstimator.Options { NgramLength = 3, UseAllLengths = false },
28. }, "Message"))
29. .Append(mlContext.Transforms.CopyColumns("Features", "FeaturesText"))
30. .Append(mlContext.Transforms.NormalizeLpNorm("Features", "Features"))
31. .AppendCacheCheckpoint(mlContext);
32. // Set the training algorithm
33. var trainer = mlContext.MulticlassClassification.Trainers.OneVersusAll(mlContext.BinaryClassification.Trainers.AveragedPerceptron(labelColumnName: "Label", numberOfIterations: 10, featureColumnName: "Features"), labelColumnName: "Label")
34. .Append(mlContext.Transforms.Conversion.MapKeyToValue("PredictedLabel", "PredictedLabel"));
35. _trainingPipeline = dataProcessPipeline.Append(trainer);
36. }
37. public void Train()
38. {
39. //Train model
40. _model = _trainingPipeline.Fit(_data);
41. }
42. public SpamPrediction Predict(SpamInput input)
43. {
44. var predictor=mlContext.Model.CreatePredictionEngine<SpamInput, SpamPrediction>(_model);
45. return predictor.Predict(input);
46. }
47. }
After ML.NET Model implementation, now, we will implement and view, that will provide the user interface for consuming our SMS Spam Detection ML.NET Model.
So first, we will create an empty controller named "SpamDetection" and use the following snippet inside our controller.
1. public class SpamDetectionController : Controller
2. {
3. public IActionResult Predict()
4. {
5. return View();
6. }
7. [HttpPost]
8. public IActionResult Predict(SpamInput input)
9. {
10. var model = new SpamDetectionMLModel();
11. model.Build();
12. model.Train();
13. ViewBag.Prediction = model.Predict(input);
14. return View();
15. }
16. }
After controller implementation, we will implement our View. Create an empty view named "Predict" inside Views>SpamDetection folder. I've kept it simple.
1. @model Spam_Detection.Data.SpamInput
3. @{
4. ViewData["Title"] = "Spam Detection";
5. }
7. <h1>Spam Prediction for Text Messages</h1>
9. <div class="row">
10. <div class="col-md-6">
11. <form asp-action="Predict">
12. <div asp-validation-summary="ModelOnly" class="text-danger"></div>
13. <div class="form-group">
14. <label asp-for="Message" class="control-label"></label>
15. <textarea asp-for="Message" class="form-control" ></textarea>
16. <span asp-validation-for="Message" class="text-danger"></span>
17. </div>
18. <div class="form-group">
19. <input type="submit" value="Predict" class="btn btn-primary" />
20. </div>
21. </form>
22. @if (ViewBag.Prediction != null)
23. {
24. <h4>Is Spam : @ViewBag.Prediction.isSpam</h4>
25. }
26. </div>
27. </div>
29. @section Scripts {
30. @{await Html.RenderPartialAsync("_ValidationScriptsPartial");}
31. }

Conclusion

So we have built the solution to our problem.

We have created ASP.NET Core MVC Web Application template.
We have downloaded and SMS Spam Collection data set files for training our model.
We have implemented basic data strucutres for using ML.NET Model.
We have Build, Trained, and Consumed our ML.NET Model for SMS Spam Detection problem.
At last we build a user interface that would allow user to enter their message to predict whether the message is spam or not.

So here is the final look at our solution.
Now press F5 or select Debug>Start Debuging. Our application will be started.

Demo

Spam Detection For Text Messages In ASP.NET Core Using ML.NET

Note

In this article, we learned how to develop an ASP.NET Core MVC Web Application for Spam Detection for Text Messages and how to build, train, and consume Spam Detection ML.NET Machine Learning model in ASP.NET Core application.

For more information about dataset attributes, please check out UCI ML Repository.

You can download the demo project from my GitHub repository heart disease prediction.