How to remove duplicate words from string in C#

Introduction

Removing duplicate words from a string in C# is a common programming task that can be useful in many scenarios. For instance, you may want to remove duplicate words from a user's input in a search bar to ensure more accurate search results. Fortunately, there are several ways to accomplish this task in C#. In this article, we will explore all possible methods to remove duplicate words from a string in C# with examples and explanations.

Methods to remove duplicate words from a string

  • Using Regular Expressions
  • Using Split() and Distinct()
  • Using Dictionary

Method 1. Using Regular Expressions

Regular expressions are a powerful tool for pattern matching in strings. We can use regular expressions to match and remove duplicate words from a string in C#. Here's how:

using System.Text.RegularExpressions;

string input = "C# Corner is a popular online online community";
string output = Regex.Replace(input, @"\b(\w+)\s+\1\b", "$1");

Console.WriteLine(output); 
  • First, we import the System.Text.RegularExpressions namespace to use regular expressions.
  • Then, we define a string variable input with the input string that we want to remove duplicates from.
  • Next, we use the Regex.Replace() method to match and replace duplicate words in the input string.
  • The regular expression \b(\w+)\s+\1\b matches any word character (\w+) that is followed by one or more whitespace characters (\s+) and then the same word again (\1). The \b at the beginning and end ensure that the match is a whole word, not just a part of a larger word.
  • Finally, we replace the duplicate word with just the first occurrence of the word ($1) using the regular expression replacement syntax.

Method 2. Using Split() and Distinct()

Another way to remove duplicate words from a string in C# is to use the Split() method to split the string into an array of words, then use the Distinct() method to remove duplicates, and finally join the array back into a string. Here's an example:

string input = "C# Corner is a popular online community popular online community";
string[] words = input.Split(' ');
string[] distinctWords = words.Distinct().ToArray();
string output = string.Join(" ", distinctWords);
Console.WriteLine(output); 
  • First, we define a string variable input with the input string that we want to remove duplicates from.
  • Then, we use the Split() method to split the input string into an array of words, using a space character as the separator.
  • Next, we use the Distinct() method to remove duplicates from the array of words.
  • Finally, we join the distinct words back into a string using the string.Join() method, again using a space character as the separator.

Method 3. Using Dictionary

We can also use a dictionary to remove duplicate words from a string in C#. Here's how:

string input = "C# Corner is a popular online community popular online community";
string[] words = input.Split(' ');
Dictionary<string, int> dict = new Dictionary<string, int>();

foreach (string word in words)
{
    if (!dict.ContainsKey(word))
    {
        dict.Add(word, 0);
    }

    dict[word]++;
}

string output = string.Join(" ", dict.Keys);

Console.WriteLine(output);
  • First, we define a string variable input with the input string that we want to remove duplicates from.
  • Then, we use the Split() method to split the input string into an array of words, using a space character as the separator.
  • Next, we define a dictionary dict that we will use to keep track of the word occurrences.
  • We iterate over each word in the words array using a foreach loop.
  • For each word, we check if it exists in the dictionary using the ContainsKey() method. If it doesn't exist, we add it to the dictionary with an initial count of 0 using the Add() method.
  • Finally, we increment the count of the word in the dictionary by 1 using the ++ operator.
  • After all the words have been processed, we join the distinct words in the dictionary using the Keys property and the string.Join() method.

FAQs

Q- What is the difference between Method 2 and Method 3?
A- Method 2 uses the Split() and Distinct() methods to remove duplicate words, while Method 3 uses a dictionary to keep track of the word occurrences. Method 3 is more flexible and can be easily modified to perform other tasks such as counting the occurrences of each word.

Q- Can I use these methods to remove duplicate characters from a string?
A- No, these methods are specifically designed to remove duplicate words from a string. To remove duplicate characters, you can use methods such as Distinct() or a loop to iterate over each character in the string and remove duplicates manually.

Q- Are these methods case-sensitive?
A- Yes, these methods are case-sensitive. To make them case-insensitive, you can use the ToLower() or ToUpper() methods to convert the input string and the words to lowercase or uppercase before processing them.


Similar Articles