Sometimes we are required to compare strings based on how the string sounds when spoken. For such a task SQL Server provides SOUNDEX and DIFFERENCE function. In this article we will read about both functions.
SOUNDEX
SOUNDEX converts an alphanumeric string to a four-character code that is based on how the string sounds when spoken. The first character of the code is the first character of character_expression, converted to upper case. The second through fourth characters of the code are numbers that represent the letters in the expression. SOUNDEX ignores the vowels(‘A’,’E’,’I’,’O’,’U’) from alphanumeric strings until that are not the first character of the string. If alphanumeric string is small and necessary to generate a four-character code then extra zeroes are added at the end of the string.
Syntax
Example
In the above example we generate SOUNDX code for strings. Now a question arises -- how does SQL Server generate SOUNDEX code for given alpha numeric string? As I described, that first letter of SOUNDEX code is the first letter of the alpha numeric string and the last 3 digits are assigned according this table.
Number |
Represent the Letters |
1 |
B,F,P,V |
2 |
C, G, J, K, Q, S, X, Z |
3 |
D, T |
4 |
L |
5 |
M,N |
6 |
R |
A, E, I, O, U, H, W, and Y letters are ignored from alpha numeric string.
You can find more information about “SOUNDEX system” here. Soundex
For “csharpcorner” we get “C612” code. In this code “C” is for first letter, 6 for “R”, 1 for “P” and 2 for “C”.
DIFFERENCE
The DIFFERENCE function compares two expressions and assigns a value between 0 and 4, with 0 being little to no similarity and 4 representing the same or very similar phrases. This value is derived from the number of characters in the SOUNDEX of each phrase that are the same.
Syntax
Example
Output
In the above example we get the value between 0 to 4 that represents the similarity betwen the SOUNDEX of two words.
Now we have knowledge of soundex but a question arises -- what is the use of soundex or where can we implement it in our project. Generally SOUNDEX is used in a search engine. SOUNDEX is used in FULL-Text search where we want to search similar words.
Let us create a table and insert some data into this.
- DECLARE @tbl_Soundex AS TABLE
- (
- Word varchar(max)
- );
-
- INSERT INTO @tbl_Soundex
- SELECT 'Symth'
- UNION ALL
- SELECT 'Alwar'
- UNION ALL
- SELECT 'csharpcorner'
- UNION ALL
- SELECT 'csharpcor'
- UNION ALL
- SELECT 'seesharpcorner'
- UNION ALL
- SELECT 'Choudhary'
-
- SELECT * FROM @tbl_Soundex ts
Table
Example 1
We have an entry for “Symth” in the table. We have a person who wants to search for “Symth” but he writes “Smith” instead of “Symth”. In this case we try to use the like operator then we don't get any result.
We can solve this problem by using the SOUNDEX operator because SOUNDS of both word (‘Smyth’,’Smith’) are the same.
After using the SOUNDEX function we find a match for ‘Smith’ into the table.
Example 2
My name is “Pankaj” and I used “Choudhary” as my surname. But many time I find that people use “Chaudhary” or “Chowdhry” instead of “Choudhary”. We have a table that contains the following data.
Now we want to search for all persons who use “Choudhary” as a surname. If we use the like operator then we only get a single record.
Above result is not the correct result because there is another person who uses different spellings for “Choudhary” but it sounds the same . In this we can use the SOUNDEX function to find out all similar results.
Conclusion
Use SOUNDEX and DIFFERENCE function when you want to perform searching on the SOUND or pronunciation of the words. Thanks for reading the article.