What is KMP Algorithm?

Jayeskumar M

3.1k

What is KMP Algorithm for pattern matching?

I see the link I have posted was already shared. Good to know others also found it useful!

Guest User

I see the link I have posted was already shared. Good to know others also found it useful!

Guest User

KMP is an advanced pattern matching algorithm. I have seen people getting confused while learning it. Here is an article about KMP algorithm which explains the concept & time complexity in simple terms. I mean it is simpler than most other KMP resources :)

Guest User

KMP is an advanced algorithm & lot of resources are present in the internet. I would recommend https://www.w3spot.com/2020/07/kmp-algorithm-explained-in-plain-english.html as a primer. It can be as simple as it gets if you follow through the example there.

hantoniv kano

KMP
(Knuth Morris Pratt) Pattern Searching
The Naive pattern searching algorithm doesn’t work well in cases where we see many matching characters followed by a mismatching character. Following are some examples.

txt[] = “AAAAAAAAAAAAAAAAAB”
pat[] = “AAAAB”

txt[] = “ABABABCABABABCABABABC”
pat[] = “ABABAC” (not a worst case, but a bad case for Naive)
The KMP matching algorithm uses degenerating property (pattern having same sub-patterns appearing more than once in the pattern) of the pattern and improves the worst case complexity to O(n). The basic idea behind KMP’s algorithm is: whenever we detect a mismatch (after some matches), we already know some of the characters in the text of the next window. We take advantage of this information to avoid matching the characters that we know will anyway match. Let us consider below example to understand this.

Matching Overview
txt = “AAAAABAAABA”
pat = “AAAA”

We compare first window of txt with pat
txt = “AAAAABAAABA”
pat = “AAAA” [Initial position]
We find a match. This is same as Naive String Matching.

In the next step, we compare next window of txt with pat.
txt = “AAAAABAAABA”
pat = “AAAA” [Pattern shifted one position]
This is where KMP does optimization over Naive. In this second window, we only compare fourth A of pattern with fourth character of current window of text to decide whether current window matches or not. Since we know first three characters will anyway match, we skipped matching first three characters.

Need of Preprocessing?
An important question arises from the above explanation, how to know how many characters to be skipped. To know this, we pre-process pattern and prepare an integer array lps[] that tells us the count of characters to be skipped.

//program can be convert in c#
// C++ program for implementation of KMP pattern searching 
// algorithm 
void computeLPSArray(char* pat, int M, int* lps); 
// Prints occurrences of txt[] in pat[] 
void KMPSearch(char* pat, char* txt) 
{ 
 int M = strlen(pat); 
 int N = strlen(txt); 
 // create lps[] that will hold the longest prefix suffix 
 // values for pattern 
 int lps[M]; 
 // Preprocess the pattern (calculate lps[] array) 
    computeLPSArray(pat, M, lps); 
 int i = 0; // index for txt[] 
 int j = 0; // index for pat[] 
 while (i <; N) { 
 if (pat[j] == txt[i]) { 
            j++; 
            i++; 
 } 
 if (j == M) { 
            printf("Found pattern at index %d ", i - j); 
            j = lps[j - 1]; 
 } 
 // mismatch after j matches 
 else if (i <; N && pat[j] != txt[i]) { 
 // Do not match lps[0..lps[j-1]] characters, 
 // they will match anyway 
 if (j != 0) 
                j = lps[j - 1]; 
 else
                i = i + 1; 
 } 
 } 
} 
// Fills lps[] for given patttern pat[0..M-1] 
void computeLPSArray(char* pat, int M, int* lps) 
{ 
 // length of the previous longest prefix suffix 
 int len = 0; 
    lps[0] = 0; // lps[0] is always 0 
 // the loop calculates lps[i] for i = 1 to M-1 
 int i = 1; 
 while (i <; M) { 
 if (pat[i] == pat[len]) { 
            len++; 
            lps[i] = len; 
            i++; 
 } 
 else // (pat[i] != pat[len]) 
 { 
 // This is tricky. Consider the example. 
 // AAACAAAA and i = 7. The idea is similar 
 // to search step. 
 if (len != 0) { 
                len = lps[len - 1]; 
 // Also, note that we do not increment 
 // i here 
 } 
 else // if (len == 0) 
 { 
                lps[i] = 0; 
                i++; 
 } 
 } 
 } 
} 
// Driver program to test above function 
int main() 
{ 
 char txt[] = "ABABDABACDABABCABAB"; 
 char pat[] = "ABABCABAB"; 
 KMPSearch(pat, txt); 
 return 0; 
}

Output:
Found pattern at index 10

//Whats your opinion ??

Archana Parmar

Knuth Morris Pratt (KMP) is an algorithm, which checks the characters from left to right. When a pattern has a sub-pattern appears more than one in the sub-pattern, it uses that property to improve the time complexity, also for in the worst case.

Roshan Rathod