The prefix function Π for a pattern gives an idea about how the pattern matches against shifts of itself. This information can help in avoiding some useless shifts that are performed in the naive algorithm for string matching. It (Π[k]) is defined as the length of the largest proper prefix of pattern P that is also a proper suffix of P. It enables avoiding backtracking on the text string ‘T’. The algorithm to calculate Π can be found in textbooks or on several websites. Here is the python implementation of the algorithm mentioned in the algorithm book by CLRS. The string matching KMP algorithm is present here.
def calculate_PI(p):
l = len(p)
pi = [0]
k = 0
for i in xrange(1,l):
while(k > 0 and p[k] != p[i]):
k = pi[k-1]
if (p[k] == p[i]):
k = k+1
pi.append(k)
print "pattern is: ", p
print "pi values are: ", pi
str = "civic"
calculate_PI(str)
str = "malyalam"
calculate_PI(str)
str = "ababababca"
calculate_PI(str)
str = "mississippi"
calculate_PI(str)
The output of the above code is as follows…
pattern is: civic
pi values are: [0, 0, 0, 0, 1]
pattern is: malyalam
pi values are: [0, 0, 0, 0, 0, 0, 0, 1]
pattern is: ababababca
pi values are: [0, 0, 1, 2, 3, 4, 5, 6, 0, 1]
pattern is: mississippi
pi values are: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
There is a bug in your code. It loops for string ‘aab’.
Thanks for letting me know. It should be “k = pi[k-1]” in the while loop. Check the modified code.
pattern is: aab
pi values are: [0, 1, 0]