In Rabin-Karp string matching algorithm, both pattern and text are represented in digits and then the remainder is calculated for the text numbers. The calculation of the remainder is performed as follows…
- Starting from the very first position of the text string, a number of the same size as pattern length is taken. If it is the very first number, get the remainder diving by a number (Usually take a prime number between 0 & 20), else do the following to calculate the remainder..
- rem = ((previous remainder – (digit to be dropped * 10 patlen-1) mod prime_num)*10 + next digit to be added) mod prime_num
- e.g. Suppose patlen = 5 and the text is 124327. If we already calculated the remainder for 12432, then digits to be dropped = 1 and next digit to be added =7
Here is the python code to calculate the remainder for the whole text.
def modulo (p,t):
n = len (t) # text length
m = len (p) # pattern length
rem = [] # list of remainders
q = 13 # prime number
for i in xrange (n-m+1):
if (i == 0):
x = int(t[0:m])%q #store as string
rem.append(str(x)) #store as string
else:
x = ((int(rem[i-1]) - (int(t[i-1])*(10**(m-1)))%q)*10 + int(t[i+m-1]))% q
rem.append(str(x))
print "Remainder list: ", rem
pat = "31415"
tex = "2359023141526739921"
modulo(pat,tex)
The output of the above code is as follows…
Remainder list: [‘8’, ‘9’, ‘3’, ’11’, ‘0’, ‘1’, ‘7’, ‘8’, ‘4’, ‘5’, ’10’, ’11’, ‘7’, ‘9’, ’11’]