of 14 /14
Data Engineering and Cloud Computing Department Reva University Novel Approach for String Searching and Matching using American Standard Code for Information Interchange Value

String Searching and Matching

  • Author

  • View

  • Download

Embed Size (px)

Text of String Searching and Matching

Data Engineering and Cloud Computing DepartmentReva UniversityNovel Approach for String Searching and Matching using American Standard Code for Information Interchange Value

NOTE:To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image.


Abstract IntroductionHash TableString Matching Algorithm Design ProposedAlgorithm of the Proposed WorkResults and DiscussionConclusionReference

The algorithms based on string matching generally searches for the search string in the database and find all the occurrences of the search string. This paper introduces a novel approach for string searching and matching to identify the correct occurrence of a given search string. The proposed work is based on calculating the sum of ASCII values of each character in the search string and comparing this sum with only the names which have the same ASCII values in the database. This is implemented using hashing and hence it can limit the search to only a few name strings. After searching for the corresponding ASCII value the string matching is done by comparing the first and last character of the search string with the name strings. If there is a match, then any two random positions in the search string are considered for comparison. If all these four positions match, then the whole string is compared, otherwise the string is skipped from further comparisons. This method is efficient in identifying the search string easily and the number of comparisons is reduced.

Umma Khatuna Jannat Abstract

IntroductionIn general, string searching and matching form an important class of algorithms that tries to locate the position of occurrence of patterns that may be occur in a bigger text or string.

String matching is a basic and important research subject in computer science, which plays a crucial role in text processing. String-matching algorithms are implemented in a large number of software applications. They generally help in finding all occurrences of a pattern present in a text. String matching algorithms generally helps in finding one or all occurrences of a search string, also called as a pattern, in an input string. Multiple pattern matching is a one in which more than one search string is simultaneously matched against the text or otherwise it is called single pattern matching.Umma Khatuna Jannat

Hash TableA hash table is a data structure that stores elements and allows insertions, lookups, and deletions to be performedA hash table is an alternative method for representing a dictionaryIn a hash table, a hash function is used to map keys into positions in a table. This act is called hashingHash Table OperationsSearch: compute f(k) and see if a pair existsInsert: compute f(k) and place it in that positionDelete: compute f(k) and delete the pair in that positionIn ideal situation, hash table search, insert or delete

Umma Khatuna Jannat

Hash Table

Umma Khatuna Jannat

String Matching

String matching algorithms, are an important class ofstring algorithmsthat try to find a place where one or severalstrings(also calledpatterns) are found within a larger string or text.Umma Khatuna Jannat

Algorithm Design ProposedUmma Khatuna Jannat

Figure 1 Work Flow diagram

Algorithm of the Proposed Work

1) Start2) Store the input Search String in s3) Ss.to lowercase()4) Get the ascii of each character in the search string and compute their sum.5) Using modulo hash function navigate to the corresponding ASCII index.6) FgetFirstcharacter(S)7) Using modulo hash function go to the index F8) Lgetlastcharacter(S)9) Match L with the last character of the name string in that block. If true, then do steps 10-12 else skip that string and go onto the next string in that block10) Get 2 random positions,Mgetmthcharacter(S),Ngetnthcharacter(S )11) If s[M]=namestring[M] and S[N]=namestring[N] is true12) Then compare all places of the string excluding positions FLMN.13) Else go to step 8 and continue the process until a match is found or until the end of the block14) Stop

Umma Khatuna Jannat

Umma Khatuna JannatThis algorithm is implemented using a database with 5500 names. This algorithm reduces the number of comparisons by a large number. So strings out of 5500 to find the correct match, which is 0.218% of 5500. Results and Discussion

Figure 2: Performance with various number of name strings

Umma Khatuna JannatIn Figure 3, Large, medium and small represent the length of the search string used and the chart represents the maximum number of comparisons to be made using various algorithms.Results and Discussion

Figure 3: Boyer Moore, Brute Force and the proposed algorithm based on the number of comparison operations.

ConclusionIn the work a novel approach for string searching and matching is proposed based on the American Standard Code for Information Interchange value. The number of comparisons are reduced significantly, to a maximum value of the worst case and the best case. It has a maximum time complexity .In the future this algorithm can be enhanced by dividing the names in the block into four quarters and perform the match so that we can reduce the factor .Umma Khatuna Jannat

Reference [1] A. Hume and D. Sunday, Fast String Searching, Journal of Software: Practice and Experience , Vol. 21, No.11, pp.1221-1248, 1991.[2] A.M. Alshahrani and M.I. Khalil, Exact and Like String Matching algorithm for Web and Networks Security, Computer and Information Technology(WCCIT), pp.1-4,2013.[3] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms".[4] R.S.Boyer, J.S.Moore,"A Fast String Searching Algorithm.,"Comm. ACM , Vol.20 , No 10, pp. 762-772 , 1977.[5] D.Knuth,J.H.Morris, V.Pratt, "Fast Pattern Matching in Strings,". SIAM Journal on Computing, Vol.6 , No.2 , pp.323-350 , 1977.[6] R.Cole, "Tight Bounds on the Complexity of the Boyer-Moore String Matching Algorithm," Proc. ACM-SIAM symposium on Discrete algorithms,pp.224-233, 1991.[7] Z. Galil, "On Improving the Worst-case Running Time of the Boyer-Moore String Matching Algorithm". Comm. ACM ,Vol.22, No.9, pp.505-508, 1979[8] V.Gupta ,M.Singh and K.B.Vinod,Pattern Matching Algorithms for Intrusion Detection and Prevention System: A Comparative Analysis , Intl.Conf.Advances in Computing, Communications and Informatics, pp.50-54,2014.

Thank you!