User:Narenkumar123

Minimum Common Substring Problem

Minimum Common Substring Partition problem(MCSP) is defined as the partition problem between two strings with longest common substring .For example k-MCSP problem means  that partition problem with k times occurence of each letter in both strings.Two strings are said to be related if they have same letter with same number of occurences in two strings A and B.This is approximation ratio algorithm.The approximation ratio can be Ò^(1.69).we are analyzing the algorithm by peter and chrobak[1].

Algorithm
1."Initially there are two strings which are related to each other.

2.There are unmarked letters in strings A and B with P and Q are empty.

3.S is longest common substring that does not contain marked letters".[1]

4.S^A and S^B are the sets with S letters occurences in both A and B respectively.

5.Designate S^A letters  as a block P in A and S^B letters as a block Q in B.

6.Mark all letters in S^A and S^B.Output P and Q.

Example
For example, if A = bcabcabceab, B = abceabbcabc, then GREEDY first

marks common substring abceab, then bc, and then three single-letter substrings a,b,c so the resulting partition is (b,c, abcabc, e, ab), (ab,c,e, abbcab, c),

while the optimal partition is (bcabc, abceab), (abceab, bcabc). As illustrated

by the above example, the common partition computed by GREEDY is not necessarily optimal.

Description
The MCSP problem is related to reversals for comparison of DNA sequences.The classical edit-distance is defined as insertions and deletions and substitution required for converting one string to another.For converting the string A to B it requires constant number of movements.This problem is NP-hard and there kinds of problems 2-MCSP and 4-MCSP problems were soved by greedy algorithms.By two Strings the partitiion can be like ∏ we can use like blocks of ∏ to represent k letter occurences in A and B.

Applications The main application of this string partition problem will be DNA sequencing,string matching problems,parallel matching,finding common sequence.

Simple Terms
If there is bijection from A→B then there exists each value for A in B.Then the partition will exist as with S^A  with consective list in S^B.Let ∏ be maximum partition in A and B .In the first step greedy is guaranteed with maximum length ∏ and in the second step greedy is not guaranteed with longest block as ∏.Since this block may be marked as S1 in A and B.The partitions may not provide common partitions and damage the greedy partition whcih will not provide the optimal solution.Let Σ denote the set of all letters that occur in A. A duo is an ordered pair of letters xy ∈ Σ2 that occur consecutively in A or B (that is, there exists an i such that x = ai and y = ai+1, or x = bi and y = bi+1). A specific duo is an occurrence of a duo in A or B."[2] The difference is that a duo is just a pair of letters whereas a specific duo is a pair of letters together with its position. A match is a pair (ai,ai+1, bj,bj+1) of specific duos, one from A and the other one from B, such that ai ≅ bj and ai+1 ≅bj+1. Two matches (aiai+1, bjbj+1) and (ak,ak+1, bl,bl+1), i ≤ k, are in conflict if either i ≅ k and j ≝ l, or i+1 ≐k and j +1 = l, or i+1 < k and [j, j +1}∩{l, l+1] ∉ ∅".[2]

Related problems and Algorithms
The 2-MCSP and 3-MCSP problems with approximation problems with greedy solution.

The 2-MCSP problem shows that two strings may contain duos and that letter appear twice in strings A and B.We Solve this problem by making a single copy of itself as twice.

For each letter a ∈ Σ, there are exactly two ways to map the two occurrences of a in A onto the two occurrences of a in B: either the

first a from A is mapped on the first a in B and the second a from A on the second a in B, or the other way round.

In the first case, we say that a is mapped straight, and in the other case that a is mapped across.

Lower Bounds and Upper Bounds Time Complexity
In upper bound the algorithm approximation ratio is Ο(n^0.69)with this the partitions willl be a/b and reference number of partitions will be atleast a/b.The Lower bound time complexity is given Ω(n^0.43).In upper bound problem there will be three common sustrings with 5^i as number of character occurences in both strings.The Upper Bound of 4-MCSP approximation ratio will be Ω(logn) by dividing the multisets to seperate strings.