User:Tvign

The longest common subsequence problem (LCS) is a computer science problem to find the longest subsequence common to a set of sequences. This paper will address the special problem of comparing the words or whole lines of 2 text files where the goal is to discover how the second file differs from the first. It is useful in comparing versions of text files to determine what changes were made in subsequent versions. We will call this special case of the problem to have a bias for the first sequence.

Using Bias
The problem can be solved in polynomial time with memory for the two sequences and memory for a binding vector for each sequence. The binding vectors are vectors that remember which element of one sequence is common to the other. A matrix of common sequence lengths is not required as in the classical LCS solution.

Although in the classical problem, the LCS is not unique, it is unique in this special case of bias.

Suffixes
Let us define a suffix of a sequence as the sequence with the top cut off. Let X be the sequence {GCAT},  Then there are 4 suffixes, S1, S2, S3, S4 which are subsequences of X where the subscript denotes the number of elements in the subsequence. The possible suffixes of X are
 * S1 = {T}
 * S2 = {AT}
 * S3 = {CAT}
 * S4 = {GCAT} = X

CS function
The CS function is a function that yields a common subsequence, but not necessarily the longest. It is a guess of how the second sequence is a modification of the first. Given two sequences, $$X = (x_1, x_2,...,x_m)$$ and $$Y = (y_1, y_2,...,y_n)$$, their binding vectors, Xbind[1..m] and Ybind[1..n], and their corresponding suffixes, $$X_1, X_2,...,X_m$$ and $$Y_1, Y_2...,Y_n$$ the CS function is given by

CS\left(X_{i},Y_{j}\right) = \begin{cases} \empty & \mbox{ if }\ i = 0 \mbox{ or } j = 0 \\ \textrm{ ( } x_i, CS\left(X_{i-1},Y_{e-1}\right)), Xbind_i=e,Ybind_e=i & \mbox{ if } x_i = y_e,1 \le e \le j \\ \mbox { }CS(X_{i-1},Y_{j}) & \mbox{ if } x_i \notin Y_j \\ \end{cases} $$

LCS function
Now, the longest common subsequence can be found using the CS function and is

LCS\left(X_{i},Y_{j}\right) = \begin{cases} \textrm{ ( }CS\left(X_{i},Y_{j}\right)) & \mbox{ if } CS\left(X_{i},Y_{j}\right)) = CS\left(Y_{j},X_{i}\right)) \\ \mbox { }\left(y_j,LCS(X_{e-1},Y_{j-1})\right) & \mbox{ if } y_j = x_e,Xbind_e \ne j, 1 \le e \le j \\ \end{cases} $$

Example
Given the two sequences, $$Version 1 = (BCDFKLP)$$ and $$Version 2 = (ABDEKN)$$ the longest common sequence is $$(BDK)$$. A Difference Report of the two sequences side by side commenting on the additions, changes and deletions made to the left (bias) sequence, might be

Version 1   Version 2 -   -                A  Added B           B   C  Deleted D           D   F            E  Changed K           K   L            N  Changed P Deleted Now, consider the Versions reversed. $$Version 1 = (ABDEKN)$$ and $$Version 2 = (BCDFKLP)$$. Although the same longest common sequence is found, the additions, changes and deletions are different since the bias has changed.

Version 1   Version 2 -   -   A  Deleted B           B                C  Added D           D   F            E  Changed K           K   L            N  Changed P Added

Non-Recursive Code for LCS with Bias
function forwardTrace( X[1..m], Y[1..n]) )  array Xbind[1..m], Ybind[1..n]   Xbind, Ybind := Zero Vector   mtop, ntop := 1   j := 1   while j ≤ n      j, j0 := ntop      Find some CS(X,Y) scanning the first sequence X      for i := mtop..m         j := j0         while j ≤ n and Y[j] ≠ X[i]            j := j+1         end         if j ≤ n            Xbind[i] := j            Ybind[j] := i            j0 := j+1         else            Xbind[i] := 0         end       next i      i, i0 := mtop      Scan the second sequence to see if CS(Y,X) = CS(X,Y), i.e., symmetrical      isSymmetrical := true       j = ntop      while j ≤ n and isSymmetrical         i := i0         while i ≤ m and X[i] ≠ Y[j]            i := i+1         end         if i ≤ m            if Xbind[i] = 0 or ( Ybind[j] > 0 and Xbind(Ybind[j]) ≠ j ) CS(Y,X) ≠ CS(X,Y) at this j so… erase all bindings from here down for Ybind for j2 := j+1..n                 if Ybind[j2] > 0 Xbind[Ybind[j2]] := 0 Ybind[j2] := 0 end next j2              Set new bindings Ybind[j] := i              Xbind[Ybind[j]] := i               mtop := i+1 ntop := j+1 isSymmetrical := false end end j := j+1 end end return {Xbind, Ybind} ''For languages that cannot return sets, include the arrays as parameters by reference. ''

Print the Differences
This function will print the two sequences side by side commenting on the additions, changes and deletions made to the left (first) sequence, the bias sequence. Appropriate column formatting may be added.

function PrintDiff( X[1..m], Y[1..n], Xbind[1..m], Ybind[1..n] ) i, j := 1 while i ≤ m and j ≤ m     if Xbind[i] = 0 and Ybind[j] > 0 Print X[i] + " Deleted" i := i+1 else if Xbind[i] > 0 and Ybind[y] = 0 Print Space(10) + Y[j] + " Added" j := j+1 else if Xbind[i] = 0 and Ybind[j] = 0 Print X[i] + " " + Y[j] + " Changed" i := i+1 j := j+1 else Print X[i] + " " + Y[j] i := i+1 j := j+1 end end if i ≤ m     for i := i..m         Print X[i] + " Deleted" end end if j ≤ n     for j := j..n         Print Space(10) + Y[j] + " Added" end end return 0