Longest sequences

These are a type of dynamic programming problem where we’re looking for common sequences of subelements within two or more elements. Most commonly, chars within a string.

Longest common substring

Given two strings $a_{1}$ , $a_{2}$ , return the largest substring present in both.

Solution

We’ll first solve the longest common suffix problem. Let $a_{1}, a_{2}$ be any strings with $a_{1} y, a_{2} y$ having the longest common suffix $y$ . Now, what is the longest common substring of $a_{1} y b_{1}, a_{2} y b_{2}$ ?

If $b_{1} = b_{2}$ , then $y b_{1} = y b_{2}$ is the longest common suffix.
Else, $y$ is the longest common substring, they have no common suffix. Let’s define the elements of our table.
two-dimensional table $T$ , $i$ is index of first string, $j$ index of second

T [i, j] = {T [i - 1, j - 1] + 1 0, if x_{i} = y_{j} else

Pseudocode

def lcsubstring(x, y):
	dp = [0; len(x) + 1][0; len(y) + 1]
	maximum = 0
	maxpos = (0, 0)
 
	for i in 1..(len(x) + 1):
		for j in 1..(len(y) + 1):
			if x[i-1] = y[j-1]:
				dp[i, j] = dp[i - 1, j - 1] + 1
			else
				dp[i, j] = 0
			if maximum < dp[i, j]:
				maximum = dp[i, j]
				maxpos = i, j

Considering the problem in terms of suffices is easier than in substrings. The table stores suffix lengths, and the longest suffix of a prefix is the longest substring.

Longest common subsequence

Take the example $x =$ “abcbdab”, $y$ = “bdcaba”. Subsequences are not contiguous, so for the example, subsequences “bcba” and “bdab” are valid. Let $X = x_{1}, .., x_{m}$ , $Y = y_{1}, .., y_{n}$ , $Z = z_{1}, .., z_{k}$ where $Z$ is the longest common subsequence of $X$ and $Y$ . Then if $x_{m} = y_{n}$ , $z_{k} = x_{m} = y_{n}$ and $z_{1}, .., z_{k - 1}$ is the longest common subsequence. If $x_{m} \neq = y_{n}$ , then $Z_{k} \neq = X_{m}$ or $Z_{k} \neq = Y_{n}$ and $Z$ is still the longest common subsequence.

Solution

Let’s define our recurrence $T [i, j]$ as the longest common subsequence of $x_{1}, .., x_{i}$ and $y_{1}, .., y_{j}$ . If either of $i$ or $j$ is zero, we want zero because there is no possible subsequence. If two letters are equal, we want to add that character so we take the LCS of both strings without that character and add one for that character. Otherwise, the LCS is the max of the LCSs with one less character.

T [i, j] = ⎩ ⎨ ⎧ 0 T [i - 1, j - 1] + 1 ma x (T [i - 1, j], T [i, j - 1]) if i = 0 or j = 0 if x_{i} = y_{j} i f x_{i} \neq = y_{j}

We take the max since if both letters are different, it may increase a previous LS in one, but not both.

Pseudocode

def lcs(x, y):
	dp = [0; len(x)+1][0; len(y)+1]
 
	for i in 1..(len(x) + 1):
		for j in 1..(len(y) + 1):
			if x[i] = y[j]:
				dp[i, j] = dp[i-1, j-1] + 1
			else:
				dp[i, j] = dp[i, j-1]

The runtime here is $n \times m$ because of the table, where $n = ∣ x ∣$ and $m = ∣ y ∣$ .

Longest palindromic subsequence

Suppose you have as input one string $x = a_{1}, .., a_{n}$ and we want to find the longest palindromic subsequence. This is the same as $l cs (a, a . re v erse ())$ .

Solution

We can construct the dp array $d p [n] [n]$ with $d p [i] [j]$ = LPS from $a_{i}, .., a_{j}$ . We have the base cases where empty strings and single chars are palindromic, so $\forall i d p [i] [i] = 1$ . If $a_{i}, .., a_{j}$ has the fact that $a_{i} = a_{j}$ , then it could be the ends of a palindrome, depending on $a_{i} (a_{i + 1}, .., a_{j - 1}) a_{j}$ , so we obtain the recurrence:

T [i, j] = {2 + T [i + 1, j - 1] ma x (T [i + 1, j], T [i, j - 1]) if x_{i} = x_{j} i f x_{i} \neq = x_{j}

Pseudocode

def lps(x):
	dp = [0; |x|][0; |x|]
 
	for i in 1..|x|:
		dp[i][i] = 1
 
	for s in range 1..|x|:
		for i in range |x| - s:
			j = i+s
			if x[i] = x[j]:
				dp[i][j] = 2 + dp[i+1][j-1]
			else:
				dp[i][j] = max(dp[i+1][j], dp[i][j-1])

The runtime is $O (n^{2})$ because it’s a two-dimensional $n \times n$ table.

Raquent.in

Table of Contents

Longest sequences

Longest common substring

Solution

Pseudocode

Longest common subsequence

Solution

Pseudocode

Longest palindromic subsequence

Solution

Pseudocode

Graph View