General and affine gap penalties
Lecturer: Alexei Drummond

Global alignment using general gap scores

  • We can also easily modify the Needleman–Wunsch algorithm to compute optimal alignments with general gap scoring.
  • Let $s(x_i , y_i)$ be the substitution score for characters $x_i$ and $y_i$.
  • If $x_1, x_2, \dots , x_n$ and $y_1, y_2, \dots, y_m$ are two strings then we can compute the best alignment $F(n,m)$ using:
\begin{align*} F(i,j) = \max\left\{\begin{array}{l} F(i-1,j-1) + s(x_i,y_j)\\ F(i,j-k) + \gamma(k), \; k= 1,\dots,j\\ F(i-k,j) + \gamma(k), \; k= 1,\dots,i \end{array}\right. \end{align*}
$$F(k,0) = F(0,k) = \gamma(k)$$

But, the running time is now cubic: $\Theta(nm(n + m))$ !

Global alignment using affine gap scores

Affine gap scores are a specific gap score function: $\gamma(k) = -d - (k-1)e$

Global alignment using affine gap scores

  • Let $M(i,j)$ be the best score under the constraint that $x_i$ aligned to $y_j$.
  • Let $I_x(i, j)$ be the best score with $x_i$ aligned to a gap.
  • Let $I_y(i, j)$ be the best score with $y_j$ aligned to a gap.

We get a quadradic-time algorithm by introducing a constant factor more subproblems.

i.e, 3 tables of size $O(mn)$ $0 \le i \le n$ and $0 \le j \le m$.

Global alignment using affine gap scores

An efficient set of recursions to compute affine gap scores for strings $x_1,x_2,\dots,x_n$ and $y_1,y_2,\dots,y_m$ is now $\max\{M(n,m),I_x(n,m),I_y(n,m)\}$ where:

\begin{align*} M(i,j) = \max\left\{\begin{array}{l} M(i-1,j-1) + s(x_i,y_j)\\ I_x(i-1,j-1) + s(x_i,y_j)\\ I_y(i-1,j-1) + s(x_i,y_j) \end{array}\right. \end{align*}
\begin{align*} I_x(i,j) = \max\left\{\begin{array}{l} M(i-1,j) - d\\ I_x(i-1,j) - e \end{array}\right. \end{align*}
\begin{align*} I_y(i,j) = \max\left\{\begin{array}{l} M(i,j-1) - d\\ I_y(i,j-1) - e \end{array}\right. \end{align*}
$$I_x(k,0) = I_y(0,k) = \gamma(k) = -d - (k-1)e,\;\; k>0$$