Given sequences $X^{(1)},\ldots,X^{(N)}$ of lengths $n_1,\ldots,n_N$, seek $A^{(1)},\ldots,A^{(N)}$ of length $n\geq\max\{n_i\}$ such that
A-CTCAT A-GTC-T ACGTC-T
Thought to be the most biologically appropriate, but
Sum of pairs is almost always used in practice.
Define $F(i_1,i_2,\ldots,i_N)$ to be the score of the best alignment up to the subsequences ending in $x^{(1)}_{i_1}, x^{(2)}_{i_2}, \ldots, x^{(N)}_{i_N}$.
We can then find the following recurrence relation:
Even aside from the time cost, the space requirement for this algorithm is prohibitive. Storing $F$ for an alignment of 5$\times$100 character sequences requires $101^5\simeq 10^{10}$ numbers. Assuming 32 bit integers, this equates to $\sim$39 Gb of memory!
A big improvement, but still impractical for most alignment problems.
Really need approximation methods.
Different techniques:
Decisions:
Distance matrix:
\begin{equation*} d = \begin{array}{cccc} & A & B & C & D \\ A & - & 4 & 8 & 8 \\ B & & - & 8 & 8 \\ C & & & - & 6 \\ D & & & & - \end{array} \end{equation*}Distance matrix:
\begin{equation*} d = \begin{array}{cccc} & A & B & C & D \\ A & - & 4 & 8 & 8 \\ B & & - & 8 & 8 \\ C & & & - & 6 \\ D & & & & - \end{array} \end{equation*}Distance matrix:
\begin{equation*} d = \begin{array}{cccc} & E & C & D \\ E & - & 8 & 8 \\ C & & - & 6 \\ D & & & - \end{array} \end{equation*}Distance matrix:
\begin{equation*} d = \begin{array}{cccc} & E & C & D \\ E & - & 8 & 8 \\ C & & - & 6 \\ D & & & - \end{array} \end{equation*}Distance matrix:
\begin{equation*} d = \begin{array}{cccc} & E & F \\ E & - & 8 \\ F & & - \end{array} \end{equation*}Distance matrix:
\begin{equation*} d = \begin{array}{cccc} & E & F \\ E & - & 8 \\ F & & - \end{array} \end{equation*}Progressive alignment algorithm published in 1987 (Feng and Doolittle, J. Mol Evol).
Overview:
Employs many other heuristics.
I.e. "hill climbing". Slightly change solution to improve score. Converge to local optimum.
E.g. Barton-Sternberg (1987) multiple alignment:
Is there a better alternative?