🏠
Introduction - Developing Darwin's Computer
Lecturer: Alexei Drummond

Darwin's Computer

Detail from the only illustration in the Origin of Species (Darwin, 1859)

https://www.evogeneao.com/en/explore/tree-of-life-explorer

Descent with modification in populations

15 possible trees of 4 individuals/species

105 possible trees of 5 individuals/species

945 possible trees of 6 individuals/species

How many trees are there?

$n$#trees
415enumerable by hand
5105enumerable by hand on a rainy day
6945enumerable by computer
710395still searchable very quickly on computer
8135135about the number of hairs on your head
92027025greater than the population of Auckland
1034459425$\approx$ upper limit for exhaustive search
20$8.2 \times 10^{21}$$\approx$ upper limit of branch-and-bound searching
48$3.21 \times 10^{70}$$\approx$ the number of particles in the Universe
136$2.11 \times 10^{267}$number of trees to choose from in the “Out of Africa” data (Vigilant et al. 1991)

Computing Darwin's Tree of Life

Bayesian reconstruction of phylogenetic trees

Yang & Rannala (1997), Mau, Newton & Larget (1998)

What we really want is the probability of the tree given the data.

We can compute that from the likelihood using Bayes Theorem:

This is known as the Posterior probability of the tree. Using the Markov chain Monte Carlo algorithm we can produce a sample of trees that have high posterior probability.

The posterior distribution on Darwin's Finches

This posterior probability distribution was computed using Markov chain Monte Carlo implemented in the BEAST software package (Drummond et al, 2012).

The posterior distribution of larger trees

Produced using DensiTree (Bouckaert, 2012)

Elaborations of the phylogenetic model

Basic model: (posterior proportional to likelihood × prior)

$$p(T|D) \propto \Pr(D|T) \; p(T)$$

Substitution model:

$$p(T,\color{green}{Q}|D) \propto \Pr(D|T,\color{green}{Q}) \; p(T) \; \color{green}{p(Q)}$$

Substitution model and tree branching process prior:

$$p(T,\color{green}{Q},\color{blue}{\Theta}|D) \propto \Pr(D|T,\color{green}{Q}) \; p(T|\color{blue}{\Theta}) \; \color{green}{p(Q)} \; \color{blue}{p(\Theta)}$$

Elaborations of the model: Evolution is happening right now!

Modelling phylogenetic data sampled through time

Drummond and Rodrigo (2000), Drummond et al (2002)

  • Rapidly evolving microbes
  • Ancient DNA
  • Cancer
  • Somatic evolution
  • Languages
  • et cetera
$$p(T,\color{green}{Q},\color{blue}{\Theta},\color{red}{\mu}|D) \propto \Pr\{D|\color{red}{\mu} T,\color{green}{Q}\} \; p(T|\color{blue}{\Theta}) \; \color{green}{p(Q)} \; \color{blue}{p(\Theta)} \; \color{red}{p(\mu)}$$

A calibrated phylogenetic inference

Origin of the HIV epidemic in the Americas, Gilbert et al (2007)

A phylogenetic reconstruction of samples of HIV-1 virus. Each tip represents a single infected individual from whom a blood sample has been taken.

Estimation of

  • evolutionary trees
  • divergence times
  • evolutionary rates
  • population history
  • infectious disease parameters
  • migration rates
  • and more…

Also: Drummond & Rambaut, 2007; Drummond et al., 2012; Bouckaert et al., 2014; Suchard et al., 2018; Baele et al., Nat Methods, 2025

>42,000 scientific studies have cited the six BEAST papers,
>3,200 last year alone (2025)

Phylodynamics

Statistical phylogenetics and
population genetics models

Mathematical epidemiology,
non-linear dynamical models

Goal: the integration of immunological, genomic and epidemiological data in a single coherent predictive model.

Coalescent theory: $p(g|\Theta)$

$\Theta$ $g$
$$p(g,\color{green}{Q},\color{blue}{\Theta}|D) \propto \Pr\{D|\mu g,\color{green}{Q}\} \; p(g|\color{blue}{\Theta}) \; \color{green}{p(Q)} \; \color{blue}{p(\Theta)}$$

Trees inside trees

  • Modelling diversity and divergence
  • Central to understanding diverse areas including speciation genomics and host-symbiont co-evolution is building up a statistically rigorous model of the patterns of divergence and diversity spanning multiple species/hosts, multiple genes / gene families and multiple individuals per species/host.
$$p(T,\color{purple}{S}|D) \propto \Pr\{D|T\} \; p(T|\color{purple}{S}) \; \color{purple}{p(S)}$$
$$p(t_1,t_2,\ldots,t_N,\color{purple}{S}|D) \propto \prod_i \Pr\{D_i|t_i\} \; p(t_i|\color{purple}{S}) \; \color{purple}{p(S)}$$

Phylogenies and the fossil record

An example of node dating with BEAST from Subramanian et al (Biology Letters, 2013)

Final Perspectives

  • Evolutionary biology has become a multidisciplinary analytical science, with major input from computer scientists, statisticians, mathematicians and physicists.
  • Evolutionary biology is not just an historical science. Rapidly evolving natural systems, low-cost high-throughput sequencing and high-throughput automated experimental evolution platforms, all add up to the potential to close the loop between experimental and theoretical evolutionary biology.
  • A common set of evolutionary modelling principles can inform us on diverse questions including human population history, the spread of viral epidemics and the age of fossils.

Recommended Reading

Decoding Genomes (Stadler et al., 2024)
  • Chapter 1: Introduction