Birth-death tree priors
Lecturer: Alexei Drummond

The Coalescent Approximation Revisited

  • Probability of distinct parents among $k$ sampled lineages in previous generation is:
    \begin{align*} 1-p_{c}(k) &= \left(1-\frac{1}{N}\right)\left(1-\frac{2}{N}\right)\ldots\left(1-\frac{k-1}{N}\right)\\ & = 1 - \frac{1}{N}\sum_{j=1}^{k-1}j + O(N^{-2}) \end{align*}
  • Assumption: $k\ll N$ so that $p_c(k)\simeq \binom{k}{2}/N$. (No multifurcation.)
  • Probability of coalescence after $m$ generations:
    $$P(m)=(1-p_c(k))^m p_c(k)$$
  • Since $N\gg k$, $p_c(k)\ll 1$ and thus $P(m)\simeq e^{-mp_c(k)}p_c(k)$, and
    $$P(t)=e^{-(t/g)p_c(k)}p_c{(k)}\frac{dm}{dt}=e^{-t\binom{k}{2}/Ng}\binom{k}{2}\frac{1}{Ng}$$

The linear birth-death-sampling process

  • Generalization of Yule model for speciation to include extinction and sampling by Tanja Stadler (2009).
  • Also applied to populations, in particular measurably evolving pathogens.

Equivalent to a population evoving under the following (forward-time) reactions: \begin{align*} X & \overset{\lambda}{\longrightarrow} 2X\\ X & \overset{\mu}{\longrightarrow} 0 \end{align*} In addition, a linear sampling process $\psi$ probabilistically generates samples, but does not otherwise affect the population. (No implicit removal on sampling!)

The reconstructed birth-death-sampling tree

  • Left-hand tree is full tree, right-hand tree is equivalent "reconstructed" tree.
  • How can we compute the probability of such a tree under the BDS process?

Flavour of derivation

Let $p_0(t)$ be the probability that an individual alive at time $t$ has no sampled descendents. Then: \begin{align*} p_0(t+\Delta t) \simeq & p_0(t)(1-\Delta (\lambda + \mu + \psi))\\ & + \Delta \mu + \Delta\lambda p_0(t)^2 \end{align*} and so $$\dot{p}_0(t) = -(\lambda + \psi + \mu)p_0(t) + \mu - \lambda p_0(t)^2$$

Let $g_e(t)$ be the probability that the sampled tree below time $t$ on edge $e$ evolved as observed. Then: $$\dot{g}_e(t) = -(\lambda + \psi + \mu)g_e(t) + 2\lambda p_0(t)g_e(t)$$ where

$$g_e(s)=\left\{\begin{array}{ll} \lambda g_{e_1}(s_1)g_{e_2}(s_2) & \text{ if $e$ has two sampled desc.}\\ \psi g_{e_1}(s_1) & \text{ if $e$ has one sampled desc.}\\ \psi p_0(s_1) & \text{ if $e$ has no sampled desc.} \end{array}\right.$$

Comparison with the coalescent

  • Birth-death trees comparable to coalescent trees under exponential growth.
  • Coalescent approximation breaks down when population sizes are small compared to ancestral lineage count.
  • Figure on right shows CDFs for coalescent times under different models including coalescent (blue) and birth-death model (black).

Non-linear birth-death tree priors

  • If complete population trajectory (sequence of birth/death events) is known, probability of tree is easy:
Here $p_c=1/\binom{N}{2}$ and $p_{nc}=1-\binom{k}{2}/\binom{N}{2}$.
  • Use particle filtering to simulate conditioned trajectories, estimate tree probability.

Non-linear birth-death tree priors

Using this in MCMC lets us jointly infer the tree and the nonlinear birth-death trajectory:

Summary

  • Birth-death processes (and their corresponding branching processes) can be used to describe the generative process behind tree formation.
  • In certain limits they yield the coalescent distribution for the sampled genealogy.
  • Stadler et al. have shown that they can be used directly and without approximation to derive model-based tree priors that:
    • properly describe the effects of stochastic fluctuations in small populations, and
    • explicitly take sampling processes into consideration.
  • Extension to nonlinear birth-death process descriptions requires numerical methods such as particle filtering or numerical integration of master equations.

Further reading