2. The Phylogenetic Posterior
Standard application of Bayes' theorem gives the posterior:
$$P(T,\mu,\theta|D) = \frac{P(D|T,\mu, \theta)P(T,\mu,\theta)}{P(D)}$$
Factorizing the Posterior
Most phylogenetic models make two key assumptions:
- $\theta$ only affects the probability of the data via the tree $T$
- $\mu$ has no effect on the tree branching process
These assumptions lead to the factorized form:
Phylogenetic Posterior
$$P(T,\mu,\theta|D) = \frac{1}{P(D)}P(D|T,\mu)P(T|\theta)P(\mu,\theta)$$
Where:
- $P(T|\theta)$ is the "tree prior" parameterized by $\theta$
- $P(\mu,\theta)$ are the parameter priors
- $P(D)$ is the marginal likelihood (evidence)
Questions to Consider
- What is $P(D)$ and why is it difficult to compute?
- Is the tree prior really a prior? (i.e., does it depend on the data?)
The Neutrality Assumption
The factorization above implicitly assumes that our alignment could have been produced through the following process:
Important: Separating the process of tree generation from that of sequence evolution implies neutrality. This is a fundamental assumption in most phylogenetic analyses.
3. Tree Priors
Tree priors allow us to specify a generative model for the genealogy. While this model may not involve genetic evolution, it may depend on:
- Speciation rates
- Population sizes
- Migration rates
- Extinction rates
- Sampling processes
Note: A huge fraction of phylogenetics inference research is focused on developing and efficiently implementing tree priors!
The Yule Model for Speciation
The simplest tree prior is the Yule model (Yule, 1924):
Yule Process
- Constant rate branching process
- Branching occurs uniformly across extant lineages at rate $\lambda$ per lineage per unit time
- Chemical kinetics formalism: $X \overset{\lambda}{\longrightarrow} 2X$
The number of lineages $k$ evolves under the master equation:
$$\dot{p}(k,t) = \lambda (k-1)p(k-1,t) - \lambda k p(k,t)$$
Birth-Death(-Sampling) Priors
A more realistic model accounts for both speciation and extinction:
Birth-Death Process
- Generalization of Yule process to account for extinction and sampling
- Linear speciation, extinction, and sampling rates
- Prior is over sampled trees: assumes unobserved full tree
- Groundwork largely due to Tanja Stadler (ETH Zürich)
Coalescent Priors
For modeling gene trees within populations:
Coalescent Process
- Original theory due to Kingman (1980)
- Models the shape of gene trees within a species
- Can be obtained as the limit of population genetic models (Wright-Fisher, Moran)
- Provides crucial link between tree shape and population size
Multi-species Coalescent Priors
Hierarchical models that embed gene trees within species trees:
- Species tree described using a birth-death prior
- Gene trees described by coalescent priors for populations which subdivide at speciation events
- Accounts for incomplete lineage sorting
4. MCMC in Tree Space
Basic MCMC Theory Review
MCMC works by simulating a stochastic trajectory according to:
$$p_{i+1}(x)=\int p_i(x')W(x|x')\mathrm{d}x'$$
Where the transition kernel is:
$$W(x'|x) = q(x'|x)\alpha(x'|x) + \left(1-\int q(x''|x)\alpha(x''|x)\mathrm{d}x''\right)\delta(x'-x)$$
Detailed Balance and Acceptance Probability
A sufficient condition for $\pi(x)$ to be the equilibrium distribution is detailed balance:
$$\pi(x')q(x|x')\alpha(x|x')=\pi(x)q(x'|x)\alpha(x'|x)$$
This is satisfied by the Metropolis-Hastings acceptance probability:
Metropolis-Hastings Acceptance
$$\alpha(x'|x)=\min\left[1, \frac{\pi(x')}{\pi(x)}\times\frac{q(x|x')}{q(x'|x)}\right]$$
Key Properties for Tree Space
- By tuning the acceptance probability, any probability distribution can be targeted
- Only the ratio $\pi(x')/\pi(x)$ appears: normalization need not be known
- The only requirement for $q(x'|x)$ is irreducibility: random walks must explore the entire state space
- For high-dimensional state spaces, $W(x'|x)$ is decomposed into several distinct operators: $W(x'|x)=\sum_j W_j(x'|x)$
- The ratio $q(x|x')/q(x'|x)$ is the Hastings ratio
5. Proposal Distributions for Trees
We need to identify a set of proposals $q_j(x'|x)$ when $x$ is a point in the space of rooted time trees.
Common Tree Operators
- Subtree Slide
- Move a subtree up or down along a branch, changing node heights
- Wilson-Balding
- Prune a subtree and regraft it elsewhere, sampling new node times
- Narrow/Wide Exchange
- Swap two subtrees that share a grandparent (narrow) or any two subtrees (wide)
- Node Height Changes
- Modify the height of internal nodes while maintaining time consistency
- Tree Scaling
- Scale all node heights by a common factor
Challenge: Each operator must maintain the constraint that parent nodes are older than their children (time consistency).
Hastings Ratios for Tree Operators
Computing the Hastings ratio for tree operators can be complex:
- Must account for changes in the number of possible attachment points
- Node height proposals may have different forward/reverse probabilities
- Some operators (e.g., uniform tree scaling) have trivial Hastings ratios
Example: Subtree Slide Hastings Ratio
For a subtree slide that moves node $v$ from time $t$ to $t'$:
- If heights are proposed uniformly in valid ranges
- Forward range: $(t_{\text{child}}, t_{\text{parent}})$
- Reverse range: $(t'_{\text{child}}, t'_{\text{parent}})$
- Hastings ratio: $\frac{t'_{\text{parent}} - t'_{\text{child}}}{t_{\text{parent}} - t_{\text{child}}}$
6. Convergence and Mixing
Stopping Criteria
Two critical questions for phylogenetic MCMC:
- How can we tell when the chain has reached equilibrium?
- How do we know when we've collected enough samples?
Effective Sample Size (ESS)
One approach is to compute ESS for each parameter and tree summary statistics:
- ESS estimates the number of independent samples
- Common statistics to monitor:
- Tree height (age of root)
- Tree length (sum of branch lengths)
- Model parameters (substitution rates, etc.)
- Rule of thumb: ESS > 200 for all parameters
Multiple Independent Runs
Best Practice: Always run multiple independent chains with different starting points. Convergence to the same distribution is necessary (but not sufficient) for equilibrium.
Assessing Tree Convergence
Trees present special challenges for convergence assessment:
- Cannot plot simple traces of "the tree"
- Need specialized statistics for tree space
- Common approaches:
- Compare split frequencies between runs
- Average standard deviation of split frequencies (ASDSF)
- Tree distance metrics between samples
AWTY (Are We There Yet?)
Software that applies multiple statistics to assess tree convergence:
Key diagnostics:
- Split frequency correlations between runs
- Cumulative split frequencies over time
- Tree space visualization using multidimensional scaling
7. Post-processing MCMC Output
Parameter Samples
Logs of individual parameters can be considered samples from marginal posteriors:
Common analyses:
- Marginal density estimation
- 95% credible intervals
- Mean/median estimates
- Correlation between parameters
Tree Samples
Summary Statistics
Extract specific information from sampled trees:
- Age of MRCA for clades of interest
- Posterior probabilities of clades
- Branch length distributions
- Tree balance statistics
Summary Trees
Different approaches to produce a single "summary" tree:
- Strict Consensus
- Include only clades that appear in ALL sampled trees
- Majority Rule Consensus
- Include clades that appear in >50% of sampled trees
- Maximum Clade Credibility (MCC) Tree
- The sampled topology for which the product of posterior clade probabilities is maximized
In BEAST: Summary trees are produced using the MCC tree method via the program TreeAnnotator.
Caution: No summary tree algorithm is perfect! They may be misleading when the posterior is multimodal. Consider reporting uncertainty using posterior probabilities.