Last lecture the multispecies coalescent nested gene trees inside a species tree. This lecture asks where the species tree itself comes from, and what its shape records about speciation and extinction.
About 300,000 flowering-plant species share the planet with a few hundred horsetails. Comparable imbalance appears across the tree of life.
Two processes set the balance: the rate at which lineages split (speciation) and the rate at which they die out (extinction).
This lecture builds the model that links those rates to the shape of a phylogeny, and asks how much of the history we can read back out.
Backward: the coalescent
Start from sampled lineages in the present and merge them back to common ancestors. This is the natural view within a population, where we sample the survivors.
Forward: the birth-death process
Start from one lineage in the past. It splits (a birth, speciation) and it dies (a death, extinction). Run the process forward and a tree of species grows.
Both describe the same kind of object, a rooted tree, from opposite ends of time. For species, where extinction matters, the forward birth-death view is the more natural starting point.
Every lineage splits independently at a constant rate $\lambda$. With more lineages, more splits happen, so diversity grows geometrically: $E[\,n(t)\,]=e^{\lambda t}$. Individual histories scatter around that expectation.
Choose a speciation rate. Pure birth (no extinction); a simulated realisation against the deterministic expectation, log scale.
Each lineage now also goes extinct at a constant rate $\mu$. Two combinations summarise the process:
A lineage that goes extinct leaves no living descendants, so it is absent from a tree built only from species alive today.
What we estimate is the reconstructed tree: the full birth-death tree with every extinct branch pruned away. This is the exact macroevolutionary parallel of the coalescent, where we only see the survivors of a population.
The signal of extinction is therefore indirect. It must be read from the spacing of the branching events that remain.
A lineage-through-time plot counts lineages in the reconstructed tree as it grows toward the present, here scaled to the final diversity. Raise $\varepsilon$ and watch the curve bend upward near the present.
All curves share the same net rate $r$, so the same deep slope. Extinction steepens only the recent past: the pull of the present. Recent lineages have had little time to die, so survivors over-represent them.
Fitting a birth-death model to a dated tree of living species returns $\lambda$ and $\mu$, and so $r$ and $\varepsilon$. The net rate $r$ is recovered well; teasing $\lambda$ and $\mu$ apart is harder, since the data carry little direct information about extinction.
A constant-rate model is the baseline. The biologically interesting questions are where and when rates change:
These connect directly to the macroevolution and adaptive-radiation lectures later in the course.
Fossils are direct observations of lineages that, in most cases, have gone extinct. They carry exactly the information the reconstructed tree of living species lacks.
The fossilised birth-death process adds a third rate, the recovery of fossil samples through time, and treats living tips, fossils, and their ages within one generative model.
With morphological and molecular data together, this supports total-evidence dating: estimating the timing of the tree and its rates from extinct and living species jointly.
The birth-death process is the prior on the species tree. The multispecies coalescent and the substitution model then build the data on top of it. One nested generative story runs from speciation to sequence.
Bayesian inference runs this story in reverse: from observed sequences, back through gene trees, to the species tree and the speciation and extinction rates that shaped it.