# Ancestral Reconstruction using Discrete Phylogeography

by Walter Xie, Remco Bouckaert, and Alexei Drummond

It guides you through a discrete phylogeography analysis of a H5N1 epidemic in South China. This analysis will use the model developed by Lemey et al., 2009 that implements ancestral reconstruction of discrete states (locations) in a Bayesian statistical framework, and employs the Bayesian stochastic search variable selection (BSSVS) to identify the most parsimonious description of the phylogeographic diffusion process.

The additional benefit using this model is that we can summarise the phylogeographic inferences from an analysis, and use a virtual globe software to visualize the spatial and temporal information.

The programs used in this tutorial are listed below.

## The NEXUS alignment

The data is in a file called h5n1.nex.
By clicking the name of the file, it will be opened on your web
browser, after which you can download the data by right-clicking on
the main window, “Save Page As”, and saving the file as `h5n1.nex`

in the desired folder.

The data is a subset of original dataset Wallace et al., 2007, and it consists of 43 influenza A H5N1 hemagglutinin and neuraminidase gene sequences isolated from a variety of hosts 1996 - 2005 across sample locations.

## Constructing the scripts in LPhy Studio

The software LPhy Studio is used to specify and visualise models as well as simulate data from models defined in LPhy scripts.

The `data`

block is used to input and store the data, which will be processed by the models defined later, and which also allows you to reuse the another dataset by simply replacing the current data.
In this block, we normally include the constants for models, the alignment loaded from a NEXUS file, and the meta data regarding to the information of taxa that we have known.

The `model`

block is to define and also describe your models and parameters
in the Bayesian phylogenetic analysis.
Therefore, your result could be easily reproduced by other researchers.

Please make sure the tab above the command console is set to `data`

,
when you intend to type or copy and paste the data block scripts into the console.
In addition, make sure to switch the tab to `model`

,
when you intend to type or copy and paste the model block scripts into the console.

When you write your LPhy scripts, please be aware that `data`

and `model`

have been reserved and cannot be used as the variable name.

The LPhy scripts to define this analysis is listed below.

## Code

data {

options = {ageDirection=“forward”, ageRegex=“_(\d+)$”};

D = readNexus(file=“data/H5N1.nex”, options=options);

L = D.nchar();

taxa = D.taxa();

D_trait = extractTrait(taxa=taxa, sep=“_”, i=2);

K = D_trait.canonicalStateCount();

dim = K*(K-1)/2;

}

model {

π ~ Dirichlet(conc=[2.0, 2.0, 2.0, 2.0]);

κ ~ LogNormal(meanlog=1.0, sdlog=1.25);

γ ~ LogNormal(meanlog=0.0, sdlog=2.0);

r ~ DiscretizeGamma(shape=γ, ncat=4, replicates=L);

Θ ~ LogNormal(meanlog=0.0, sdlog=1.0);

ψ ~ Coalescent(taxa=taxa, theta=Θ);

D ~ PhyloCTMC(Q=hky(kappa=κ, freq=π), mu=0.004, siteRates=r, tree=ψ);

π_trait ~ Dirichlet(conc=rep(element=3.0, times=K));

I ~ Bernoulli(minSuccesses=dim-2, p=0.5, replicates=dim);

R_trait ~ Dirichlet(conc=rep(element=1.0, times=dim));

Q_trait = generalTimeReversible(rates=select(x=R_trait, indicator=I), freq=π_trait);

μ_trait ~ LogNormal(meanlog=0, sdlog=1.25);

D_trait ~ PhyloCTMC(L=1, Q=Q_trait, dataType=“standard”, mu=μ_trait, tree=ψ);

}

## Graphical Model

For the details, please read the auto-generated narrative from LPhyStudio.

The code `D_trait = extractTrait(taxa=taxa, sep=“_”, i=2);`

in the data block uses the function
to extract the locations from the taxa names, and creates a trait alignment `D_trait`

to contain these locations mapped to each taxon.
Then the next line `K = D_trait.canonicalStateCount();`

count the number of unique locations
in the trait alignment.
Please note the method `canonicalStateCount()`

returns the number of canonical states **excluding** ambiguous states.

## Geographic Model

In this analysis, we have two parts mixed in the model section: the first part is modeling evolutionary history and demographic structure based on a nucleotide alignment, and the second part is defining how to sample the discrete states (locations) from the phylogeny $\psi$ shared with the 1st part.

For the nucleotide alignment, We fix a strict molecular clock to 0.004, to make the analysis converge a bit quicker.

The next is the geographic model. In the discrete phylogeography, the probability of transitioning to a new location through the time is computed by

\[P(t) = e^{\Lambda t}\]where $\Lambda$ is a $ K \times K $ infinitesimal rate matrix, and $K$ is the number of discrete locations (i.e. 5 locations here). Then,

\[\Lambda = \mu S \Pi\]where $\mu$ is an overall rate scalar, $S$ is a $ K \times K $ matrix of relative migration rates, and $\Pi = diag(\pi)$ where $\pi$ is the equilibrium trait frequencies. After the normalization, $\mu$ measures the number of migration events per unit time $t$. The detail is explained in Lemey et al., 2009.

So, assuming migration to be symmetric in this analysis, we define a vector variable `R_trait`

with the length of $ \frac{K \times (K-1)}{2} $ to store the off-diagonal entries of the unnormalised $S$. Another boolean vector `I`

with the same length determines which infinitesimal rates are zero,
which is performed by the function `select`

.
This implements the Bayesian stochastic search variable selection (BSSVS).

## Producing BEAST XML using LPhyBEAST

BEAST 2 reads instructions about the data and the model from a user-provided .xml, which can be produced in a variety of ways. Our goal with LPhy is to make the preparation of the .xml as painless, clear and precise as possible. In order to achieve that, we will use a companion application, LPhyBEAST, as a bridge between the LPhy script we typed above and the .xml.

LPhyBEAST is distributed as a BEAST 2 package,
we can use an application called `Package Manager`

, which is distributed with BEAST 2 together,
to install it.
To start LPhyBEAST, we have to use the script `lphybeast`

.
Some technical guides can help you to start.

In our h5n1.lphy script, the alignment file is assumed to locate under the folder `tutorials/data/`

.
So we need to go to the `tutorials`

folder, which is normally where the LPhy is installed,
run LPhyBEAST as below and check the end of message to find where is the generated XML.

Let us run LPhyBEAST now:

```
# BEAST_DIR="/Applications/BEAST2/"
$BEAST_DIR/bin/lphybeast -l 3000000 h5n1.lphy
```

## Running BEAST

After LPhyBEAST generates a BEAST 2 .xml file (e.g., h5n1.xml), we can point BEAST 2 to it, which will then start the inferential MCMC analysis.

BEAST 2 will write its outputs to disk into text files specified in the .xml file (specific paths can be passed in, but in their absence BEAST 2 will write the outputs in the same directory from where it was called).

BEAST 2 will also output the progress of the analysis and some summaries to the screen, like this:

```
BEAST v2.6.7, 2002-2020
Bayesian Evolutionary Analysis Sampling Trees
Designed and developed by
Remco Bouckaert, Alexei J. Drummond, Andrew Rambaut & Marc A. Suchard
Centre for Computational Evolution
University of Auckland
r.bouckaert@auckland.ac.nz
alexei@cs.auckland.ac.nz
Institute of Evolutionary Biology
University of Edinburgh
a.rambaut@ed.ac.uk
David Geffen School of Medicine
University of California, Los Angeles
msuchard@ucla.edu
Downloads, Help & Resources:
http://beast2.org/
Source code distributed under the GNU Lesser General Public License:
http://github.com/CompEvol/beast2
BEAST developers:
Alex Alekseyenko, Trevor Bedford, Erik Bloomquist, Joseph Heled,
Sebastian Hoehna, Denise Kuehnert, Philippe Lemey, Wai Lok Sibon Li,
Gerton Lunter, Sidney Markowitz, Vladimir Minin, Michael Defoin Platel,
Oliver Pybus, Tim Vaughan, Chieh-Hsi Wu, Walter Xie
Thanks to:
Roald Forsberg, Beth Shapiro and Korbinian Strimmer
Random number seed: 1659590653904
...
...
2850000 -5960.6708 -5834.2398 -126.4309 0.3211 0.1966 0.2226 0.2595 9.0333 0.3238 8.6857 0.1296 0.3150 0.1804 0.1649 0.2098 1 1 1 1 1 1 1 1 1 1 0.1884 0.0747 0.0422 0.1702 0.0805 0.0174 0.2373 0.0457 0.0291 0.1139 0.6191 1m16s/Msamples
3000000 -5957.8699 -5834.9067 -122.9631 0.3277 0.1965 0.2239 0.2517 7.6506 0.4290 5.9862 0.1501 0.3367 0.1051 0.2549 0.1529 1 1 1 1 1 1 1 1 1 1 0.1650 0.0716 0.0709 0.1302 0.3421 0.0130 0.0994 0.0303 0.0485 0.0286 0.2453 1m16s/Msamples
Operator Tuning #accept #reject Pr(m) Pr(acc|m)
BitFlipOperator(I.bitFlip) - 39782 90355 0.04340 0.30569
DeltaExchangeOperator(R_trait.deltaExchange) 0.57921 15922 104777 0.04031 0.13191
ScaleOperator(Theta.scale) 0.46729 7620 18512 0.00866 0.29160
ScaleOperator(gamma.scale) 0.40531 7005 18916 0.00866 0.27024
ScaleOperator(kappa.scale) 0.52022 6840 19112 0.00866 0.26356
ScaleOperator(mu_trait.scale) 0.27589 7746 18166 0.00866 0.29893
UpDownOperator(mu_traitUppsiDownOperator) 0.91311 40003 321915 0.12047 0.11053
DeltaExchangeOperator(pi.deltaExchange) 0.06532 9549 46452 0.01868 0.17051
DeltaExchangeOperator(pi_trait.deltaExchange) 0.54124 10870 57988 0.02285 0.15786
Exchange(psi.narrowExchange) - 60429 295287 0.11850 0.16988
ScaleOperator(psi.rootAgeScale) 0.77165 3697 22375 0.00866 0.14180
ScaleOperator(psi.scale) 0.90511 35312 319993 0.11850 0.09939 Try setting scaleFactor to about 0.951
SubtreeSlide(psi.subtreeSlide) 0.88122 70929 284431 0.11850 0.19960
Uniform(psi.uniform) - 149335 206383 0.11850 0.41981
Exchange(psi.wideExchange) - 1575 353485 0.11850 0.00444
WilsonBalding(psi.wilsonBalding) - 2051 353189 0.11850 0.00577
Tuning: The value of the operator's tuning parameter, or '-' if the operator can't be optimized.
#accept: The total number of times a proposal by this operator has been accepted.
#reject: The total number of times a proposal by this operator has been rejected.
Pr(m): The probability this operator is chosen in a step of the MCMC (i.e. the normalized weight).
Pr(acc|m): The acceptance probability (#accept as a fraction of the total proposals for this operator).
Total calculation time: 232.693 seconds
```

## Analysing the BEAST output

Run the program called *Tracer* to analyze the output of BEAST. When the
main window has opened, choose `Import Trace File...`

from the `File`

menu
and select the file that BEAST has created called `h5n1.log`

. You should now
see a window like in Figure 2.

Remember that MCMC is a stochastic algorithm so the actual numbers will
not be exactly the same.
On the left hand side is a list of the different quantities that BEAST has logged.
There are traces for the posterior (this is the log of the product of
the tree likelihood and the prior probabilities), and the continuous parameters.
Selecting a trace on the left brings up analyses for this trace on the right hand
side depending on tab that is selected. When first opened, the “posterior”” trace
is selected and various statistics of this trace are shown under the `Estimates`

tab. In the top right of the window is a table of calculated statistics for the
selected trace.

Tracer will plot a (marginal posterior) distribution for the selected parameter
and also give you statistics such as the mean and median. The **95% HPD lower**
or **upper** stands for *highest posterior density interval* and represents the most
compact interval on the selected parameter that contains 95% of the posterior
probability. It can be thought of as a Bayesian analog to a confidence interval.

## Obtaining an estimate of the phylogenetic tree

Run the program TreeAnnotator, and then choose 10% as the burn-in
percentage, while keeping “Maximum clade credibility tree” as the
“Target tree type”.
For “Node heights”, choose “Mean heights”.
Then load the tree log file that BEAST 2 generated (it will end in
“.trees” by default) as “Input Tree File”.
For this tutorial, the tree log file is called `h5n1_with_trait.trees`

.
Finally, for “Output File”, type `h5n1_with_trait.tree`

.

This setup will take the set of trees in the tree log file, and summarize it with a single maximum clade credibility (MCC) tree. The MCC tree is the tree that has the largest clade probability product across all nodes. Divergence times will reflect the mean ages of each node, and those times will be annotated with their 95% HPD intervals. TreeAnnotator will also display the posterior clade credibility of each node in the MCC tree.

More details on summarizing trees can be found in beast2.org/summarizing-posterior-trees/.

Note that TreeAnnotator only parses a tree log file into an output text file, but it will not allow you to visualize summary trees. Visualization has to be done with other programs (see next section).

## Distribution of root location

When you open the summary tree with locations `h5n1_with_trait.tree`

in a text editor,
and scroll to the most right and locate the end of the tree definition,
you can see the set of meta data for the root.
Looking for the last entries of `location.set`

and `location.set.prob`

,
you might find something like this:

```
location.set = {Guangdong,HongKong,Hunan,Guangxi,Fujian}
location.set.prob = {0.18656302054414214,0.6129927817878956,0.03220433092726263,0.1121599111604664,0.0560799555802332}
```

This means that we have the following distribution for the root location:

Location | Probability |
---|---|

Guangdong | 0.18656302054414214 |

HongKong | 0.6129927817878956 |

Hunan | 0.03220433092726263 |

Guangxi | 0.1121599111604664 |

Fujian | 0.0560799555802332 |

This distribution shows that the 95% HPD consists of all locations except Hunan, with a strong indication that HongKong might be the root with over 58% probability. It is quite typical that a lot of locations are part of the 95% HPD in discrete phylogeography.

## Viewing the Location Tree

We can visualise the tree in a program called *FigTree*.
Run this program, and open the summary tree file `h5n1_with_trait.tree`

by using the `Open`

command in the `File`

menu.
The tree should appear. You can now try selecting some of the options in the control panel on the left.
Try selecting `Appearance`

to set the branch `Colour by`

`location`

.
In addition, you can set the branch `Width by`

`location.prob`

according to the posterior support of estimated locations. Increasing the `Line Weight`

can make the branch width more different regarding to its posterior support. Finally, tick `Legend`

and select `location`

in the drop list of `Attribute`

.
You should end up with something like Figure 4.

Alternatively, you can load the posterior tree set `h5n1_with_trait.trees`

(note this is NOT the summary tree, but the complete set) into *DensiTree* and set it up as follows.

- Click
`Show`

to choose`Root Canal`

tree to guide the eye. - Click
`Grid`

to choose`Full grid`

option, type year 2005 in the`Origin`

text field and tick`Reverse`

to show the correct time scale. You also can reduce the`Digits`

to 0 which will rounding years in the x-axis (i.g. 2005, instead of 2005.22). - Go to
`Line Color`

, you can colour branches by`location`

. The final image look like Figure 5.

## The number of estimated transitions

Sometime, we want to visualise how the location states are changed through the phylogeny.
`StateTransitionCounter`

can count the number of branches in a tree or a set of trees that have a certain state at the parent and another at the node.

So, install the `Babel`

package and run the `StateTransitionCounter`

through BEAST application launcher.
The command line below will generate the output file `stc.out`

containing all counts from the logged posterior trees `h5n1_with_trait.trees`

,
after removing 10% burn-in.
Please make sure you install the latest version (Babel >= v0.3.2, BEASTLabs >= v1.9.6).

```
$BEAST2_PATH/bin/applauncher StateTransitionCounter -burnin 10 -in h5n1_with_trait.trees -tag location -out stc.out
```

Next, we need to use R to plot the histogram given the summary in `stc.out`

.
If you have a problem to generate it, you can download a prepared file stc.out.
You also need to download a script PlotTransitions.R,
which contains the functions to parse the file and plot the histograms.
The script requires to install R package `ggplot2`

and `tidyverse`

.

Run the following scripts in R:

```
#setwd(YOUR_WD)
source("PlotTransitions.R")
stc <- parseTransCount(input="stc.out", pattern = "Histogram", target="Hunan")
# only => Hunan
p <- plotTransCount(stc$hist[grepl("=>Hunan", stc$hist[["Transition"]]),])
ggsave( paste0("transition-distribution-hunan.png"), p, width = 6, height = 4)
```

The `stc`

contains a statistical summary of the estimated transition counts related to the target location,
here is `Hunan`

, and a table of the actual counts.
To plot a simple graph, we only pick up the transitions to Hunan in the next command,
and then save the graph a PNG file. The counts are normalised into probabilities.

The x-axis presents the number of estimated transitions in all migration events from one particular location to another which is separated by “=>”, and y-axis is the probability which is normalised from the total counts. As you can see, “Guangxi=>Hunan” (blue) has higher probability than other migration events. This type of visualisation will help you to quantify the uncertainty how the disease (H5N1) spread from/to the location(s) of interest, which is simulated from your model and given the data.

Because the posterior trees in this analysis are scaled to time, also known as “time tree”, we can convert this graph into daily transitions. More details and visualisations can be seen from Douglas et. al. 2020.

## Post processing geography

Start the application `spread`

, which can be used to analyze and visualize phylogeographic reconstructions resulting from Bayesian inference of spatio-temporal diffusion Bielejec et al., 2011.

Select the `Open`

button in the panel under `Load tree file`

,
and select the summary tree file `h5n1_with_trait.tree`

.
Change the `State attribute name`

to the name of the trait,

which is `location`

in this analysis.
Click the `Setup`

button. A dialog pops up where you can edit altitude and
longitude for the locations. Alternatively, you can load it from a tab-delimited
file. A file `locationCoordinates_H5N1.txt`

is prepared in Spread website.

Tip: to find latitude and longitude of locations, you can use Google maps,
switch on photo’s and select a photo at the location of the map. Click the
photo, then click `Show in Panoramio`

and a new page opens that contains the
locations where the photo was taken. An alternative is to use `Google Earth`

, and
point the mouse to the location. Google-Earth shows latitude and longitude of
the mouse location at the bottom of the screen.

Now, open the `Output`

tab in the panel on the left hand side. Here, you
can choose where to save the KML file (default `output.kml`

).
Select the `generate`

button to generate the KML file, and a world map
appears with the tree superimposed onto the area where the rabies epidemic
occurred.

If you have a problem to generate KML file, you can download a prepared output.kml.

The KML file can be read into `Google Earth`

. Then, the spread of the epidemic
can be animated through time. The coloured areas represent the 95% HPD
regions of the locations of the internal nodes of the summary tree.

## Programs used in this tutorial

The following software will be used in this tutorial:

- LPhy Studio - this software will specify and visualise models as well as simulate data from models defined in LPhy scripts. It is available for download from LPhy releases.
- LPhy BEAST - this software will construct an input file for BEAST. The installation guide and usage can be found from here.
- BEAST - this package contains the BEAST program, BEAUti, DensiTree, TreeAnnotator and other utility programs. This tutorial is written for BEAST v2.6.7 or higher version, which has support for multiple partitions. It is available for download from http://www.beast2.org.
- BEAST labs package - containing some generally useful stuff used by other packages.
- BEAST feast package - this is a small BEAST 2 package which contains additions to the core functionality.
- Tracer - this program is used to explore the output of BEAST (and other Bayesian MCMC programs). It graphically and quantitatively summarises the distributions of continuous parameters and provides diagnostic information. At the time of writing, the current version is v1.7.2. It is available for download from http://beast.community/tracer.
- FigTree - this is an application for displaying and printing molecular phylogenies, in particular those obtained using BEAST. At the time of writing, the current version is v1.4.3. It is available for download from http://beast.community/figtree.
- BEAST classic package - Phylogeography is a part of the BEAST-CLASSIC package. BEAST-CLASSIC requires the BEASTlabs package. You can install them from BEAST 2 package manager.
- Babel - A BEAST package containing tools for post-analysis. We will use
`StateTransitionCounter`

. - Spread - summarising the geographic spread in a KML file (available from http://www.kuleuven.ac.be/aidslab/phylogeography/SPREAD.html.
- Google-earth - displaying the KML file (just Google for it, if you have not already have it installed).

## Data

The number of replicates,*L*is the number of characters of alignment,

*D*. The alignment,

*D*is read from the Nexus file with a file name of "data/H5N1.nex" and

*options*. The

*options*are ageDirection="forward" and ageRegex="_(\d+)$". The

*taxa*is the list of taxa of alignment,

*D*. The int,

*K*is the number of states of alignment,

*D*. The alignment,

_{trait}*D*extracts the trait from

_{trait}*taxa*, with a sep of "_" and an i of 2. The object,

*dim*comes from the K*(K-1)/2.

## Model

The alignment,*D*is assumed to have evolved under a phylogenetic continuous time Markov process (Felsenstein; 1981) on phylogenetic time tree,

*ψ*, with a molecular clock rate of 0.004, an instantaneous rate matrix and siteRates,

*r*. An instantaneous rate matrix is the HKY model (Hasegawa

*et al*; 1985) with transition bias parameter,

*κ*and base frequency vector,

*π*. The base frequency vector,

*π*have a Dirichlet distribution prior with a concentration of [2.0, 2.0, 2.0, 2.0]. The transition bias parameter,

*κ*has a log-normal prior with a mean in log space of 1.0 and a standard deviation in log space of 1.25. The double,

*r*is assumed to come from a DiscretizeGamma with shape,

_{i}*γ*and a ncat of 4, for i in 0 to L - 1. The shape,

*γ*has a log-normal prior with a mean in log space of 0.0 and a standard deviation in log space of 2.0. The time tree,

*ψ*is assumed to come from a Kingman's coalescent tree prior (Kingman; 1982) with coalescent parameter,

*Θ*and

*taxa*. The coalescent parameter,

*Θ*has a log-normal prior with a mean in log space of 0.0 and a standard deviation in log space of 1.0. The alignment,

*D*is assumed to have evolved under a phylogenetic continuous time Markov process (Felsenstein; 1981) on phylogenetic time tree,

_{trait}*ψ*, with molecular clock rate,

*μ*, instantaneous rate matrix,

_{trait}*Q*, a length of 1 and a dataType of "standard". The instantaneous rate matrix,

_{trait}*Q*is assumed to come from the generalTimeReversible with rates and freq,

_{trait}*π*. The freq,

_{trait}*π*have a Dirichlet distribution prior with a concentration. A concentration is assumed to come from the rep with an element of 3.0 and times,

_{trait}*K*. The number is assumed to come from the select with x,

*R_trait*and indicator,

_{i}*I*. The indicator,

_{0}*I*is assumed to come from a Bernoulli with a p of 0.5, replicates,

*dim*and minSuccesses. The minSuccesses is calculated by dim-2. The x,

*R*have a Dirichlet distribution prior with a concentration. A concentration is assumed to come from the rep with an element of 1.0 and times,

_{trait}*dim*. The molecular clock rate,

*μ*has a log-normal prior with a mean in log space of 0 and a standard deviation in log space of 1.25.

_{trait}## Posterior

$$ \begin{split} P(\boldsymbol{\pi}, \kappa, \boldsymbol{r}, \gamma, \boldsymbol{\psi}, \Theta, \boldsymbol{\pi\textbf{}_{trait}}, \boldsymbol{I}, \boldsymbol{\textbf{R}_{trait}}, \mu\textrm{}_{trait} | \boldsymbol{D}, \boldsymbol{\textbf{D}_{trait}}) \propto &P(\boldsymbol{D} | \boldsymbol{\psi}, Q, \boldsymbol{r})P(\boldsymbol{\pi})P(\kappa)\\& \prod_{i=0}^{L - 1}P(\textrm{r}_i | \gamma)P(\gamma)P(\boldsymbol{\psi} | \Theta)\\& P(\Theta)P(\boldsymbol{\textbf{D}_{trait}} | \boldsymbol{\psi}, \mu\textrm{}_{trait}, \boldsymbol{\textbf{Q}_{trait}})\\& P(\boldsymbol{\pi\textbf{}_{trait}})P(\boldsymbol{I})P(\boldsymbol{\textbf{R}_{trait}})\\& P(\mu\textrm{}_{trait})\end{split} $$## Useful Links

- LinguaPhylo: https://linguaphylo.github.io
- BEAST 2 website and documentation: http://www.beast2.org/
- Join the BEAST user discussion: http://groups.google.com/group/beast-users

## References

- Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of molecular evolution, 17(6), 368-376. https://doi.org/10.1007/BF01734359
- Hasegawa, M., Kishino, H. & Yano, T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22, 160–174 (1985) https://doi.org/10.1007/BF02101694
- Kingman JFC. The Coalescent. Stochastic Processes and their Applications 13, 235–248 (1982) https://doi.org/10.1016/0304-4149(82)90011-4
- Lemey, P., Rambaut, A., Drummond, A. J. and Suchard, M. A. (2009). Bayesian phylogeography finds its roots. PLoS Comput Biol 5, e1000520.
- Wallace, R., HoDac, H., Lathrop, R. and Fitch, W. (2007). A statistical phylogeography of influenza A H5N1. Proceedings of the National Academy of Sciences 104, 4473.
- Bielejec F., Rambaut A., Suchard M.A & Lemey P. (2011). SPREAD: Spatial Phylogenetic Reconstruction of Evolutionary Dynamics. Bioinformatics, 27(20):2910-2912. doi:10.1093.
- Douglas, J., Mendes, F. K., Bouckaert, R., Xie, D., Jimenez-Silva, C. L., Swanepoel, C., … & Drummond, A. J. (2020). Phylodynamics reveals the role of human travel and contact tracing in controlling COVID-19 in four island nations. doi: https://doi.org/10.1101/2020.08.04.20168518