Mahesh Panchal School of Biological Sciences,
University of Reading,
Whiteknights,
P.O. Box 228,
Reading RG6 6AJ,
UK.

My Research

Nested Clade Phylogeographic Analysis

Nested Clade Phylogeographic Analysis (NCPA) is a widely used research tool to make explicit inferences about the history and structure of a species. The author of the method, Prof. Alan Templeton, first described the method in 1995 applying it to data on the Tiger Salamander. Over the years it has grown in popularity and been known by various names, such as Nested Clade Analysis (NCA), Nested Cladistic Analysis (NCA), and Nested Geographic Clade Analysis (NGCA). Recently the method has come under critism from several authors. One of the problems of this method is that it does not allow for explicit calculation of error in its results. Most assessments of NCPA have been empirical, the largest of which has been published by Templeton in April 2004. There is however a need for simulations in order to test the method against recovering a known history. Knowles and Maddison published results in 2002 from a set of small scale simulations on NCPA, however flaws were highlighted in the methodolgy, meaning a successful set of simulations have yet to be published.

This is where my research kicks in. One of the problems with the Knowles and Maddison simulations was that it was on a small scale. This was because the task of conducting NCPA many times was extreamly labourious, due to the lack of automation for the complete analysis. An aim of my research is to automate NCPA as much as possible so that simulations can be carried out on a much larger scale. The task of automating NCPA has highlighted several problems that need to be addressed, and solutions for them have been put into place in order to produce the software (Panchal and Beaumont, Evolution 61(6):1466-1480).

Approximate Bayesian Computation

This is no longer part of my research, however I think it is a good method to work with. It is a model based alternative for NCPA. The ABC framework is much more flexible than NCPA and so can be used to test other hypotheses as well.

The idea behind the ABC framework is to have a model from which you want to estimate parameters (which can also be various demographic models too). By assigning prior distributions to these various parameters you then simulate data by selecting point values from these priors and using them to simulate data under the desired model. You then chose a set of summary statistics that capture the information about the model (e.g. number of segregating sites, number of haplotypes, and so on), and calculate these summary statistics on the data generated. You then repeat the process of randomly selecting parameter values from the prior to generate more datasets. The number of datasets needed depends on the number of summary statistics. Once you have a sufficient number of datasets you then calculate the same set of summary statistics on your real data (the one you wish to investigate). By using some sort of rejection/regression method, you can then select which datasets are closer to the real dataset by comparing summary statistics. Using the datasets that are ``closest'' to the real data set, you can then use the parameters associated with each dataset to build posterior distributions of the parameters.

Although this process is simple in theory, some aspects of this process still need a lot of work. For example which summary statistics are appropriate, and how many are needed to capture the information required? Is there a better way to selecting the ``closest'' datasets than a rejection or regression approach? There are also other questions to be answered, however the method does allow you to esimate the error present, and provides a way to analyse datasets quickly and effectively.