Hi all,
I recently posted a question about incomplete lineage sorting and
hybridization and I will here forward all the relevant and excellent
answers (thanks a lot!!) i recieved. I was glad to see that this is a
topic that many of you are thinking about.
Original post:
Dear all,
Are there any methods that can specifically distinguish between
incomplete lineages sorting from intregressive hybridization for
sequence data. I'm studying two sister species that are well
separated in microsatellites, allozymes, morphology and ecology but
share all common haplotypes in one mitochondrial gene and two nuclear
introns (that also lack a deep divergence between any haplotypes).
MDiv gives maximum likelihood estimates of M (mNe) upp to 2 in some
populations and because of this divergence time estimates are
uncertain (non-zero ends on the posterior probability distributions).
Due to the high estimates of M we would like to argue that indeed the
sharing of haplotypes is mainly due to introgression. Nevertheless
reviewers to the manuscript argue that this still could be due to
incomplete lineage sorting and claim that test specifically designed
to distinguish between these two scenarios do exist (without
mentioning which). I haven't been able to fins such a test in the
literature so I ask for your help.
Thank you in advance.
-----
Answers:
-----
Dear Petri,
Your reviewer is wrong. The IM and Mdiv packages designed to
implement the models described by Nielsen and Wakely (2001) and by
Hey and Nielsen (2004) are specifically designed to address exactly
this question. Do you have non-zero tails at both ends of your TDIV
distribution? The right tail in your TDIV posterior will always be
non-zero for analyses of a single locus data, no matter what your
estimate of M is; this is shown in the original Nielsen and Wakeley
paper. If you analyzed several of your data sets together using IM
instead of MDIV, this non-zero tail in Tdiv MIGHT go away.
As far as demonstrating introgression rather than lineage sorting,
you could use the posterior distribution of M to calculate the
probability that it is different from some benchmark. In a paper I
published in Molecular Ecology (see refs below), I summed under the
curve of the posterior distribution to show that I could reject the
hypothesis that M was greater than or equal to 1; there is a
boundary condition at zero so I couldn't calculate the probability
that M was actually different than zero, but I was able to reject
panmixia . Depending on the shape of your posterior distribution you
MAY be able to reject the hypothesis that M is less than one.
If you look at the IM discussion board: http://groups.google.com/
group/Isolation-with-Migration , you can find a recent exchange I had
with Jody Hey about using this procedure to test demographic hypotheses:
http://groups.google.com/group/Isolation-with-Migration/browse_thread/
thread/64dd7f6187960c1f
Hey, J., and R. Nielsen. 2004. Multilocus Methods for Estimating
Population Sizes, Migration Rates and Divergence Time, With
Applications to the Divergence of Drosophila pseudoobscura and D.
persimilis. Genetics 167:747-760.
Nielsen, R., and J. Wakeley. 2001. Distinguishing migration from
isolation: a Markov Chain Monte Carlo approach. Genetics 158: 885-896.
Smith, C. I., and B. D. Farrell. 2005. Phylogeography of the longhorn
cactus beetle Moneilema appressum Leconte (Coleoptera: Cerambycidae):
Was the differentiation of the Madrean sky-islands driven by
Pleistocene climate changes? Mol Ecol 14:3049-3065.
(Petris comment: this is what I will first and foremost do for my
data! thanks a lot Chris!)
-----
Hi Petri,
There are a couple of recent papers that talk about identifying
introgression as inconsistent alleles from particular genes with
respect to (most) other genes. They do this using IM or IMa, a
program that will estimate divergence times and ancestral population
sizes under a coalescent model (which will account for incomplete
lineage sorting). The latter also includes a parameter for secondary
gene flow following divergence. The logic is that if a set of genes
or alleles gives a significantly more recent divergence time
estimation than the combined dataset, or the dataset without those
alleles, they are likely to have been introduced due to
introgression. Jody Hey, who co-authored these programs, has a
google discussion group as well. The papers are below. I'd be
interested in any other suggestions you get. Good luck!
Peters et al. 2007. Evolution 61(8): 1992-2006
Carling & Brumfield. 2008. Genetics 178: 363-377.
------
Hi Petri,
Check out my paper attached. The method has it's problems but might
be ok in some situations. Also, you should check out the models of
Wakeley and Hey and the IM program.
Cheers,
Thomas
-----
Dear Petri,
I think there is an important ambiguity that exists in the distinction
between lineage sorting and introgression you should think about.
Introgression is in fact part and parcel of the lineage sorting process
until lineage sorting has gone to completion. Subsequent introgression
would reintroduce previously lost allelic lineages into one or both gene
pools. This would, in effect, re-establish lineage sorting processes,
although I recognize this is not what you or others mean when trying
to draw
this distinction. The question is really between primary and
secondary
lineage sorting, where introgression would be the trigger for secondary
lineage sorting and merely a possible component of primary lineage
sorting.
I would essentially describe your current task as trying to determine
whether or not primary lineage sorting had ever gone to completion
for the
loci you are studying. It is, of course, irrelevant whether it did
or did
not go to completion for any other loci, and I would object as a
reviewer if
you tried to extend your conclusions about lineage sorting at these
loci to
the rest of the genome.
Cheers,
Guy
-----
Dear Petri,
There are attempts at disentangling these two processes leading to
reticulated species relationships, like that of Sang and Zhong (2000,
Syst. Biol.), based on expectances of coalescent times. However, these
methods are flawed and do not really allow to recognise hybridisation
from introgression. My conclusion to this dilemma was that this
distinction is really hard to make unless you call geography into play.
Hybridisation is expected to show a geographic pattern, stochastic
lineage sorting doesn't. Please, have a look to the paper Gómez-
Zurita and Vogler (2006, J. Mol. Evol. 62: 421-433) [I attached it to
my mail, but your server returned it to me because it exceded
attachment size limitations.]
Hope it helps.
Txus
-----
hi Petri
Ryan Garrick and I came up with an approach to this, in Garrick RC,
Sands CJ, Rowell DM, Hillis DM, Sunnucks P (2007). Catchments catch
all: long-term population history of a giant springtail from the
southeast Australian highlands a multigene approach. Molecular
Ecology, 16, 1865-1882.
I remember that Thomas R Buckley et al had a paper in 2006 in Syst
Biol that tried another attempt at this
Paul Sunnucks
-----
Petri Kemppainen,
PhD student
Tjärnö Marine Biological Laboratory,
45296, Strömstad, Sweden
Tel: +46 526 686 83
Fax: +46 526 686 07
Mob: +46 709 360 124
petri.kemppainen@marecol.gu.se