Dear all,
A few days ago, I posted the following query:
"I often use the Partition Homogeneity Test (= ILD) in PAUP* to test
data homogeneity for pylogenetic analyses, but as the number of taxa
increases, this rapidly becomes impossible to use. This is related to
the use of parsimony in this test, even if the principle of PHT should
allow the use of other (and faster to compute) optimality criteria.
Unfortunately, PAUP requires the use of parsimony in PHT.
Is anyone aware of another (good) method to test data homogeneity, with
an associate software or PAUP routine?"
Many people seem to have the same problem. Here are the putative
solutions I received:
- See attachment [Zelwer M and Daubin V; 2004. Detecting phylogenetic
incongruence using BioNJ: an improvement of the ILD test. Mol Phyl Evol
33: 687-693].
Rob Cruickshank
- The test was made for and justified based on the parsimony criterion,
of
course. To make it tractable for a large dataset, you can use an
abbreviated heuristic search to speed up the analysis of the random
partitions if necessary (i.e., limit the number of trees swapped using
nchuck command, for instance, or use a smaller number of repetitions
using
the nrep=5 command, or even using the parsimony ratchet, though
implementing that might be a chore). This app[roach risks not finding
the
very best tree for each random replicate, which would have the effect of
increasing the variance of your p-value from its "true" value that you
would find using exact searches. However, this approach shouldn't
introduce a strong bias that would drive the p-value strongly up or
down,
because the failure to find the best tree would be equally likely to
affect the partitioned and unpartitioned length estimates for each
replicate, giving the ILD for each replicate a more-or-less equal chance
of increasing or decreasing.
You should not use an abbreviated search for your test partition,
however,
since the accuracy of this length difference is critical. You want this
estimate to be as precise as possible.
As for switching to other criteria, the ILD concept could be adapted to
likelihood, but this would be much more time consuming, so it wouldn't
help you. Huelsenbeck and Bull have their likelihood-based
nonparametric
bootstrap for incongruence, but it is far more computationally intensive
than the parsimony ILD. I don't know of a clear justification for using
an ILD-like test in a distance context, and I would be very wary of such
an approach. The ILD measures conflict between and within data subsets,
which has a direct relationship to length difference. I don't know if
we
can assume that differences in total branch lengths (for minimum
evolution
criterion) or least-squares-fit or other such distance measures should
be
distributed in a similar way under the null hypothesis of no
incongruence,
which is what is required for the ILD test to be valid.
Joe Thornton
- Did you try Winclada + Nona (http://www.cladistics.com/)? These
programs are faster than PAUP.
Sophie Quérouil
[My translation]
- 1. I don't know of a faster optimality criterion than parsimony. If
you're thinking of neighbor-joining, that method doesn't have an
optimality criterion, and therefore the test can't be performed on it.
I suppose you could construct a neighbor-joining tree and then evaluate
the topology under some optimality criterion, either least squares or
parsimony. Parsimony would still be the fastest to compute.
2. Are you aware of recent literature showing that the ILD isn't a
reliable test? In particular, I'm thinking of Barker, F. K., and F. M.
Lutzoni. 2002. Spurious rejection of phylogenetic congruence by the ILD
test: A simulation study. Syst. Biol. 51:625-637. But there are other
similar papers.
John Harshman
- The attached paper by Waddell, Kishino and Ota [2000. Rapid
Evaluation of the Phylogenetic Congruence of Sequence Data Using
Likelihood Ratio Tests. Mol Biol Evol 17(12): 1988-1992] describes a
homogeneity test that can use RELL. RELL is, in this sort of
application, very fast. The required parts are available in PAUP (the
site likelihoods or probabilities of data patterns) but an R script
would be best to put it all together.
Peter Waddell
I sincerely thank all those who answered (with or without proposal!).
Yves
Yves Desdevises
Laboratoire Arago, Université Pierre et Marie Curie
UMR CNRS 7628 : Modèles en biologie cellulaire et évolutive
BP 44, 66651 Banyuls-sur-Mer Cedex, France
http://www.obs-banyuls.fr
Tél. : (33) (0)4 68 88 73 13 / (33) (0)6 17 27 17 97
Fax : (33) (0)4 68 88 73 98
Email : desdevises@obs-banyuls.fr
Web : http://desdevises.free.fr