Dear all, A few days ago, I posted the following query: "I often use the Partition Homogeneity Test (= ILD) in PAUP* to test data homogeneity for pylogenetic analyses, but as the number of taxa increases, this rapidly becomes impossible to use. This is related to the use of parsimony in this test, even if the principle of PHT should allow the use of other (and faster to compute) optimality criteria. Unfortunately, PAUP requires the use of parsimony in PHT. Is anyone aware of another (good) method to test data homogeneity, with an associate software or PAUP routine?" Many people seem to have the same problem. Here are the putative solutions I received: - See attachment [Zelwer M and Daubin V; 2004. Detecting phylogenetic incongruence using BioNJ: an improvement of the ILD test. Mol Phyl Evol 33: 687-693]. Rob Cruickshank - The test was made for and justified based on the parsimony criterion, of course. To make it tractable for a large dataset, you can use an abbreviated heuristic search to speed up the analysis of the random partitions if necessary (i.e., limit the number of trees swapped using nchuck command, for instance, or use a smaller number of repetitions using the nrep=5 command, or even using the parsimony ratchet, though implementing that might be a chore). This app[roach risks not finding the very best tree for each random replicate, which would have the effect of increasing the variance of your p-value from its "true" value that you would find using exact searches. However, this approach shouldn't introduce a strong bias that would drive the p-value strongly up or down, because the failure to find the best tree would be equally likely to affect the partitioned and unpartitioned length estimates for each replicate, giving the ILD for each replicate a more-or-less equal chance of increasing or decreasing. You should not use an abbreviated search for your test partition, however, since the accuracy of this length difference is critical. You want this estimate to be as precise as possible. As for switching to other criteria, the ILD concept could be adapted to likelihood, but this would be much more time consuming, so it wouldn't help you. Huelsenbeck and Bull have their likelihood-based nonparametric bootstrap for incongruence, but it is far more computationally intensive than the parsimony ILD. I don't know of a clear justification for using an ILD-like test in a distance context, and I would be very wary of such an approach. The ILD measures conflict between and within data subsets, which has a direct relationship to length difference. I don't know if we can assume that differences in total branch lengths (for minimum evolution criterion) or least-squares-fit or other such distance measures should be distributed in a similar way under the null hypothesis of no incongruence, which is what is required for the ILD test to be valid. Joe Thornton - Did you try Winclada + Nona (http://www.cladistics.com/)? These programs are faster than PAUP. Sophie Quérouil [My translation] - 1. I don't know of a faster optimality criterion than parsimony. If you're thinking of neighbor-joining, that method doesn't have an optimality criterion, and therefore the test can't be performed on it. I suppose you could construct a neighbor-joining tree and then evaluate the topology under some optimality criterion, either least squares or parsimony. Parsimony would still be the fastest to compute. 2. Are you aware of recent literature showing that the ILD isn't a reliable test? In particular, I'm thinking of Barker, F. K., and F. M. Lutzoni. 2002. Spurious rejection of phylogenetic congruence by the ILD test: A simulation study. Syst. Biol. 51:625-637. But there are other similar papers. John Harshman - The attached paper by Waddell, Kishino and Ota [2000. Rapid Evaluation of the Phylogenetic Congruence of Sequence Data Using Likelihood Ratio Tests. Mol Biol Evol 17(12): 1988-1992] describes a homogeneity test that can use RELL. RELL is, in this sort of application, very fast. The required parts are available in PAUP (the site likelihoods or probabilities of data patterns) but an R script would be best to put it all together. Peter Waddell I sincerely thank all those who answered (with or without proposal!). Yves Yves Desdevises Laboratoire Arago, Université Pierre et Marie Curie UMR CNRS 7628 : Modèles en biologie cellulaire et évolutive BP 44, 66651 Banyuls-sur-Mer Cedex, France http://www.obs-banyuls.fr Tél. : (33) (0)4 68 88 73 13 / (33) (0)6 17 27 17 97 Fax : (33) (0)4 68 88 73 98 Email : desdevises@obs-banyuls.fr Web : http://desdevises.free.fr