I recently sent out a message asking whether bootstrapping across individuals or loci is appropriate for gauging the confidence in relationships between populations, but I see from some of the responses that my question could have been posed better. I thank everyone who replied so far, but I'd like to restate the question. I am working in a population genetics framework and have genotyped many individuals for each of several populations for a dozen microsatellite loci. I know which pairwise populations are significantly differentiated as determined through the non-parametric permutation approach of Excoffier et al. (1992). My initial query really had to do with constructing a population dendrogram showing the relationships between populations. The program POPULATIONS (Olivier Langella) offers bootstrapping across individuals or loci using 15 different differentiation metrics. My question, then, is which of these two options is most appropriate (or even if both are statistically valid). To reiterate, I feel reticent about bootstrapping across individuals because they are not statistically independent (they are related). Please read the former message (below) for more ruminations. Thanks. Joseph. (josephwb@umich.edu) Initial query: I am wondering which bootstrap technique (individuals versus loci) is most appropriate for gauging confidence in the genetic differentiation between populations. I am reticent about bootstrapping over individuals as to me they do not appear to be independent (i.e. they are related). I would like to know which method is most appropriate for my needs (determining confidence in the relationships between populations), and also perhaps examples of where you would use each method. My ruminations to date: I suppose both are valid (or else they wouldn't both be available in software by people much more learned than I), but it will depend on the question you ask. Bootstrapping across individuals (which I am not entirely cool with since data points are not independent - they are related, at least to some degree, which will, to me anyway, will necessarily bias estimates in favour of differentiation) would enable you to gauge confidence in how well your samples support differentiation between populations. Taken this way, you assume that the distribution of genetic variation in your samples is representative of the populations as a whole. In contrast, bootstrapping across loci gauges how well the genome supports differentiation. Here, obviously, you assume that your loci are representative of the distribution of differentiation between populations across the genome as a whole. A major potential problem here is that a handful of loci may be unlikely to accurately portray this distribution (but clearly this is an implicit problem with every population genetics study under the sun). The efficiency of bootstrapping when sample sizes are small (i.e. a handful of loci) may be pertinent here, though I am not up on the bootstrapping literature. You can imagine a locus that is quite unrepresentative may bias your results (e.g. 2 populations have a particular allele in high frequency, due solely to homoplasy). In that case, going the individual-route might be more appropriate as the real-variance is better approximated with the much larger sample size of individuals. At the other extreme, if you know your samples are good (for example, you have sampled all individuals), then bootstrapping loci would seem more appropriate to me as the variance due to sampling individuals is zero. I guess the thing to do is bootstrap whichever sample distribution (individuals or loci) you think best approximates the variance of the respective real distributions (population or genome). It may very well be that both distributions are equally well approximated, and would therefore deliver similar bootstrap proportions. Long story short, the two tests seem complementary. Despite reservations, I think I would report bootstrap proportions across individuals. I would defend this by saying that 1) I assume my microsatellites are representative because they are putatively randomly distributed across the genome, or 2) the results are true given the data used (which happens to also be the only data available), and are open to debate with new data. Any input on this problem would be greatly appreciated. Thanks. Joseph. (josephwb@umich.edu) Joseph W. Brown     Graduate Student, Mindell Lab Department of Ecology and Evolutionary Biology 3015 Ruthven Museums Building Museum of Zoology, Bird Division University of Michigan, Ann Arbor 48109-1079 Email: josephwb@umich.edu Fax: (734) 763-4080 Homepage: http://www.ummz.lsa.umich.edu/students/josephwb/index.htm Biology 162 Laboratory: http://www-personal.umich.edu/~josephwb/Biol162/Lab.html Queen's Conservation Genetics Group: http://biology.queensu.ca/~cgg