Thanks to everyone for their rapid response to my question. As requested by quite a few people, here are the responses. Regards, Tim ----- Original Message ----- > > Does anyone know of software for testing difference in allelic > richness between two samples, preferably with a permutation test that > resamples individuals as opposed to populations within sample (ie > FSTAT)? > > Tim, I don't have software or know of anything particular, though there is probably something in the Ecological arena, where they have the "species richness" analogue to contend with. There are, as far as I can see, two aspects to the problem. The first is whether what we see in two samples is comparable or really different. The second is how well the two measures capture what is really there. The second problem is probably the more challenging and interesting, but the first is fairly easy to attack. Imagine a pair of populations of sizes N1 and N2, respectively, preferably but not necessarily the same. In the first population, allelic richness is K alleles, with counts of n11, n12, . . . , n1K, summing to N1. In the second population, we see L alleles, with counts of n21, n22, . . . , n2L, summing to N2. Now, the question is whether K and L are really different, or whether the apparent difference is just a consequence of drawing two samples, of sizes N1 and N2 from the same "pool". If we take the view that "the data are the data, and anything else is extrapolation," then we might be inclined to conduct a resampling exercise, using only the data at hand, randomly shuffling the N = N1 + N2 alleles between the two populations, without replacement, and in numbers N1 and N2 allocated to the two pseudo-populations, on the null premise that both populations are drawing from the same pool. We simply count K and L from the two (constructed) pseudo-populations, or perhaps (K/N1) and (L/N2), if N1 and N2 are very differeent, and compute the difference (or some derivative translation of that difference), and tally it. We then shuffle again, record the criterion of interest, and do it again. With (say) 999 random trials, to which we add the observed data outcome (on the premise that it too is the result of a random draw from a common pool), we have constructed a null distribution of our chosen criterion, against which to compare the actual result. The whole thing is conditioned on the data we actually have, and never mind what we have not seen. I suspect, on the basis of accumulated experience, that the common alleles will be recovered in both populations, and that the precise allocation of the rarer alleles will be unpredictable. Typical sampling stuff. You may find a program that does this, but my guess would be that you could set it up as an Excel routine. Many years ago, the Michigan team published a paper on thousands of genetic samples from Hiroshima and Nagasaki, Neel JV, Satoh C, Smouse PE, Asakawa J, Takahashi N, Goriki K, Fujita M, Kageoka T and Hazama R. 1988. Protein variants in Hiroshima and Nagasaki: Tales of two cities. Am. J. Hum. Genet. 43:870-893, the point of which was not "allelic richness" per se, but the samples were so large that they were instructive about the sampling aspects of the problem. The common alleles were shared, but the really rare alleles, say those in frequencies < 0.001, were virtually unique to the two cities. Of course, with our more typical sample sizes of (say) 50-100 individuals, we would never have seen these alleles at all. When we subsampled, the relatively common alleles survived, but the really rare alleles (p < 0.001) and most of the unusual alleles (say p < 0.01) dropped out of the sample. Typical rarifaction result. Several years later, working with colleagues on mtDNA haplotypes from marine fish, we encountered a similar problem, but there the sample sizes were (for single populations) on the order of N = 100 per population, and the total (N1 + N2 + . . . + N17) was on the order of 1000 or so. There, haplotypic richness was at least part of the issue, Brown BL, Epifanio JM, Smouse PE and Kobak CJ. 1996. Temporal stability of mtDNA haplotype frequencies in American shad stocks: to pool or not to pool across years? Can. J. Fish. Aquat. Sci. 53:2274-2283, and we discovered that by doubling the total sample size, we added large numbers of rare haplotypes, the majority of which had not been seen in the previous sample. The common haplotypes were recovered again, and in roughly the same frequencies. Some of the uncommon haplotypes, seen perhaps once or twice in the first sample, showed up again (once or twice) in the second sample. Today, this is well-traveled territory. There are standard measures of expected allelic richness, with increasing sample size, but they usually require some way to determine the parametric frequencies of the alleles/species not (yet) seen in the finite sample we have. With genetic markers, Ewens and colleagues have devised ways to work out the expected form of the allelic spectrum, against which to compare observations, and (with a little bit of theoretical faith) predicting what we do not actually see, but which is almost surely out there. The way you posed the question suggests that you are interested (at this point) in the 1st problem, but one could take the view that we should be sampling from the only partially described allelic spectrum distribution, and sampling with replacement (bootstrapping) from that distribution. We would then be taking the view that we might have sampled other alleles than those actually recovered. Let me know what you hear back on this one. I am also interested, and I'm sure I'm not the only one. ? Peter Peter Smouse [Smouse@AESOP.Rutgers.edu] I would suggest you to work on a little program with R if your do not find an already made software. The function sample() is very easy to use and will allow you to make resampling and estimation of allelic richness after. Sophie Gerber [gerber@pierroton.inra.fr] As far as we know there is no software developed for such test. However, some authors have developed programs to do those permutations for their own use: - Degen, B., R. Streiff and B. Ziegenhagen (1999). "Comparative study of genetic variation and differentiation of two pedunculate oak (Quercus robur) stands using microsatellite and allozyme loci." Heredity 83(5): 597-603. - Glaubitz, J. C., J. C. Murrell and G. F. Moran (2003). "Effects of native forest regeneration practices on genetic diversity in Eucalyptus consideniana." Theoretical and Applied Genetics 107(3): 422-431. - Glaubitz, J. C., H. X. Wu and G. F. Moran (2003). "Impacts of silviculture on genetic diversity in the native forest species Eucalyptus sieberi." Conservation Genetics 4(3): 275-287. We have done our own programe for two of our studies (articles currently submitted and in preparation). We could send you the source code in Fortran and then you would need to play with it to adjust for your sample sizes and compile it. Also, you might want to consider other options for testing genetic diversity. Many studies on bottlenecks use Wilcoxon tests or even t-tests, that you can done with any statistical package: - Spencer, C. C., J. E. Neigel and P. L. Leberg (2000). "Experimental evaluation of the usefulness of microsatellite DNA for detecting demographic bottlenecks." Molecular Ecology 9(10): 1517-1528. And for a discussion on different tests for allelic richness you might want to read: - Kalinowski, S. T. (2004). "Counting alleles with rarefaction: private alleles and hierarchical sampling designs." Conservation Genetics 5(4): 539-543. We hope this information is useful to you. Please do not hesitate in contact us for further discussion or if you are interested in our program. Finally, we are interested in this topic so we would be grateful if you could forward us other answers you get. Concetta Burgarella, PhD cr1burco@uco.es & Miguel de Navascués, PhD m.navascues@gmail.com I've attached a couple of papers by a friend of mine that may be of interest. Here is a link to the associated software: http://www.montana.edu/kalinowski/kalinowski_software.htm James Rhydderch [James.Rhydderch@noaa.gov] Hi Tim, By now I'm sure others have told you the same thing, but in case not, check out HP_Rare by steve kalinowski. Devon Pearse [Devon.Pearse@noaa.gov] So it sounds like you're talking about running F statistics with a permutation test? If I understand your question correctly, you can do that with Jerome Goudet's 'hierfstat' package: http://cran.r-project.org/src/contrib/Descriptions/hierfstat.html 'hierfstat' is a generalized setup, which means you can mould your F statistic analysis and permutation tests to any number of hierarchical levels. I seem to recall that this includes an option for resampling at the individual level. The package runs on the R statistical software system, so you'll have to familiarize yourself with that: http://www.r-project.org/ Neither programme is the easiest to use, but they have more than enough documentation to make them quite approachable. They're also free, so you certainly can't complain about that.... *** Dr. Murray Cox Arizona Research Laboratories - Biotechnology 1041 East Lowell Street, University of Arizona Bioscience West, Room 246B Tucson, AZ 85721, USA Tel: (520) 621-9791 URL: www.u.arizona.edu/~mpcox/ *** I don't know of specific software, but I do know of a work around. I have used the Montecarlo procedure in PopTools (an excel addin) to do something very similar in the past. Find it here: http://www.cse.csiro.au/poptools/ Chester Sands [cjsan@bas.ac.uk] You may check SPAGEDI, by Hardy & Vekemans, it has many analyses and permutations at the individual level Although conceived to perform autocorrelation analyses, you can also define groups so it can permute individuals among groups PLEASE NOTE NEW E-MAIL AND UPDATE YOUR ADDRESS BOOK: xturon@ub.edu Xavier Turon Dept. of Animal Biology (Invertebrates) Fac. of Biology Univ. of Barcelona 645, Diagonal Ave 08028 Barcelona e-mail: xturon@ub.edu phone: 34-93-4021441 fax: 34-93-4035740 Tim Jones