Thanks to everyone for their rapid response to my question. As requested
by quite a few people, here are the responses.
Regards,
Tim
----- Original Message -----
>
> Does anyone know of software for testing difference in allelic
> richness between two samples, preferably with a permutation test that
> resamples individuals as opposed to populations within sample (ie
> FSTAT)?
>
>
Tim, I don't have software or know of anything particular, though there
is probably something in the Ecological arena, where they have the
"species richness" analogue to contend with. There are, as far as I can
see, two aspects to the problem. The first is whether what we see in two
samples is comparable or really different. The second is how well the
two measures capture what is really there. The second problem is
probably the more challenging and interesting,
but the first is fairly easy to attack. Imagine a pair of populations of
sizes N1 and N2, respectively, preferably but not necessarily the same.
In the first population, allelic richness is K alleles, with counts of
n11, n12, . . . , n1K, summing to N1. In the second population, we see L
alleles, with counts of n21, n22, . . . , n2L, summing to N2. Now, the
question is whether K and L are really different, or whether the
apparent difference is just a consequence of drawing two samples, of
sizes N1 and N2 from the same "pool".
If we take the view that "the data are the data, and anything else is
extrapolation," then we might be inclined to conduct a resampling
exercise, using only the data at hand, randomly shuffling the N = N1 +
N2 alleles between the two populations, without replacement, and in
numbers N1 and N2 allocated to the two pseudo-populations, on the null
premise that both populations are drawing from the same pool. We simply
count K and L from the two (constructed) pseudo-populations, or perhaps
(K/N1) and (L/N2), if N1 and N2 are very differeent, and compute the
difference (or some derivative translation of that difference), and
tally it. We then shuffle again, record the criterion of interest, and
do it again. With (say) 999 random trials, to which we add the observed
data outcome (on the premise that it too is the result of a random draw
from a common pool), we have constructed a null distribution of our
chosen criterion, against which to compare the actual result. The whole
thing is conditioned on the data we actually have, and never mind what
we have not seen.
I suspect, on the basis of accumulated experience, that the common
alleles will be recovered in both populations, and that the precise
allocation of the rarer alleles will be unpredictable. Typical sampling
stuff. You may find a program that does this, but my guess would be that
you could set it up as an Excel routine. Many years ago, the Michigan
team published a paper on thousands of genetic samples from Hiroshima
and Nagasaki,
Neel JV, Satoh C, Smouse PE, Asakawa J, Takahashi N, Goriki K, Fujita M,
Kageoka T and Hazama R. 1988. Protein variants in Hiroshima and
Nagasaki: Tales of two cities. Am. J. Hum. Genet. 43:870-893,
the point of which was not "allelic richness" per se, but the samples
were so large that they were instructive about the sampling aspects of
the problem. The common alleles were shared, but the really rare
alleles, say those in frequencies < 0.001, were virtually unique to the
two cities. Of course, with our more typical sample sizes of (say)
50-100 individuals, we would never have seen these alleles at all. When
we subsampled, the relatively common alleles survived, but the really
rare alleles (p < 0.001) and most of the unusual alleles (say p < 0.01)
dropped out of the sample. Typical rarifaction result.
Several years later, working with colleagues on mtDNA haplotypes from
marine fish, we encountered a similar problem, but there the sample
sizes were (for single populations) on the order of N = 100 per
population, and the total (N1 + N2 + . . . + N17) was on the order of
1000 or so. There, haplotypic richness was at least part of the issue,
Brown BL, Epifanio JM, Smouse PE and Kobak CJ. 1996. Temporal stability
of mtDNA haplotype frequencies in American shad stocks: to pool or not
to pool across years? Can. J. Fish. Aquat. Sci. 53:2274-2283,
and we discovered that by doubling the total sample size, we added large
numbers of rare haplotypes, the majority of which had not been seen in
the previous sample. The common haplotypes were recovered again, and in
roughly the same frequencies. Some of the uncommon haplotypes, seen
perhaps once or twice in the first sample, showed up again (once or
twice) in the second sample.
Today, this is well-traveled territory. There are standard measures of
expected allelic richness, with increasing sample size, but they usually
require some way to determine the parametric frequencies of the
alleles/species not (yet) seen in the finite sample we have. With
genetic markers, Ewens and colleagues have devised ways to work out the
expected form of the allelic spectrum, against which to compare
observations, and (with a little bit of theoretical faith) predicting
what we do not actually see, but which is almost surely out there.
The way you posed the question suggests that you are interested (at this
point) in the 1st problem, but one could take the view that we should be
sampling from the only partially described allelic spectrum
distribution, and sampling with replacement (bootstrapping) from that
distribution. We would then be taking the view that we might have
sampled other alleles than those actually recovered.
Let me know what you hear back on this one. I am also interested, and
I'm sure I'm not the only one. ? Peter
Peter Smouse [Smouse@AESOP.Rutgers.edu]
I would suggest you to work on a little program with R if your do not
find an already made software. The function sample() is very easy to use
and will allow you to make resampling and estimation of allelic richness
after.
Sophie Gerber [gerber@pierroton.inra.fr]
As far as we know there is no software developed for such test. However,
some authors have developed programs to do those permutations for their own
use:
- Degen, B., R. Streiff and B. Ziegenhagen (1999). "Comparative study of
genetic variation and differentiation of two pedunculate oak (Quercus
robur) stands using microsatellite and allozyme loci." Heredity 83(5):
597-603.
- Glaubitz, J. C., J. C. Murrell and G. F. Moran (2003). "Effects of
native forest regeneration practices on genetic diversity in Eucalyptus
consideniana." Theoretical and Applied Genetics 107(3): 422-431.
- Glaubitz, J. C., H. X. Wu and G. F. Moran (2003). "Impacts of
silviculture on genetic diversity in the native forest species
Eucalyptus sieberi." Conservation Genetics 4(3): 275-287.
We have done our own programe for two of our studies (articles currently
submitted and in preparation). We could send you the source code in
Fortran and then you would need to play with it to adjust for your
sample sizes and compile it.
Also, you might want to consider other options for testing genetic
diversity. Many studies on bottlenecks use Wilcoxon tests or even
t-tests, that you can done with any statistical package:
- Spencer, C. C., J. E. Neigel and P. L. Leberg (2000). "Experimental
evaluation of the usefulness of microsatellite DNA for detecting
demographic bottlenecks." Molecular Ecology 9(10): 1517-1528.
And for a discussion on different tests for allelic richness you might
want to read:
- Kalinowski, S. T. (2004). "Counting alleles with rarefaction: private
alleles and hierarchical sampling designs." Conservation Genetics 5(4):
539-543.
We hope this information is useful to you. Please do not hesitate in
contact us for further discussion or if you are interested in our
program. Finally, we are interested in this topic so we would be
grateful if you could forward us other answers you get.
Concetta Burgarella, PhD
cr1burco@uco.es
&
Miguel de Navascués, PhD
m.navascues@gmail.com
I've attached a couple of papers by a friend of mine that may be of
interest. Here is a link to the associated software:
http://www.montana.edu/kalinowski/kalinowski_software.htm
James Rhydderch [James.Rhydderch@noaa.gov]
Hi Tim, By now I'm sure others have told you the same thing, but in
case not, check out HP_Rare by steve kalinowski.
Devon Pearse [Devon.Pearse@noaa.gov]
So it sounds like you're talking about running F statistics with a
permutation test? If I understand your question correctly, you can do
that with Jerome Goudet's 'hierfstat' package:
http://cran.r-project.org/src/contrib/Descriptions/hierfstat.html
'hierfstat' is a generalized setup, which means you can mould your F
statistic analysis and permutation tests to any number of hierarchical
levels. I seem to recall that this includes an option for resampling at
the individual level.
The package runs on the R statistical software system, so you'll have to
familiarize yourself with that: http://www.r-project.org/
Neither programme is the easiest to use, but they have more than enough
documentation to make them quite approachable. They're also free, so you
certainly can't complain about that....
***
Dr. Murray Cox
Arizona Research Laboratories - Biotechnology
1041 East Lowell Street, University of Arizona
Bioscience West, Room 246B
Tucson, AZ 85721, USA
Tel: (520) 621-9791
URL: www.u.arizona.edu/~mpcox/
***
I don't know of specific software, but I do know of a work around. I
have used the Montecarlo procedure in PopTools (an excel addin) to do
something very similar in the past. Find it here:
http://www.cse.csiro.au/poptools/
Chester Sands [cjsan@bas.ac.uk]
You may check SPAGEDI, by Hardy & Vekemans, it has many analyses and
permutations at the individual level
Although conceived to perform autocorrelation analyses, you can also
define groups so it can permute individuals among groups
PLEASE NOTE NEW E-MAIL AND UPDATE YOUR ADDRESS BOOK: xturon@ub.edu
Xavier Turon
Dept. of Animal Biology (Invertebrates)
Fac. of Biology
Univ. of Barcelona
645, Diagonal Ave
08028 Barcelona
e-mail: xturon@ub.edu
phone: 34-93-4021441
fax: 34-93-4035740
Tim Jones