Here are the responses to my recent enquiry about how best to estimate the
magnitude of selection coefficients from longitudinal data on allele frequency.
My original query is given below.
Firstly, many thanks to all who responded.
I make no comment on the individual responses but would note that we intend to
track the spread of insecticide resistance in large geographic regions; the
presumed intense selection and (unfortunate) large mosquito population size
makes me inclined to ignore the effects of drift.
*****************original question*****************
I want to estimate selection coefficients from a longitudinal series of diploid
genotype frequencies. The gene in question encodes insecticide resistance so
selection is relatively intense.
I intend to use max. likelihood to fit initial allele frequency, selection
coefficient and dominance (i.e. P(0), s and h in standard terminology). Given
these three parameters we can predict genotype frequencies at any given time
point and use the multinomial distribution to get LL of obtaining the observed
number of the three genotypes.
So I have two questions
(1) Is this a sensible way of doing it, or are there hidden pitfalls
and/or is there a better way?
(2) Has anyone done it before and published the method so we can cite
it? It seems like it should be a standard type of analysis but we
haven't been able to track a previous one down yet.
(3) Better still, is there a public access programme that we can
download? If not, we'll write our own and make it available.
Comment from myself (you always think of something straight after you've sent
the message)
Why check fit to the three genotype frequencies? I'll track allele frequency so
why not count the alleles in the three genotypes at each time point and then
use the binomial to get LL from the predicted allele frequency. This also
avoids having to assume the genotypes are in H-W which may be problematic if
selection has already taken place (because, by definition, selection will cause
deviations from HW)
A sensible procedure, but with one trap. Remember to condition on your
observations, that is, update the initial allele frequency every time you make
an observation, and evaluate s and h based on each interval of observation.
A. G. Clark used this procedure around 1980 in publications in Genetics and
Heredity, as far as I remember.
/Freddy
Freddy Bugge Christiansen,
Bioinformatics Research Center (BiRC),
University of Aarhus,
C.F. Møllers Alle, Bldg. 1110,
DK-8000 Århus C.
Dear Ian Hastings,
Regarding your question on Evoldir, could you possibly post the most interesting
answers?
I am working at IGC, Portugal in Evolutionary Genetics and I will have data on
experimentally evolved populations for which I'll need the same kind of
estimations.
Best regards
Ivo Chelo
Ivo M. Chelo, PhD.
Evolutionary Genetics
IGC Instituto Gulbenkian de Ciência
Lisbon, Portugal
Dear Ian,
I am currently working on a very similar issue, and I would be glad to know if
you got some feedback on your question. I have a piece of code in R to estimate
p(0) and the relative fitness of 3 possible genotypes using maximum likelihood,
and it seems to work (although it is very experimental). The algorithm relies on
the assumption that the population size is infinite (no drift), so that all
stochasticity comes from sampling. Accounting both for drift and sampling
requires much more complex stats, including random effects etc., but it is not
impossible (in theory).
In case you are comfortable with R, let me know if you are interested in beta
testing my code. Otherwise, I am also curious about existing software that can
do the job.
Cheers,
Arnaud Le Rouzic
CEES
Dept. of Biology >, P.O. Box 1066 Blindern
0316 Oslo
Norway
I've done it using Bayesian methods: see the attached manuscript. I can send
you the BUGS code if you're interested. It should be easy to adapt to your
problem.
Bob
[ms is O'Hara 2005 Proc. Roy Soc]
--
Bob O'Hara
Department of Mathematics and Statistics P.O. Box 68 (Gustaf Hällströmin katu
2b)
FIN-00014 University of Helsinki
Finland
Hi Ian,
John Novembre just made me aware of your posting. You might want to take a look
at: http://www.genetics.org/cgi/content/full/179/1/497
If that method seems to apply to your situation I believe John has a program
that he probably wouldn't mind sharing with you.
Best wishes,
Rasmus Nielsen
Ian -- A major issue with which you will have to be concerned is sample size.
Several years ago, Joel Kingsolver and several coauthors published a series of
papers on estimates of selection.
In one paper, which I don't seem to have here, they analyzed the relationship
between selection estimates and the sample size used in the analysis. They found
that at small sample sizes there was a wide range of estimates, including very
high ones. As the sample size increased, the maximum value of selection tended
to decrease. This indicated that high selection estimates were artifacts of
sample error. One of the papers Kingsolver et al. did at the time, but not the
one that demonstated the effect of sample size, was
Kingsolver, J. G., H. E. Hoekstra, J. M. Hoekstra, D. Berrigan, S. N. Vignieri,
C. E. Hill, A. Hoang, P. Gilbert, and P. Beerli. 2001. The strength of
phenotypic selection in natural populations. Am. Natur. 157:245-261.
Good luck with your work.
-- Mike Bell
Michael A. Bell, Professor
Department of Ecology and Evolution
Stony Brook University
Stony Brook, NY 11794-5245, USA
phone: 1-631-632-8574
fax:1-631-689-6682
Hi Ian,
This software package might be helpful to you:
PGEToolbox - Matlab toolbox for Population Genetics and Evolution
http://www.bioinformatics.org/pgetoolbox/
The functions called STNPDFH calculates the stationary distribution of the
frequency X of a newly arisen mutation under selection with dominance factor h.
Another function called STNPDFSMPL computes the frequency spectrum for the
mutation. Reference:- Population Genetics of Polymorphism and Divergence for
Diploid Selection Models With Arbitrary Dominance (2004) Scott Williamson
Best regards,
James J. Cai, Ph.D.
Petrov Lab
Department of Biology
Stanford University
Stanford, CA 94305, USA
e-mail: jamescai@stanford.edu
Dear Ian
I'm writing regarding your evoldir question You may want to look at this paper,
doing exactly what you want to do Lenormand, T., and M. Raymond. 2000. Analysis
of clines with variable selection and variable migration. American Naturalist
155:70-82. (you can find the pdf on my website) Pierrick Labbé and I have also
submitted a paper doing this on a long historical cline series to genetics (it's
under review) I'm sure Pierrick could share the ms with you if you're interested
(see his email Cc) One big pitfall is that you want to make sure that the
frequency change is due to selection and not something else (in particular
dispersal), but there are many other pitfalls...
Best
thomas
Thomas Lenormand
CEFE - UMR 5175
1919 route de Mende
F-34293 Montpellier cedex 5
Ian --
1. The proposed analysis is fine -- an excellent strategy.
2. Not sure where it is first mentioned in the literature. Let me
give you a little perspective. Up to the 1960s methods could not
be published unless they were doable by desk-calculator and had
closed-form expressions for the estimates. This is not true of
the ML selection curve analyses.
3. It became clear to multiple people by the 1970s that your proposed
approach was the right one.
3. I would guess the first publication mentioning this should be
in the early 1970s -- but I can't think offhand where (Evolution?
Genetics?). Some phrases like "population cage" and "selection
curve" may be important -- and you should look in population
genetics texts of the era and any books (Endler's book on
natural selection in the wild? Brian FJ Manly's 1985 book?)
J.F.
----
Joe Felsenstein joe@gs.washington.edu
Department of Genome Sciences and Department of Biology, University of
Washington, Box 355065, Seattle, WA 98195-5065 USA
Dear Ian,
I am not sure whether the dominance coefficient can be estimated accurately
through changes in frequency only, without measuring fitnesses for each
genotype.
As for the selection coefficient, you can always estimate an "efficient"
selection coefficient (assuming no dominance for instance) from the allele
frequency change. One easy way of doing this is by taking the slope of the curve
: log(p/(1-p)) as a function of time, as we propose in Chevin & Hospital (2008)
(latest Genetics issue). This definition goes back to Fisher, and has been used
in experimental evolution (see Lenski et al 1991).
You may also need to account for the changes in frequency attributable to
genetic drift. If you are not aware of the effective population size for this
species, it can be estimated jointly with s using the method proposed by
Bollback,York & Nielsen (2008).
Hope this helps.
Cheers,
Luis.
--
Luis-Miguel CHEVIN
Doctorant,
UMR de Génétique Végétale du Moulon
& laboratoire Ecologie Systématique Evolution, bât 360, Université Paris Sud XI.
01 69 15 70 49
URL : http://www.ese.u-psud.fr/bases/upresa/pages/chevin/index.html
Ian Hastings
Liverpool School of Tropical Medicine
Pembroke Place,
Liverpool L3 5QA
0151 705 3183 (office)
0151 705 3147 (group secretary)
Email: hastings@liverpool.ac.uk
"Hastings, Ian"