Dear Evoldir members Thanks to all of you that have answered my question about working with multiple copy genes for population genetics. Below I have compiled the mails I got in response. In short, it seems that the conclusion is that even low-copy genes are a pain in the neck, and the best way is to try to develop locus-specific primers to amplify just one of the paralogs. The problem of false recombinants appearing during PCR amplification further complicates the issue. There are useful references in the answers. Best regards Xavier The original question: Hi everyone We have been working in population genetics with several invertebrate groups. In an effort to develop new sequence-based markers, not just the usual mitochondrial genes, we have been doing sequencing work on nuclear genes (both introns and exons). We are finding in many cases that our target genes are multiple copy genes. Although the number of copies is low (nothing to do with rRNA genes, for instance) and probably there is some concerted evolution, when we clone the amplification products we end up with several alleles in most individuals. My question is, what can we do with these data? cloning represents a lot of effort and I wonder if it is worth it. This kind of data violates assumptions for most analyses for phylogeny, phylogeography or population genetics. The concept of homgygosis or heterozygosis breaks down. They cannot be treated as polyploid, either, as the different copies are paralogous. Hence, different copies may follow different evolutionary models. To further complicate the issue, often the different copies are close together in chromosomes, so recombination is possible both intragene and intergene. We can calculate some sort of "haplotype frequency" based on the different sequence types found in the clones but then, can we legitimally use these estimates in popgen programs? are there any program that can handle this information, or any published reference? Any hint will be appreciated. I will of course compile and post the answers to the whole list. Best regards The answers: Hi Xavier, Depening on whether you can distinguish (at the sequence level) different alleles per locus and different loci in each pcr, my suggestion would be to run the primers you have already designed on a few individuals, clone and sequence all alleles/loci from each and from there design gene specific primers. then use these in all your individuals. good luck! --mark Mark A. Chapman mchapman@plantbio.uga.edu www.theburkelab.org University of Georgia Department of Plant Biology Miller Plant Sciences Bldg. Athens, GA 30602 http://darwinawards.com/ www.bbc.co.uk/littlebritain Dear Xavier, Are you sure the loci are in multiple copy? Or are you simply getting more than just the expected 2 loci per individual? That also happens due to cloning artifacts. Chimeras between alleles can be created during amplification and then isolated during cloning and sequencing. Are you using a PCR+1 protocol for cloning? In my experience, and I too have cloned non-rDNA nuclear loci, cloning artifacts are extremely common - they happened everytime until I started using PCR+1. Once I did that the number of alleles became 2. The drawback is that PCR+1 is a lot of work. But for pop. gen. it is a must. Briefly, in PCR+1 you start-up by using a lot of one of the primers and 1/10 as much of the other (by the way, you have to use a proof reading polymerase - we use Easy-A because it also generates T overhangs and then we use TOPO-TA kit). You then do 1 last PCR cycle in which you add the primer that was in short supply but that primer has been modified so it has a RE cut site (a 10 bp tail is added with a restriction site). After that last cycle, you clone the PCR products, amp a bunch of clones (using primers in the vector), clean them, and cut them with the restriction enzyme. Only sequence the ones that do cut. That means they were generated in that last cycle using the new primer and are not the result of priming by unfinished fragments (which generates the chimeras). Dina -- Dina M. Fonseca, PhD Associate Professor Center for Vector Biology Rutgers University 180 Jones Avenue New Brunswick, NJ 08901 Phone:(732) 932 3146 Fax: (732) 932 9257 email: dinafons@rci.rutgers.edu Why don't you try to make gene-trees for each gene, using the inferred haplotypes? The pseudogenes should form a cluster separate from the real genes, and you should be able to, at least for the exons, figure out which ones are real and pseudo, by looking for indels and stop-codons. If the pseudo ones are distant enough, you will probably not have to do much cloning, just enough to confirm your findings. Also, if you find fixed differences between pseudo and target gene you can design allele-specific primers to get rid of the pseudo ones. I wish you the best of luck, Magdalena Zarowiecki NHM London See Martin and Burg Syst Biol. 2002 Aug;51(4):570-87. This paper describes the issue of paralogy of gene copies and its influence and usefulness for phylogenetic inference. A Andrew Martin Dept of Ecology and Evolutionary Biology University of Colorado Boulder, CO 80309 Hola Xavier, Sento no poder resoldre els teus dubtes (no crec que siguin qüestions amb gaire consens...), però resulta que mentre llegeixo el teu e-mail tinc a sobre de la taula, encara per llegir, un review que crec que potser, i només potser, et podria ajudar a adreçar els teus problemes... Nei M, Rooney AP. Concerted and birth-and-death evolution of multigene families. Annu Rev Genet. 2005;39:121-52. Review. PMID: 16285855 [PubMed - indexed for MEDLINE] Salutacions, David Álvarez hi Xavier I suspect these sort of difficulties are why 'EPICS' have not progressed as far as we first thought they would, all those years ago. Multiple gene families are a big problem. You are talking about analytical solutions, but I think the best ones are lab-based. Two most obvious ones come to mind: (1) avoid multiple gene families. Good progress has been made in recent years by Simon Jarman: Jarman SN, Ward RD, Elliott NG (2002) Oligonucleotide primers for the amplification of coelomate introns. Marine Biotechnology, 4, 347-355. and there is a more recent paper on crustacea, if that is your group. (2) use your multiple sequences to try and redesign primers that amplify only single loci. You will need to validate that you have isolated single loci, preferably with family material, but population material can be effective for this as long as you get 'the right answer' about what is allelic with what in your pool of multiple alleles. (Actually they are probably not alleles in the strict sense, because they probably come from different loci). There are some related tips you might find useful in: Garrick RC, Sunnucks P (2006) Development and application of three- tiered nuclear DNA genetic markers for basal Hexapods using single- stranded conformation polymorphism coupled with targeted DNA sequencing. BMC Genetics, 7. Paul -- Dr Paul Sunnucks Senior Lecturer in Zoology School of Biological Sciences Monash University Clayton Campus 3800 Victoria Australia email Paul.Sunnucks@sci.monash.edu.au phone + 61 3 99059593 http://www.biolsci.monash.edu.au/staff/sunnucks/index.html Hello Xavier, Differences in GC% in third codon positions tend to keep paralogs from recombining. It would be interesting to check this out with your multiple copy genes. For more please see: http://post.queensu.ca/~forsdyke/book03.htm Sincerely, Donald Forsdyke, Department of Biochemistry, Queen's University, Canada Estimado Xavier, Nosotros estamos trabajando también con genes nucleares y encontramos en cada individuo más de uno y dos alelos, como en tu caso. No sé si vosotros habeis verificado la presencia de más de una copia de los genes, pero en nuestro caso sabemos que sólo hay una copia, por lo que asumimos que cuando hay más de dos alelos se debe a errores de la PCR, clonaje o secuenciación (no sabemos en qué etapa del proceso se producen). Lo que intentamos es secuenciar al menos tres secuencias de cada alelo, y aquellas mutaciones que aparecen una sola vez las asumimos como errores de secuenciación. Otro problema que hemos encontrado a menudo en los genes de los autosomas es el de los recombinantes, de manera que aparecen alelos que son una mezcla de los dos verdaderos. No sé si habreis tenido en cuenta estas cosas, imagino que sí, pero por si acaso pensé que os podría servir de ayuda. En todo caso, suerte y espero que tengais respuestas que os puedan ayudar. Un cordial saludo, Ramiro --------- Dr. Ramiro Morales-Hojas Evolução Molecular IBMC Rua do Campo Alegre 823 Porto 4150-180 Portugal e-mail: rmhojas@ibmc.up.pt ---- Hi Xavier, my simple suggestion is that you design locus-specific primers for selective amplification of each locus. Of course, just by looking at your clone sequences it may be difficult to work out which sequence are allelic and which are not, but you could get a hint from the amount of divergence between sequences (allelic copies should be more similar unless duplication is recent). Or if you have families, you could follow the inheritance of specific sequence types to determine which belong to the same locus. I don't know if this will be practical in your system, but I've used it in a tetraploid plant species (which has - at least two copies of each gene). Perhaps there are easier ways around the problem. Good luck! Alf Ceplitis Dear Xavier, Besides the points you brought up, you might also have to consider potential recombinations generated in the course of your experiment, during the PCR step. Such recombinant products may further complicate, perhaps distort your subsequent analysis. You may find the following paper interesting to read. Meyerhans A, Vartanian JP, Wain-Hobson S: DNA recombination during PCR. Nucleic Acids Res 1990, 18:1687-1691. I attached the PDF. We work on repetitive elements and run into those problems from time to time as well. Regards, Jens Mayer Hola Xavier: Te escribo con respecto al mail que has enviado a evoldir sobre genes de copia múltiple. Nosotros tenemos un trabajo que está en prensa en el J of biogeography (te adjunto copia) en el que usábamos las familias de parálogos de ITS como herramienta filogenética. Por supuesto nunca puedes afirmar con absoluta certeza que has secuenciado todas las posibles copias de un gen, pero por probabilidad la que salga será la más frecuente y por consiguiente la favorecida por la evolución (concertada o no). A ver si el paper te vale apra algo, y si no igual las referencias que aparecen citadas. Mucha suerte: RAFA Dr. Rafael Rubio de Casas P.D., the paper mentioned is: Plastid and nuclear DNA polymorphism reveals historical processes of isolation and reticulation in the olive tree complex (Olea europaea) G. Besnard1*, R. Rubio de Casas2,3 and P. Vargas3 J. Biogeogr. (in press) -- Xavier Turon Dept. of Animal Biology (Invertebrates) Fac. of Biology Univ. of Barcelona 645, Diagonal Ave 08028 Barcelona e-mail: xturon@ub.edu phone: 34-93-4021441 fax: 34-93-4035740 Xavier Turon