Thanks to everyone who wrote me suggesting possible approaches to mtDNA ambiguous sequences (heteroplasmy or other). Some of these answers: 1) Have you considered the possibility that you have amplified Numts (nuclear copies of mitochondrial DNA fragments) along with your mitochondrial DNA? I have found Numts in beetle genomic DNA before. One way to get around the problem is to amplify a larger mtDNA fragment (for example, half of the mitochondrial genome). Most (but not all!) Numts consist in relatively short fragments. Therefore, amplifying a larger fragment often allows to get rid of the nuclear copies. Patrick 2) Hi, most probably you're not facing heteroplasmy (which es extremely rare in animals) but rather co-amplification of nuclear pseudogenes (numts). Its a pretty nasty job to get rid of them, mainly playing around with primers, PCR conditions or doing longPCR reactions making advantage that neuclear insertions are rather short. You should really consider cloning at least a few samples to get clean seqs; a sequence analysis checking nonsynonymous substitutions, indels, damaged ORFs, GC content,... should guide you to the authentic mtDNA seq and may help in primer development. See the refs below, mainly that one of Thalman et al, for some hints. Which beetle are you working on and which primers did you use? regards, Wolfgang Bensasson D., Zhang D.X., Hartl D.L., and Hewitt G.M. (2001) Mitochondrial pseudogenes: evolution's misplaced witnesses. Trends in Ecology and Evolution 16, 314-321. *Thalmann O., Hebler J., Poinar H.N., Pääbo S., and Vigilant L. (2004) Unreliable mtDNA data due to nuclear insertions: a cautionary tale from analysis of humans and other great apes. Molecular Ecology 13, 321-335.* Arctander P. (1995) Comparison of a mitochondrial gene and a corresponding nuclear pseudogene. Proceedings of the Royal Society of London, B 262, 13-19. Pons J., and Vogler A.P. (2005) Complex Pattern of Coalescence and Fast Evolution of a Mitochondrial rRNA Pseudogene in a Recent Radiation of Tiger Beetles. Molecular Biology and Evolution 22, 991-1000. Sunnucks P., and Hales D.F. (1996) Numerous Transposed Sequences of Mitochondrial Cytochrome Oxidase I-II in Aphids of the Genus /Sitobion/ (Hemiptera: Aphididae). Molecular Biology and Evolution 13, 510-524. Zhang D.X. and Hewitt G.M. (1996) Nuclear integrations: challenges for mitochondrial DNA markers. Trends in Ecology and Evolution 11, 247-251. Dr. Wolfgang Arthofer Universität für Bodenkultur, Wien Institut für Forstentomologie, Forstpathologie und Forstschutz Hasenauerstrasse 38, A-1190 Wien, Austria wolfgang.arthofer@boku.ac.at http://ifff.boku.ac.at 3) Hi, Fotini Are you shure that the results are not caused by the presense of pseudogenes inthe nucleus? Are the problems present in both genes or only one of them and incases where there is only a single difference have you checked taht it does notresult in a stop codon which could clearly indicate that it was pseudogenes. Kind regards, Søren 4) Answer: Your problem may be with nuclear copies of the mtDNA rather than with mtDNA heteroplasmy. If you can determine the nuclear sequence, you may be able to "subtract" it from the hetero sequence. There are also other possible solutions. See: Sorenson, M.D. & T.W. Quinn. 1998. Numts: A challenge for avian systematics and population biology. The Auk 115: 214-221. Sorenson, M.D. & R.C. Fleischer. 1996. Multiple independent transpositions of mitochondrial DNA control region sequences to the nucleus. Proceedings of the National Academy of Science USA 93:15239-15243. Question: Thank you very much for your response. I already read your papers but I don’t think that I have amplified Numts. When I checked for amino acid changes in these sites I only found synonymous mutations and no stop codons. Additionally DNA was extracted for tissue very rich in muscle (beetle legs). What do you think about it? Is it enough to exclude Numts case? Answer: In general, I'd say that the above is not sufficient evidence. The example highlighted in our 1998 paper is one where a numt was amplified from muscle tissue. Even if you have a very high ratio of mtDNA to nuclear copies, you can still amplify the nuclear copy if the primers you are using match the nuclear sequence better. As for the patterns of change, I would expect most substitutions between mtDNA and a recently evolved numt to be 3rd position transitions. The reason is that the numt evolves more slowly (given better DNA repair mechanisms in the nucleus), such that most of the change represent subsequent substitutions in the mtDNA copy. As a next step, I would first ask about your primers. Are they "universal" primers developed from other species? Do the primers incorporate degenerate sites to accommodate likely variation at 3rd positions or are the primers a single sequence? If they are universal primers with no degenerate sites, it is particularly in this situation that you risk preferential amplification of a low copy number nuclear copy. Hope that helps. Best wishes, Mike Question: My primers are used for different insects but they are not degenerated at 3^rd positions. I am not sure I completely understood your last phrase. You mean that if they are universal primers with no degenerate sites they would preferentially amplify numts? I thought the opposite! Answer:This is a somewhat difficult effect to explain, but consider designing primers based on known sequences from a diverse (or not so diverse) set of insect species. The COI sequences will vary among species mostly at 3rd positions. If you use a "majority rules" approach to deciding on the base at each variable position, your primer will approximate the ancestral COI sequence. Given that mtDNA evolves more rapidly than nucelar, a numt will often be more similar to the ancestral sequence than the current mtDNA sequence and therefore a "consensus" primer may preferentially amplify the nuclear sequence. It may also just be bad luck. Suppose that your nuclear and mtDNA sequence differ at one or a few third positions and just by chance, the numt matches the primer better. You have now lost the advantage of having many more mtDNA copies than nuclear copies in your extract. If on the other hand, you use a primer with degenerate sites, then in your PCR reaction, there will be primer molecules that match the mtDNA and also match the nuclear DNA - by using the degenerate primer, you have eliminated the possibility of preferential amplification and now the difference in copy number determines your result. I discuss this logic in the 1998 paper. In general, I very strongly recommend primers with degenerate sites - particularly for mtDNA protein-coding genes! If you get good primers and/or amplify across the same region with an entirely different set of primers (i.e., in different locations from your current primers) and if you still got double-peaks, I would be more inclined to believe the possibility of heteroplasmy, but until then, I think numts is the more plausible explanation. One last comment, it has been found in some species that portions of the mtDNA are tandemly duplicated many times in the nucleus, such that the difference in copy number is not so great. We recently sequenced an entire mtDNA genome copy in the nucleus of a bird species - even though our extract was from muscle tissue, we encountered double sequences with most of the primer pairs we used (~20 primer pairs to work around the genome). We eventually separated the two sequences by designing mtDNA and numt specific primers. Oh, one more comment. I very much appreciate your careful approach to your data. I think that many researchers ignore or gloss over problems like this and you should be commended for not doing the same. Best wishes, Mike 5) You can use Seqscape software for detecting heteroplasmy in your analyses. Applied biosystems provide this type of software. If you need any info. pls don't hesitate to contact me. Mahesh Mahesh S. Dharne National Center For Cell Science Molecular Biology Unit, University campus,Pune, (M.S.),India Tel # 91(20)5690922, Fax #91(20)5692259 Personal link: http://mbu.nccs.tripod.com/id6.html web:http://www.nccs.res.in Contact no: +91 9923188900 6) Dear Fotini, Unfortunately, I do not think that the absence of stop codon is enough. I know one case in a beetle species where a Numt did not contain a single stop codon. Plus, I seem to remember reading in the literature that some Numts are still active in the nuclear genome, i.e., they still code for proteins (this would need to be checked). Cheers, Patrick 7) Hi Fotini, The method you are looking for is called haplotype subtraction, originally proposed by Clark in 1990. The paper is in Mol. Biol. Evol. 7(2): 111-122. This is the abstract: "Direct sequencing of genomic DNA from diploid individuals leads to^ ambiguities on sequencing gels whenever there is more than one mismatching^ site in the sequences of the two orthologous copies of a gene. While these^ ambiguities cannot be resolved from a single sample without resorting to^ other experimental methods (such as cloning in the traditional way),^ population samples may be useful for inferring haplotypes. For each^ individual in the sample that is homozygous for the amplified sequence,^ there are no ambiguities in the identification of the allele's sequence.^ The sequences of other alleles can be inferred by taking the remaining^ sequence after "subtracting off" the sequencing ladder of each known site.^ Details of the algorithm for extracting allelic sequences from such data^ are presented here, along with some population-genetic considerations that^ influence the likelihood for success of the method. The algorithm also^ applies to the problem of inferring haplotype frequencies of closely linked^ restriction-site polymorphisms." If you want the whole pdf please let me know Best --mark ------ Mark A. Chapman mchapman@plantbio.uga.edu www.theburkelab.org ----- University of Georgia Department of Plant Biology Miller Plant Sciences Bldg. Athens, GA 30602 8) Dear Fotini,Can you design haplotype specific primers for PCR? That is, do yourhaplotypes fall into 2 (or more) distinct groups? I work on a system inmussels where heterplasmy is normal - Doubly Uniparental Inheritance inbivalve molllusks has male- and female-specific mtDNA. The mt genomes aredifferent enough to amplify one or the other using primers specific to oneor the other. Is there any chance your beetles have DUI? Probably not, but it would be thefirst non-mollusk to show it. Brian 9) Dear Fotini, This is not really an answer to your question, but maybe just something to keep in mind. I am finishing a LONG PH.D. (insert exclamation marks here!), and what started out as possible heteroplamsy turned into a nightmare in my project. I work on seedsnipes (birds of the Andes) and I found multiple nuclear copies of mitochondrial genes, true heteroplasmy (possible hybridization and paternal leakage of mitochondria), and variation among individuals and populations even within the same species with respect to these copies. I just published a paper that may be of interest, at least because to me it is something that we should probably never underestimate again. I have attached a copy--- you don't have to clone (and it may not really tell you whether you actually have real mtDNA copies), but there are other methods now available to try to figure out what the individual sequences are. I don't know if you or a colleague run SSCP analyses, but bands can also be cut out from these gels so that you can sequence individual DNA strands (i.e. haplotypes or alleles). I hope the information is useful. Best wishes in your research. Gabriela Ibarguchi - - - - - - - - - - - - - Gabriela Ibarguchi /Department of Biology, Queen's University Kingston, Ontario, Canada, K7L 3N6 /ibarguch@biology.queensu.ca or gibarguchi@biology.ca tel (613) 533-6000 ext. 75539, fax (613) 533-6617 http://www.ibarguchi.ca 10) hi Fotini are you sure they are heteroplasmic sequences, and not nuclear copies of mtDNA? The latter are extremely common (I know to my cost, having wasted a year of a project about aphids - see Sunnucks & Hales (1996) in Mol Biol Evol for lots of details about finding nuclear copies). I would strongly recommend against trying to use any sort of numerical analysis to sort out the sequences rather than finding a molecular biology solution. (1) If the extra copies are nuclear: we often find that diluting the template DNA in PCR to the lowest concentration that still gives sequenceable PCR product will get rid of nuclear copies (eg Ryan Garrick used this approach on Collembola - see Garrick et al 2004 in Mol Ecol), and leave mtDNA behind (because of its generally higher relative concentration). If that does not work, see 2b (2) If the extra copies really are heteroplasmic: (a) The dilution technique may work anyway - you will be able to make a rarer mtDNA go away, and thus by deduction, separate the two sequences (b) you might need to employ a molecular method of separating them. One cloning-free approach is to run them on SSCP gels and reamplify from stabbed bands. This often works well - see Sunnucks et al 2000 in MOl Ecol for a protocol that works easily on most templates, and again, Ryan Garrick's work is a great example of how to apply this approach to separating sequences. Paul -- Dr Paul SunnucksSenior Lecturer in ZoologySchool of Biological SciencesMonash UniversityClayton Campus3800 VictoriaAustraliaemail Paul.Sunnucks@sci.monash.edu.au phone + 61 3 99059593http://www.biolsci.monash.edu.au/staff/sunnucks/index.html 11) You should not attempt to produce a consensus sequence. Your phylogeny is a phylogeny of sequences, not necessarily organisms, so you should retain their separate identity. In fact, it is probably a pseudogene/numt and therefore potentially very useful in rooting a tree and providing independent corroboration of phylogeny. There are many papers on numts providing useful reviews of uses and approaches to identifying them (54 hits on ISI WoS; e.g. 2001 review in TREE 16(6):314-321). I would try changing primers to amplify the two sequences independently, and build them into a single tree (the nuclear clade should come out independently and show much less differentiation). True heteroplasmy generally involves only small differences, or indels, except in the special case of mussels. good luck, graham 12) I can't help with non-cloning methods, but I can relate my experiences with heteroplasmy. I also had a considerable amount in the bees I studied (about 1/5 of species, and up to 1.5% difference between haplotypes). Be aware that not all of the polymorphisms will show up in the regular chromatogram. I had problems with cloning so I was only able to get at most two clones from them, but I found both double peaks in the original that were identical in the clones, and differences between the clones that did not appear as polymorphisms in the original. However, at sites of the latter I did find that the very small peaks that are ignored as part of "noise" were always the same color as the alternate base. In other words, polymorphisms and heteroplasmy may well be more common than we think, and being obscured by the way PCR works. This was also shown where I had to reamplify one species where the first try gave poor sequencing length; in the first try there was only 2 polymorphisms, but there were over 20 in the second. Karl Karl Magnacca, UC-Berkeley ESPM Dept., 137 Mulford Hall #3114 510-642-4148 http://nature.berkeley.edu/~magnacca http://nature.berkeley.edu/ogradylab 13) Hello, Please find attached in a separate email the .pdf of a paper describing a method that you could use if your mitochondrial haplotypes had variable lengths (Flot et al. 2006 Mol Ecol Notes). However, since you have apparently only a few double peaks in each chromatogram, it may be that all your haplotypes have identical lengths, in which case the method described in this paper does not work. You could try, however, to design primers in order to include in your PCR products some length-variable regions adjacent to your regions of interest, in order to be able to use the method. If this is impossible to do, have a look at the papers cited in the introduction of the attached article for other existing methods - you may be able to sort your double-peaks by using haplotype-specific PCR primers for instance. Best wishes, Jean-François Jean-François Flot UMR 7138 "Systématique, Adaptation, Évolution" (USM 603) Département Systématique et Évolution Muséum National d'Histoire Naturelle Case postale N°26 57 rue Cuvier 75231 Paris Cedex 05 France 14) Dear Fotini, I´m also working with beetles, and we´ve found heteroplasmic sites in our COI sequences as well. To analyse them we´ve used IUPAC codes, and we´ve had no problems when using WinClada & NONA, POY and MrBayes. So if you´d like to consider those sites as polymorphic, maybe you could try out one of these programs (either parsimony or Bayesian estimation)? They are all freely available in the internet. Best regards, Helena Koivulehto 15) it sounds like an interesting, but slightly tricky problem. I'm not sure what the answer is. For example, are you sure that these individuals have only 2 DNA sequences, and not more? If you are sure that there are only 2 (but I don't know how you woudl be!) then maybe PHASE would work OK if there is reason to expect that the different haplotypes might be related by a tree-like structure, since this is the main assumption underlying PHASE (actually PHASE allows for recombination, which probably does not make so much sense in mitochondria(?), but you could specify no recombination using the -MS option) You might also look into methods people use for estimating hapltoypes from DNA pools, and from Malaria, where sometimes individuals are infected with multiple strains (and you don't know how many). i'm not intimately familiar with that stuff, but you might try searching for haplotype and malaria, and/or pool. Matthew On 2/22/07, Fotini KOUTROUMPA wrote: Dear Mr Stephens, I am writing to you about your software PHASE which I would need in the analyses performed for my PhD project on population genetics in one European beetle species. I sequenced two mitochondrial DNA fragments from the genes of cytochrome oxydase I (COI) and II (COII). My current problem is concerning heteroplasmy (i.e. the presence of two kinds of DNA in the mitochondria in one individual, or the presence of two kinds of mitochondria in one individual). Indeed, half analysed individuals showed double peaks at 1 to 8 nucleotide sites. This was confirmed by the reverse sequences. I have tried to perform analyses by coding these ambiguous sites using the IUPAC code. However softwares such as PAUP have difficulties while the analyses are running. I have thought to infer the haplotypes of the concerned sequences using your Bayesian method implemented by PHASE. I need your confirmation to use it for mitochondrial DNA data which unfortunately are not genotypic data (no presence of two alleles). In case that PHASE is not adapted to such analyses, are you aware of suitable analyses and/or softwares? I thank you very much for your precious help. Best regards, Fotini Koutroumpa Fotini KOUTROUMPA Laboratoire de Biologie des Ligneux et des Grandes Cultures UPRES EA 1207 Université d'Orléans Chartres str BP 6759 45 067 ORLEANS Cedex 2 FRANCE fotini.koutroumpa@univ-orleans.fr