Hello Evoldir members, a few weeks ago I posted a question about MHC and cloning artefacts and I got a lot of really good answers which made it possible for me to solve the problem. So many thanks to you all that answered me and took your time, it was really helpful! Below my question I made a summary of most of the answers. Please feel free to write to me if you have any more useful suggestions or any questions! Best regards / Erik email: hageri@kth.se My questions was: Hello, I read the discussion about MHC and cloning artefacts and I found it very interesting as I have a similar problem with very puzzling results. I study MHC in dogs (DLA) and to get the alleles in heterozygote individuals I use cloning (I use the GeneJET cloning kit K1221from Fermentas). The problem is that I sometimes get more than two alleles per individual, sometimes as many as six different alleles in one single individual! This problem has occurred in around 50 % of the samples in each cloning and the incorrect ones differ every time and do not occur in specific individuals. Some of these alleles do not correspond to the diploid sequence at all (for example when the diploid ABI sequence indicates a T and G as a double peak, the cloned sequence indicates a clear C). Things that I have already done to avoid problems are the following (some are the same as the suggestions mentioned in answers in the discussion): * HPLC purified primer. However, the differences between the alleles are not where the primers anneal but in between the annealing sites, so the artefacts can not be due to the imperfect synthesis of primers. * I use Platinum Taq DNA polymerase from Invitrogen with high fidelity, which should generate few mis-incorporated bases. However, the differences between the alleles are anyway far too many to be explained by mutations caused by the polymerase (at least according the ratio I know of, around 0.0006). * The PCR product should be pure as I use a nested PCR with two different primer pairs which should enhance the purity. * Contamination should not be a problem as I work in a special lab-room which is more or less DNA free. The diploid ABI sequences only shows two peaks at the polymorphic sites and one peak at the other sites and no indication of contamination between the samples. So, with that said, what should I do? I read one comment where recombination between the E.coli were mentioned as a possible cause and that one should use a low or non recombinant strain to avoid this. I use a strain called RRIM15. At least one of the artefact alleles looks as a recombinant of two other alleles. Is there anyone that knows how common this problem is and if this really can cause so many artefacts as in my case? Could recombination be the problem? Or what other factor could cause this problem that I have?? And when I am writing, I will add a related question. How important is it to purify the PCR product when one is working with MHC genes and could purification improve sequence quality and reduce background signals? I will be very grateful for all suggestions that could help me. I will put a summary of the answers on evoldir in a few weeks. The answers were: 1. the problems you describe are most likely PCR-errors introduced by the polymerase. Even the best proof-reading polymerase introduce enough errors that you find in clones, and the fact, that you can not reproduce the artifacts from the same sample hints to that. I had similar findings when cloning single copy genes: 1. base exchanges, normally found in only single clones ("background noiose"), 2. other exchanges, that were common among clones and fitted with double peaks from direct sequencing (the real "alleles"). and 3. in very rare cases chimeric clones, that starts with sequences of one allele and ends with the other (Recombination during PCR: if in some rare cases elongation of a primer does not go over the reverse primer, the resulting fragment could in theory anneal to a a template of the other allele in the next cycle and produce a mixture of both alleles). The only chance I see to reduce such errors are: 1. Reduce the cycles in your PCRs to a minimum and 2. Do not use a nested PCR (try to design more specific primers, if necessary). When you try this and find that errors are reduced you can be sure, the problems were due to PCR-errors. 2. I would hazard to guess that you might have a series of tandem duplications, although I don't know the dog genome sequence that well. But even if there is genomic sequence available, for complex regions like MHC, initial draft genomic sequence is not very reliable for inferring these type of issues, especially if it not from your specific animal. I would try using some technique to get out into the 5' or 3' genomic sequence flanking your genes to see what is happening. Also, try PCRing with a very long extension time and see if you get a PCR product that is really large. Also, try designing primers in your introns, which might be more specific to one of your copies. You could also try designing PCR primers such that the 3' end, ends on one of your SNPs, so that it will be allele specific. DOing this, you could have primers specific for each allele and then see if you can successfully PCR each one out specifically. This would be a pretty good indication that you have gene duplicates. Cloned PCR products will recombine and I can't even imagine the mess if you have more than two alleles in the mix. I generally sequence 4 clones to start with to infer one haplotype, or go up to 8 if I am not sure, and with possible tandem duplication issues, you will certainly have to go even higher. Use direct PCR sequencing and multiple clones to get agreement for polymorphic sites, and clone information for phase. I would not use clone information from PCR samples that might have mixed duplicate alleles, as recombination potential is a painful realization. THus, get your PCR locus specific. PCR these, clone, get things working so you can convince yourself you have, when you want to, 1 specific locus. Then move onto the others. 3. Jag har samma problem som du när jag klonar. Jag brukar dock inte få fullt så många varianter och det brukar inte vara så mycket som 50 % av proverna som krånglar. Jag brukar inte ha problem att fastställa vilka alleler det är som gäller om jag plockar ut 12 kolonier/prov. I genomsnitt är det kanske 2-3 st som krånglar, för vissa prov fler, för andra inga alls. Hos de prover som krånglar så ser jag ofta att sekvensen följer en variant fram till en viss punkt (positionen för detta varierar) och därefter följer den sekvensen av den andra allelen. Jag använder Invitrogens TOPO TA Cloning kit med TOP 10 celler. Jag renar mina PCR produkter med QIAquick PCR purification kit innan ligeringen. 4. Take a look at these: Borriello F, Krauter KS (1990) Reactive site polymorphism in the murine protease inhibitor gene family is delineated using a modification of the PCR reaction (PCR+1). Nucleic Acids Research, 18, 5481–5487. L’Abbe D, Belmaaza A, Decary F, Chartrand P (1992) Elimination of heteroduplex artifacts when sequencing HLA genes amplified by polymerase chain reaction (PCR). Immunogenetics, 35, 395–397. Longeri M, Zanotti M, Damiani G (2002) Recombinant DRB sequences produced by mismatch repair of heteroduplexes during cloning in Escherichia coli. European Journal of Immunogenetics, 29, 517–523. Meyerhans A, Vartanian JP, Wain-Hobson S (1990) DNA recombination during PCR. Nucleic Acids Research, 18, 1687. Tombline G, Bellizzi D, Sgaramella V (1996) Heterogeniety of primer extension products in asymmetric PCR is due to both cleavage by a structure-specific exo/endonuclease activity of DNA polymerases and to premature stops. Proceedings of the National Academy of Sciences of the USA, 93, 2724. Triggs-Raine BL, Gravel RA (1990) Diagnostic heteroduplexes: simple detection of carriers of a 4-bp insertion mutation in Tay-Sachs disease. American Journal of Human Genetics, 46,183–184. 5. I read with interest your situation cloning DRB alleles. I have had experience working with canids, and I have cloned and sequenced MHC alleles from several pinniped species. From reading your email, I can see five possible explanations: 1. Although unlikely, it is possible that the particular individuals/groups you are studying do in fact have multiple copies of a particular MHC locus. I don't recall this happening in dogs, but we did detect this in one of four Arctic seal species, see: Lehman N, Decker DJ, Stewart BS (2004). Divergent patterns of variation in major histocompatibility complex class II alleles among Antarcticic phocid pinnipeds. Journal of Mammalogy 85(6): 1215-1244. 2. More likely is that you are generating recombinants via the PCR. I would suggest increasing the extension times in the PCR (to as much as 5-10 minutes!) to eliminate this possibility. We have recently published on this issue, see: Yu W, Rusterholtz KJ, Krummel AT, Lehman N (2006). Detection of high levels of recombination generated during PCR amplification of RNA templates. BioTechniques 40:499-507. 3. It is also possible, though again unlikely (but very interesting), if you have managed to catch actual (i.e., biological) recombinants in certain individuals, which is a mechanism by which MHC alleles can diversify during evolution. 4. Perhaps your PCR annealing temperatures are too low and you are cross-amplifying a different MHC locus in some cases. Try raising the annealing temperature and see if you coalesce on only maximum of two alleles per individual. 5. Of course the least interesting possibility is PCR contamination, and when you say your PCR set-up area is "more-or-less DNA free" you should make sure it's more "more" and less "less"! 6. >From your email, its not clear how different these "alleles" are. MHC should be very diverse, so there should be massive numbers of differences between any two real alleles - right? I think that even if you use error-correcting taq, you might have mutations when growing the colonies. I'm not working with MHC, but other genes, and if you consider a single base difference to be a different "allele" then I also can commonly get 6 alleles per individual if I have sequenced 8 clones. I generally assume that the small differences among clones is due to mutation at some stage (pcr or ecoli cell division), and yes, the mutation rate is far higher than what the taq manufacturers would lead us to expect. I sometimes get up to 6 PCR or clone mutations in a 1000 base pair sequence. In addition to pcr mutation, you may also be getting PCR chimeras, where one clone was amplified partially from one real allele and partially from the other allele. It seems like you either need to believe that it is some wierd MHC-specific phenomenon, where new alleles are created in-vivo due to an immunity-related response, or you need to assume that you are getting clone error, and combine clones to take the concensus sequence. 7. As another person who has amplified/sequenced MHC genes in several species, I can tell you I have also encountered the situation you described. Another possible explanation is that you are amplifying pseudogenes, along with your functional locus. We know that in many of the species so far characterized, the MHC region contains numerous loci, some non-functional. One way to explore this would be to isolate RNA/try rtPCR and then clone/sequence these products. 8. In my view the solution is not to clone. The artifacts associated with cloning have been appreciated for some time, and cloning MHC Class I and II genes is effectively a worst case scenario due to high heterozygosity and multiple loci. However, cloning is not necessary. It is readily possible to isolate allelic sequences from heterozygous individuals using Single Stranded Conformational Polymorphism analysis (or other similar techniques). Indidvidual sequences can be resolved by taking punches from an acrylamide SSCP gel using a standard pasteur pipette (using a non-UV light system such the Dark Reader and a fluorescent stain such as SYBR Gold). Resuspended in 500 microliters of ddH2O overnight, a single microliter can be used to re-amplify the target template using the original primers. 25 cycles is sufficient for direct sequencing; the number of cycles should be kept to a minimum to keep contaminating sequences from becoming sufficiently common to affect the sequencing. In my hands this technique is reliable and repeatable. The approach works because artifacts generated during PCR are heterogeneous and therefore do not appear as discrete bands on the SSCP gel (and artifacts specific to cloning, such as cloning of heteroduplexes, are avoided entirely). Re-amplified sequences can be run out on an SSCP gel to check purity, and banding patterns can be compared to those obtained from the original PCR sample, to ensure that all sequences present in the original sample are accounted for. 9. these artefacts in MHC amplification are in deed a known obstacle. To my knowledge there are two major sources for artefacts: 1- Incomplete allele strands (due to short elongation time or decrease of primer concentration) anneal with the non-complementary allele strands and produce 'heteroduplexes that the polymerase elongates. The result are alleles that consist of two or three different parts of different alleles (I found even recognizable recombinants of three alleles). Check Zylstra et al. 1998 Immunology and Cell Biology for information. 2- The so called MutHLS, a mismatch repair system that E. coli (and most other cells) possess. If heteroduplexes occur in the last cycle of PCR, these heteroduplexes are ligated and transformed into the E. coli. The MutHLS recognizes these heteroduplexes and 'repairs' them. This makes usually sense (for the E.coli) after the duplication of the original bacterial plasmid where one strand is methylated and the new one is not. In that case the MutHLS recognizes the 'original' strand and repairs the new one in accordance to the 'original'. In our case of transformation both strand are non-methylated and therefore the MutHLS doesn't know which is the correct strand. Therefore it 'repairs' randomly and produces sequences that don't look like recombinants. These should occur only as one or two copies each though as the likelyhood of getting the same random repair product again is quite low (that's how I found it). Check Thompson et al. 2002 Nucleic Acids Research for information. The first measure to avoid artefacts is to lower the PCR cycle number (I use 25 cycles at the moment). This helps already a lot and decreases the number of recombinants dramatically. Then you can do a 'Reconditioning PCR' as proposed by Thompson et al. This is a new PCR with the orginial primer concentration, but only 3-5 cycles and using a dilution of the normal PCR product. This results in a decrease of heteroduplexes before cloning and avoids these mismatch repair products. Using these two measures: low cycle number and reconditioning PCR, I get good results and not more alleles than expected. Saying all this I assume that you don't amplify more than one locus in your dogs. If you are not sure, you should definitely clone a sub samplle by picking a large number of clones to get a clear picture of what you can expect. Even if you get artefacts, the orginal alleles should still occur in majority. Finally I would recommend purification, but you don't have to use an expensive kit. Ethanol precipitation works just as fine if you have a clear fragment. If your PCR product is not so clean, I would recommend gel purification, especially after reconditioning PCR as the primer concentration is very high and you might end up with only primer clones. Please feel free to write to me if you have any more useful suggestions or any questions! Best regards / Erik email: hageri@kth.se hageri@kth.se