Dear Brian, Here is all the answers that I want to put onto. Thanks Junghwa An I have tons of experience aligning very diverse sequences from lentiviruses and other viruses (as well as the complete mitochondrial genomes of vertebrates). Clustal is a "toy" compared to HMMer, MAAFT and other tools for multiple sequence alignment. Write back, and I will give you help. Brian Foley, PhD HIV Databases http://hiv.lanl.gov (505) 665-1970 Sorry, I saw the typo just as I hit the send button http://align.bmr.kyushu-u.ac.jp/mafft/software/ -- Brian Foley, PhD HIV Databases http://hiv.lanl.gov (505) 665-1970 Hi Annie,It can be a hard job to align control region sequences between different species! You did not write with which species you work.I worked mainly with sequences of fishes and it is hardly possible to align between different species. I worked on intraspecific phylogenies and tried to find outgroups. Even within the same Genera of e.g. Sprattus or Mullus I could not align sequences. Though, for some groups it is still possible and I always do this to design new primers (where I look for conserved elemnts). I can recommend you a paper that deals with the appropriateness of the control region for interspecific phylogenies of fishes (they also descibe how they aligned):__ Lee W-J, Conroy J, Howell WH and Kocher TD (1995) Structure and Evolution of Teleost Mitochondrial Control Regions. Journal of Molecular Evolution, 41, 54-66.To manually align sequences, and most probably you have to do that as you might have to deal with doublications of part of the sequences and many large indels, I always start with the central conserved region. If you have sequences of the tRNA-Pro at the 5'end and/or the tRNA-Phe at the 3'end it is not as good to start from there. You should be aware that these are the most variable regions and very often are not homologous due to large indels right at the beginnig/end of the control region. It's also good to search for conserved sequence blocks and fix those. Have a look at this publication:__ Saccone C, Pesole G and Sbisa E (1991) The main regulatory region of mammalian mitochondrial DNA: Structure-function model and evolutionary pattern. Journal of Molecular Evolution 33, 83-91.When you have many different species you should also be aware of the strong variation within a species, which can strongly influence your alignement. I would not recommend this region for interspecific phylogenies unless you have no other choice.When I had to deal with outgroup sequences for intraspecific phylogenies I deleted the hypervariable (5' or 3') ends and tried to use the central conserved region only, but there was not enough variation within the species to evaluate the "oldest haplotype" as this was dependant from how much I deleted. All my phylogenies stayed unrooted. Maybe this is what you can do for your interspecific phylogenies: just use the central conserved region.Another thing that might help is downloading additional sequences of related species from Genbank and use them to improve the alignment.I hope this will help you somehow and if you have any questions don't hesitate to write back. All the best, Paul Hello,The control region and especially its HVR can be really problematic to align in divergent taxa. It can contain repetitive sequences or just have received many substitutions. In mammals it's removed from alignments even in closely related taxa (recently diverged). In fish it's more conserved and can actually be used in phylogenetic reconstruction at the genus level (see Doukakis's papers). Douzery has published on the cervid control region.I would remove it from the analysis. The rest of the mitochondrial genome sequence should be more informative at this scale anyway. In cases of difficult to align sequences (indels, varying length, etc), a simultaneous alignment and phylogeny method can be employed to integrate across alignment and tree space. POY, BaliPhy, and StatAlign are programs implementing these methods. Best,sergios Sergios-Orestis Kolokotronis, PhD Coordinator, DNA Barcoding Initiative for Conservation Sackler Institute for Comparative Genomics American Museum of Natural History Central Park West at 79th Street New York, NY 10024 -USA- tel +1 212 313 7648 koloko@amnh.org http://koloko.net Hi Annie You can exclude sites (across all taxa) if there are some too variable to be aligned. When you know what sites (bp) you are excluding you can just add that command to your tree-building command set in PAUP or MrBayes or whatever you are using. You should report this in your methods. good luck, Kathryn Hi, Try BAli-Phy and BEAST. They might be good for D-loop stuff. Read attached: Rokas, then Wong, then lunter, then redelings. BEAST will do what lunter says _ read about BEAST in the Drummond paper. then, for a brief overview, see the lecture notes by drummond (the powerpoint file). there are discussion groups for BEAST and BAli-Phy. Mark Schultz Charles Darwin University C/- Arafura Timor Research Facility PO Box 41775 Casuarina NT 0811 Australia Ph: +61 (0)8 8920 9292 Fax: +61 (0)8 8920 9222 Within on clade or group of mammals (family Bovidae) I think you would want to use as much of the data as possible. But it depends on whether you want to get the correct branching order and clades within a Genus and species; or if you want to determine exactly when the Caprinae (goats and sheep) last shared a common ancestor with other Bovidae (cattle, buffalo, antelope, deer, etc). If you are looking deep in the phylogenetic tree, there are enough differences in the very easily alignable sites to separate Deer from Cattle from Sheep. If you are looking mainly at the tips of the branches, the leaves, then most of the easily alignable regions are identical between individuals of the same species, and you want to keep as many sites as possible. http://www.biomath.ucla.edu/msuchard/bali-phy/index.php Has a good discussion of how our bias in doing alignment ends up giving us a bias in the resulting trees. So Redeling et al developed a method of comparing alignments as well as trees made from those alignments. One of the major problems in phylogetetic analyses, is that we don't have good mathematical models for insertion/deletion in comparison to the models we have for single base changes. There are many types of insertion/deletion events, so we would have to treat each type differently. A 10-base deletion is usually a single event, not 10 separate single-base deletions. And exactly how we do the alignment can turn one ten-base deletion into two 5-base deletions, or a 4-base deletion and a 6-base deletion, etc. So in general,I recommend getting rid of all regions where a bias (human or program) in the alignment procedure will bias the resulting trees. But I think Redeling et al have a good idea, to look at many alignments. Brian Foley HIV Databases Hi AnnieI expended quite some effort into looking at alignment issues in introns which will be the same sort of problem as you are experiencing. There are a number of solutions and programs summarised in the Sys Biol and Evol Bioinformatics papers on my website below if they are of any help?http://biology.bangor.ac.uk/%7Ebssa0d/scpublications.htm Cheers and good luck !Si-- Si Creer Post Doctoral Research Fellow Molecular Ecology and Fisheries Genetics Group School of Biological Sciences University Wales, Bangor Bangor Gwynedd LL57 2UW UK e-mail: s.creer@bangor.ac.uk Tel: +1248 382302 Fax: +1248 371644 Home Page: http://biology.bangor.ac.uk/~bssa0d/ ΎΘΑ€Θ­