Dear all, I have received a number of useful answers to my question regarding the breaking up of alignments based on a sequence distance threshold (see my original question below this message) which I will summarise below. Two programs were suggested independently by a number of people. These programs are usually employed for analysing communities of soil organisms, and can divide sequences into clusters based on a similarity measure threshold set by the user: 1. Robin Floyd and Simon Creer both kindly pointed me towards the Perl script MOTU_define, which can be found on the website of Mark Blaxters lab at the university of Edingburgh (http://www.nematodes.org/bioinformatics/MOTU/index.shtml). MOTU_define uses blast-based similarity matching, and single linkage clustering. 2. Sujeevan Ratnasingham, Simon Creer and Richard Waterman suggested the program Dotur, by Patrick Schloss from the university of Massachusetts (http://schloss.micro.umass.edu/software/dotur.html). However, this program has recently been incorporated in the new open source, expandible software Mothur (http://schloss.micro.umass.edu/mothur/Main_Page) which can do many cool things, including sequence clustering. It allows you to choose between single linkage (nearest neighbor), complete linkage (furthest neighbor) and average linkage (average neighbor / UPGMA) clustering methods. In addition, Lee Taylor suggested using the genome assembly program Cap3 (Lee Taylor). I hope this may help anyone else interested in sequence clustering. Best, Robin van Velzen -----Original Message----- From: evoldir@evol.biology.mcmaster.ca [mailto:evoldir@evol.biology.mcmaster.ca] Sent: woensdag 22 juli 2009 7:25 To: Velzen, Robin van Subject: Other: breaking up alignments based on distances? Dear EvolDir members, I have a large (>500 individuals) partial COI DNA sequence alignment which I would like to break up into subsets based on a pairwise sequence distance threshold. For example: break up the alignment into subsets containing individuals with <10% pairwise sequence distance. Ideally, I would want to do this without having to produce a distance tree. Does anyone know of a program or script that could do this for me? Any suggestions are most welcome. Thanks! Robin van Velzen Robin van Velzen - PhD student Robin.vanVelzen@wur.nl Biosystematics Group, Wageningen University Generaal Foulkesweg 37, 6703 BL Wageningen, The Netherlands Tel: +31 (0)317 483425 FAX: +31 (0)317 484917 http://www.bis.wur.nl/UK/ www.nationaalherbarium.nl Robin.vanVelzen@wur.nl