Dear Evoldir members, please find copied below all the answers we receive about open source software for automated AFLP data scoring and individual genotyping Thanks Alessia Cariani FROM Roland Schultheiss [Roland.Schultheiss@bio.uni-giessen.de] Hi Alessia, we're having the same issue at the moment in our lab. Hence, I would highly appreciate if you could share the answers you receive with the list. Until now, one of the best solutions we have found is the R package "rawGeno" ( http://www2.unine.ch/webdav/site/ebolab/shared/Programs/Biology08Poster.pdf). However, our experience with the package is still limited... By the way: have you considered "Genemarker" from softgenetics ( http://www.softgenetics.com/GeneMarker.html)? Best regards, Roland FROM Ben Holt [b.holt@uea.ac.uk] This is the business: http://www.shef.ac.uk/molecol/software~/aflpscore.html FROM Yael Kisel [y.kisel06@imperial.ac.uk] Hi Alessia, I've just been investigating AFLP scoring software myself - I'm doing AFLPs on a bunch of orchid species and don't have time left in my PhD for scoring it all by hand! Or the patience =) Here is my short list of (free) programs to try - RawGeno - an R package - very quick and easy to use, takes files generated by PeakScanner, which is a free ABI program for looking at .fsa files . .. but I still don't totally understand how it does the scoring, I plan to check on that more. AFLPScore - another R-based program, which uses files generated by GeneMapper (not free, but maybe you could make the right files in PeakScanner) - I haven't tried this yet, but it also calculates error rates . .. tinyFLP - a standalone small program that is very fast . .. and there is an R package called AFLPdat for managing AFLP data in R, which sounds really useful but I also haven't tried yet (I really have just started on analysis). I'll be really interested to hear what responses you get! best and good luck, Yael FROM Wolfgang Arthofer [Wolfgang.Arthofer@uibk.ac.at] Hi Alessia, in June 2009 I published a free package for automated binning of AFLP data; see ref Arthofer W. (2009) tinyFLP and tinyCAT: software for automatic peak selection and scoring of AFLP data tables. Molecular Ecology Resources, doi: 10.1111/j.1755-0998.2009.02751.x. The software works with exported PeakScanner tables, so its mandatory that your AFLPs were run on an ABI platform. It has no fancy interface, but creates relieable 0/1 matrices and infiles for MrBayes and Genalex very fast. Actually, I am working on an additional program to find optimal scoring parameters for a given dataset, this program will be availabe sometimes in early 2010. If you want to test the program you can download it at http://sourceforge.net/projects/tinyflp/ (use the 'View all files' button that will appear!); I would appreciate any feedback if the software was helpful. regards, Wolfgang FROM Licia Colli [licia.colli@unicatt.it] Try Genographer! It can read ABI data and return 0/1 matrix. FROM Jérôme Vrancken [jerome.vrancken@uclouvain.be] Hi; To my knowledge, this is the only one that is free, userfriendly and that works. Genographer: http://hordeum.oscs.montana.edu/genographer/ FROM "Kai N. Stölting" [kai.stoelting@access.uzh.ch] In general, there are several relevant other papers on BMC genomics on the optimization of AFLP experiments. Have a look. Eukaryotic transcriptomics in silico: Optimizing cDNA-AFLP efficiency *Kai N Stolting* email < mailto:kai.stoelting@access.uzh.ch>, *Gerrit Gort* email < mailto:gerrit.gort@wur.nl>, *Christian Wust* email < mailto:christian.wuest@math.uzh.ch> and *Anthony B Wilson* email < mailto:tony.wilson@zm.uzh.ch> /BMC Genomics/ 2009, *10**:*565doi:10.1186/1471-2164-10-565 Published: 30 November 2009 Evaluating the impact of scoring parameters on the structure of intra-specific genetic variation using RawGeno, an R package for automating AFLP scoring *Nils Arrigo*^1 email < mailto:nils.arrigo@unine.ch>, *Jarek W Tuszynski*^2 email < mailto:jaroslaw.w.tuszynski@saic.com>, *Dorothee Ehrich*^3 email < mailto:dorothee.ehrich@ib.uit.no>, *Tommy Gerdes*^4 email < mailto:tommy.gerdes@rh.regionh.dk> and *Nadir Alvarez*^5 email < mailto:nadir.alvarez@unine.ch> ^1 Laboratory of Evolutionary Botany, Institute of Biology, University of Neuchâtel, 11 rue Emile-Argand, CH-2000 Neuchâtel, Switzerland ^2 Science Applications International Corporation (SAIC), 1710 SAIC Drive Suite 3155 McLean, VA 22102, USA ^3 Department of Biology, University of Tromsø, N-9037 Tromsø, Norway ^4 Chromosome Laboratory, Department of Clinical Genetics, Rigshospitalet, Blegdamsvej 9, Copenhagen, Denmark ^5 Laboratory of Evolutionary Entomology, Institute of Biology, University of Neuchâtel, 11 rue Emile-Argand, CH-2000 Neuchâtel, Switzerland author email corresponding author email /BMC Bioinformatics/ 2009, *10**:*33doi:10.1186/1471-2105-10-33 The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/10/33 Received: 18 August 2008 Accepted: 26 January 2009 Published: 26 January 2009 Background Since the transfer and application of modern sequencing technologies to the analysis of amplified fragment-length polymorphisms (AFLP), evolutionary biologists have included an increasing number of samples and markers in their studies. Although justified in this context, the use of automated scoring procedures may result in technical biases that weaken the power and reliability of further analyses. Results Using a new scoring algorithm, RawGeno, we show that scoring errors - in particular "bin oversplitting" (i.e. when variant sizes of the same AFLP marker are not considered as homologous) and "technical homoplasy" (i.e. when two AFLP markers that differ slightly in size are mistakenly considered as being homologous) - induce a loss of discriminatory power, decrease the robustness of results and, in extreme cases, introduce erroneous information in genetic structure analyses. In the present study, we evaluate several descriptive statistics that can be used to optimize the scoring of the AFLP analysis, and we describe a new statistic, the information content per bin (I_bin ) that represents a valuable estimator during the optimization process. This statistic can be computed at any stage of the AFLP analysis without requiring the inclusion of replicated samples. Finally, we show that downstream analyses are not equally sensitive to scoring errors. Indeed, although a reasonable amount of flexibility is allowed during the optimization of the scoring procedure without causing considerable changes in the detection of genetic structure patterns, notable discrepancies are observed when estimating genetic diversities from differently scored datasets. Conclusion Our algorithm appears to perform as well as a commercial program in automating AFLP scoring, at least in the context of population genetics or phylogeographic studies. To our knowledge, RawGeno is the only freely available public-domain software for fully automated AFLP scoring, from electropherogram files to user-defined working binary matrices. RawGeno was implemented in an R CRAN package (with an user-friendly GUI) and can be found at http://sourceforge.net/projects/rawgeno webcite "alessia.cariani"