Thank you for all the people that responded to my inquiry. Down there
are the answers. Of all the answers, I choose GenAlEx6, (thanks to Peter
Smouse) and already tried it. It is a module for Excel, fit to both Mac
and PC, easy and intuitive to use. Based on my three-days experience, I
can recommend it for such analyses, and perhaps for many more.
Yuval
Here is the original question:
>Dear EvolDir members:
>I have genotypes data of >50 microsatellite loci for ~400 samples of
three Helianthus species, one is a homoploid hybrid species, and the
other two are parental. I want to plot the individuals based on their
genotypes, in order to test the relationships between the species. I am
not interested in the phylogeny, just in the relative distances between
individuals in multi-dimensional scale (k=no. of loci). The goal is to
examine the relative position (in k dimensions) of the hybrid species
relative to its parents. The problem with microsatellite data is that
it is di-allelic data, i.e., for each character (=locus) I have two
states for each individual, which could be identical (homozygote) or
different (heterozygote), and are not independent. I assume (for now)
that each locus is independent of the other loci (non-realistic, but it
is corrected in a different analysis).
>Does anyone aware of a method to - and better: software - that does
things like that? Freewares are favorable, of course.
>
ANSWERS:
Data modifications advices -
You could calculate a genetic distance called DPS, the proportion
of alleles shared between individuals. This is an individual-based
genetic distance suitable for codominant markers. The computer
program MSA (Microsatellite Analyzer) by Dieringer & Schlotterer
can do this. The software was published in Molecular Ecology
Notes. After you got the individual-based distance matrix you could
resolve this with one of many multivariate techniques. E.g. you
could use MSD (multidimensional scaling) or PCA (principial
component analysis). Both methods can be carried out with
general stats software packages, e.g. SPSS or JMP.
(Rodney Dyer:)
I have software on my server (Multivariate Genotypes) at:
http://dyerlab.bio.vcu.edu/wiki/index.php/Software
that takes diploid multilocus data and turns it into multivariatly
normal data. You can look at the 2GenerV paper (pdf #10 on my
publications page) to get an overview of how this works, or I would be
happy to discuss it with you directly if you like.
Look at this page:
http://dyerlab.bio.vcu.edu/wiki/index.php/Software#Multivariate_Genotypes
there you'll have your data ready for PCA, CDA and related
you may use the PCO, like a PCA but on genetic distances, GenAlex
software performs this kind of method, but also the package ecodist of
R software.
Freeware -
(Peter Smouse:)
Dear Registered User of GenAlEx
We are pleased to advise that the official release of GenAlEx 6 is now
available: http://www.anu.edu.au/BoZo/GenAlEx/
This version includes all of the features listed in Peakall and Smouse
(2006) :
Peakall, R., Smouse, P.E., 2006. GENALEX 6: genetic analysis in Excel.
Population genetic software for teaching and research. Molecular Ecology
Notes 6, 288-295.
http://www.blackwell-synergy.com/doi/abs/10.1111/j.1471-8286.2005.01155.x
Molecular Ecology Notes http://www.blackwell-synergy.com/loi/men
Please note that the t he current program and documentation files
supersede all previous versions. Therefore, it is strongly recommended
that you update your program and documentation. Further information
about updating GenAlEx is provided in the 'Read me' file when you
download the new documentation.
We thank the many users of our beta releases of GenAlEx 6 for their
positive and supportive feedback and bug reports.
Enjoy!
Rod Peakall and Peter Smouse
April 7, 2006
There’s a program called GenAlEx which is a Microsoft Excel add-in which
will create PCO plots for both dominant and codominant data. I'm not
sure of the exact details (like the non-independence you mentioned) but
you could look into it. Its free at:
http://www.anu.edu.au/BoZo/GenAlEx/
It was pretty easy to use as well which is always helpful with these things!
GenAlEx (http://www.anu.edu.au/BoZo/GenAlEx/) can do a PCA based on inter-individual
genetic distances, and it's free.
This is just a very simple way of doing it :-)
Make a table with one column for each allele (with 50 loci and an average
of 10 alleles at each loci would make a table with about 500 columns) and
for each individual (each row) you put in 0 and 1 for absence/prescence
of that allele. This large 0/1 matrix could be analyzed by several
multivariate methods, like principal coordinate analysis (PCoA) or
principal component analysis (PCA). Both of these methods can be easily
used in the freeware PAST (http://folk.uio.no/ohammer/past/). At least,
the score plots would give you some ideas of the distance between
the different species. If your major aim is to have measures of the
distances, you may use the given coordinates for the components you
chose to focus on.
I have used the following sotwares, with good results in plotting and
identifying hybrid individuals:
1) Genetix, wich performs Factorial Correspondece Analysis:
http://www.genetix.univ-montp2.fr/genetix/genetix.htm
2) PCAGEN: http://www2.unil.ch/popgen/softwares/pcagen.htm
I have done this for microsats using a number of techniques.
1)Bayesian assignment software such as "Structure" by Pritchard et. al or a
very new one "structurama" by Huelsenbeck are well suited for this question and
freely available.
2) Non metric multidimensional scaling is another approach. This method is
analogous to PCA except the number of axes you select determines the placement
of each individual in multivariate space.
First I calculate a pairwise distance matrix. Then I input the matrix into a Non
Metric Multidimensional Scaling program. I use NTSYS for this. An iterative
process is implemented in order to minimize the "stress", which, as I
understand it, is a goodness of fit. Models with low stress are better.
The one major problem with NTSYS is that it can only handle a limited number of
individuals. I know there are a number of programs out there for NMMDS, and I
am pretty sure SAS has such a module. So all you have to do is generate a
genetic distance matrix then you should be able to use that file in whichever
program you end up using.
PCAgen (Goudet - http://www.unil.ch/dee/page6767_en.html#3) with 2D
interface, 2 digit coding of msat,and statistical test of significance
of the components (broken-stick)
Genetix (Belkhir - www.univ-montp2.fr/~*genetix*/*genetix*.htm ) with 3D
interface, quite good one, easy import data from Fstat, genepop or text,
includes other stats on All. freqs., in French (but still manageable).
$$$$$ware -
It sounds like you might want to try principal coordinates analysis. It
could be done using the software NTSYS and the modules: SIMGEND, DCENTER,
EIGEN, MOD3D, in that order. A number of distance coefficients might be
used, but offhand I'd recommend "BAND", which is a simple band sharing
coefficient that is applicable to codominant data like microsatellites.
The output would be a 2, 3, or k-dimensional plot (at your choosing) with
clusters (or perhaps not, depending on the structure of the data)
corresponding to the two parental species and the homoploid hybrid
derivative.
NTSys does various multidimensional scalings based on genetic distances
created in eg. the freeware microsat or others. NTSys itself is not a
freeware itself but can be purchased for a favourable price if you are
an academic.
