Dear all, I recently posted this question: I'm working on population genetics of reef fishes; I've used microsatellite loci to detect population structure and I've found significant Fst value among different sampling locations. However, when running STRUCTURE I failed finding any partition among samples. Does anybody has a clue about why is this happening?? Answers are listed below: Answers 1) with polymorphic enough data, Fst tests are powerful to detect weak differentiation between a priori groups. STRUCTURE tries to find a structure in a dataset without a priori groups, which is much more challenging, and will often fail when the Fst between real groups is too low (e.g. Fst <0.03). 2) Your experience is not unique. For example, we recently published a paper in Molecular Ecology on our work on collared lizards. We found significant isolation by distance and an fst, but STRUCTURE gave us nothing. We then used a different Bayesian assignment program that also incorporates spatial information, BAPS, and got beautiful results. In this case, I had translocated these populations into an area in which they had gone extinct, and we have followed in detail their entire colonization and dispersal history for over 30 years. BAPS reconstructed this known history very accurately. The BAPS results are in our paper (Neuwald, J. L., and A. R. Templeton. 2013. Genetic restoration in the eastern collared lizard under prescribed woodland burning. Molecular Ecology 22:3666-3679). Right now I'm in Israel working on an endangered salamander. When we applied STRUCTURE to our data, we got just two divisions (the Galilee and Mt. Carmel, which are isolated from one another and show extreme genetic differentiation). However, when we applied BAPS, in addition to this major subdivision, it subdivided the Galilee into 10 subpopulations, all of which made excellent sense given the topography of the area and our previous studies on dispersal. Using these 10 subpopulations, we had highly significant results with fst and AMOVA - all completely invisible to STRUCTURE. I have also had experience with STRUCTURE in my work in human genetics, and have found I can get just about any result I want by playing with K, which is notoriously hard to estimate in a statistically meaningful fashion. I truly do not understand the popularity of STRUCTURE. I advise you to simply avoid its use, and go to other programs such as BAPS. A non-parametric alternative that has been used mostly in the human genetic literature is with the program Awclust (http://awclust.sourceforge.net/docs/index.html). 3)My guess is that you are looking at two different scales in your data (also I am not sure what parameters you used in structure) - if you find a local structure (it could be that your individuals are more related and then detect significant Fst between populations) whereas you have enough migration (in the sense of genetic exchange) between groups which lead structure to consider your population as panmictic. I would suggest that you have a look at: Gauffre B, Estoup A, Bretagnolle V, Cosson JF (2008) Spatial genetic structure of a small rodent in a heterogeneous landscape. Mol Ecol 17:4619-4629 and maybe who cites this paper. 4)Personally I only use Bayesian clustering when I am desperate, e.g. I suspect strong FIS to come from Wahlund effects but have no clue to find the origin of it. The assumptions of panmixia and Linkage equilibrium (the last being impossible to reach in real populations), and also because I really do not understand what these kind of softwares really do, are constraints that make me quite reluctant. I prefer using old stuffs that are directly connected to demography in a way I can understand. In your case, you might have a continuous (or nearly so) increase of differentiation with some factor(s), the most obvious being geographic distance. You might also have multi-hierarchical levels. All factors that might prevent STRUCTURE finding anything. Try to study isolation by distance and, if it works, you will get much more information than STRUCTURE will ever give you. A good thing also is checking that all your loci behave the same (for both FST and FIS). If one or two loci display unusual behaviour as compared to all others, this might represent the signature of some technical or non-neutral factors that may also disturb STRUCTURE functioning. 5)Fst can become significant also for very small value if the sample size turns larger. In fact, and that is true for most statistical applications, if you increase your sample size enough you get significant results eventually - even though they will be biologically irrelevant. It is difficult to evaluate your question without having seen your structure results, or knowing your runtime settings. Maybe you have performed Structure in a wrong way, e.g., you may have run it too short? Can you ask experienced Structure users around? Or are you experienced yourself? I dont' know. Structure is also known to be not sooooo good in picking up subtle population division. Check out the software DAPC, maybe it helps more (download here: https://dl.dropboxusercontent.com/u/40499866/Jombart-T._Discriminant-analysis-of-principal-components-A-new-method-for-the-analysis-of-genetically-structured-populations_2010.pdf). You can also investigate the hypothesis of panmixia by migrate-n. Download one of my papers where population structure was an issue and confusing here: https://dl.dropboxusercontent.com/u/40499866/kraus-Global%20lack%20of%20flyw ay%20structure%20in%20a%20cosmopolitan%20bird%20revealed%20by%20a%20genome%2 0wide%20survey%20of%20single%20nucleotide%20polymorphisms.pdf. 6) Your result may not be that unexpected. When you calculate Fst you supply much more information than structure had - the population designations of each individual. Try running the version of structure where you provide training samples for each population. If you use - say - half of your data as training, you may find the rest fall neatly into their 7) Because of how p-values for FST are usually calculated it is possible to get a 'significant' FST when in reality there is little or no population structure. You should interpret to the FST value itself, rather than the p-value. Another option if you have info about sampling locations is to check the option in STRUCTURE that uses this as a prior. Doing this will pick up more subtle structure within your sample. 8) If by 'significant' you mean you get p-values below say 0.05, this doesn't mean there is real structure. P-value testing coupled with Fst like measures is notorious for type I errors (see here ). Additionally, I have read somewhere that structure can't detect differentiation where Fst < 0.01, however this figure may not always be valid when using microsatellites since they often misbehave when coupled with Fst (see below). I would recommend you first check to make sure your microsatellite loci are suitable for use with Fst, and don't suffer from the well know problem of negative bias as a result of high diversity (see here ). Following this, testing the significance of genetic differentiation is much more appropriately done using a bootstrapping method whereby 95% confidence intervals can be used. If you want any more information about how you can do such analyses, I and some colleagues have an R package, diveRsity (and associated web app http://glimmer.rstudio.com/kkeenan/diveRsity-online/) which will allow you to calculate Fst, Gst, G'st and Jost's D, compare the relationship of each statistic and calculate 95% confidence intervals for each. 9) It's somewhat counterintuitive, but Fst can be more sensitive at detecting differentiation than STRUCTURE is. One might imagine that a genotypic approach, capturing recent information from an array of markers, would have more power for detecting differentiation than allele-frequency based approaches. But that may not be true. You can test this with simulations for your situation, as Katherine Harrisson did using EASYPOP in Harrisson KA, Pavlova A, Amos JN, Takeuchi N, Lill A, Radford JQ, Sunnucks P. (2012) Fine-scale effects of habitat loss and fragmentation despite large-scale gene flow for some regionally declining woodland bird species. Landscape Ecology, 27, 813-827. 10) This type of result is not unexpected. When you calculate Fst, you provide substantially more information than you provide to Structure: the population from which each observation came. Lacking this information, Structure has to integrate over all possible population allocations of individuals to the specified number of populations - with corresponding uncertainty about allele frequencies in each population. You can, however, use Structure differently to include some population allocation information. For example you can find settings in Structure which use a subset of the data, with their population allocations, as training dataset, and then classify the other individual according to the proportion of their genome from each population. If you do this, I anticipate Structure will allocate a large proportion of the test data to the appropriate population. 11) This is common, which is why you need to use a number of methods to investigate population structure: population trees, PCA plots FST etc. Sturcture is not good when there is isolation by distance, so this could be an issue. Finally have heard that if there are lots of unique alleles in each population, this can obscure the structure. One thing to try is use long rungs (burn in) etc. and use a locprior model. 12) Bayesian clustering algorithms are better able to partition samples when FST values are high and when genetic differentiation among populations is strong. As genetic differentiation among populations gets weaker, Bayesian clustering algorithms have less variance to work with, and are less able to correctly identify population structure. I did a simulation paper in which I evaluated 3 common (non-spatial) Bayesian clustering algorithms (STRUCTURE, BAPS, PARTITION) to determine their relative utility for detecting population structure as the level of differentiation decreased. I have attached it here, though others have noted similar phenomena. Clustering algorithms that take spatial data into account will likely be better at detecting structure at lower levels of differentiation, but may not be appropriate for your study system. I hope you find this information helpful, and good luck in your research. 13)I had a similar problem recently with a dataset I was working on and the STRUCTURE manual mentions this is a common phenomenon. Have you tried running it with the LOCPRIOR model selected? It basically takes into account your own "populations" of where you collected each individual to assist the algorithm in finding structure in the data. There is a section in the manual on the LOCPRIOR model and it is pretty straight forward. It did in fact improve the results of my analysis. Let me know if you have any other questions. 14) As you probably know the STRUCTURE software might not find structure if this weak. Even if the Fst is significant if its value is low the signal might be not strong enough to be detected by STRUCTURE. Have you tried use sampling locations as prior information? As you can check in the manual, this might help the clustering when the signal is relatively weak without leading to spurious results. 15) We've had similar things happen to us with some of our data sets (see results of analysis of wingless alleles in attached paper). When you calculate Fst values using Genepop or other similar programs, you are assigning individuals to populations a priori (without reference to the data). Structure assigns individuals to populations on the basis of the data itself. This is a very useful attribute of Structure, but it comes with a cost: a loss of statistical power for detecting differences among populations when they are only moderately differentiated from each other. This loss of statistical power is particularly evident when the sample sizes for some of the populations being considered are small. 16) the lack of partitions in your samples could be due, in my view to a Isolation by distance pattern. Could it be the case? In addition, the significance in FST values does not imply the existence of genetic structure (e.g the FST could be very low, even if is significant). The first thing I would do is to perform a MDS and a PCA (probably more than two components) to explore how the samples are in a plot. Secondly, you could try to thest Isolation by distance with a mantel test and then you could use the DAPC (Discriminant Analysis of Principal components) or the SPCA (spatial principal components analysis) implemented in the R package "adegenet". 17) I think it depends on how many K you are looking at and the parameters you set such as burn-in and the number of iterations. Good practice is pretty computer-intensive. The number of K's tested should be equal to K+1 groups. Burn-in at 10,000 is sufficient but 100,000 is best. At least 100,000 replicates but 1 million is best. 10-20 iterations per K is also suggested. Also, when you average across your iterations, be sure to use CLUMPP to find the best run so that you avoid issues like "label switching". I would also suggest you email, Vikram Chhatre, who has written a program on automating STRUCTURE analysis. His webpage is: http://www.crypticlineage.net/index.html Also there is a Google Discussion Group for STRUCTURE, which may be of some help. https://groups.google.com/forum/#!forum/structure-software I suggest you read Gilbert et al. (2008) in Molecular Ecology. Publication title: Recommendations for utilizing and reporting population genetic analyses: the reproducibility of genetic clustering using the program STRUCTURE. Jessy Castellanos Gell Genética para la Conservación Centro de Investigaciones Marinas Calle 16 No.114 entre 1ra. y 3ra. Miramar, Playa, Ciudad de la Habana CP 10300. CUBA. Tel.(537)203 06 17 jessy@cim.uh.cu jessy@fbio.uh.cu