Dear All, Many thanks for all of you who have replied to my post on Evoldir last week. The responses were very useful and as many requested me to forward them the replies, I have summarised them below. For privacy's sake I have left out people's names. Thank you once again, Serinde My original post: I am running outlier analyses on a set of 25 markers but find discrepancies between different methods, which I find hard to explain. The dataset is comprised of both putatively neutral microsatellites and candidate SNP loci in genes of interest for divergently selected populations over 6 generations (50 individuals per population). Fdist and Schlotterer's method are fairly consistent in the identification of outlier loci, which also correspond to patterns of allelic divergence and results obtained from exploratory analyses (AMOVA, correspondence analysis). The output from BayeScan however is very different: rather than indicating divergent selection at the loci identified by Fdist and Schlotterer's methods (and others) other loci are identified as under balancing selection. There therefore appears to be a bias towards balancing selection rather than diversifying selection at the expected loci, which doesn't make biological sense in the case of my data. Analysing microsatellites and SNPs independently does not radically change the findings. I would like to know if anyone has observed similar discrepancies between BayeScan and other outlier analyses or could think of ways how such a shift (identification of different loci under balancing selection rather than the same loci under divergent selection) could be explained. Any suggestions would be extremely helpful. Thank you very much in advance. The responses: ----- 1) The methods are really different, so you shouldn't be shocked if they yield different results. 2) The Beaumont & Nichols FDIST/Lositan test is very intuitive and understandable, so if it does confirm likely divergent selection at those candidate loci for which you already had biological knowledge, then that is a strong indication! 3) Consider that in most cases these tests are conducted on samples from wild populations, with large Ne and very little knowledge about them. You may have a lot of knowledge about your populations, and over 6 generations. Also if your populations are small and there is a strong genetic drift signal in your data maybe that affects the results of Bayescan. I remember some experiments on Drosophila that showed that when you have lab populations undergoing strong genetic drift, the signal of selection may not be so obvious. Maybe FDIST picks it up, and Bayescan doesn't. 4) Do you have 2 or more populations? If you have more, you could try to do tests per pairs. 5) You could also try to use multivariate ordination techniques (like PCA or FCA), separately for microsats and SNPs, and plot your individuals alongside the alleles, to see if there is any decisive pattern of association between certain alleles that you expect to be under selection and those individuals that had been exposed to said selective pressure (I suspect that if there is a real signal of selection, you should see some striking scattering). So, these were my two-cent - good luck with your research! ----- I saw you message on evoldir. Sorry I canīt be of much use, but Iīd be interested in the response you get so if you donīt post a summary on evoldir, perhaps you could forward the replies to me? One thought did occur. Perhaps your results are not as different from each other as they appear, it just seems that the assessment of divergent selection by BayeScan has shifted down for all loci. In that case outlier loci look like loci under ordinary drift only, and loci under drift look to be under stabilising selection. Perhaps a problem with your prior settings? Good luck in figuring out the difference and which program does the better job in your case ----- It is a general issue that BayeScan seems to be more conservative than alternative tests (see attached if you haven't already), but be aware of the major flaw in their discussion of the hierarchical model. I don't know how large your Fst values are, but for marine fish (approx. Fst=0.0001-0.04) we do also often get candidates for balancing selection, but they're most likely false positives, and as long as the neutral background Fst is relatively low, I don't think it's possible to reliably detect true balancing selection in these types of tests. Thus, I wouldn't worry too much about that result. The lack of outliers in BayeScan may simply be due to low power in your dataset, but I guess that if you couple these observations with time series plots of Fst for the other outliers that could reinforce the results (does it make sense). ----- How large is your Fst values over time? Does it make sense that the signal is simply too low for BayeScan to detect? A simulation trial could verify that (like the Narum & Hess 2011 paper). ----- I attach a paper that initiates some discussion for difference (Manel et al 2008, BMC Evolutionary Biology 2009, 9:288). But not so easy to explain. In other analysis, I also find very often difference. Probably due to software assumption about demographic structure. ----- We had a similar problem when applying BayesFst which uses the same underlying model than Bayescan (although with different outputs and explicit tests of positive selection on each locus) in the Eveno et al paper (2008. Mol. Biol. Evol 25: 417-437). We interpreted this as the fact that if mutation rates are really different between SSRs and SNPs (which is the case a priori but might be more or less pronounced depending on the species), the subdivided model used in Bayescan might be affected and this could potentially lead to problem of convergence and fitting of the different parameters. Did you test if your run had actually converged correctly? what does it mean when you say that using SNP only, this did not change the results radically? in this case , a low number of loci might be insufficient also for Bayescan which is fitted using the original data, while the Fdist2 uses independently simulated data, and then compares neutral envelop to the observed results. Be aware also that the fdist2 analysis might not be completely adequate for SNP data (see in Eveno et al) due to their low mutation rate and this is also visible in the Beaumont & Nichols paper in one of the figures when they test the robustness of their model (inflation of upper bound for low theta). ----- Check out this paper on outliers: Nunes et al. 2011, Molecular Ecology 20: 193-205 ----- I would be careful regarding any conclusions I would make from few markers, i.e. considering each of them separately or all together. There is a lot of variation in terms of divergence (e.g. outliers) even if the underlying demography is well know and informative (see attached paper; Orozco-ter Wengel et al. 2011. Mol Ecol 20:1108-1121). I would suggest to perform a bunch of simulations under a demographic scenario that broadly fits your data and then look at the distribution of simulated Fst values you get. The simulations I suggest should be performed with software like ms (Dick Hudson) and then converted into microsatellite data with ms2ms (Pidugu and Schloetterer) so they fit the stepwise mutation model. Finally calculate the Fst for your couple of thousands of simulated markers with MSA or any other such software to get an idea of the distribution of Fst under the null of your demography and plain neutrality. ----- I got completely different results from Bayescan and LOSITAN on the same set of microsatellites and populations. I am appending to my response the set of responses that I received on a similar query to evoldir. The bottom line? I think it may be the case that none of these approaches are very reliable, but Bayescan was certainly much more conservative, finding only one outlier (which wasn't one of the six that LOSITAN identified). What follows are the replies that I received to my evoldir query regarding apparently non-neutral microsatellite loci. For privacy's sake, I have dropped people's names. Thanks to all who provided responses. My original post: "Most analyses of population genetic variation and structure in populations assume the neutrality of the markers used. There are a number of software tools that test for selection on loci, with LOSITAN (which uses the It uses the FST-outlier method (Vitalis et al. 2001; Beaumont 2005) my preferred tool. In a data set of 31 microsatellite loci, as many of as six test as being significantly under either balancing or positive selection across my samples. I would just like to get a feel for whether the community at large, faced with such a scenario, would 1) advocate dropping those loci from the data set, 2) leaving them in or 3) presenting analyses both with and without those loci included. My question is asked from the standpoint of presentation in publication - I intend to analyze both the full and trimmed data set for my own interests in how non-neutral loci affect population genetic analyses." Responses: I would be cautious with only using the classical Fst method (for example LOSITAN) for detecting loci under selection. The classical methods for detection of loci under selections are based on simulating a FST-null distribution across all loci and from this detect loci that lie outside the credibility region, and therefore assumed to be under selection. These methods apply a simple demographic model such as coalescent-based approach assuming genetic drift to be the contributor to differentiation among populations; outliers are therefore taken as evidence of selection. Novel methods (such as BAYESCAN ver. 2.01 (Foll and Gaggiotti, 2008)) extent the classical approach to include dynamic processes such as gene flow are based on detection of LD among pairs of loci. The demographic models are more advanced and more realistically describing ecological scenarios, including migration among subpopulations. The degree of differentiation (FST) decomposed into a locusŽ-specific component (alpha), shared by all populations, and a population-specific component (beta), shared by all loci. Selection is assumed when alpha is necessary for explaining the observed pattern of diversity. For testing for loci under selection I have used both LOSITAN and BAYESCAN, the latter seemed to be more realistic, when the population structure and history was taken into consideration. I guess I'd say your results are probably evidence that the method used to identify "positive selection" is junk (like most such methods). Almost certainly false positives... I would look into how your loci are segregating your samples. You could do PCAs or correspondence analyses of the allele distributions among your samples and see how they segregate for each marker, then you can compare what your presumably neutral and what your presumably selected markers are doing. If segregation is the same among all markers (i.e. throughout the genome), then I would say you do not have selection, just some markers that are particularly good at picking the biological signal, and thus, in my opinion, all markers should be included into the estimation of differentiation. If you neutral markers all segregate your samples one way, but your outlier loci do it another (or several other ways), then you may have selection, and I would report your neutral differentiation, and the differentiation due to "selected" markers. It depends on the impact on the results and the amount of data you have. I definitely prefer to check it. Often the impact for the final conclusions is very limited and can be noted verbally. (Well, dendrograms are notoriously instable, but one does not need them anyway.) You might want to do some more realistic simulations to explore other confounding factors such as sampling from a spatially structured population, founding events and so on. I would suggest excluding these 6 loci since you will have many left. I would also check what each non neutral ones says. I would also check the repeat motive. Trinucleotides are more likely to be non neutral than dinucleotidic loci. I would also, if possible try to know where these loci come from in the genome (in a coding sequence etc...). I would strongly recommend to at least report the outlier results and subsequent results with and without them. I would definitely analyse data with and without. In my experience, this will likely change things. But it might only affect only some of the populations analysed (which is reasonable if any such locus is indeed affected by some local selective force specific to some area of your study). If the loci "potentially under selection" do change the picture quite a bit, you would need to check on BLAST if they are anywhere near a genomic region that has some functional implication. Please also note that you should consider also the fact that in certain conditions, when a population is quickly expanding its range at distribution margins, some rare alleles can suddenly increase in frequency as a result of these peripheral founding at the front wave of expansion (see recent papers by Excoffier & Co about the "allele surfing" hypothesis). Reviewers will almost certainly ask you to consider that. The idea of comparing the analyses with and without the selected loci seems interesting (just be aware of the smaller sample size, and thus lower power, in the "neutral" fraction compared to the whole list of loci). Also I have a feeling that outlier analyses such as LOSITAN are quite prone to giving false positive hits. serindevanwijk@gmail.com