Dear Evoldir members, Below are the responses I received to my recent question regarding how best to deal with disconnected parsimony networks when performing NCA with relatively highly divergent intraspecific mtDNA sequence data (up to 12% uncorrected p). Thanks to everyone who offered advice... your comments were very helpful & informative. cheers, Ryan 1. Your data seems quite divergent. I am not sure what is your problem with the constrained MP approach, or with the reduced confidence setting approach but at worst you could claim that you have nine clades at the last nesting level. David Posada 2. This is a common "problem" with accompanying misconceptions. I would first ask if the different networks represent groups of allopatric haplotypes? If so, any further efforts to join them would seem rather pointless to me. Moreover, it should be understood that the network NCA approach is really only meant for closely related lineages, and the inference key for "almost" exclusively intraspecific interactions. Of course, every gene has a different mutation rate, and we rarely have a good idea what it is, so each network constructed with a different gene reflects a different level of temporal resolution, a point which I think most people understand, but few comment on, or take into consideration when drawing inferences, or comparing data sets. If your data set is really about one species, with some level of interaction among many of the populations in the recent past (e.g. the last few 100,000 years or so), then it would seem that the COI in this species is mutating too fast to provide the best data set for NCA across the species range. By trying to remove 3rd positions, etc., you are reducing the genetic resolution (in a temporal sense) and that should help, but if it does not, and/or results in more and more reticulate branches than it would seem to me that you simply have too much homoplasy. If these different networks represent allopatric populations that have been isolated for long periods of time, then there seems little point in wanting to "connect" them, to support inferences other than "fragmentation", which can be simply concluded, in any case (and perhaps the notion that you have different species). I see no problem in conducting the analysis on your larger networks independently (the ones with 40 and 10), and simply inferring long-term fragmentation for the other smaller networks, which I presume are located on peripheral edges of the distribution and/or in specific refugia or areas that have undergone bottlenecks accompanied with isolation. Lastly, if these different networks do not represent allopatric groups of haplotypes, it is more complicated, and you need to first establish through other genetic marker systems if you are indeed looking at one or more species before considering other approaches to the problem steven.weiss 3. In the new version of TCS (1.18), you can fix the connection limit to a defined number of steps, instead of using the 95 or 90% connection limit. So you can just increase this number of steps, until all your haplotypes are connected in a single network. Best wishes, Patrick 4. I think you might need to just set TCS's connection limit (not a confidence limit) down much lower. My understanding of that is that if you have 12% difference, then to get them into one clade, you need to go down to below that, e.g. below 88% connection limit. You might need to go lower, though. From what you're saying, I'm not entirely certain (my ignorance here) whether 12% is the maximum distance between two animals. It's that max you should use. Also be sure that the sequences are all aligned exactly. Good luck! Ruth 5. Could you constrain the groups found in Paup and get that to recover the best phylogeny of the disconnected groups? If I was forced to analyse the data that way I would do it. However, realize that you have some deep diversity issues which probably violate assumptions of the NCA and/or some other process like cryptic speciation might be in play. Joe 6. I have never encountered your problem.....my major problem with networks have always been intragenomic variation and homoplasy. I attach you a paper that might be useful. But in your case I think I would approach it this way: I would construct the network in the classic way, i.e. 95% confidence. Because you don't want to lose power. If the result is whatever number of disconnected networks, I think you have to deal with this by discussing it in your paper: what could explain this pattern? You should definitely calculate what is/are the nearest connection/s (less mutation steps) between your disconnected networks and haplotypes, and then join them together in the shortest way in order to get the whole picture. You will probably get loops that you also will have to deal with them....but that might be easy as there are some rules to resolve loops. Always state what is the maximum number of mutations between two connected haplotypes under the 95% confidence. You can probably present both networks, the first one, the 95% with the disconnected networks and haplotypes, and the second one the result of joining everything in the shortest way. I think its a mistake trying to change your results, as what you get is what you have. I believe you have to deal with it and try to get the maximum of it by discussing different possibilities and approaches. Hopefully there will be someone out there with the same problem and will be more helpful ;) I'll be grateful if you can send me or send to the list the feedback you get. Best luck! Sandra 7. I've dealt with the same issues so here's my thoughts on your problem. I think the biggest problem in trying to link together very genetically divergent haplotype networks is the rooting designation. If one of them is paraphyletic with respect to the other, then maybe you can say something about the internal-tip relationship between the two, but if they represent reciprocally monophyletic gene lineages, then they are both tip clades and under that scenario there is nothing that NCA can tell you because it requires an internal-tip comparison. NCA can be very sensitive to the internal-tip designation of clades. I've worked with similar situations at higher nesting levels and the resulting inference can differ dramatically depending on which clade is designated internal and which is designated a tip. Inferences can flip between allopatric fragmentation and range expansion, so you've got to have a lot of confidence in the internal-tip relationships to be able to use NCA in these situations. I guess the other thing to consider is that if your haplotype clades are that divergent (up to 12%) then you've got to wonder what amount of historical population-level information still exists for the causal factors that separated the most divergent haplotype clades. Dave Weisrock 8. What is your motivation for trying to connect networks at this level of divergence? If my math is correct this equals approximately 69 mutational steps suggesting that some of your networks may be separated to nearly this degree. It is very unlikely that you will be able to confidently connect networks at this level. There are likely many equally parsimonious connections, perhaps resulting in numerous loops. You may be able to determine that certain higher-level nested groups are more closely related to other nested groups and potentially make a connection at this level, but determining interior-tip status may be difficult. I would also guess that if networks are separated by a comparable number of mutational steps (considering a fragment of ~600bp), NCA would infer fragmentation or inconclusive result (e.g., IBD or Frag. if sampling is insufficient sampling). One further issue is sampling. I am not an Entomologist, but it seems that springtails would not disperse too far. In which case, your sampling density may still be too coarse (but of course, I don't know your study organisms). Depending on your question, it may not be entirely necessary to try to connect all networks. One solution is to conduct an NCPA on each network independently to test specific hypotheses, while using other information to infer processes between networks. Hope this is helpful. I would appreciate if you could either send me a copy of the responses you receive or post them on the Evol. Directory. Matthew E. Gifford Ryan Garrick Department of Genetics Biological Sciences Building 1 La Trobe University Bundoora, VIC 3086 AUSTRALIA E-mail: r.garrick@latrobe.edu.au