Bottom

Recombination of humans has a profound impact on virus evolution, but characterizing recombination patterns in molecular sequences remains a challenge. Despite its importance in molecular evolutionary studies, the identification of sequences exhibiting such patterns has received comparatively less attention in the context of recombination detection.

Here, we extend a recombination detection method based on quartet mapping to allow the identification of recombinant sequences without prior specification of query and reference sequences. Through simulations, we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences.

Methods

We describe the developments of the method in several subsections. The procedure we present here is an extension of a previously developed visual recombination detection (VisRD) method. We propose a screening method that assesses whether all combinations of four sequences (quartets) in a sequence alignment jointly provide evidence of recombination. When this is the case, we classify taxa or groups of taxa based on their contribution to this recombination signal. The method can be used to sequentially prune putative recombinants until no significant evidence of recombination can be found in the alignment.

We start by briefly explaining the previously developed visual recombination detection method based on quartet scanning. Following the description of a new quartet mapping approach based on a distance-based method, we present measures for phylogenetic inhomogeneity of quartets, classification measures for taxa and taxa groups, and an overall test statistic for the recombination We then describe how null distributions can be obtained for this test statistic. In addition to the VisRD method, we briefly explain alternative triplet approaches used in our comparisons. We conclude the methods section by providing details on the simulated and empirical data used in this study.

Results

Analysis of phylogenetic simulations reveals that identification of descendants of relatively old recombination events is a challenging task for all available methods, and that quartet exploration performs relatively well compared to triplet-based methods. The use of quartet scanning is further demonstrated by analysis of well-established and putative recombinant strains of HIV-1.

Consistent with recent findings, we provide evidence that the putative circulating recombinant CRF02_AG is a ‘pure’ lineage, while the putative G subtype of the parental lineage has a recombinant origin. We also demonstrate HIV-1 intrasubtype recombination, confirm the hybrid origin of SIV in chimpanzees, and further unravel the recombinant history of SIV lineages in a primate immunodeficiency virus dataset.

Conclusion

Quartet scanning is a valuable addition to triplet-based methods for identifying recombinant sequences without prior specification of query and reference sequences. The new method is available in the VisRD v.3.0 package