Literature DB >> 35035904

Classification of small ruminant lentivirus subtype A2, subgroups 1 and 2 based on whole genome comparisons and complex recombination patterns.

Aaron M Dickey¹, Timothy P L Smith¹, Michael L Clawson¹, Michael P Heaton¹, Aspen M Workman¹.

Abstract

Background: Small ruminant lentiviruses (SRLVs) cause a multisystemic chronic wasting disease in sheep across much of the world. SRLV subtype A2 is prevalent in North America and further classified into multiple subgroups based on variation in the group antigens gene (gag) and envelope (env) genes. In sheep, the ovine transmembrane protein 154 (TMEM154) gene is associated with SRLV susceptibility. Ewes with at least one copy of TMEM154 encoding a full-length protein with glutamate at position 35 (E35; haplotypes 2 and 3), are highly susceptible to SRLV infection while ewes with any combination of TMEM154 haplotypes which encodes lysine (K35; haplotype 1), or truncated proteins (haplotypes 4 and 6) are several times less so. A2 subgroups 1 and 2 are associated with host TMEM154 genotypes; subgroup 1 with the K35/K35 genotype and subgroup 2 with the E35/E35 genotype.
Methods: Sequence variation within and among full-length assemblies of SRLV subtype A2 subgroups 1 and 2 was analyzed to identify genome-scale recombination patterns and subgroup-specific variants.
Results: Consensus viral genomes were assembled from 23 infected sheep, including animals of assorted TMEM154 genotypes comprised of haplotypes 1, 2, or 3. Viral genome analysis identified viral subgroups 1 and 2 among the samples, and revealed additional sub-structure within subgroup 2 based on models predicting complex patterns of recombination between the two subgroups in several genomes. Animals with evidence of dual subgroup infection also possessed the most diverse quasi-species and the most highly recombined consensus genomes. After accounting for recombination, 413 subgroup diagnostic single nucleotide polymorphisms (SNPs) were identified. Conclusions: The viral subgroup framework developed to classify SRLV consensus genomes along a continuum of recombination suggests that animals with the TMEM154 E35/K35 genotype may represent a reservoir for producing viral genomes representing recombination between A2 subgroups 1 and 2. Copyright:

Entities: Chemical

Keywords: Small ruminant lentivirus; ovine progressive pneumonia virus; quasispecies; recombination

Mesh：

Year: 2020 PMID： 35035904 PMCID： PMC8749911 DOI： 10.12688/f1000research.27898.2

Source DB: PubMed Journal: F1000Res ISSN： 2046-1402

Introduction

Small ruminant lentiviruses (SRLV) are a genetically diverse group of lentiviruses belonging to the family Retroviridae. SRLV infect sheep, goats, and wild ruminants worldwide causing a lifelong persistent infection clinically characterized by wasting, interstitial pneumonia with labored breathing, indurative mastitis, arthritis, and/or more rarely, encephalitis ( Blacklaws, 2012). Disease progression is typically slow, and the genetic background of both the host and virus influence the clinical course and outcome of infection ( Heaton ; Sider ). A recent meta-analysis of published data estimates global flock and individual SRLV prevalence from 2011–2020 to be 31% and 11% respectively ( de Miguel ). The SRLV genome consists of two identical positive-sense single-stranded RNA subunits approximately 9 kb in length. The viral genome, which is integrated into host cells in the form of a provirus, contains three structural genes ( gag, pol, and env) and three regulatory genes ( tat, vif, and rev) flanked by non-coding long terminal repeat regions (LTRs). Phylogenetic analysis based on partial group antigens gene ( gag) and polymerase ( pol) gene sequences divides these viruses into five major genotype groups, A-E, which are further divided into different subtypes ( Shah ). The groups differ by 25–37% and the subtypes differ by 15–27% in nucleotide composition at these genomic loci ( Ramírez ). Genotypes A and B are distributed worldwide, while genotypes C-E are geographically restricted ( Gjerset ; Grego ; Shah ). The comprehensive set of haplotypes from the same viral species in a single host is known as a quasispecies ( Eigen ). The viral quasispecies arises from the interplay of three evolutionary factors throughout the duration of a chronic infection. These three factors are mutation, recombination, and selection. Point mutations and small indels are introduced into the SRLV viral genome due to its low fidelity, error prone reverse transcriptase enzyme. Selection can be driven by the host immune system and antiviral medications. Work with related lentiviruses has revealed selection pressure on mutants can produce variants diverging by up to 5% from the founder strain in a few years, though this rate does not remain constant over time ( Lee ; Shankarappa ). Co-infection (simultaneous infection) or superinfection (sequential infection) can lead to more dramatic genetic changes through recombination, but these two types of dual infection are difficult to distinguish in the absence of a detailed infection history. Regardless of the timing, when diverse viral subtypes infect the same host cell, the reverse transcriptase readily switches between viral genomes, with estimates of three to nine recombination events per replication cycle ( Jetzt ). This process allows rapid emergence of new viral strains that may exhibit novel phenotypes (reviewed in Ramírez ). In vivo recombination has been documented between genotypes A and B and among genotype A and B subtypes ( Andrésdóttir, 2003; Fras ; Pisoni ; Ramírez ; Sider ). However, recombination in SRLVs has not previously been characterized at the whole-genome level. The virus-host interaction is a continuous co-evolutionary process. In sheep, genetic variation in the host transmembrane protein 154 ( TMEM154) gene associates with SRLV susceptibility ( Heaton ; Leymaster ; Leymaster ; Molaee ; Molaee ; Yaman ). Twelve TMEM154 haplotypes have been identified ( Heaton ), and ewes homozygous for haplotype 1, which encodes a lysine at position 35 (K35), had a decreased risk of SRLV infection compared to those with one or two copies of haplotype 2 or 3 (ancestral), both of which encode a glutamate at position 35 (E35). Two distinct SLRV A2 subgroups have been identified that infected sheep in association with their TMEM154 E35K genotypes and specific diplotypes ( Sider ). SLRV A2 subgroup 1 viruses were significantly more likely to infect sheep with either TMEM154 diplotypes 1,1 or 1,4. TMEM154 haplotype 4 contains a rare frameshift mutation (A4Δ53) and does not produce mRNA encoding functional amino acids downstream of amino acid position 4 of the precursor protein. Consequently, subgroup 1 associated with hemizygous or homozygous K35 genotypes. SLRV A2 subgroup 2 viruses were more likely to infect sheep with one or two copies of either haplotype 2 or 3, and that could also have one copy of haplotype 4 ( Sider ). Consequently, subgroup 2 viruses associated with hemizygous or homozygous E35 genotypes. While it has been proposed that TMEM154 E35K hemizygosity or homozygosity could be a factor in SRLV A2 subgroup associations, the biology remains obscured. SRLV A2 subgroups 1 and 2, and their associations with TMEM154 E35K are not well understood. The subgroups were previously defined by sequence variation in a partial region of the gag and the transmembrane region of the envelope gene ( env), which were not thought to have critical roles in potential SRLV TMEM154 interactions ( Sider ). Due to relatively short sequenced regions of SRLV A2 subgroups 1 and 2 genomes and because of extensive recombination detected within the sequences, the cutoff between groups 1 and 2 was not clear ( Sider ). In this study, we obtained full-length consensus SRLV genomes from a cross-section of sheep belonging to the flock which was part of the original TMEM154 association studies ( Heaton ; Heaton ; Leymaster ; Leymaster ; Sider ). These sheep had all been genotyped as containing haplotypes 1, 2, or 3. Ovine TMEM154 haplotypes 4-12 were not represented in this study. The goals of this study were to 1) obtain full-length consensus genomes for members of SLRV A2 subgroups 1 and 2; 2) identify subgroup 1 and 2 specific variants, while accounting for recombination, and use these variants to estimate levels of intra-host sequence variation; 3) search for subgroup-specific functional viral variants relative to host TMEM154 E35K genotypes.

Methods

Ethics statement

All animal procedures were reviewed and approved by the United States Department of Agriculture (USDA), Agricultural Research Service (ARS), U.S. Meat Animal Research Center (USMARC) Animal Care and Use Committee prior to their implementation (Experiment Number 96). The source flock’s history of disease surveillance is also relevant when requesting reference samples described in any report. Since first stocking sheep in 1966, USMAC has not had a known case of scrapie. Until 2002, surveillance consisted of monitoring sheep for possible signs of scrapie and submitting brain samples to the USDA Animal and Plant Health Inspection Service (APHIS) National Veterinary Services Laboratory in Ames, IA for testing. All tests have been negative. Since April 2002, USMARC has voluntarily participated in the APHIS Scrapie Flock Certification Program, is in compliance with the National Scrapie Eradication Program, and is certified as scrapie-free. With regards to other transmissible diseases, it is recognized that the USMARC flock of 2000 to 4000 breeding ewes is located in a bluetongue medium incidence area and is known to have some prevalence of contagious ecthyma (sore mouth), foot rot, paratuberculosis (Johne’s disease), ovine progressive pneumonia (visna-maedi), and pseudotuberculosis caseous lymphadenitis.

Study population

Lungs from 22 sheep at the US Meat Animal Research Center in Nebraska that were a part of the original study for association of A2 subgroups 1 and 2 with TMEM154 haplotypes ( Heaton ; Sider ) were used in this study ( Table 1). These sheep were all genotyped as containing haplotypes 1, 2, or 3. SRLV seropositive sheep were originally diagnosed with clinical ovine progressive pneumonia (OPP) by gross morphology and histopathology of both lung and mediastinal lymph node. In addition, colostrum from one seropositive ewe (201373037) showing no clinical signs of disease was used in this study ( Table 1).

Table 1.

Sheep and virus information for 23 SRLV A2 strains.

Animal ID / Viral Strain	GenBank Accession Number	Breed	TMEM154 Diplotype	Mean Genome Coverage ± Standard Deviation	Submitted Genome Length	Subgroup ^**
200303038	MT993897	MARCIII	1,2	489.0±426.1	9192	1
200303332	MT993898	MARCIII	1,1	52.3±20.3 ^*	9171	1
200303013	MT993896 ^***	MARCIII	1,1	920.0±373.0	9206	1
200103515	MT993899	MARC III	1,1	71.3±19.4 ^*	9166	1
200050064	MT993900	ROMANOV- DORSET-SU	1,2	482.8±185.7	9202	r/d
200323455	MT993901	MARCIII	1,1	1287.9±345.8	9194	1
200103342	MT993902	MARCIII	1,1	351.6±124.9	9207	1
199835918	MT993903	RAMBOUILLET	1,2	1344.8±440.8	9192	1
200023230	MT993904	MARCIII	1,1	1636.7±1223.3	9193	1
201373037	MT993905	KATHADIN X RAMBOUILLET	1,1	76.3±31.0 ^*	9166	1
200117502	MT993906	RAMBOUILLET	1,3	395.8±252.6	9185	r/d
200216049	MT993907	FINN	1,3	540.2±280.9	9200	r/d
200212120	MT993908	POLLED DORSET	1,2	927.8±611.7	9202	r/d
200312013	MT993909	POLLED DORSET	2,2	651.0±410.0	9191	r/d
200312088	MT993910	POLLED DORSET	1,2	279.3±195.5	9199	r/d
199906011	MT993911 ^***	TEXEL	2,2	692.7±417.4	9189	2a
200106932	MT993912	TEXEL	2,3	1230.6±1086.1	9191	2a
200106929	MT993913	TEXEL	2,2	580.3±521.8	9201	2a
200016283	MT993914	FINN	1,2	2014.5±710.3	9206	2b
200335185	MT993915	RAMBOUILLET	1,2	171.8±74.7	9203	2b
200177363	MT993916	DORSET X ROMANOV	1,3	1801.5±1564.7	9215	2b
199916128	MT993917	FINN	1,1	227.6±67.3 ^*	9172	2b
199916193	MT993918	FINN	1,1	2661.1±1329.2	9204	r/d

*No long reads: coverage includes only short reads, 10 to 26 nucleotides not fully resolved on 5' ends.

**The population to which a consensus genome was assigned in the subgroup 1 vs 2a vs 2b recombination model ( Figure 3A). r/d: recombinant/dual infections.

***The short read + long read consensuses reported here are identical (200303013) and slightly different (199906011) from the long-read consensuses reported by Workman (KY358787 and KY358788 respectively). See results.

*No long reads: coverage includes only short reads, 10 to 26 nucleotides not fully resolved on 5' ends. **The population to which a consensus genome was assigned in the subgroup 1 vs 2a vs 2b recombination model ( Figure 3A). r/d: recombinant/dual infections.

Figure 3.

Twenty-three SRLV subtype A2 genomes ‘painted’ with recipient genomic ‘chunks’ derived from subgroup-specific donor genomes.

The recombination models utilized were ( A) subgroup1 vs subgroup 2a vs subgroup 2b and ( B) subgroup 1 vs subgroup 2b. The black boxes highlight large subgroup 1 recombination blocks in subgroup 2a genomes.

Generation of full-length SRLV genomes

Lung samples were homogenized using a gentleMACS dissociator (Miltenyl Biotec; San Diego, CA) in minimal essential medium (Gibco, Thermo Fisher Scientific, Waltham, MA). Homogenates were then subjected to two freeze/thaw cycles to further ensure cell lysis. Homogenates were clarified by centrifugation followed by sequential filtration through 0.45 and 0.2-µm syringe filters to remove cellular debris. Clarified samples (250 uL) were treated with 20 U RNase One (Promega, Madison, WI) and 30 U Turbo DNase (Ambion, Austin, TX) in 1x DNase buffer (Ambion) at 37°C for 90 minutes to degrade unprotected host and environmental nucleic acids. To ensure continuous DNase activity, 10 U of DNase was added to the sample every 30 minutes during the 90-minute incubation. Remaining nucleic acids were isolated using Trizol LS (Invitrogen, Carlsbad, CA) according to the manufacturer’s instructions. A final DNase treatment was performed to remove final traces of DNA from the RNA preparation. Colostrum (approximately 4 mL) was manually collected from a SRLV seropositive ewe within the first 24 hours after giving birth. Colostrum was diluted 1:2 with cold phosphate-buffered saline and centrifuged at 800 x g for 15 minutes at 4°C. The cream layer was skimmed from the top and 250 µL of milk was treated with nucleases and RNA was extracted as described above. RNA libraries were prepared as previously described ( Workman ; Workman ; Workman ). Briefly, 100 ng of total RNA were used as input material for Illumina TruSeq RNA Sample Preparation Kit (Illumina, San Diego, CA). Libraries were prepared as specified by the manufacturer’s protocol without the initial step of poly-A selection on oligo-dT beads. Libraries were sequenced on an Illumina MiSeq instrument with a 600-cycle kit (v3) to generate 2 × 300 bp paired-end reads. Index adapters were removed from raw sequence reads using cutadapt 1.9.1 ( Martin, 2011) or BBDuk 35.82 (Brian Bushnell within Geneious 11.1.4 ( Kearse ) (Biomatters, Auckland, New Zealand) and trimmed reads were screened against the UniVec_Core database to remove contaminating vector sequences. Overlapping paired-end reads were merged using BBMerge 8.9 (Bushnell within Geneious). The remaining RNA was used to generate long-read sequencing libraries according to a modified RNA Iso-Seq with poly(A) tails added to the 3’ ends to allow cDNA synthesis of subgenomic fragments. The resulting cDNA was amplified, size fractionated and the largest fragments were used to make SMRTbell templates, which were sequenced on a Pacific Biosciences RSII instrument. SMRT Analysis was used to generate error corrected circular consensus sequences (CCS) from the raw reads and adapters and poly(A) tails were removed with BBDuk. Reads greater than 1,000 nucleotides in length were assembled de novo with the Geneious assembler. All reads were then mapped to the de novo assembly, and the resulting consensus sequence was reported. Four strains lacked good quality long-read sequence data ( Table 1) so the MiSeq short reads were assembled using template-assisted assembly in Geneious following Workman with accessions KY358787 and KY358788, respectively, used as subgroup 1 and 2 references. These two reference strains were included in this study. To calculate total genome coverage for each sample, merged and unmerged paired-end reads plus long-read CCS’s were jointly mapped to the consensus genome using the Geneious mapper with 40% maximum allowable mismatches, a word length of 24, an index word length of 14, 10% allowable gaps and a maximum gap size of 50. Up to 26 nucleotides on the 5’ ends did not fully resolve in genomes with only short-read sequencing available ( Table 1). Genomes were manually annotated based on alignments with full-length SRLV genomes available in GenBank.

Phylogenetic analyses of full-length genomes

To augment the newly reported genome sequences, full-length SLRV and SLRV-like genomes were also downloaded from GenBank using the following query of the Nucleotide database on October 11, 2019: txid11660[ORGN] OR txid2169971[ORGN] OR txid11653[ORGN] OR txid11663[ORGN] AND ("8000"[SLEN] : "12000"[SLEN]). 79 unique genomes were aligned using MUSCLE 3.8.425 ( Edgar, 2004 in Geneious 11.1.5) and a neighbor net phylogenetic network was built using default settings in Splitstree5 ( Huson & Bryant, 2006).

A2 Subgroup diagnostic SNP identification accounting for recombination

Population assignment of each genome generated in this study was performed in fineSTRUCTURE version 4 and its companion program, ChromoPainter version 1 ( Lawson ). Methods were similar to those previously reported for studying population structure and recombination in Helicobacter pylori phrophages ( Yahara ) and Herpes Simplex Virus ( Forni ). To generate the alignment used in fineSTRUCTURE, MUSCLE was used followed by manual refinement. Gaps in the alignment were converted to presence/absence characters as-is with simple gap patterns reduced to a single character regardless of size. The recombination rate map used in fineSTRUCTURE and ChromoPainter was estimated using the LDhat 2.2a interval program ( McVean ). For LDhat, pre-computed likelihoods were generated using a population-scaled per-site mutation rate inferred from the data (0.07587), a grid size of 101 and a maximum population-scaled whole-genome recombination rate of 100. The variable rate estimation was run for 10 million iterations with the first half discarded as burn-in and a block penalty of 20. To avoid alignment edge inaccuracies, the final 10% of SNPs from the 3’ terminus were placed preceding the 5’ terminus and the first 10% of SNPs from the 5’ terminus were placed following the 3’ terminus, essentially simulating circular genomes. The recombination rate point estimates at these simulated edges were removed. The outputs of LDHat were population-scaled recombination rates ( p), which relate to the biochemical recombination rate ( r) according to the formula p=2N where N is the effective population size. N is difficult to estimate. Estimates for HIV, a related lentivirus which also produces chronic infections, vary by several orders of magnitude ( Pennings ). Computational estimates of N in viruses also require time-series sampling data ( Rousseau ). Thus, N was not estimated for this study and the ChromoPainter recombination outputs were interpreted as being scaled by 2N . The fineSTRUCTURE analysis was run using the linkage model, the variable recombination rate map estimated as described above, and specifications for haploid genomes. ChromoPainter detected shared ancestry by reconstructing each genome as a probabilistic mosaic of ‘chunks’ derived via recombination from all other input genomes (termed ‘all vs all painting’) and fineSTRUCTURE assigned the genomes to populations based on the quantity and lengths of these shared genomic chunks. The following settings were changed from the default in fineSTRUCTURE and/or ChromoPainter to ensure convergence of estimated parameters: ChromoPainter chunks-per-region parameter k=38, ChromoPainter samples s=10, Markov Chain Monte Carlo (MCMC) iterations=1e6 fineSTRUCTURE tree finding maximization steps=1e6 and fineSTRUCTURE independent MCMC runs=10. ChromoPainter iterations i=100 were run for estimating the global switch rate parameter and the global mutation rate. ChromoPainter initializes at the mutation rate of Li & Stephens, 2003 and was estimated in our analyses using the -iM (global mutation rate) and -im (strain mutation rate) flags according to program developer recommendations. In fineSTRUCTURE, Strain 201373037 was excluded from the estimation of the global switch rate parameter since it trended toward 0 and stalled the program. This indicated very closely related samples in the dataset (G. Hellenthal, personal communication). Inconsistency in assignment of individuals to populations was resolved by assigning all ambiguously assigned individuals to the largest of the potential populations. To model subgroup-specific recombination, ChromoPainter version 2 was run in ‘donor-mode’ using the population assignments and global switch-rate parameter (0.168355) from the fineSTRUCTURE analysis. Population specific mutation rates for the donor specific recombination models were calculated as the mean of the constituent strain-specific mutation rates estimated by ChromoPainter during all vs all painting. Donor specific recombination models were run for 500 iterations to ensure convergence of copy probability. In contrast to all vs all painting, donor specific painting assigns genomic chunks to recipient genomes based on donor populations comprised of multiple genomes. To increase genome-wide assignment probability of subgroup specific SNPs, consensus genomes with evidence of large recombination blocks were iteratively removed from each subgroup pool of donor genomes if average copy probability was increasing. This was done to eliminate the most obvious recombinants from the pools while retaining enough donor genomes to optimize the recombination model. The output donor subgroup-assignments for each SNP were used to identify subgroup-specific SNPs while accounting for recombination. Recombination rates have been estimated for several RNA retroviruses, and most estimates are in the range of 10 -3 to 10 -5 ( Tromas ). Thus, we also ran the diagnostic SNP identification model using a range of fixed recombination rates (10 -3 to 10 -8 Morgans-per-base pair) to confirm that diagnostic SNP count did not change when varying input recombination rate by several orders of magnitude. The possibility of a historical recombination event giving rise to subgroup 2a from an introgression of subgroup 1 sequence on a subgroup 2b background was investigated using RDP5 ( Martin ) using default settings and a full exploratory recombination scan. A PHI test ( Bruen ) for the presence of recombination was also conducted in RDP5.

Dual infection inference using diagnostic SNPs

The subgroup-specific SNP content was quantified for each strain by extracting intra-host SNPs meeting default statistical restrictions (Maximum Variant P-value of 10 -6, Minimum Strand-Bias P-value of 10 -5 when exceeding 65% bias) relative to their final alignment in Geneious 11.1.4. The percentage of subgroup 1, subgroup 2, and ‘other’ SNPs at each diagnostic locus was calculated for each consensus genome. Subgroup partial dual infections were inferred as contiguous or nearly contiguous SNP blocks bearing both subgroup diagnostic alleles. For visualization relative to the consensus genome, these inferred partial dual infections were limited to genome blocks or scaffolds ≥50 nucleotides long where variants diagnostic for both subgroups co-occurred at a frequency of ≥5% and ≥2 reads. The 5% threshold was chosen as it was a conservative value accounting for sequencing error and mis-mapping when calling quasispecies SNPs. Multiple putative dual infection blocks were extended or scaffolded together when separated by <50 nucleotides and 1 diagnostic SNP. We characterize these as being caused by dual infection with unknown underlying viral haplotypes containing SNPs diagnostic to both subgroups at these regions as this is the most parsimonious explanation. However, quasispeciation in the absence of dual infection is an increasingly possible explanation as the numbers of adjacent subgroup diagnostic SNPs in the characterized regions decrease.

Functional analyses and annotation

Once subgroup-specific SNPs were identified in the context of recombination, intra-host amino acid variation (functional quasispecies) at the subgroup diagnostic loci were identified by extracting variants from the alignments in Geneious 11.1.5 using the same statistical criteria applied to nucleotides and occurring at a frequency of ≥5%. Highly variable domains in gag and env previously shown to be important were analyzed in the context of subgroup assignment and host TMEM154 diplotype. Additionally, SignalP-5.0 ( Nielsen, 2017) was used to predict the env signal peptide cleavage site.

Results

Genomes

Coverage of the 23 genomes ranged from 52- to 2661-fold ( Table 1). Complete or near-complete genomes ranged from 9164 to 9215 nucleotides in length. The combined short read and long read consensus genome of strain 199906011 was slightly different from the long read only consensus (KY358788). The sites with differences had a high frequency of the minor allele in the quasispecies in the long read only consensus and so switched the identity of the minor allele in the combined short read and long read consensus. The combined short read and long read consensus of strain 200303013 was identical to the long read only consensus (KY358787). A phylogenetic network using full-length SRLV genomes was dominated by groups A and B ( Figure 1). Subtype A2 strains from the United States of America occupied a distinct cluster within the network. Additional clusters on the tree were generally represented by a single geographic region, with Italy representing the highest number of unique clusters. Recombination was evident in several clusters of the network including subtype A2.

Figure 1.

Splitstree neighbornet phylogenetic network of 79 SRLV genomes from genotypes A, B, C and E.

Colors correspond to genomes assigned to subgroup specific pools of donor genomes in recombination analyses ( Table 1). Two major clusters containing genotypes A and B are denoted by curly brackets. Genomes outside these two major clusters are labeled with the format: Genotype-Country Abbreviation. Genbank accession numbers for all genomes are provided in Extended data, Supplemental Table 1 ( Dickey & Workman, 2020a).

fineSTRUCTURE was then used to classify the 23 genomes from this study into discrete populations that could be used in recombination analyses. The program first computed the pairwise similarities and then performed clustering to summarize relationships between genomes. Six distinct populations were identified by fineSTRUCTURE ( Figure 2), including two distinct subgroup 2 populations identified in Figure 2 as subgroup 2a and subgroup 2b. Subgroup 2a is intermediate between subgroup1 and subgroup2b on principal component 1 ( Figure 2).

Figure 2.

fineSTRUCTURE population assignments of 23 SRLV subtype A2 genomes along the first three principal components.

The first three principal components (PCs) account respectively for 40, 18 and 11 percent of the variance in the data. ( A) PC1 vs PC2, ( B) PC1 vs PC3 and ( C) PC2 vs PC3.

Splitstree neighbornet phylogenetic network of 79 SRLV genomes from genotypes A, B, C and E.

fineSTRUCTURE population assignments of 23 SRLV subtype A2 genomes along the first three principal components.

The first three principal components (PCs) account respectively for 40, 18 and 11 percent of the variance in the data. ( A) PC1 vs PC2, ( B) PC1 vs PC3 and ( C) PC2 vs PC3.

Recombination models

Population assignments from fineSTRUCTURE were used for recombination models in ChromoPainterV2. However, due to the large number of fineSTRUCTURE identified populations, several recombination models were run. Five of the six identified populations had >1 consensus genome. Thus, we first ran a model with five potential populations of donor genomes, on the grounds that singletons could not comprise a population for this kind of analysis. This model indicated that the three most frequent populations (subgroup 1, subgroup 2a and subgroup 2b) accounted for >88% of the SNPs and were the majority donors to 22 genomes ( Extended data, Supplemental Figure 1 ( Dickey & Workman, 2020b)). These results suggested that fineSTRUCTURE populations 4 and 5 were not true populations, but would be better described as complex recombinants of already defined subgroups. Thus, the model was then run with subgroups 1, 2a, and 2b as the only three donor populations ( Figure 3A). Due to the relative location of the three populations along principal component 1 ( Figure 2), the model was also run with subgroup 2b and subgroup 1 as the only two donor populations ( Figure 3B). This model showed possible complex recombination blocks between subgroup 1 and subgroup 2b in subgroup 2a genomes ( Figure 3B). The putative recombination event giving rise to this pattern was detected using RDP5 by all eight constituent methods (multiple comparison corrected probability 5.6X10 -67). Additional details of the event are provided in Extended data, Supplemental Table 2 ( Dickey & Workman, 2020a). Subgroup 2a genomes also showed intermediate average pairwise percent divergences between subgroups 1 and 2b ( Table 3). All models showed many predicted recombination blocks spanning the consensus genomes ( Table 2, Figure 3 and Extended data, Supplemental Figure 1 ( Dickey & Workman, 2020b)). Finally, a recombination model was run with only 2 donor populations, subgroup 1 and subgroup 2 (as 2a+2b) ( Figure 4). This was done to identify and extract subgroup specific SNPs while accounting for recombination. Viral strains 200050064 and 199916128 were removed from the subgroup 1 and 2 donor pools respectively based on having the highest proportion of inter-subgroup recombination ( Table 2, Figure 3) and this improved the subgroup 1 vs subgroup 2 recombination model. Further removal of genomes as potential donors did not improve the model. The average number of alternate subgroup recombination blocks (ChromoPainter’s chunkcount) was 2-fold higher in subgroup 2 genomes than subgroup 1 ( Table 2). ChromoPainter’s chunklengths parameter averaged 3-fold higher in subgroup 2 and predicted population specific mutation rate averaged 3-fold higher in subgroup 2 consensus genomes than subgroup 1 ( Table 2). The PHI-test for recombination in RDP5 was significant (p-val < 0.00001).

Table 3.

Average pair-wise genetic distance within (on-diagonal) and among (off-diagonal) SLRV A2 subgroups characterized in this study.

	Subgroup 1	Subgroup 2a	Subgroup 2b
Subgroup 1	1.7%	13.7%	15.6%
Subgroup 2a	13.7%	5.1%	10.4%
Subgroup 2b	15.6%	10.4%	5.2%

Table 2.

Select ChromoPainterV2 calculated parameters for the subgroup 1 vs 2 recombination model ( Figure 4).

Recipient Genome	fineSTRUCTURE Assigned Population	Donor Status	Subgroup 1 Chunk counts	Subgroup 2 Chunk counts	Subgroup 1 Chunk lengths	Subgroup 2 Chunk lengths	Donor specific mutation rate ^*
200303038	Subgroup 1	Subgroup 1	52.4404	0	30218.2	0	0.0161771
200303332	Subgroup 1	Subgroup 1	51.8010	0	30218.2	0	0.0177648
200303013	Subgroup 1	Subgroup 1	60.0599	0	30218.2	0	0.0357224
200103515	Subgroup 1	Subgroup 1	60.6890	0	30218.2	0	0.0236503
200050064	Subgroup 1	Recipient only	48.2632	6.4329	28003.9	2214.3
200323455	Subgroup 1	Subgroup 1	55.0174	0	30218.2	0	0.0196987
200103342	Subgroup 1	Subgroup 1	60.1737	0.0740	30207.5	10.7	0.0160127
199835918	Subgroup 1	Subgroup 1	61.1799	0	30218.2	0	0.0294231
200023230	Subgroup 1	Subgroup 1	62.7606	0	30218.2	0	0.0128684
201373037	Subgroup 1	Subgroup 1	46.9864	0	30218.2	0	0.0480409
200117502	Pop 5	Recipient only	25.5460	16.0997	15790.2	14428.0
200216049	Pop 5	Recipient only	42.0507	43.0426	15384.2	14834.0
200212120	Pop 4	Recipient only	43.3061	36.3817	18021.3	12196.9
200312013	Pop 4	Recipient only	39.8241	22.9727	21460.6	8757.6
200312088	Pop 6	Recipient only	27.2590	65.5737	8656.4	21561.9
199906011	Subgroup 2a	Subgroup 2	2.8228	47.0027	582.9	29635.3	0.0628346
200106932	Subgroup 2a	Subgroup 2	1.0739	38.8866	469.4	29748.8	0.0525776
200106929	Subgroup 2a	Subgroup 2	1.4457	44.8749	339.5	29878.8	0.0684202
200016283	Subgroup 2b	Subgroup 2	0	63.3829	0	30218.2	0.0781928
200335185	Subgroup 2b	Subgroup 2	0	66.1020	0	30218.2	0.0929781
200177363	Subgroup 2b	Subgroup 2	0	70.5193	0	30218.2	0.0898418
199916128	Subgroup 2b	Subgroup 2	0	70.2713	0	30218.2	0.0938431
199916193	Subgroup 2b	Recipient only	6.1441	53.8583	5726.2	24492.0	-

*ChromopainterV2 mutationprobs parameter.

Figure 4.

Twenty-three SRLV subtype A2 genomes ‘painted’ with recipient genomic ‘chunks’ derived from subgroup-specific donor genomes.

The recombination model utilized was subgroup1 vs subgroup 2.

Twenty-three SRLV subtype A2 genomes ‘painted’ with recipient genomic ‘chunks’ derived from subgroup-specific donor genomes.

Select ChromoPainterV2 calculated parameters for the subgroup 1 vs 2 recombination model ( Figure 4).

ChromoPainter’s ‘chunkcounts’ parameter is defined as the number of genomic chunks from a population of donor genomes, assigned to the recipient genome via recombination. ‘Chunklengths’ are the combined lengths (in centimorgans X 2 N ) of these chunks. Donor specific mutation rate is the amount of mismatching across the recipient chunklengths divided by the number of loci. Donor status is the population of donor genomes to which a consensus genome was assigned in the final subgroup 1 vs subgroup 2 recombination model ( Figure 4). To improve recombination models, genomes with high inter-subgroup recombination were iteratively removed from the populations if model quality (as judged by average copy probability) was increasing. *ChromopainterV2 mutationprobs parameter. The recombination model utilized was subgroup1 vs subgroup 2. Subgroup diagnostic SNP inference accounting for recombination between subgroup 1 and subgroup 2 (as fineSTRUCTURE population 2a+2b) resulted in 413 diagnostic SNPs ( Extended data, Supplemental Table 3 ( Dickey & Workman, 2020a)). The frequency of alternate subgroup diagnostic alleles was 3-fold higher in subgroup 2 consensus genomes than subgroup 1 ( Table 4).

Table 4.

Subgroup specific intra-host genetic variation (quasispecies) for 23 SRLV A2 consensus genomes.

Donor status is the population of donor genomes to which a consensus genome was assigned in the final subgroup 1 vs subgroup 2 recombination model ( Figure 4).

Strain	Donor Status	Average % Subgroup 1 Across 413 Diagnostic Loci	Average % Subgroup 2 Across 413 Diagnostic Loci	Average % 'Other' Across 413 Diagnostic Loci	Genomic Regions Indicative of Dual Infection	Total Length of Genomic Regions Indicative of Dual Infection
200303038	Subgroup 1	99.79%	0.16%	0.05%	-	-
200303332	Subgroup 1	99.89%	0.02%	0.08%	-	-
200303013	Subgroup 1	99.55%	0.17%	0.28%	-	-
200103515	Subgroup 1	97.18%	1.75%	1.08%	-	-
200050064	Recipient only	91.70%	7.74%	0.56%	6	1427
200323455	Subgroup 1	99.25%	0.43%	0.32%	-	-
200103342	Subgroup 1	98.93%	0.73%	0.34%	-	-
199835918	Subgroup 1	98.89%	0.62%	0.49%	-	-
200023230	Subgroup 1	98.99%	0.50%	0.51%	-	-
201373037	Subgroup 1	98.78%	0.71%	0.51%	-	-
200117502	Recipient only	33.22%	66.24%	0.70%	9	3394
200216049	Recipient only	52.04%	47.62%	0.34%	13	3704
200212120	Recipient only	72.61%	25.48%	1.91%	11	4179
200312013	Recipient only	77.65%	21.69%	0.66%	8	1798
200312088	Recipient only	40.74%	58.20%	1.07%	14	2297
199906011	Subgroup 2	4.25%	95.12%	0.63%	1	158
200106932	Subgroup 2	1.00%	98.18%	0.82%	-	-
200106929	Subgroup 2	0.92%	98.49%	0.58%	-	-
200016283	Subgroup 2	0.93%	98.67%	0.40%	-	-
200335185	Subgroup 2	0.35%	99.27%	0.37%	-	-
200177363	Subgroup 2	1.62%	97.60%	0.79%	-	-
199916128	Subgroup 2	1.92%	96.40%	1.68%	2	241
199916193	Recipient only	19.47%	79.90%	0.62%	9	1896

Subgroup specific intra-host genetic variation (quasispecies) for 23 SRLV A2 consensus genomes.

Donor status is the population of donor genomes to which a consensus genome was assigned in the final subgroup 1 vs subgroup 2 recombination model ( Figure 4). Intra-host variation at subgroup-specific SNPs was analyzed to infer the presence of dual infections with subgroup 1 and subgroup 2 genomes. The parameters specified for predicting subgroup dual infections resulted in 73 genomic regions indicative of dual infection across nine consensus genomes (range: 1–14, average: 8.2), averaging 261.6 nucleotides in length (range: 55–1482) and comprising 2%–45% of the genome ( Extended data, Supplemental Table 4 ( Dickey & Workman, 2020a), Figure 5).

Figure 5.

Twenty-three SRLV subtype A2 genomic regions indicative of dual infection.

The background is the subgroup 1 vs subgroup 2 recombination model ( Figure 4). Genomic regions indicative of dual infection contained both subgroup diagnostic alleles at a frequency ≥5% for at least 2 consecutive diagnostic SNPs and 50 nucleotides.

Twenty-three SRLV subtype A2 genomic regions indicative of dual infection.

Functional variation

Of the 413 subgroup diagnostic SNPs identified, 106 were non-synonymous ( Extended data, Supplemental Tables 3 and 5 ( Dickey & Workman, 2020a)). A2 subgroup 1 and 2 specific variants were identified in all viral genes with frequencies ranging from 2.2 to 4.3%. Sequence analysis of the immunodominant epitope in the gag gene revealed two adjacent SNPs that resulted in a single amino acid change distinguishing subgroups 1 and 2 ( Extended data, Supplemental Table 5 ( Dickey & Workman, 2020a)). Analysis of the env gene variable regions V1-V5 ( Valas ) found no subgroup specific SNP in variable regions V1 and V2, five subgroup specific variants each in V3 and V4, and one in V5 ( Extended data, Supplemental Table 5 ( Dickey & Workman, 2020a)). Six subgroup defining variants were identified in the predicted env signal peptide.

Discussion

This study provides full-length or near-full-length consensus genomes from 21 new SRLV subtype A2 strains used in determining the viral subgroup association with TMEM154 E35K genotypes ( Heaton ; Sider ) in addition to the two subgroup representative strains from Workman . These genomes were analyzed for recombination and population structure using a chromosome ‘painting’ model. Several genomes showed complex recombination patterns. Furthermore, this model was used to identify 413 subgroup-specific SNPs while accounting for recombination. This information was used to quantify intra-host genetic diversity at diagnostic SNPs and estimated nine animals were dually infected with viral recombinants such that they have diagnostic SNPs from both virus subgroups for portions of their genome. Lastly, we analyzed important functional domains in the virus genome in the context of virus subgroup and host TMEM154 diplotypes focusing only on haplotypes 1, 2 and 3. The SRLV phylogenetic network contained subtype A2 as a distinct cluster ( Figure 1). Several genomes in this cluster are connected by many nodes indicating inter-subgroup recombination. The fineSTRUCTURE analysis identified two distinct subgroup 2 ‘populations’ of consensus genomes. These have been provisionally designated subgroup 2a and subgroup 2b. Subgroup 2a is intermediate between subgroup 1 and subgroup 2b both in terms of position along principal component 1 ( Figure 2) and in terms of genetic distance ( Table 3). The subgroup 2 genome reported by Workman (viral strain 199906011, GenBank KY358788.1) belongs to subgroup 2a ( Table 1). This is a case where it is difficult to distinguish, with certainty, the recombinant from the second viral donor, however there is some evidence that subgroup 2b may be a ‘purer’ representative of subgroup 2. Even when subgroup 2a genomes were included among the subgroup 2 donors, the only consensus genomes that resolved unambiguously as subgroup 2 were the four subgroup 2b strains ( Table 2). In contrast, subgroup 1 was more clearly delimited by the recombination model. Of the nine subgroup 1 donor genomes, only 200103342 did not resolve unambiguously as subgroup 1 ( Table 2). Subgroup 2 was more genetically diverse than subgroup 1 based on its higher mutation rates and increased recombination ( Figure 4, Table 2). Subgroup 2 also had higher intra-host genetic diversity based on the dual infection analysis ( Table 4). The consensus genomes with the highest intra-host diversity ( Figure 5, Table 4 and Extended data, Supplemental Table 4 ( Dickey & Workman, 2020a)) and highest recombination block count ( Figure 4, Table 2) did not conform to a good concept of ‘population’ ( Extended data, Supplemental Figure 1 ( Dickey & Workman, 2020b)) despite being identified as populations by fineSTRUCTURE analysis ( Figure 2). But these genomes could be parsimoniously modeled as complex recombinants of subgroups 1 and 2 ( Figure 4). More than one representative genome is required to properly distinguish subgroups from recombinant forms of subgroups 1 and 2. The subgroup 1 vs subgroup 2 recombination model used to identify diagnostic SNPs was informed by both recombination rate variation across the genome as well as differing mutation rates between subgroups. Models run with a constant recombination rate across the genome identified a slightly higher number of diagnostic SNPs (421 vs 413, see Methods). The dual infection analysis identified multiple genomic regions featuring modest frequencies of both subgroup diagnostic alleles ( Extended data, Supplementary Table 2 ( Dickey & Workman, 2020a)). These were identified in the most recombinant consensus genomes ( Figure 4). Because the dual infection regions did not span the entire genome, the underlying haplotypes were most likely recombinant as opposed to ‘pure’ subgroup sequences. The relatively small number of strains characterized in this study and a paucity of geographic variability biased our results and limited our ability to make recombination-based inference. While the subset of samples from the original TMEM154/A2 association studies chosen was a good starting place for modeling recombination, the addition of larger numbers of geographically diverse SRLV A2 genomes should improve the recombination model(s) substantially due to a larger pool of potential recombinant and parental genomes ( Yahara ). There are presently 79 unique full length SRLV genomes available with more than half of these published since 2019 ( Colitti ; present work) so the time has probably come to recharacterize SRLV diversity at the whole-genome level, expanding the current classification beyond partial gag/ pol sequence. A revised classification system will better facilitate outbreak tracing and identification of recombinants circulating beyond their local flocks. Such circulating recombinant forms (CRFs) have been extensively characterized for HIV ( Carr ; Leitner ) providing a possible model and framework for the SRLV research community to adapt. However, the current CRF framework for HIV utilizes consensus genomes so an accounting of the underlying haplotypes (quasispecies) contributing to these consensuses would benefit the genomic characterization of both lentiviruses. Fras have suggested that dual infection of small ruminant lentiviruses may be common, understudied, and underreported. Our results and those of Sider confirm that dual infection is common though none of our samples showed evidence of having been dually infected by pure subgroup 1 and 2 representatives. Hopefully, declining sequencing costs and increased availability of whole genome sequencing will foster greater reporting of this phenomenon. Our results also conform to those of Sider including the two subgroups identified and the existence of recombination. These results extend those of Sider from partial gag/ env to the complete genome while accounting for recombination. Recombination is also clearly delimited by the ChromoPainter models. The 1+2b=2a recombination block spanning the middle portion of env ( Figure 3B, individuals p and q) was also predicted by RDP5 ( Martin ) (Extended data, Supplemental Table 2 ( Dickey & Workman, 2020a)). While our results extend the existence of two subgroups across the entire SRLV genome, subgroup 2 has additional population sub-structuring ( Figure 2, Figure 3B). Subgroup 2a may represent a somewhat stable locally circulating recombinant of subgroup 1 and 2b ( Figure 3). The genomes identified as 2a were found exclusively in TMEM154 2,2 and 2,3 diplotype animals ( Table 1). Additionally, most strains with genomic regions indicative of dual infection were from TMEM154 susceptible 1,2 and 1,3 heterozygotes, i.e. animals with both an E and K at position 35. This suggests that animals that are E35K heterozygous due to TMEM 154 1,2 and 1,3 diplotypes may facilitate recombination between subgroups 1 and 2. Interestingly, two subtype specific functional variants were found in a region of the env gene variable region 4 (V4) which was recently identified to contain ’signature patterns’ associated with different clinical status in sheep and goats ( Mendez ). This region of V4 also contains targets of neutralizing antibodies and is predicted to play a role in virus entry ( Skraban ). Multiple amino acid changes were also observed in the N-terminus of env. None of the amino acids were predicted to change the env signal peptide cleavage site; however, it would be interesting to know if the five subgroup-specific amino acids affect post-translational modifications such as cleavage timing, folding, or glycosylation, phenomena documented to affect HIV fitness ( Asmal ; Snapp ; Upadhyay ). As more genomes are sequenced, and we learn more about the function of TMEM154 in the context of the virus lifecycle, it will be interesting to see which, if any, of these viral sequences are biologically responsible for TMEM154 associations.

Data availability

Underlying data

NCBI sequence accessions are provided in Table 1.

Extended data

Figshare: Supplemental Tables. https://doi.org/10.6084/m9.figshare.14991990 ( Dickey & Workman, 2020a). This project contains the following extended data: Supplemental Table 1. Genbank accession numbers and viral strains for 79 complete or near-complete SRLV genomes. The genomes are listed clockwise in the order they appear in Figure 1 of Dickey et al., 2020 starting at KY358788/199906011. Colors according to subgroup 1 vs 2a vs 2b recombination model (Figure 3A) can be seen in the downloaded file. Supplemental Table 2. Details on Recombination Detection Program 5 (RDP5) recombination event 3. Supplemental Table 3. Four hundred and thirteen diagnostic SNPs distinguishing SRLV A2 subgroups. Supplemental Table 4: Seventy-three genomic regions indicative of dual subgroup infection among 9 SRLV A2 genomes. Supplemental Table 5: Within-host amino acid variability (functional viral quasispecies) at 106 subgroup diagnostic loci. Colors according to subgroup 1 vs subgroup 2 recombination model ( Figure 4) can be seen in the downloaded file. Figshare: Supplemental Figure 1. https://doi.org/10.6084/m9.figshare.14991975 ( Dickey & Workman, 2020b). This file contains 23 SRLV subtype A2 genomes ‘painted’ with recipient genomic ‘chunks’ derived from populations of donor genomes. The five donor genome populations in the recombination model were determined using fineSTRUCTURE. Extended data are available under the terms of the Creative Commons Attribution 4.0 International license (CC-BY 4.0). In this manuscript, the authors assembled the full-length viral genome sequence of 23 subtype A2 SRLV strains using long-read sequencing technology, and try to identify recombination patterns. The authors compare the 79 unique full-length SRLV genomes. I am reviewing the version 2 of this manuscript that has improved the previous version. The article is well-written, the results of the study are well presented, well-supported, and certainly add to the current knowledge of the topic. Authors could include the prevalence of the different SRLV subtypes in the breed and the geographical area, if it is known. The authors state that ewes heterozygous for E35K (TMEM154 1,2 and 1,3 diplotypes) may facilitate recombination between subgroups 1 and 2, but they only have 23 full length viral genomes, and this should be confirmed using a bigger experimental design. In my opinion, the manuscript is acceptable for indexing. Is the work clearly and accurately presented and does it cite the current literature? Yes If applicable, is the statistical analysis and its interpretation appropriate? Yes Are all the source data underlying the results available to ensure full reproducibility? Yes Is the study design appropriate and is the work technically sound? Yes Are the conclusions drawn adequately supported by the results? Yes Are sufficient details of methods and analysis provided to allow replication by others? Yes Reviewer Expertise: Research focused in the application of genomic tools in animal science, mainly in ruminants, including animal breeding, nutrigenomics and nutrigenetics, and also in animal health. I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors have addressed my concerns. Manuscript is now acceptable for indexing in this revised version. Is the work clearly and accurately presented and does it cite the current literature? Partly If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required. Are all the source data underlying the results available to ensure full reproducibility? Yes Is the study design appropriate and is the work technically sound? Yes Are the conclusions drawn adequately supported by the results? Partly Are sufficient details of methods and analysis provided to allow replication by others? Yes Reviewer Expertise: Genetic and anitigenic characterization of Small Ruminant Lentiviruses (SRLV) in sheep, goats and wildlife ruminants (molecular analysis, phylogeny, cross species infection, genetic recombination) I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. In this manuscript the authors assembled full length viral genomes and then analyzed specific segments in order to identify recombination patterns by using chromosome painting model. General comments: It’s not clear in the introduction whether this chromosome painting model is used for recombination detection in any other virus? Perhaps any other RNA virus at all? I would highly appreciate if the authors could expand their introduction about this. I would suggest the authors to include RDP software in methods and at least 1 figure in the results to see where the recombination events take place within the genome and compare these results with the chromosome painting one. Currently, these results are as not shown so it’s not very clear whether these results are complementary. My major concern with the recombination events that the authors describe in the manuscript are related with the differentiation between mutation and recombination. They mention in the introduction that RNA viruses have more mutations due to the low fidelity of the reverse transcriptase how did they differentiate these events from true recombination events? Specific comments: Abstract - Methods – The goal. Could the author explain the method rather than the goal? In the first paragraph would be good to read 1 or 2 lines about the importance or impact of this pathogen over sheep population not just the disease that cause the virus. Perhaps include incidence or any statistics about the disease that cause the virus. This would help to put this research in context. As the genome contains more than one non-coding long terminal repeat regions. I wonder how did the authors manage to sequence that specific part as I know they could be problematic to determine sequence identity. Are these regions inverted? Why the authors did not amplify the virus previous to sequencing? Is there any cell culture that could be used for this purposes? Could this help with the four strains that had low quality long read sequence? RDP software is mentioned in the discussion but not in the methods. The split tree software has a tool to measure recombination called PHI test. Is there a reason why the authors did not include this in the manuscript? What does PC stands for in Figure 2? Is the work clearly and accurately presented and does it cite the current literature? Yes If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required. Are all the source data underlying the results available to ensure full reproducibility? Partly Is the study design appropriate and is the work technically sound? Yes Are the conclusions drawn adequately supported by the results? Yes Are sufficient details of methods and analysis provided to allow replication by others? Partly Reviewer Expertise: Virology; recombination; herpesvirus I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. General comments: Li N, Stephens M (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165: 2213–2233 It’s not clear in the introduction whether this chromosome painting model is used for recombination detection in any other virus? Perhaps any other RNA virus at all? I would highly appreciate if the authors could expand their introduction about this. Response: The recombination modeling program utilized in these analyses has been previously used to analyze recombination in Helicobacter pylori prophages (Yahara et al., 2019) and Herpes Simplex Virus (Forni et al., 2020). This information has been added to the methods. The Yahara et al. reference may be most similar in initial approach. Ours may be the first use of ChromoPainter version 2 ‘donor mode’ to extract subgroup-diagnostic variants in viruses while accounting for recombination. I would suggest the authors to include RDP software in methods and at least 1 figure in the results to see where the recombination events take place within the genome and compare these results with the chromosome painting one. Currently, these results are as not shown so it’s not very clear whether these results are complementary. Response: We agree with the reviewer that our use of RDP belongs in the methods. We have attempted to do so in a way that does not distract from the primary focus of the manuscript, which was to identify subgroup-diagnostic variants, rather than recombination events. RDP was only used in this work to confirm a single putative event: the possible introgression of a large chunk of subgroup 1 sequence into a subgroup 2b background to create subgroup 2a (black box in Figure 3b). This event (approximate size, genomic position, and ChromoPainter assigned subgroup of RDP identified putative parental strains) was confirmed by RDP. This information has been added to the results and the details of the RDP event are provided in Extended data, Supplemental Table 2(Dickey & Workman, 2020a). My major concern with the recombination events that the authors describe in the manuscript are related with the differentiation between mutation and recombination. They mention in the introduction that RNA viruses have more mutations due to the low fidelity of the reverse transcriptase how did they differentiate these events from true recombination events? Response: All recombination analysis programs claim to account for mutation as this is a fundamental issue. ChromoPainter initializes at the mutation rate of Li and Stephens 2003, but we estimated it in our analyses by using the -iM flag according to program developer recommendations. Interestingly, ChromoPainter’s mutation rate estimates are 3-fold higher for subgroup 2 genomes compared to subgroup 1 (Table 2). Population specific mutation rate estimates were obtained by averaging the estimated rates from the fineSTRUCTURE run among members of the assigned population and incorporated into our increasingly refined recombination models using the -m flag. We have added this information and the Li and Stephens reference to the manuscript. Our subgroup-specific-colored blocks in figures 3-5 are best interpreted as regions of sequence assigned by the recombination model to those subgroups, not as events-per-se. We believe regions alternating back-and-forth between subgroups over short genomic spans (e.g. strains labeled k-r in figures 3-5), are sometimes best interpreted as having arisen from multiple recombination events over time. We have attempted to minimize the use of the term ‘event’ in the manuscript in service of this paradigm. Specific comments: Abstract - Methods – The goal. Could the author explain the method rather than the goal? Response: The methods section now reads: Methods: Sequence variation within and among full-length assemblies of SRLV subtype A2 subgroups 1 and 2 was analyzed to identify genome-scale recombination patterns. In the first paragraph would be good to read 1 or 2 lines about the importance or impact of this pathogen over sheep population not just the disease that cause the virus. Perhaps include incidence or any statistics about the disease that cause the virus. This would help to put this research in context. Response: This information has been added to the first paragraph of the introduction. As the genome contains more than one non-coding long terminal repeat regions. I wonder how did the authors manage to sequence that specific part as I know they could be problematic to determine sequence identity. Are these regions inverted? Response: We used PacBio long-read technology to help resolve the ends of the viral genome. We report some genomes as ‘near-complete’ because we did not obtain the complete LTR sequence for all genomes. Why did the authors not amplify the virus previous to sequencing? Is there any cell culture that could be used for this purposes? Could this help with the four strains that had low quality long read sequence? Response: It is true that the virus can be isolated in culture prior to sequencing; however, there were no commercially available cell lines for the isolation of SRLV at the time we initiated this project. While we could have developed appropriate cell lines, we chose not to isolate the virus first, as this can lead to the selection of a subpopulation of the infecting virus quasispecies. For these reasons, we sequenced the virus directly from the infected tissues. RDP software is mentioned in the discussion but not in the methods. Response: Please see the response to General comment 2. The split tree software has a tool to measure recombination called PHI test. Is there a reason why the authors did not include this in the manuscript? Response: SplitsTree5 does not have this analysis option, though we also remember it from SplitsTree4. We ran a PHI-test in RDP5 and now report it in the results. What does PC stand for in Figure 2? Response: Principal Component (PC), is now incorporated into the legend of Figure 2. In this article, the authors focused mainly on the analysis of recombination of 23 SRLV strains belonging to subtype A2. In this paper a lot of different analyses were performed, but the results and figures are described very poorly. This makes the article very hard to read. For example on Fig. 1 I see only 15 isolates not 23. Lack is information on accesion numbers or isolate names of sequences from GenBank present on the tree. Please indicate to which genotypes they belong. In Figure 2, there are three images that should be labeled as A, B and C and their description should be more detailed (in figure and in results). In the third picture I see three populations not six. I don't understand why all 23 sequences were used in the recombination analysis. The authors write that only subgroubs 2a, 2b and 1 was used in the reombination models but some genomes not belonged to these subgroups (Fine structure Pop4, 5 and 6, Figure2). In the discussion the results of the study are repeated. There are no analyses regarding the occurrence of TMEM154E35K genotypes and susceptibility/resistance of sheep infected with and SRLV A2 subgroups 1 and 2. Did the research confirm previous studies described by Sider et al., 2013? Is the work clearly and accurately presented and does it cite the current literature? Partly If applicable, is the statistical analysis and its interpretation appropriate? I cannot comment. A qualified statistician is required. Are all the source data underlying the results available to ensure full reproducibility? Yes Is the study design appropriate and is the work technically sound? Yes Are the conclusions drawn adequately supported by the results? Partly Are sufficient details of methods and analysis provided to allow replication by others? Yes Reviewer Expertise: Genetic and anitigenic characterization of Small Ruminant Lentiviruses (SRLV) in sheep, goats and wildlife ruminants (molecular analysis, phylogeny, cross species infection, genetic recombination) I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. In this article, the authors focused mainly on the analysis of recombination of 23 SRLV strains belonging to subtype A2. In this paper a lot of different analyses were performed, but the results and figures are described very poorly. This makes the article very hard to read. For example on Fig. 1 I see only 15 isolates not 23. Response: Figure 1 is information dense, particularly the very similar subgroup 1 genomes denoted by the lavender-colored oval. However, we agree it is beneficial to list all genomes, so they are now listed in clockwise order in the new Supplemental Table 1. Lacking is information on accession numbers or isolate names of sequences from GenBank present on the tree. Response: An added benefit to the new supplemental table (see previous comment) is space to put the GenBank accession numbers of all strains in Figure 1. Genbank accession number, isolate name, county of origin, and virus genotype are now listed in Supplemental Table 1. Please indicate to which genotypes they belong. Response: This information is made clearer in the figure by curly brackets, additional annotations, and clarification in the legend. In Figure 2, there are three images that should be labeled as A, B and C and their description should be more detailed (in figure and in results). In the third picture I see three populations not six. Response: The figures sections have been relabeled A, B and C and additional details are provided in the results. We are not certain but think the reviewer may be referring to 3 clusters of strains, not six, since principal components 2 and 3 (PC2 and PC3) do not strongly discriminate subgroups 1 and 2b. We do not find this problematic since the first principal component strongly discriminates these two populations. I don't understand why all 23 sequences were used in the recombination analysis. The authors write that only subgroups 2a, 2b and 1 was used in the recombination models but some genomes not belonged to these subgroups (Fine structure Pop4, 5 and 6, Figure2). Response: Our initial recombination donor model had five donor populations and all 23 genomes as possible recipients (Supplemental Figure 1). fineSTRUCTURE population 6 was excluded from this analysis on the grounds that singletons could not comprise a population for this type of analysis. The result suggests fineSTRUCTURE populations 4 and 5 are probably not populations either, justifying our continued refinement of recombination models to those with fewer donor populations, which better characterize the subgroups for which there is sufficient evidence, and recombination patterns among them. Our explanation of this is that fineSTRUCTURE assigned ‘population’ status to groups of genomes that would be better described as complex recombinants of already defined subgroups, rather than true populations. Over-splitting of these types of datasets into too many ‘populations’ is a known problem. See: Falush D, van Dorp L, Lawson D, 2016 A tutorial on how (not) to over-interpret STRUCTURE/ADMIXTURE bar plots. bioRxiv 066431; 10.1101/066431. While, fewer potential donor genomes were used in the refined recombination models (Figures 3 and 4), all genomes were used as recipients, allowing us to characterize recombination. In the discussion the results of the study are repeated. Response: We believed this was beneficial to organizing points of discussion. There are no analyses regarding the occurrence of TMEM154E35K genotypes and susceptibility/resistance of sheep infected with and SRLV A2 subgroups 1 and 2. Did the research confirm previous studies described by Sider et al., 2013? Response: We generated complete genome sequences from a subset of genomes characterized in the Sider et al., 2013 paper. The goal was not to confirm or dispute the association, but rather, to generate complete genomes and develop a nucleotide polymorphism typing system for the two major subgroups while accounting for recombination. More complete SRLV sequences are required to identify which viral elements correlate with host adaptation.

44 in total

1. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data.

Authors: Na Li; Matthew Stephens
Journal: Genetics Date: 2003-12 Impact factor: 4.562

2. A simple and robust statistical test for detecting the presence of recombination.

Authors: Trevor C Bruen; Hervé Philippe; David Bryant
Journal: Genetics Date: 2006-02-19 Impact factor: 4.562

3. Incidence of infection in 39-month-old ewes with TMEM154 diplotypes "1 1," "1 3," and "3 3" after natural exposure to ovine progressive pneumonia virus.

Authors: K A Leymaster; C G Chitko-McKown; M P Heaton
Journal: J Anim Sci Date: 2015-01 Impact factor: 3.159

4. High rate of recombination throughout the human immunodeficiency virus type 1 genome.

Authors: A E Jetzt; H Yu; G J Klarmann; Y Ron; B D Preston; J P Dougherty
Journal: J Virol Date: 2000-02 Impact factor: 5.103

5. Recombinant small ruminant lentivirus subtype B1 in goats and sheep of imported breeds in Mexico.

Authors: H Ramírez; I Glaria; X de Andrés; H A Martínez; M M Hernández; R Reina; E Iráizoz; H Crespo; E Berriatua; J Vázquez; B Amorena; D de Andrés
Journal: Vet J Date: 2010-10-06 Impact factor: 2.688

6. Evidence for recombination in the envelope gene of maedi-visna virus.

Authors: Valgerdur Andrésdóttir
Journal: Virus Genes Date: 2003-08 Impact factor: 2.332

7. Reduced lentivirus susceptibility in sheep with TMEM154 mutations.

Authors: Michael P Heaton; Michael L Clawson; Carol G Chitko-Mckown; Kreg A Leymaster; Timothy P L Smith; Gregory P Harhay; Stephen N White; Lynn M Herrmann-Hoesing; Michelle R Mousel; Gregory S Lewis; Theodore S Kalbfleisch; James E Keen; William W Laegreid
Journal: PLoS Genet Date: 2012-01-26 Impact factor: 5.917

8. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data.

Authors: Matthew Kearse; Richard Moir; Amy Wilson; Steven Stones-Havas; Matthew Cheung; Shane Sturrock; Simon Buxton; Alex Cooper; Sidney Markowitz; Chris Duran; Tobias Thierer; Bruce Ashton; Peter Meintjes; Alexei Drummond
Journal: Bioinformatics Date: 2012-04-27 Impact factor: 6.937

9. A new approach for Small Ruminant Lentivirus full genome characterization revealed the circulation of divergent strains.

Authors: Barbara Colitti; Elisabetta Coradduzza; Giantonella Puggioni; Maria Teresa Capucchio; Ramsés Reina; Luigi Bertolotti; Sergio Rosati
Journal: PLoS One Date: 2019-02-21 Impact factor: 3.240

10. Loss and recovery of genetic diversity in adapting populations of HIV.

Authors: Pleuni S Pennings; Sergey Kryazhimskiy; John Wakeley
Journal: PLoS Genet Date: 2014-01-23 Impact factor: 5.917