| Literature DB >> 28450881 |
Robert P Ruggiero1, Yann Bourgeois1, Stéphane Boissinot1.
Abstract
Vertebrate genomes differ considerably in size and structure. Among the features that show the most variation is the abundance of Long Interspersed Nuclear Elements (LINEs). Mammalian genomes contain 100,000s LINEs that belong to a single clade, L1, and in most species a single family is usually active at a time. In contrast, non-mammalian vertebrates (fish, amphibians and reptiles) contain multiple active families, belonging to several clades, but each of them is represented by a small number of recently inserted copies. It is unclear why vertebrate genomes harbor such drastic differences in LINE composition. To address this issue, we conducted whole genome resequencing to investigate the population genomics of LINEs across 13 genomes of the lizard Anolis carolinensis sampled from two geographically and genetically distinct populations in the Eastern Florida and the Gulf Atlantic regions of the United States. We used the Mobile Element Locator Tool to identify and genotype polymorphic insertions from five major clades of LINEs (CR1, L1, L2, RTE and R4) and the 41 subfamilies that constitute them. Across these groups we found large variation in the frequency of polymorphic insertions and the observed length distributions of these insertions, suggesting these groups vary in their activity and how frequently they successfully generate full-length, potentially active copies. Though we found an abundance of polymorphic insertions (over 45,000) most of these were observed at low frequencies and typically appeared as singletons. Site frequency spectra for most LINEs showed a significant shift toward low frequency alleles compared to the spectra observed for total genomic single nucleotide polymorphisms. Using Tajima's D, FST and the mean number of pairwise differences in LINE insertion polymorphisms, we found evidence that negative selection is acting on LINE families in a length-dependent manner, its effects being stronger in the larger Eastern Florida population. Our results suggest that a large effective population size and negative selection limit the expansion of polymorphic LINE insertions across these populations and that the probability of LINE polymorphisms reaching fixation is extremely low.Entities:
Keywords: Anolis carolinensis; LINE; genome resequencing; retrotransposon; selection; transposable element
Year: 2017 PMID: 28450881 PMCID: PMC5389967 DOI: 10.3389/fgene.2017.00044
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Long Interspersed Nuclear Element clades found in the A. carolinensis genome.
| Clades | Number of families | Number of RT hits1 | Total number of copies in published genome1 | Number of full-length copies in published genome1 | Length of full length elements1 | Number of polymorphic insertions2 | Number of full-length polymorphic insertions2 |
|---|---|---|---|---|---|---|---|
| R4 | 2 | 7,682 | 3,000 | 994 | 3.8 Kb | 1,729 | 712 |
| RTE | 2 | 18,554 | 3,516 | 217 | 3.2–3.9 Kb | 3,367 | 1782 |
| CR1 | 4 | 86,802 | 1,594 | 117 | 4.6–5.8 Kb | 27,802 | 2,578 |
| L2 | 17 | 38,607 | 3,800 | 380 | 4.8–6.3 Kb | 11,210 | 769 |
| L1 | 20 | 7,441 | 806 | 170 | 5.2–6.8 Kb | 2,508 | 1,089 |
Origin of the samples sequenced, sequencing depth, and number of polymorphic insertions per individual.
| Sample | Clade | Locality | Latitude | Longitude | Depth | Number of polymorphic insertions present | Number of polymorphic full-length insertions present |
|---|---|---|---|---|---|---|---|
| AC_36_1 | Gulf-Atlantic | Blount, Tennessee | 35.53855 | -84.07625 | 15× | 7,557 | 839 |
| AC_38_4 | Gulf-Atlantic | Blount, Tennessee | 35.5558 | -84.00245 | 10× | 6,367 | 699 |
| AC_8_13 | Gulf-Atlantic | Thibodaux, Louisiana | 29.797883 | -90.8129 | 9× | 6,402 | 629 |
| AC_8_8 | Gulf-Atlantic | Thibodaux, Louisiana | 29.797883 | -90.8129 | 16× | 7,849 | 861 |
| AC_27_3 | Gulf-Atlantic | Darien, Georgia | 31.35295 | -81.447467 | 10× | 5,626 | 565 |
| AC_27_4 | Gulf-Atlantic | Darien, Georgia | 31.35295 | -81.447467 | 10× | 5,135 | 500 |
| CC3 | East Florida | Cocoa, Florida | 28.243611 | -80.870556 | 16× | 9,969 | 863 |
| CC8 | East Florida | Cocoa, Florida | 28.243611 | -80.870556 | 16× | 11,965 | 1,130 |
| SB3 | East Florida | South Bay, Florida | 26.683333 | -80.716884 | 12× | 11,839 | 1,069 |
| SB4 | East Florida | South Bay, Florida | 26.683333 | -80.716884 | 8× | 8,371 | 621 |
| TV8 | East Florida | Titusville, Florida | 28.5437777 | -80.9421666 | 8× | 8,557 | 740 |
| VB6 | East Florida | Vero Beach, Florida | 27.640278 | -80.59475 | 10× | 10,393 | 890 |
| VB7 | East Florida | Vero Beach, Florida | 27.640278 | -80.59475 | 9× | 10,451 | 924 |
Summary statistics for all LINE clades, families and subgroups considered in this study.
| Mean number of differences in polymorphic insertions | Tajima’s D | Number of polymorphic loci | % of private insertions | % of fixed differences | % of shared differences | Mean | ||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Dataset | All | Florida | Gulf-Atl | Florida | Gulf-Atl | Florida | Gulf-Atl | |||||
| SNPs | 0.21 | 0.22 | 0.36 | -0.62 | 0.47 | 314575 | 60.25 | 15.85 | 0.19 | 23.72 | 0.12 | |
| L1 | All | 0.15 | 0.22 | 0.31 | -1.39∗∗∗ | -0.48∗∗ | 2508 | 65.67 | 19.46 | 0 | 14.87 | 0.04∗∗ |
| L1_AC1 to 16 | FL | 0.15 | 0.21 | 0.32 | -1.46∗∗∗ | -0.13 | 454 | 71.81 | 18.5 | 0 | 9.69 | 0.04∗ |
| TR | 0.18 | 0.25 | 0.30 | -0.95 | -0.5∗∗ | 1062 | 59.13 | 15.35 | 0 | 25.52 | 0.04∗∗∗ | |
| L1_AC17 to 20 | FL | 0.11 | 0.17 | 0.28 | -2.06∗∗∗ | -0.78∗∗∗ | 635 | 68.82 | 27.09 | 0 | 4.09 | 0.03∗ |
| TR | 0.14 | 0.20 | 0.31 | -1.6∗∗∗ | -0.24∗ | 357 | 71.71 | 19.33 | 0 | 8.96 | 0.04 | |
| L2 | All | 0.15 | 0.23 | 0.28 | -1.27∗∗∗ | -0.74∗∗∗ | 11210 | 61.06 | 23.76 | 0 | 15.18 | 0.05∗∗ |
| FL | 0.13 | 0.20 | 0.28 | -1.65∗∗∗ | -0.75∗∗∗ | 769 | 67.1 | 25.1 | 0 | 7.80 | 0.04∗∗∗ | |
| TR | 0.15 | 0.23 | 0.28 | -1.24∗∗∗ | -0.74∗∗∗ | 10440 | 60.61 | 23.66 | 0 | 15.73 | 0.05∗∗ | |
| CR1 | All | 0.15 | 0.22 | 0.31 | -1.31∗∗∗ | -0.29∗ | 27802 | 70.35 | 18.02 | 0.02 | 11.62 | 0.05 |
| FL | 0.14 | 0.21 | 0.30 | -1.51∗∗∗ | -0.49∗∗ | 2578 | 68 | 23.27 | 0 | 8.73 | 0.05 | |
| TR | 0.16 | 0.22 | 0.31 | -1.29∗∗∗ | -0.27∗ | 25224 | 70.59 | 17.48 | 0.02 | 11.91 | 0.05 | |
| R4 | All | 0.17 | 0.24 | 0.25 | -1.04∗ | -1.1∗∗∗ | 1729 | 49.1 | 20.76 | 0 | 30.13 | 0.03∗∗∗ |
| FL | 0.16 | 0.23 | 0.25 | -1.16∗∗ | -1.21∗∗∗ | 1017 | 47.79 | 20.94 | 0 | 31.27 | 0.02∗∗∗ | |
| TR | 0.18 | 0.25 | 0.27 | -0.87 | -0.93∗∗∗ | 712 | 50.98 | 20.51 | 0 | 28.51 | 0.04∗∗∗ | |
| RTE-1 | All | 0.11 | 0.18 | 0.23 | -1.91∗∗∗ | -1.42∗∗∗ | 2853 | 62.57 | 33.16 | 0 | 4.28 | 0.02∗∗∗ |
| FL | 0.11 | 0.18 | 0.22 | -2.00∗∗∗ | -1.52∗∗∗ | 1774 | 61.72 | 35.17 | 0 | 3.10 | 0.02∗∗∗ | |
| TR | 0.12 | 0.19 | 0.24 | -1.77∗∗∗ | -1.23∗∗∗ | 1079 | 63.95 | 29.84 | 0 | 6.21 | 0.02∗∗∗ | |
| RTEBovB | All | 0.25 | 0.31 | 0.34 | -0.08+ | 0.06 | 514 | 37.74 | 12.84 | 0 | 49.42 | 0.05∗∗∗ |
| FL | 0.27 | 0.38 | 0.33 | 0.76 | -0.06 | 8 | 25 | 25 | 0 | 50.00 | 0.14 | |
| TR | 0.25 | 0.31 | 0.34 | -0.1 | 0.06 | 506 | 37.94 | 12.65 | 0 | 49.41 | 0.05 | |
Copy numbers of L1 and L2 families.
| L1 Clade | L2 clade | RTE clade | |||
|---|---|---|---|---|---|
| Families | Copy number | Families | Copy number | Families | Copy number |
| L1AC01 | 68 | L2AC01 | 507 | RTE-1 | 2853 |
| L1AC02 | 18 | L2AC02 | 336 | RTEBovB | 514 |
| L1AC03 | 0 | L2AC03 | 301 | ||
| L1AC04 | 43 | L2AC04 | 504 | ||
| L1AC05 | 27 | L2AC05 | 276 | ||
| L1AC06 | 87 | L2AC06 | 569 | ||
| L1AC07 | 532 | L2AC07 | 543 | ||
| L1AC08 | 95 | L2AC08 | 1424 | ||
| L1AC09 | 82 | L2AC09 | 1661 | ||
| L1AC10 | 0 | L2AC10 | 131 | ||
| L1AC11 | 90 | L2AC11 | 720 | ||
| L1AC12 | 52 | L2AC12 | 206 | ||
| L1AC13 | 103 | L2AC13 | 948 | ||
| L1AC14 | 85 | L2AC14 | 256 | ||
| L1AC15 | 181 | L2AC15 | 1177 | ||
| L1AC16 | 53 | L2AC16 | 388 | ||
| L1AC17 | 763 | L2AC17 | 1263 | ||
| L1AC18 | 0 | ||||
| L1AC19 | 23 | ||||
| L1AC20 | 206 | ||||
Summary of parameters (in demographic units) estimated with fastsimcoal2.5.
| Parameter | 2.50% | Maximum Likelihood estimate | 97.50% |
|---|---|---|---|
| Ancestral size (Gulf) | 379795 | 1422722 | 8838592 |
| Ancestral size (Florida) | 366002 | 751115 | 1756393 |
| Ancestral size (All) | 564492 | 1167977 | 1488644 |
| Current size (Florida) | 1959085 | 3316203 | 4603720 |
| Current size (Gulf) | 101238 | 235789 | 351645 |
| Time since size change (Gulf) | 57331 | 274157 | 559121 |
| Time since size change (Florida) | 275163 | 802462 | 1110215 |
| Migration rate (Gulf from Florida) | 2.96E-07 | 3.94E-07 | 5.51E-07 |
| Migration rate (Florida from Gulf) | 2.19E-07 | 3.38E-07 | 9.00E-07 |
Comparison of the mean number of pairwise divergence for complete and truncated elements in the two populations.
| Clade | Florida, complete | Florida, truncated | W summary statistics | Gulf, complete | Gulf, truncated | W summary statistics | ||
|---|---|---|---|---|---|---|---|---|
| CR1 | ||||||||
| L1 (AC 1 to 16) | 0.322 | 0.296 | 30660 | 0.06041 | ||||
| L1 (AC 17 to 20) | 0.276 | 0.314 | 8754 | 0.05928 | ||||
| L2 | 0.278 | 0.280 | 517370 | 0.8827 | ||||
| R4 | ||||||||
| RTE01 | ||||||||