| Literature DB >> 21613236 |
Abstract
Piwi-interacting RNAs (piRNAs) are a recently discovered class of 24- to 30-nt noncoding RNAs whose best-understood function is to repress transposable elements (TEs) in animal germ lines. In humans, TE-derived sequences comprise ∼45% of the genome and there are several active TE families, including LINE-1 and Alu elements, which are a significant source of de novo mutations and intrapopulation variability. In the "ping-pong model," piRNAs are thought to alternatively cleave sense and antisense TE transcripts in a positive feedback loop. Because piRNAs are poorly conserved between closely related species, including human and chimpanzee, we took a population genomics approach to study piRNA function and evolution. We found strong statistical evidence that piRNA sequences are under selective constraint in African populations. We then mapped the piRNA sequences to human TE sequences and found strong correlations between the age of each LINE-1 and Alu subfamily and the number of piRNAs mapping to the subfamily. This result supports the idea that piRNAs function as repressors of TEs in humans. Finally, we observed a significant depletion of piRNA matches in the reverse transcriptase region of the consensus human LINE-1 element but not of the consensus mouse LINE-1 element. This result suggests that reverse transcriptase might have an endogenous role specific to humans. Overall, our results elucidate the function and evolution of piRNAs in humans and highlight the utility of population genomics analysis for studying this rapidly evolving genetic system.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21613236 PMCID: PMC3199439 DOI: 10.1093/molbev/msr141
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
P Values from Wilcoxon Tests for Individual HapMap Phase 3 Populations
| Population | Number of piRNA SNPs | |
| ASW | 248 | |
| CEU | 212 | 0.316 |
| CHB | 202 | 0.219 |
| CHD | 197 | 0.114 |
| GIH | 213 | 0.0651 |
| JPT | 199 | 0.0318 |
| LWK | 246 | |
| MEX | 231 | 0.219 |
| MKK | 230 | |
| TSI | 218 | 0.140 |
| YRI | 245 |
Note.—The P values shown were not corrected for multiple hypothesis testing. P values significant at the 5% threshold after Bonferroni correction are shown in bold. The population names are from the HapMap Project Phase 3. ASW, African Americans; CEU, Europeans; CHB, Chinese in Beijing; CHD, Chinese in Denver; GIH, Gujarati Indians; JPT, Japanese; LWK, Luhya; MEX, Mexicans; MKK, Masai; TSI, Tuscans; YRI, Yorubans.
FDerived allele frequency distributions for different classes of functional sites in the following HapMap phase 3 populations: ASW (African ancestry in Southwest USA), YRI (Yoruba in Ibadan, Nigeria), CHB (Han Chinese in Beijing, China), and CEU (Utah residents with Northern and Western European ancestry from the CEPH collection). An excess of SNPs in piRNAs with low derived allele frequency relative to intergenic SNPs is a signature of selective constraint on piRNA sequences. The error bars were computed by bootstrapping samples of SNPs.
Percentage of Bases of LINE-1 Subfamilies That Match piRNAs
| LINE-1 Subfamily | Number of Bases in LINE-1s | Percentage of Bases Matching piRNAs (1 mismatch, 1 indel), % | Percentage of Bases Matching piRNAs (1 mismatch, 0 indel), % | Percentage of Bases Matching piRNAs (0 mismatch, 0 indel), % |
| LINE-1 HS (human specific) | 3,458,046 | 15.96 | 11.87 | 4.23 |
| LINE-1 PA2 (7.6 Ma) | 9,493,804 | 16.56 | 11.35 | 3.83 |
| LINE-1 PA3 (12.5 Ma) | 18,923,178 | 12.71 | 8.91 | 3.23 |
| LINE-1 PA4 (18.0 Ma) | 18,340,583 | 11.18 | 7.17 | 2.23 |
| LINE-1 PA5 (20.4 Ma) | 15,765,637 | 11.55 | 6.92 | 1.64 |
| LINE-1 PA6 (26.8 Ma) | 10,819,404 | 9.26 | 5.90 | 1.48 |
| LINE-1 PA7 (31.4 Ma) | 19,129,677 | 6.14 | 3.55 | 0.96 |
| LINE-1 PA8 (40.9 Ma) | 6,561,140 | 7.06 | 4.23 | 1.00 |
| LINE-1 (all) | 504,651,578 | 3.09 | 1.63 | 0.37 |
Note.—All LINE-1 subfamilies annotated in RepeatMasker (http://www.repeatmasker.org) are listed from youngest to oldest (top to bottom). The age of each LINE-1 subfamily was taken from Khan et al. (2006). The number of bases contained in TEs from each subfamily (column 2) and the percentage of bases that match piRNAs are shown at different matching stringencies (columns 3–5). There is a strong correlation between the age of the subfamily and the percentage of bases that match piRNAs.
FCorrelation between the age of human LINE-1 subfamilies and number of bases in the consensus sequence of that subfamily matching piRNAs.
Percentage of Bases of Alu Subfamilies That Match piRNAs
| Alu Subfamily | Number of Bases in Alu Subfamily | Percentage of Bases Matching piRNAs (1 mismatch, 1 indel), % | Percentage of Bases Matching piRNAs (1 mismatch, 0 indel), % | Percentage of Bases Matching piRNAs (0 mismatch, 0 indel), % |
| AluYg6 (2 Ma) | 162,316 | 33.99 | 21.86 | 13.86 |
| AluYb9 (5 Ma) | 9,126,467 | 41.17 | 20.77 | 17.29 |
| AluYb8 (5–15 Ma) | 8,802,284 | 50.87 | 31.30 | 20.23 |
| AluYa5 (5–15 Ma) | 1,168,599 | 30.46 | 19.92 | 17.66 |
| AluY (25 Ma) | 39,622,226 | 29.78 | 17.57 | 9.05 |
| AluSg (31 Ma) | 23,605,918 | 28.64 | 15.39 | 4.70 |
| AluSx (37 Ma) | 97,504,435 | 28.07 | 14.04 | 3.69 |
| AluSq (44 Ma) | 26,932,423 | 32.23 | 18.16 | 4.97 |
| Alus (all) | 307,703,885 | 27.01 | 14.35 | 4.20 |
Note.—All Alu subfamilies annotated in RepeatMasker (http://www.repeatmasker.org) are listed from youngest to oldest (top to bottom). The age of the Alu subfamilies was compiled from data in Kapitanov and Jurka (1995), Batzer and Deininger (2002), and Salem et al. (2003). The number of bases contained in TEs from each subfamily (column 2) and the percentage of bases that match piRNAs are shown at different matching stringencies (columns 3–5). The bold horizontal lines demarcate major transitions in Alu evolution (Batzer and Deininger 2002). The correlation between the age of the subfamily and the percentage of bases that match piRNAs is discernable across the major groups of Alus. However, within groups, the correlation is weaker than the correlation for LINE-1 elements, perhaps because of the greater uncertainty in the ages of the Alu subfamilies.
Percentage of Bases of Mouse LINE-1 Subfamilies That Match Mouse piRNAs
| Subfamily | 0 Mismatch, % | 1 Mismatch, % | 2 Mismatch, % | |
| Inactive | L1MdF | 1.73 | 6 | 13.2 |
| L1MdF2 | 4.6 | 15.1 | 21.8 | |
| L1MdF3 | 4.85 | 14.2 | 21.6 | |
| Active | L1MdGf | 5.8 | 16.8 | 22.52 |
| L1MdT | 10.3 | 18.9 | 26.5 | |
| L1MdA | 9.45 | 17.5 | 24.0 |
Repeat Masker (http://www.repeatmasker.org) annotates the six youngest LINE-1 subfamilies in the mouse genome (UCSC genome version mm9) as L1MdF, L1MdF2, L1MdF3, L1MdT, L1MdGf, and L1MdA. The different F-subfamilies annotated as L1MdF, L1MdF2 and, L1MdF3 summarize a more complex phylogeny of up to 17 subfamilies of mouse-specific LINE-1s. LINE-1 elements belonging to the subfamilies L1MdT, L1MdGf, and L1MdA have been reported to be currently active (Naas et al. 1998; Hardies et al. 2000; Goodier et al. 2001).
FDensity of piRNA matches to the consensus sequence of human-specific LINE-1 elements. To smooth the density, the plots were made using kernel density estimation with a Gaussian kernel instead of histograms. The blue (green) line shows the density of sense (antisense) piRNA matches to the LINE-1 element. The ∼1-kb region in the coding region of ORF2 that is depleted of piRNA matches is also depleted across all primate-specific LINE-1s in humans. There are 1,134 bases in LINE-1s that match piRNAs.