| Literature DB >> 24026178 |
Alexej Abyzov1, Rebecca Iskow, Omer Gokcumen, David W Radke, Suganthi Balasubramanian, Baikang Pei, Lukas Habegger, Charles Lee, Mark Gerstein.
Abstract
In primates and other animals, reverse transcription of mRNA followed by genomic integration creates retroduplications. Expressed retroduplications are either "retrogenes" coding for functioning proteins, or expressed "processed pseudogenes," which can function as noncoding RNAs. To date, little is known about the variation in retroduplications in terms of their presence or absence across individuals in the human population. We have developed new methodologies that allow us to identify "novel" retroduplications (i.e., those not present in the reference genome), to find their insertion points, and to genotype them. Using these methods, we catalogued and analyzed 174 retroduplication variants in almost one thousand humans, which were sequenced as part of Phase 1 of The 1000 Genomes Project Consortium. The accuracy of our data set was corroborated by (1) multiple lines of sequencing evidence for retroduplication (e.g., depth of coverage in exons vs. introns), (2) experimental validation, and (3) the fact that we can reconstruct a correct phylogenetic tree of human subpopulations based solely on retroduplications. We also show that parent genes of retroduplication variants tend to be expressed at the M-to-G1 transition in the cell cycle and that M-to-G1 expressed genes have more copies of fixed retroduplications than genes expressed at other times. These findings suggest that cell division is coupled to retrotransposition and, perhaps, is even a requirement for it.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24026178 PMCID: PMC3847774 DOI: 10.1101/gr.154625.113
Source DB: PubMed Journal: Genome Res ISSN: 1088-9051 Impact factor: 9.043
Summary of predicted novel retroduplications
Figure 1.Approach for novel retroduplication discovery. (A) If an analyzed genome has an unknown (i.e., absent from reference genome) retroduplication, then sequencing reads originating from the retroduplication can be used for it to be discovered. (B) Reads aligned to the reference genome provide three lines of evidence for the novel retroduplication: reads clustering around the insertion point, increased read depth in exons, and mapping of unaligned reads to a splice-junction library. (C) The existence of a novel retroduplication for the SKA3 gene in the CEU trio is supported by the three lines of evidence. The retroduplication is polymorphic as it is not present in the mother's genome. (D) PCR validation strategy. Two sets of primers test for the presence of a splice junction and for the insertion point, respectively. (E) Existence of novel retroduplication for SKA3 is validated in the daughter's and father's genomes but not in the mother's. (F) The novel retroduplication for SKA3 is polymorphic in the CEU population, as PCR across the insertion point yields a product in only some of the individuals tested.
Prediction of retroduplications in CEU and YRI trios
Figure 2.Frequency of novel retroduplications by populations. Most of the novel retroduplications are discovered in only one population (due to conservative calling) but are present in a few more as evident from genotyping. The phylogenetic tree was constructed based on the overlap of novel retroduplications between different populations. The tree shows that, except in one case, populations separate perfectly by continental groups. Outlier clustering of the Finnish population (FIN) is likely due to its distinct data properties allowing discovery of more unique retroduplications (see text). Admixed populations and the Iberian population (with just a few sequenced individuals) were excluded from phylogenetic analysis.
Figure 3.Enrichment of parent genes for expression at different phases of the cell cycle. A list of periodic genes was produced previously (Whitfield et al. 2002) and downloaded from Cyclebase (Gauthier et al. 2010). (A) Genes with RDVs and recent retroduplication in the reference genome are significantly enriched (with P-values = 0.012 and 0.008, respectively, and denoted by [*]), for expression in the M and M/G1 cell cycle phases. M/G1 represents genes with uncertain phase assignment close to the M-to-G1 transition due to measurement imprecision. During M/G1, cell division occurs (red horizontal bar). Due to the saturation effect (see text), the enrichment for parent genes being expressed during M or M/G1 is not obvious when analyzing all known retroduplications. (B) Average number of retroduplications in the reference genome per gene (y-axis) is depicted for periodic genes with maximum expression at particular cell cycle phases (x-axis). Genes expressed in M and M/G1 phases generate significantly more (P-value = 0.0047) retroduplications than genes expressed during other phases, suggesting that cell cycle timing directly relates to retroduplication frequency.