| Literature DB >> 25348035 |
Djie Tjwan Thung1, Joep de Ligt, Lisenka E M Vissers, Marloes Steehouwer, Mark Kroon, Petra de Vries, Eline P Slagboom, Kai Ye, Joris A Veltman, Jayne Y Hehir-Kwa.
Abstract
Mobile elements are major drivers in changing genomic architecture and can cause disease. The detection of mobile elements is hindered due to the low mappability of their highly repetitive sequences. We have developed an algorithm, called Mobster, to detect non-reference mobile element insertions in next generation sequencing data from both whole genome and whole exome studies. Mobster uses discordant read pairs and clipped reads in combination with consensus sequences of known active mobile elements. Mobster has a low false discovery rateand high recall rate for both L1 and Alu elements. Mobster is available at http://sourceforge.net/projects/mobster.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25348035 PMCID: PMC4228151 DOI: 10.1186/s13059-014-0488-x
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Overview of the Mobster algorithm. (A) In the first phase discordant ends (long red arrows) and clipped ends (short red arrows) are extracted from the BAM file when, respectively, the mate or the unclipped end is mapped uniquely to the reference. Subsequently these reads are mapped to the mobilome and investigated for having a polyA/T tail. (B) After mapping, all mates and unclipped sequences (anchors) belonging to unambiguously mapped Alu, L1, SVA, or HERV-K reads are identified. Anchors of clipped reads are clustered separately from anchors of discordant reads. (1) For clipped clusters, anchors should be: (i) supportive of the same ME family or same polyA/T stretch; (ii) clipped on the same side; and (iii) clipped within a few bp of each other. (2) The 5′ clipped cluster (arrow pointing to right), consisting of right-clipped reads, and 3′ clipped cluster (arrow pointing to left), consisting of left-clipped reads, are indicative of the same MEI event when: (i) they support the same ME family or one of the clusters supports a ME family and the other cluster supports a polyA/T tail; and (ii) they overlap by a maximum of 50 bp (allowing for TSDs) or are separated by a maximum of 20 bp (allowing for target site deletions). (3, 4) Discordant pair anchors, are clustered when: (i) they map to the same strand; (ii) are supporting the same ME family; and (iii) have start positions, which are within a specified neighborhood distance (4). (5) Forward strand anchors form 5′ discordant clusters; reverse strand anchors form 3′ discordant clusters. Discordant clusters from the 5′ and 3′ are indicative of the same MEI event when they overlap by maximal 50 bp or are within a user-defined window size. When possible, clipped clusters are merged with discordant clusters.
Characteristics of the three different experimental datasets used to test MEI identification with Mobster
|
|
|
| |
|---|---|---|---|
| Number samples | 2 | 4 | 300 |
| Average depth of coverage | 40X | 95X | 40X |
| Read length | 100 bp | 90 bp | 50 bp |
| Sequencing platform | Illumina | Illumina | SOLiD |
Percentage of PCR validated events recalled from CEU trio by the different algorithms
|
|
| |
|---|---|---|
| Nr PCR events | 1029 | 99 |
| Mobster |
|
|
| Tangram | 98.1 | 85.5 |
| RetroSeq | 97.8 | 83.8 |
| alu-detect | 95.1a | NA |
| Tea | 91.1 | 80.7 |
aRecall calculated based on NA12878 (408 PCR validated Alu events).
Values in bold depict best performing algorithm.
Figure 2Characteristics of detected MEI events. (A) The predicted MEI events in the MZ twins show target site duplication sizes and target site deletion sizes characteristic of retrotransposition. (B) pMEI predictions in whole genome and whole exome paired-end datasets show a similar distribution pattern in mobile family origin, with Alu being inserted most frequently in both datasets.
Gene components affected by MEI events in healthy individuals sequenced with WGS
|
|
|
|---|---|
| Genic | 436 (36.9%) |
| Coding gene exonic | 1 (0.1%) |
| Coding gene intronic | 395 (33.4%) |
| Coding gene UTR5 | 4 |
| Coding gene UTR3 | 1 (0.1%) |
| Non-coding gene exonic | 3 (0.3%) |
| Non-coding gene intronic | 32 (2.7%) |
| Non-genic | 745 (63.1%) |
| 1 kb downstream TSS | 6 (0.5%) |
| 1 kb upstream TSS | 5 (0.4%) |
| Intergenic | 734 (62.2%) |
Validation of MEI detection in WGS and WES paired-end data
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
| chr1 | 60,470,596 |
| Intronic |
| Heterozygous | Duplication | Yes | |
| chr1 | 83,201,791 | L1 | Intergenic | Heterozygous | Duplication | Yes | Yes | |
| chr1 | 93,167,519 |
| Intronic |
| Homozygous | Unknown | Yes | |
| chr1 | 142,803,597 | L1 | Intergenic | Homozygous reference | Duplication | Yes | No | |
| chr3 | 103,171,382 |
| Intergenic | Heterozygous | Deletion | Yes | ||
| chr4 | 80,883,493 |
| Intronic |
| Heterozygous | Duplication | Yes | Yes |
| chr8 | 53,791,040 |
| Intergenic | Heterozygous | Duplication | Yes | ||
| chr8 | 132,672,106 |
| Intergenic | Heterozygous | Duplication | Yes | Yes | |
| chr10 | 130,625,059 | L1 | Intergenic | Heterozygous | Duplication | Yes | ||
| chr17 | 43,660,608 | SVA | Intergenic | Heterozygous | Unknown | Yes | ||
| chr20 | 29,638,569 | L1 | Upstream |
| Heterozygous | Duplication | Yes | |
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
| chr1 | 93,167,519 |
| Intronic |
| Homozygous | Unknown | Yes | |
| chr2 | 11,426,360 |
| Intronic |
| Homozygous | Unknown | Yes | |
| chr3 | 50,879,159 |
| Exonic |
| Homozygous | Unknown | Yes | |
| chr5 | 173,036,482 | L1 | Exonic |
| NA | Unknown | Yes | No |
| chr6 | 52,712,717 |
| Intergenic | Homozygous | Unknown | Yes | ||
| chr9 | 68,415,155 |
| Intergenic | NA | Unknown | Yesc | ||
| chr11 | 428,014 |
| Intronic |
| Homozygous | Unknown | Yes | |
| chr11 | 112,084,617 | L1 | Intronic |
| Heterozygous | Unknown | Yes | |
| chr17 | 61,565,890 |
| Intronic |
| Heterozygous | Unknown | Yes | |
| chr19 | 52,888,055 |
| Exonic |
| Homozygous | Duplication | Yes | |
aOverlap with gene component is determined based on Mobster’s predicted insertion window.
bNot overlapping dbRIP or in silico MEI predictions [13,14,17] within a 50 bp window.
c454 validation by Stewart et al.
On average 1,181 MEI events were detected per WGS sample of which 4.5% were novel. Ten of the 11 randomly select MEI events could be validated. MEI detection in WES produced on average 42 events per exome of which 4.8% were novel. Nine of the 10 randomly selected MEI events from the WES predictions could be validated.
TSD = target site duplication.
Figure 3Validation of MEI events detected. (A) Validation of Alu events, bp in brackets correspond to the expected PCR product size of the wild-type allele. 1: 100 bp marker, 2: WES event10 homozygous MEI insertion (178 bp). (B) Sanger trace of first breakpoint. (C) Schema representing exonic Alu insertion in ZNF880. (D) Single-end exome sequencing reveals a novel processed pseudogene (UQCR10) insertion into the exon of C1orf194. 1: 100 bp marker, 2: homozygous insertion, 3: heterozygous insertion. (E) Sanger trace representing distal breakpoint of insertion. Distal breakpoint has been mapped to chromosome 1 between 109,650,634-109,650,635 (F) Schema representing the retrotransposition event.