| Literature DB >> 27278669 |
Natalie C Fonville1, Karthik Raja Velmurugan1, Hongseok Tae1, Zalman Vaksman1, Lauren J McIver1, Harold R Garner1.
Abstract
The human genome is 99% complete. This study contributes to filling the 1% gap by enriching previously unknown repeat regions called microsatellites (MST). We devised a Global MST Enrichment (GME) kit to enrich and nextgen sequence 2 colorectal cell lines and 16 normal human samples to illustrate its utility in identifying contigs from reads that do not map to the genome reference. The analysis of these samples yielded 790 novel extra-referential concordant contigs that are observed in more than one sample. We searched for evidence of functional elements in the concordant contigs in two ways: (1) BLAST-ing each contig against normal RNA-Seq samples, (2) Checking for predicted functional elements using GlimmerHMM. Of the 790 concordant contigs, 37 had an exact match to at least one RNA-Seq read; 15 aligned to more than 100 RNA-Seq reads. Of the 249 concordant contigs predicted by GlimmerHMM to have functional elements, 6 had at least one exact RNA-Seq match. BLAST-ing these novel contigs against all publically available sequences confirmed that they were found in human and chimpanzee BAC and FOSMID clones sequenced as part of the original human genome project. These extra-referential contigs predominantly contained pentameric repeats, especially two motifs: AATGG and GTGGA.Entities:
Mesh:
Year: 2016 PMID: 27278669 PMCID: PMC4899811 DOI: 10.1038/srep27722
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Frequency of MSTs motif classes in unmapped reads relative to those found in the reference genome (HG19) and the whole genome.
The majority of novel microsatellite loci captured using our method contained pentameric repeats. For comparison, most loci found in the reference genome are not pentameric; and analysis of whole genome data confirms that much of the missing genome is associated with pentamer repeat regions.
Mapping of enrichment reads.
| Sample & Enrichment | Total Reads | Mapped % | Exome Overlap % | Mapped with MST % | Known MST Loci called % | Unmapped % | Unmapped with MST% |
|---|---|---|---|---|---|---|---|
| DLD-1- Exome | 107763705 | 99.9 | 53.7 | 7.3 | 44.4 | 0.1 | 1.6 |
| DLD-1 - GME | 99869840 | 99.6 | 2.9 | 15.7 | 61.7 | 0.4 | 2.8 |
| DLD-1 - Comb | 86571454 | 99.8 | 52.3 | 12.1 | 41.1 | 0.2 | 4.2 |
| SW403 - Exome | 99392535 | 99.9 | 57.3 | 6.8 | 41.2 | 0.1 | 1.8 |
| SW403 - GME | 93042396 | 99.1 | 2.3 | 67.4 | 11.7 | 0.9 | 3.8 |
| SW403 - Comb | 95846052 | 99.7 | 51.7 | 12.0 | 27.3 | 0.3 | 3.3 |
| Normal-1-GME | 107184846 | 99.2 | 1.8 | 86.7 | 33.3 | 0.8 | 1.1 |
| Normal-2-GME | 86039208 | 98.4 | 1.9 | 87.5 | 26.3 | 1.6 | 0.5 |
| Normal-3-GME | 77776824 | 98.1 | 1.9 | 88.6 | 23.3 | 1.9 | 0.4 |
| Normal-4-GME | 76888422 | 99.0 | 2.0 | 89.1 | 23.1 | 1.0 | 0.9 |
| Normal-5-GME | 88088498 | 97.6 | 2.1 | 87.1 | 26.1 | 2.4 | 0.5 |
| Normal-6-GME | 87529722 | 97.7 | 2.0 | 87.7 | 25.9 | 2.3 | 0.4 |
| Normal-7-GME | 85362982 | 98.8 | 2.0 | 86.5 | 26.7 | 1.2 | 0.7 |
| Normal-8-GME | 69912104 | 98.3 | 1.8 | 88.7 | 24.1 | 1.7 | 0.6 |
| Normal-9-GME | 89072202 | 99.1 | 1.8 | 83.5 | 36.8 | 0.9 | 0.7 |
| Normal-10-GME | 88599848 | 99.1 | 1.9 | 87.0 | 30.6 | 0.9 | 0.7 |
| Normal-11-GME | 67477542 | 99.1 | 1.9 | 87.1 | 27.0 | 0.9 | 1.1 |
| Normal-12-GME | 74895624 | 98.4 | 2.1 | 87.6 | 27.3 | 1.6 | 0.5 |
| Normal-13-GME | 102597820 | 98.3 | 2.0 | 89.1 | 29.3 | 1.7 | 0.5 |
| Normal-14-GME | 39497576 | 98.5 | 1.8 | 89.6 | 22.4 | 1.5 | 0.6 |
| Normal-15-GME | 97582298 | 99.3 | 1.8 | 88.1 | 29.3 | 0.7 | 1.3 |
| Normal-16-GME | 91613940 | 98.1 | 1.8 | 86.7 | 32.0 | 1.9 | 0.6 |
Novel contigs and MSTs from unmapped reads.
| Sample & Enrichment | Novel Contigs | % Contigs with MSTs | % Contigs with GLS | % Contigs with both MST and GLS | Total MSTs | % Novel MSTs |
|---|---|---|---|---|---|---|
| DLD1-Exome | 224 | 8 | 16 | 2 | 48 | 35 |
| DLD1-GME | 1469 | 20 | 23 | 7 | 510 | 71 |
| DLD1-Comb | 515 | 36 | 25 | 9 | 297 | 79 |
| SW403-Exome | 162 | 19 | 16 | 3 | 55 | 62 |
| SW403-GME | 372 | 46 | 34 | 19 | 227 | 92 |
| SW403-Comb | 267 | 38 | 34 | 15 | 153 | 88 |
| Normal-1-GME | 312 | 52 | 34 | 20 | 265 | 78 |
| Normal-2-GME | 278 | 55 | 28 | 18 | 244 | 82 |
| Normal-3-GME | 261 | 54 | 30 | 17 | 239 | 74 |
| Normal-4-GME | 289 | 53 | 30 | 17 | 255 | 75 |
| Normal-5-GME | 257 | 54 | 34 | 18 | 249 | 70 |
| Normal-6-GME | 316 | 50 | 33 | 17 | 253 | 79 |
| Normal-7-GME | 275 | 52 | 36 | 20 | 250 | 71 |
| Normal-8-GME | 255 | 56 | 31 | 18 | 219 | 82 |
| Normal-9-GME | 245 | 52 | 36 | 18 | 210 | 74 |
| Normal-10-GME | 221 | 54 | 31 | 17 | 197 | 75 |
| Normal-11-GME | 248 | 52 | 33 | 20 | 219 | 79 |
| Normal-12-GME | 265 | 51 | 38 | 21 | 221 | 76 |
| Normal-13-GME | 308 | 51 | 32 | 18 | 243 | 79 |
| Normal-14-GME | 198 | 52 | 29 | 16 | 192 | 68 |
| Normal-15-GME | 351 | 52 | 34 | 19 | 279 | 81 |
| Normal-16-GME | 307 | 51 | 28 | 16 | 254 | 79 |
| Total | 7395 | 42 | 29 | 14 | 5079 | 77 |
| Concordant | 790 | 42 | 32 | 16 | 533 | 100 |
Alignment analysis of concordant contigs with normal lymphoblastoid RNA-Seq samples.
| # | RNA-Seq samples | Total aligned reads | Contig length | BLAST hit | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | ||||
| 1 | 9441 | 17648 | 208 | 31 | 6957 | 15064 | 7134 | 2005 | 12173 | 7 | 70668 | 370 | HS clone. Chr21 |
| 2 | 24 | 76 | 77 | 7 | 26 | 33 | 19 | 63 | 32 | 11 | 368 | 615 | HS FOSMID clone. Chr7 |
| 3 | 24 | 67 | 75 | 7 | 21 | 33 | 20 | 61 | 32 | 11 | 351 | 628 | HS FOSMID clone. Chr7 |
| 4 | 24 | 69 | 75 | 7 | 21 | 33 | 19 | 60 | 32 | 11 | 351 | 624 | HS FOSMID clone. Chr7 |
| 5 | 55 | 38 | 28 | 11 | 4 | 48 | 0 | 48 | 46 | 26 | 304 | 531 | HS FOSMID clone. Chr17 |
| 6 | 25 | 67 | 43 | 4 | 23 | 23 | 16 | 52 | 29 | 14 | 296 | 310 | PT BAC clone. Chr7 |
| 7 | 18 | 25 | 38 | 5 | 57 | 36 | 35 | 7 | 30 | 24 | 275 | 303 | HS BAC clone. Chr17 |
| 8 | 22 | 66 | 42 | 4 | 19 | 18 | 13 | 52 | 22 | 13 | 271 | 314 | PT BAC clone. Chr7 |
| 9 | 21 | 66 | 41 | 4 | 16 | 18 | 13 | 51 | 21 | 13 | 264 | 323 | PT BAC clone. Chr7 |
| 10 | 19 | 61 | 39 | 4 | 18 | 12 | 11 | 44 | 16 | 11 | 235 | 289 | PT BAC clone. Chr7 |
The top 10 out of 37 concordant contigs that had RNA-Seq hits are presented in this table.
HS: Homo sapiens; PT: Pan troglodytes; Chr: chromosome.