| Literature DB >> 21342592 |
Sung-Chou Li1, Wen-Ching Chan, Chun-Hung Lai, Kuo-Wang Tsai, Chun-Nan Hsu, Yuh-Shan Jou, Hua-Chien Chen, Chun-Hong Chen, Wen-Chang Lin.
Abstract
BACKGROUND: Un-MAppable Reads Solution (UMARS) is a user-friendly web service focusing on retrieving valuable information from sequence reads that cannot be mapped back to reference genomes. Recently, next-generation sequencing (NGS) technology has emerged as a powerful tool for generating high-throughput sequencing data and has been applied to many kinds of biological research. In a typical analysis, adaptor-trimmed NGS reads were first mapped back to reference sequences, including genomes or transcripts. However, a fraction of NGS reads failed to be mapped back to the reference sequences. Such un-mappable reads are usually imputed to sequencing errors and discarded without further consideration.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21342592 PMCID: PMC3044317 DOI: 10.1186/1471-2105-12-S1-S9
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Collection of exon-exon junctions. By our definition, the EEJs can be continuous or discrete. The former represent known alternative splicing products. The latter, however, represent novel alternative splicing isoforms.
Figure 2Flowchart of UMARS service. The UMARS service can be divided into two subservices: for discovering viral genomic regions (UMARS:Vir); and discovering novel alternative splicing exon-exon junctions (UMARS:EEJ). Un-mappable reads must be processed by NRP to solving redundancy problem before uploaded to UMARS.
Figure 3Interface of UMARS service. UMARS:Vir and UMARS:EEJ have individual parameters. Users must adjust the parameters according to their specific data source to obtain optimal results. In UMARS:Vir service, users must specify the value, Standard or Loose, of “Parameter Criteria” parameter. Specifying Standard outputs only the viruses with CN >= 100 and RN >= 10 (see Table 1); while specifying Loose outputs all virus with CN >= 1 and RNA >= 1. This is an empirical criterion to reduce random match.
Summary of viruses detected from L1 un-mappable reads.
| Viral acc. | CN | RN | Name | Host |
|---|---|---|---|---|
| NC_007605.1 | 105,325 (86,191) | 629 (469) | Human herpesvirus 4 type 1 | human |
| NC_006146.1 | 8,556 (7,311) | 157 (34) | Macacine herpesvirus 4 | rhesus |
| NC_001798.1 | 2,332 (243) | 348 (21) | Human herpesvirus 2 | human |
| NC_005261.2 | 8,247 (200) | 304 (18) | Bovine herpesvirus 5 | bovine |
| NC_007653.1 | 7,458 (172) | 363 (24) | Papiine herpesvirus 2 | baboon |
| NC_001806.1 | 1,773 (147) | 278 (13) | Human herpesvirus 1 | human |
| NC_006560.1 | 6,107 (118) | 332 (14) | Cercopithecine herpesvirus 2 | monkey |
| NC_004812.1 | 5,961 (107) | 364 (7) | Macacine herpesvirus 1 | rhesus |
CN denotes copy number of total reads mappable to the virus; RN denotes the number of distinct genomic regions mapped by reads. The values in parentheses denote the corresponding values when no mismatch was allowed in the mapping procedure. Because their high similarity, most of the reads of type 1 EBV were also mapped back to type 2 EBV.
EBV genomic regions mapped by reads.
| Category | Intergenic | Protein coding | pre-miRNA |
|---|---|---|---|
| CN | 16,887 | 1,061 | 87,788 |
| RN | 193 | 57 | 379 |
The regions mapped by reads were classified into three categories according to the annotation of RefSeq 40 and miRBase 15. pre-miRNA regions dominated over other regions in terms of copy number and region number.
Summary of EEJs detected from L2 un-mappable reads.
| EEJ category | Continuous | Discrete |
|---|---|---|
| 18,228 (95) | 17,687 (1,209) | |
| 2,038 (14) | 700 (69) | |
| 2,986 (14) | 1,269 (83) | |
| 581,260 (4,746) | 34,469 (1,642) |
The digits in parenthesis denote the values when no mismatch was allowed in the mapping procedure. Continuous EEJ dominated over discrete ones in terms of read copy number and EST abundance.
Figure 4PCR experimental validation of the detected EEJ, NM_007108 (4:2-4). The marker lane is labeled with M and is presented as 50bp ladder. The central bright band in the marker lane is equal to 350 bp. (a) UCSC Genome Browser shows that four ESTs match the detected EEJ. (b, c) The PCR result showed that the expected EEJ (d transcript) can be experimentally detected. c and d denotes continuous and discrete transcripts, respectively. (d) The sequencing result provided the authenticity of the detected EEJs.
Detected EEJs from CLSTN1.
| Gene name | Accession | EEJ pattern | Mapping pattern | Read ID | CN | MM | seq |
|---|---|---|---|---|---|---|---|
| NM_001009566 | 19:6-8 | 14 + 5 | NR12 | 32 | 0 | cctgggtggcaagggtgcg | |
| NM_001009566 | 19:6-8 | 14 + 5 | NR541 | 6 | 1 | cctgggtggcatgggtgcg | |
| NM_001009566 | 19:6-8 | 14 + 5 | NR3652 | 4 | 1 | cctgggtggcaaggatgcg | |
| NM_014944 | 18:5-7 | 14 + 5 | NR548 | 32 | 0 | cctgggtggcaagggtgcg | |
| NM_014944 | 18:5-7 | 14 + 5 | NR695 | 6 | 1 | cctgggtggcatgggtgcg | |
| NM_014944 | 18:5-7 | 14 + 5 | NR47 | 4 | 1 | cctgggtggcaaggatgcg |
NM_001009566 and NM_014944 are splicing isoforms of CLSTN1. The EEJ (19:6-8) from NM_001009566 is identical to the EEJ (18:5-7) from NM_014944.