| Literature DB >> 22171334 |
Antonio Marco1, Sam Griffiths-Jones.
Abstract
MOTIVATION: Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs.Entities:
Mesh:
Substances:
Year: 2011 PMID: 22171334 PMCID: PMC3268249 DOI: 10.1093/bioinformatics/btr686
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Effects of color space encoding in the first nucleotide of sequenced reads. (A) cDNA sequences are linked to a P1 adapter. The first color produced is determined by the first base of the read and the last of the adapter. If we remove the first color, the first base is lost during the color-to-base decoding. This first base is kept if we do not remove the first color. (B) Effect of missing the first nucleotide (red) in sequencing long (fragmented) sequences. (C) Effect of missing the first nucleotide in sequencing microRNAs.
Linker fragments detected at 3′ end of sequenced reads using Bowtie
| Color mismatches | Reads matching 3′ linkers (%) | Reads matching 5′ linkers (%) |
|---|---|---|
| 0 | 15 540 742 (23.17) | 1827 (0.00) |
| 1 | 24 626 724 (36.72) | 2745 (0.00) |
| 2 | 31 983 586 (47.69) | 3815 (0.01) |
| 3 | 38 392 776 (57.24) | 5775 (0.01) |
Reads mapped with the sequential trimming and mapping strategy
| Experiment | Description | Total reads | Mapped reads (%) | Small RNA set (19–25 nt) (%) | Expected FP in small RNAs (%) |
|---|---|---|---|---|---|
| GEO:GSM639446 | 67 070 132 | 45 838 144 (68.34) | 21 547 990 (32.13) | 39 267 (0.06) | |
| GEO:GSM639447 | 52 620 004 | 30 382 986 (57.74) | 14 120 332 (26.83) | 35 162 (0.07) | |
| SRA:SRR039230 | 36 796 459 | 28 909 134 (78.57) | 15 204 339 (41.32) | 19 059 (0.05) |
Fig. 2.Comparison of reads mapped with RNA2MAP and a sequential trimming strategy. (A) Reads mapped using both strategies to known microRNAs in miRBase. (B) Reads mapped for both strategies to newly discovered microRNAs by Chen ). The inset in both graphs shows a zoomed view of the shaded area.
Novel microRNAs discovered in Apis mellifera and T.castaneum
| Name | Chr | Str | Start | End | Reads |
|---|---|---|---|---|---|
| ame-mir-6000 | LG11 | − | 1 144 1637 | 11 441 734 | 48 |
| ame-mir-6001 | LG13 | − | 2 650 488 | 2 650 555 | 2973 |
| ame-mir-6002 | LG16 | − | 2 944 352 | 2 944 431 | 15 |
| ame-mir-6003 | LG2 | − | 705 902 | 7 059 571 | 35 |
| ame-mir-6004 | LG5 | + | 7 573 377 | 7 573 464 | 26 |
| ame-mir-6005 | LG6 | − | 6 509 945 | 6 510 107 | 180 |
| ame-mir-6006 | LG9 | − | 62 686 | 62 773 | 19 |
| ame-mir-2765 | LG9 | + | 5 203 815 | 5 203 903 | 162 |
| tca-mir-6007 | CHR1 | + | 8 492 377 | 8 492 463 | 867 |
| tca-mir-6008 | CHR2 | + | 14 782 252 | 14 782 347 | 27 |
| tca-mir-6009 | CHR2 | + | 186 871 | 186 960 | 44 |
| tca-mir-6010 | CHR2 | + | 11 831 158 | 11 831 242 | 33 |
| tca-mir-6011 | CHR3 | − | 31 219 647 | 31 219 723 | 41 |
| tca-mir-6012 | CHR3 | − | 9 600 253 | 9 600 425 | 258 |
| tca-mir-6013 | CHR4 | + | 11 124 335 | 11 124 402 | 17 |
| tca-mir-6014 | CHR4 | + | 3 369 897 | 3 369 978 | 46 |
| tca-mir-6015 | CHR4 | + | 11 485 945 | 1 1486 036 | 23 |
| tca-mir-6016 | CHR7 | − | 17 010 368 | 17 010 442 | 57 |
| tca-mir-6017 | CHR7 | − | 10 450 348 | 10 450 502 | 252 |
| tca-mir-6018 | CHR8 | − | 247 709 | 247 801 | 42 |
| tca-mir-927b | CHR9 | − | 16 099 288 | 16 099 383 | 224 |
| tca-mir-9e | CHR9 | − | 11 062 | 11 172 | 3696 |
Chr, Chromosome/linkage group; Str, strand; start, first nucleotide position; end, last nucleotide position; reads: total number of reads.