| Literature DB >> 20598141 |
Parsa Hosseini1, Arianne Tremblay, Benjamin F Matthews, Nadim W Alkharouf.
Abstract
BACKGROUND: The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value.Entities:
Year: 2010 PMID: 20598141 PMCID: PMC2908109 DOI: 10.1186/1756-0500-3-183
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Reads per chromosome. The number of sequenced reads per CASAVA folder is concatenated, calculated and visually presented before annotation is performed. Dataset from Tremblay, A. (2010).
Figure 2Defining the necessary columns for tag-counting and analysis. There must be a column in both files which have like values. Above, such a column is 'Accession' and 'AccessionNumber'.
Figure 3Generated output. Resultant output is displayed visually and saved locally.
Number of reads per lane
| Chromosome | Chrom size (bp) | Lane1 | Lane2 | Lane3 | Lane4 | Lane5 | Lane6 | Lane7 | Lane8 |
|---|---|---|---|---|---|---|---|---|---|
| 1 | 55,915,595 | 44087 | 56460 | 144310 | 125574 | CONTROL | 73721 | 77785 | 90255 |
| 2 | 51,656,713 | 58000 | 74229 | 190042 | 165634 | CONTROL | 96910 | 102557 | 117524 |
| 3 | 47,781,076 | 50677 | 64715 | 164496 | 142812 | CONTROL | 84010 | 88910 | 101738 |
| 4 | 49,243,852 | 50726 | 64908 | 165724 | 144360 | CONTROL | 84645 | 89600 | 102509 |
| 5 | 41,936,504 | 44080 | 56686 | 144930 | 125480 | CONTROL | 73587 | 78373 | 89551 |
| 6 | 50,722,821 | 61200 | 77619 | 199609 | 173280 | CONTROL | 100638 | 106572 | 122915 |
| 7 | 44,683,157 | 50451 | 64728 | 165386 | 144206 | CONTROL | 84790 | 89232 | 102316 |
| 8 | 46,995,532 | 75392 | 95643 | 245844 | 214476 | CONTROL | 125271 | 133156 | 152202 |
| 9 | 46,843,750 | 48079 | 61227 | 157808 | 138132 | CONTROL | 80633 | 84986 | 97566 |
| 10 | 50,969,635 | 61068 | 77456 | 198859 | 172297 | CONTROL | 101868 | 107309 | 122479 |
| 11 | 39,172,790 | 60306 | 77010 | 195521 | 170430 | CONTROL | 100450 | 105876 | 120835 |
| 12 | 40,113,140 | 43131 | 55977 | 141515 | 123466 | CONTROL | 72493 | 76784 | 88282 |
| 13 | 44,408,971 | 72442 | 92603 | 235446 | 204937 | CONTROL | 119710 | 126250 | 146480 |
| 14 | 49,711,204 | 49088 | 63027 | 161172 | 140375 | CONTROL | 82670 | 86971 | 99934 |
| 15 | 50,939,160 | 57040 | 73027 | 186523 | 162741 | CONTROL | 95461 | 100196 | 114936 |
| 16 | 37,397,385 | 41074 | 52851 | 134554 | 117510 | CONTROL | 68635 | 72143 | 82947 |
| 17 | 41,906,774 | 56466 | 72499 | 186013 | 161964 | CONTROL | 94513 | 99838 | 113529 |
| 18 | 62,308,140 | 59460 | 76866 | 195838 | 170495 | CONTROL | 100206 | 105912 | 121360 |
| 19 | 50,589,441 | 45240 | 57579 | 146309 | 127167 | CONTROL | 74770 | 79226 | 90339 |
| 20 | 46,773,167 | 46146 | 58890 | 150392 | 131338 | CONTROL | 76533 | 80716 | 92677 |
| - | |||||||||
| - | |||||||||
| - | |||||||||
Lanes 1 and 2 had one pM of cDNA, lanes 3 and 4 had 4 pM of cDNA while lanes 6, 7 and 8 had 2 pM of cDNA. The number of reads is roughly proportional to the cDNA concentration. The number of reads per lane which aligned to the Soybean genome is provided. The number of reads which had functional annotation is also provided. Figure 1 ratifies the textual data pertaining to lane 2 in this table.
Performance testing TASE
| #/lanes analyzed | Specific lane(s) | #/chromosomes w/reads | Total #/reads | Read concatenation (min:sec) | Tag counting, annotation (min:sec) |
|---|---|---|---|---|---|
| 1 | 2 | 20 | 1,374,000 | 0:59 | 3:17 |
| 1 | 4 | 20 | 3,056,674 | 2:02 | 4:56 |
| 4 | 1,3,7,8 | 80 | 8,647,210 | 8:04 | 11:52 |
| 8 | Entire flow cell | 160 | 14,869,398 | 12:13 | 15:55 |
Numerous tests were performed to measure the efficiency of TASE using datasets of varying sizes.