| Literature DB >> 21143816 |
Manhong Dai1, Robert C Thompson, Christopher Maher, Rafael Contreras-Galindo, Mark H Kaplan, David M Markovitz, Gil Omenn, Fan Meng.
Abstract
BACKGROUND: While the accuracy and precision of deep sequencing data is significantly better than those obtained by the earlier generation of hybridization-based high throughput technologies, the digital nature of deep sequencing output often leads to unwarranted confidence in their reliability.Entities:
Mesh:
Year: 2010 PMID: 21143816 PMCID: PMC3005923 DOI: 10.1186/1471-2164-11-S4-S7
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Sample overview of full slide SOLiD runs. Figures 1A and 1B show panel average quality score distributions across two different full SOLiD slide runs. Black areas in four corners are regions not used by sequencing assay by design. Heat map scales are set up automatically according to the quality score range in each run. The pattern of low quality regions varies from run to run even in data from the same sequencing core. Figures 1C and 1D show the color code 0 percentage distribution and the genomic hit count distribution across different panels for the same sample illustrated in Fig. 1B, respectively. Numbers on the left and lower edge of each figure are the row and column number of panels. Numbers of the right side of the heat map scale bar are values associated with colors in the heatmap.
Figure 2SOLiD color code bias during the sequencing by ligation process. Each line in Figure 2 represents the fluctuations of SOLiD color code zero for each one of the 29 columns of panels obtained in Figure 1B during the SOLiD sequencing process. The y-axis is the percentage of color code 0 at different sequencing cycles shown on the x-axis.
Figure 3Small but consistent spatial base percentage gradient in Illumina sequencing. Figures 3A, 3B and 3C show the average percentage of the A base, average percentage of the C base and the genome hit count in each tile within an Illumina lane for the same sample. The x-axis labels and the y-axis labels on the left side are the column and row numbers of tiles. The y-axis labels on the right side of the heatmap scale are the values associated with heatmap colors.
Figure 4Quality control for paired-end sequencing. The y-axis shows the number of pairs for each of the following categories: 1) good pair: sequence reads from both ends of a sequence are from the same chromosome and their distance and orientation are consistent with the reference genome; 2) unpaired on the forward strand: orphan reads from one end of the sequencing; unpaired on the reverse strand: orphan reads from the other end of sequencing. We separate the reads from two ends since for some technologies the reading efficiency and accuracy are different for two ends; 3) different chromosome: two ends of the same fragment are from different chromosomes based on the reference genome; 4) wrong orientation: although the two ends are from the same chromosome, their relative orientation is different from the reference genome; 5) < defined range: paired-end reads with shorter than the expected library fragment range and 6) > defined range: paired-end reads with longer than the expected library fragment range. In the above example, more than one third of the pairs have a shorter than expected distances, thus indicating a library quality issue.