| Literature DB >> 21615897 |
Mariette Jérôme1, Céline Noirot, Christophe Klopp.
Abstract
BACKGROUND: Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment.Entities:
Year: 2011 PMID: 21615897 PMCID: PMC3117718 DOI: 10.1186/1756-0500-4-149
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Paired-end cleaning strategy. Reads having no linker (a) are retained as single reads. If multiple linkers are present (b) in the same read, the read is discarded. In cases where the linker is partially found, meaning that the number of mismatches is lower than a threshold, only reads where the linker is located at the beginning or at the end (c) are saved as single reads, others (d) are deleted. Reads where the entire linker is present and not to closely located to one end (e) are saved as paired-end reads. In other cases, sequences are saved as single reads only if the linker is located far enough from one end (g), while others (f) are deleted.
Figure 2Duplication profile before pyrocleaning and after. Simulated dataset were produced using the E coli K12 genome. Sequences of 500 bp were picked randomly along the genome using both strands. The number of simulated sequences (Sim run1/Sim run2) equals the number of sequences produced in the experimental runs (454 run1/454 run2).