| Literature DB >> 19405935 |
Simone Scalabrin1, Michele Morgante, Alberto Policriti.
Abstract
BACKGROUND: The construction of a whole-genome physical map has been an essential component of numerous genome projects initiated since the inception of the Human Genome Project. Its usefulness has been proved for whole-genome shotgun projects as a post-assembly validation and recently it has also been used in the assembly step to constrain on BACs positions. Fingerprinting is usually the method of choice for construction of physical maps. A clone fingerprint is composed of true peaks representing real fragments and background peaks, mainly composed of E. coli genomic DNA, partial digestions, star activity by-products, and machine background. High-throughput fingerprinting leads to the production of thousands of BAC clone fingerprints per day. That is why background peaks removal has become an important issue and needs to be automatized, especially in capillary electrophoresis based fingerprints.Entities:
Mesh:
Year: 2009 PMID: 19405935 PMCID: PMC2689866 DOI: 10.1186/1471-2105-10-127
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Example of an electrochromatogram. x and y axes represent, respectively, fragment size and peak intensity. Background is composed of low signal peaks (in this example with intensity lower than 500).
Figure 2Alternative representation of peak heights. Different scenarios: from fingerprints with a clear gap between true peaks and background to only background peaks (empty clones) through intermediate cases.
Distribution of fingerprint scenarios
| 0.16 | 0.80 | 0.03 | |
| 0.62 | 0.33 | 0.04 | |
| 0.48 | 0.47 | 0.04 | |
| 0.81 | 0.17 | 0.02 |
Distribution of different scenarios of fig. 2 in a set of 82,176 fingerprints produced at IGA, Udine, Italy, inside the TriticeaeGenome Project. Clones rejected are 1,207 (1%) and correspond to scenarios d and e. Remaining clones are divided into scenarios a, b, and c for each single dye. It is striking the variability between scenarios a and b and different dyes. See Additional file 2 for details.
Figure 3Application of the second method on the cases of fig. 2. Red and blue dots are, respectively, above and below hl and ll (marked by a black line). Herein the first two peaks are considered artifacts and are ignored in the computation of ha. Furthermore, only five peaks are used to determine initial ha (red line) and peaks below the 60-th are supposed to be background and used to compute la (blue line). Notice the variety of cases according to the ratio between ha and la. The green line represents a tentative threshold based on the ratio between ha and la.
FPB and GenoProfiler background removal
| 1 | 22 | 2 | 0.784 | Chloroplast | |
| 2 | 28 | 16 | 0.515 | Centromere | |
| 3 | 6 | 1 | 0.780 | rDNA | |
| 1 | 150 | 105 | 0.420 | Wrong assembly | |
| 1 | 22 | 5 | 0.767 | Chloroplast | |
| 2 | 22 | 12 | 0.556 | Centromere | |
Comparison of contigs produced by FPC (tolerance 0.4 bp, cutoff 1e-20) on a set of 1000 clones of the grapevine physical map project (about 0.2× coverage) and processed with FPB and GenoProfiler. Clones were chosen at random from a set of 30,000 clones for which BAC-end sequences were available. Only contigs with at least 3 clones are presented in the table. For GenoProfiler, we used both default parameters and manually optimized parameters, respectively in methods GenoProfiler1 and GenoProfiler2. Qs, represent the so-called "questionable clones" and score is a probabilistic value reflecting the goodness of contigs in FPC. The last column of the table specifies which kind of clones compose the corresponding contig (from an independent inspection on the BAC-end sequences of the set of selected clones). FPC produced wrong or incomplete assemblies with, respectively, default or manually optimized parameters for GenoProfiler.