| Literature DB >> 23977362 |
Katharina Fietz1, Jeff A Graves, Morten Tange Olsen.
Abstract
Genetic data can provide a powerful tool for those interested in the biology, management and conservation of wildlife, but also lead to erroneous conclusions if appropriate controls are not taken at all steps of the analytical process. This particularly applies to data deposited in public repositories such as GenBank, whose utility relies heavily on the assumption of high data quality. Here we report on an in-depth reassessment and comparison of GenBank and chromatogram mtDNA sequence data generated in a previous study of Baltic grey seals. By re-editing the original chromatogram data we found that approximately 40% of the grey seal mtDNA haplotype sequences posted in GenBank contained errors. The re-analysis of the edited chromatogram data yielded overall similar results and conclusions as the original study. However, a significantly different outcome was observed when using the uncorrected dataset based on the GenBank haplotypes. We therefore suggest disregarding the existing GenBank data and instead using the correct haplotypes reported here. Our study serves as an illustrative example reiterating the importance of quality control through every step of a research project, from data generation to interpretation and submission to an online repository. Errors conducted in any step may lead to biased results and conclusions, and could impact management decisions.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23977362 PMCID: PMC3745392 DOI: 10.1371/journal.pone.0072853
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Overview of haplotype duplicates in GenBank.
|
|
|
|
|---|---|---|
| 1 | 31, 32, 37, 40 | AM287245, AM287246, AM287251, AM287254 |
| 2 | 19, 21, 29 | AM287233, AM287235, AM287243 |
| 3 | 5, 28 | AM287219, AM287242 |
| 4 | 16, 25 | AM287230, AM287239 |
| 5 | 11, 17 | AM287225, AM287231 |
| 6 | 7, 9 | AM287221, AM287223 |
The distribution of grey seal haplotypes based on the re-edited chromatograms.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| 4 | 5 | 3 | 12 | 14 | 5, 28 |
|
| 4 | 4 | 3 | 11 | 2 | 19, 21, 29 |
|
| 3 | 3 | 3 | 9 | 3 | New |
|
| 1 | 3 | 5 | 9 | 5 | 23 |
|
| 3 | 3 | 6 | 3 | 27 | |
|
| 2 | 1 | 2 | 5 | 16 | 38 |
|
| 3 | 1 | 4 | 3 | New | |
|
| 1 | 1 | 1 | 3 | New | |
|
| 2 | 1 | 3 | 14 | ||
|
| 1 | 1 | 1 | 3 | New | |
|
| 1 | 1 | 1 | 3 | 2 | New |
|
| 2 | 1 | 3 | 11, 17 | ||
|
| 1 | 2 | 3 | 2 | 16, 25 | |
|
| 2 | 2 | 1 | New | ||
|
| 2 | 2 | New | |||
|
| 1 | 1 | 2 | 24 | ||
|
| 2 | 2 | 2 | New | ||
|
| 1 | 1 | 2 | 4 | 7, 9 | |
|
| 1 | 1 | 2 | 18 | ||
|
| 2 | 2 | New | |||
|
| 1 | 1 | 12 | |||
|
| 1 | 1 | 15 | |||
|
| 1 | 1 | 20 | |||
|
| 1 | 1 | 1 | New | ||
|
| 1 | 1 | 1 | New | ||
|
| 1 | 1 | 1 | New | ||
|
| 1 | 1 | New | |||
|
| 1 | 1 | 1 | New | ||
|
| 1 | 1 | 22 | |||
|
| 1 | 1 | New | |||
|
| 1 | 1 | New | |||
|
| 1 | 1 | New | |||
|
| 1 | 1 | New | |||
|
| 1 | 1 | 1 | New | ||
|
| 1 | 1 | 3 | |||
|
| 3 | 31, 32, 37, 40 | ||||
|
| 20 | 34 | ||||
|
| 4 | 35 | ||||
|
|
|
|
|
|
Haplotypes 36-38 were identical to haplotypes originally posted in GenBank and supported by our unpublished data, but not by the re-edited chromatograms.
Old HT ID corresponds to the original haplotypes posted in GenBank by Graves et al. [8].
HT = Haplotype; BB = Bay of Bothnia; EST = Estonia; STA = Stockholm Archipelago.
Figure 1Workflow for quality control of the original GenBank haplotypes and the re-edited chromatograms.
HT = Haplotype; GB = GenBank.
Number of haplotypes, haplotype diversity, nucleotide diversity (π), and 95% confidence intervals (CI) for each sample site estimated for the original, the re-edited and the erroneous datasets.
|
|
|
|
|
| |
|---|---|---|---|---|---|
|
|
| 40 | 18 | 0.968 (0.941, 0.995) | 0.015 (0.000, 0.043) |
|
| 40 | 26 | 0.965 (0.940, 0.990) | 0.016 (0.000, 0.034) | |
|
| 34 | 23 | 0.943 (0.904, 0.982) | 0.015 (0.000, 0.031) | |
|
|
| 39 | 24 | 0.968 (0.942, 0.993) | 0.018 (0.000, 0.036) |
|
| 36 | 21 | 0.957 (0.925, 0.990) | 0.018 (0.000, 0.036) | |
|
| 28 | 14 | 0.939 (0.894, 0.984) | 0.016 (0.000, 0.033) | |
|
|
| 40 | 21 | 0.958 (0.931, 0.985) | 0.017 (0.015, 0.019) |
|
| 40 | 21 | 0.958 (0.931, 0.985) | 0.060 (0.058, 0.062) | |
|
| 28 | 11 | 0.910 (0.859, 0.961) | 0.017 (0.015, 0.018) |
Degrees of freedom = 2.
Genetic differentiation among Baltic grey seal breeding sites estimated for the re-edited and the erroneous datasets.
|
|
|
| ||
|---|---|---|---|---|
|
|
| * | 0.997 | 0.245 |
|
| 0.000 | * | 0.602 | |
|
| 0.006 | 0.000 | * | |
|
|
| * | 0.807 | 0.294 |
|
| 0.000 | * | 0.558 | |
|
| 0.004 | 0.000 | * |
Pairwise F ST values for the three sample sites are below the diagonal and P-values are above. Pairwise F ST values were estimated but not reported in Graves et al. [8] and hence not included in this table.
Figure 2Haplotype networks displaying the distribution of haplotypes in the re-edited (A; blue) and in the erroneous (B; red) dataset.
Size and number in coloured circles represent occurrence of haplotypes. Shared haplotypes in both networks are connected by lines. A black dot symbolizes a haplotype not observed in either dataset, but indicates a parsimonious path between haplotypes. A white dot illustrates a haplotype present in one network, but missing in the other.
Main findings of the analyses of the respective datasets.
| GenBank Haplotypes | - Only 40 haplotypes listed in GenBank (46 reported in Graves et al.) |
| -Nine of the haplotypes were duplicates | |
| -Sixteen of the haplotypes were supported by the re-edited dataset | |
| Re-edited dataset compared to the original dataset | - Fewer haplotypes and slightly smaller dataset |
| - Slightly higher difference between STA and the other breeding sites | |
| - Lower proportion of unique haplotypes in BB and EST | |
| Erroneous dataset compared to the re-edited and original Datasets | - Very different type and frequency of mtDNA haplotypes |
| - Higher nucleotide diversity in EST | |
| - Slightly lower haplotype diversity in STA | |
| - Much lower proportion of unique haplotypes in STA |