| Literature DB >> 22880081 |
Aleksey V Zimin1, David R Kelley, Michael Roberts, Guillaume Marçais, Steven L Salzberg, James A Yorke.
Abstract
We analyzed the whole genome sequence coverage in two versions of the Bos taurus genome and identified all regions longer than five kilobases (Kbp) that are duplicated within chromosomes with >99% sequence fidelity in both copies. We call these regions High Fidelity Duplications (HFDs). The two assemblies were Btau 4.2, produced by the Human Genome Sequencing Center at Baylor College of Medicine, and UMD Bos taurus 3.1 (UMD 3.1), produced by our group at the University of Maryland. We found that Btau 4.2 has a far greater number of HFDs, 3111 versus only 69 in UMD 3.1. Read coverage analysis shows that 39 million base pairs (Mbp) of sequence in HFDs in Btau 4.2 appear to be a result of a mis-assembly and therefore cannot be qualified as segmental duplications. UMD 3.1 has only 0.41 Mbp of sequence in HFDs that are due to a mis-assembly.Entities:
Mesh:
Year: 2012 PMID: 22880081 PMCID: PMC3411808 DOI: 10.1371/journal.pone.0042680
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Histogram of the percentage of HFDs that belong to (i) set B2U1, duplicated in Btau 4.2 and single copy in UMD Bos taurus 3.1 (solid line), and (ii) set B1U2, single copy in Btau 4.2 and duplicated in UMD Bos taurus 3.1 (dashed line).
The area under each curve integrates to 100%. The histograms were computed by mapping the WGS reads to both assemblies. The average WGS read coverage of the assemblies is 5.9. The solid vertical line is placed at 5.9/ln(2), the coverage at which it is equally likely that a region occurs in two copies versus one. 47 of the 69 regions (68%) in B1U2 are on the right hand side of the line and thus they are more likely to be true segmental duplications. 94% of the 3,111 HFDs in Btau 4.2 (set B2U1) are more likely to be unique in the genome and thus probably represent assembly errors in Btau 4.2.