| Literature DB >> 32245072 |
Krishnamoorthy Srikanth1, Jong-Eun Park1, Dajeong Lim1, Jihye Cha1, Sang-Rae Cho2, In-Cheol Cho3, Woncheoul Park1.
Abstract
Until recently, genome-scale phasing was limited due to the short read sizes of sequence data. Though the use of long-read sequencing can overcome this limitation, they require extensive error correction. The emergence of technologies such as 10X genomics linked read sequencing and Hi-C which uses short-read sequencers along with library preparation protocols that facilitates long-read assemblies have greatly reduced the complexities of genome scale phasing. Moreover, it is possible to accurately assemble phased genome of individual samples using these methods. Therefore, in this study, we compared three phasing strategies which included two sample preparation methods along with the Long Ranger pipeline of 10X genomics and HapCut2 software, namely 10X-LG, 10X-HapCut2, and HiC-HapCut2 and assessed their performance and accuracy. We found that the 10X-LG had the best phasing performance amongst the method analyzed. They had the highest phasing rate (89.6%), longest adjusted N50 (1.24 Mb), and lowest switch error rate (0.07%). Moreover, the phasing accuracy and yield of the 10X-LG stayed over 90% for distances up to 4 Mb and 550 Kb respectively, which were considerably higher than 10X-HapCut2 and Hi-C Hapcut2. The results of this study will serve as a good reference for future benchmarking studies and also for reference-based imputation in Hanwoo.Entities:
Keywords: 10X genomics; Hanwoo; Hi-C; Phasing; SNPs; genome; haplotypes
Mesh:
Year: 2020 PMID: 32245072 PMCID: PMC7140831 DOI: 10.3390/genes11030332
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Figure 1Schematic illustration of the two sample preparations and the three phasing approaches carried out in this study.
Summary of sequencing data from the two platforms.
| 10X Genomics | Hi-C | |
|---|---|---|
| Total Reads | 790,643,590 | 441,889,616 |
| Mapped Reads | 756,645,916 (95.7%) | 438,995,090 (99.2%) |
| Q30 (%) | 92.50% | 93.00% |
| Mean Depth | 37.9X | 21.2X |
Summary of single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (indels) identified in this study.
| 10X Genomics | Hi-C | |||
|---|---|---|---|---|
| SNPs | Indels | SNPs | Indels | |
|
| 8,670,477 | 1,749,472 | 7,132,127 | 1,753,086 |
|
| 2,590,042 (30%) | 507,761 (29%) | 2,170,128 (30%) | 417,903 (24%) |
|
| 6,080,435 (70%) | 1,241,711 (71%) | 4,397,837 (70%) | 1,335,183 (76%) |
Summary of phasing performance: Metrics shown are total number of SNPs phased, percentage of SNPs phased, Quality adjusted N50 (QAN50), and the switch error rate.
| Sequencing Platform—Phasing Method | Metrics for Phasing Performance | |
|---|---|---|
| 10X-LG | Total SNPs Phased | 7,766,580 |
| % of SNPs Phased | 89.57 | |
| QAN50 (bp) | 1,249,365 | |
| SER * (%) | 0.07 | |
| 10X–HapCut2 | Total SNPs Phased | 5,836,541 |
| % of SNPs Phased | 67.31 | |
| QAN50 (bp) | 541,912 | |
| SER * (%) | 0.16 | |
| Hi-C–HapCut2 | Total SNPs Phased | 3,687,511 |
| % of SNPs Phased | 51.65 | |
| QAN50 (bp) | 1,034,586 | |
| SER *(%) | 0.24 | |
* Switch Error Rate.
Figure 2Comparison of phasing performance pairwise haplotype assignment: (a) Phasing accuracy shows the effect of distance on probability that SNPs on the same phasing block are correctly phased. (b) Phasing yield shows the effect of distance between a pair of SNPs on the probability that they are phased in the same phasing block.