| Literature DB >> 35333300 |
Hui-Su Kim1, Sungwon Jeon1,2,3, Yeonkyung Kim1,3, Changjae Kim1,3, Jihun Bhak1,2, Jong Bhak1,2,3,4.
Abstract
BACKGROUND: KOREF is the Korean reference genome, which was constructed with various sequencing technologies including long reads, short reads, and optical mapping methods. It is also the first East Asian multiomic reference genome accompanied by extensive clinical information, time-series and multiomic data, and parental sequencing data. However, it was still not a chromosome-scale reference. Here, we updated the previous KOREF assembly to a new chromosome-level haploid assembly of KOREF, KOREF_S1v2.1. Oxford Nanopore Technologies (ONT) PromethION, Pacific Biosciences HiFi-CCS, and Hi-C technology were used to build the most accurate East Asian reference assembled so far.Entities:
Keywords: Hi-C; KOREF_S1; Korean reference; ONT PromethION; PacBio HiFi; hybrid assembly
Mesh:
Year: 2022 PMID: 35333300 PMCID: PMC8952264 DOI: 10.1093/gigascience/giac022
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Flow chart of the KOREF reference genome assembly.
Statistics of the KOREF_S1v2.1 assembly
| Contig | Scaffold | |||||
|---|---|---|---|---|---|---|
| Statistic | Wtdbg2_paternal | Flye_paternal | Wtdbg2_maternal | Flye_maternal | Paternal | Maternal |
| Sequence No. | 3,059 | 2,973 | 2,426 | 2,475 | 2,230 | 2,616 |
| Total length (bp) | 2,652,350,533 | 2,820,210,305 | 2,691,371,348 | 2,885,670,065 | 2,821,407,033 | 2,886,600,011 |
| N50 (bp) | 15,085,508 | 19,472,363 | 15,312,743 | 25,861,606 | 141,044,433 | 150,051,441 |
| Longest (bp) | 70,969,653 | 87,371,841 | 70,444,093 | 109,786,075 | 235,665,501 | 234,237,609 |
| Gaps (%) | 0 | 0 | 0 | 0 | 0.048 | 0.037 |
| GC content (%) | 40.90 | 40.92 | 40.84 | 40.86 | 40.92 | 40.88 |
Statistics of the KOREF reference genome annotation
| Statistic |
|
|---|---|
| Genes No. | 19,668 |
| Transcripts No. | 85,889 |
| Total length of transcripts (bp) | 110,601,598 |
| N50 (bp) | 1,983 |
| Length of longest transcripts (bp) | 107,976 |
| GC content (%) | 51.60 |
| Long non-coding RNAs No. | 46,973 |
| Pseudogenes No. | 17,535 |
Statistics of KOREF_S1v2.1 protein-coding genes using BUSCO
| BUSCO assessment (%) | KOREF_S1v2.1 protein-coding genes |
|---|---|
| Complete | 99.3 |
| Complete and single-copy | 40.9 |
| Complete and duplicated | 58.4 |
| Fragmented | 0.1 |
| Missing | 0.6 |
Comparison of contigs from HG00733, HG002, and KOREF assembly
| Dataset | Sequencing platform | Assembly | Size (Gb) | QV | NG50 (Mb) |
|---|---|---|---|---|---|
| HG00733 | PB HiFi | Hifiasm (trio) | 6.071 | 49.9 | 34.9 |
| HG002 | PB HiFi | Hifiasm (trio) | 5.967 | 51.6 | 43.0 |
| KOREF | PB HiFi | Hifiasm (trio) | 5.927 | 45.1 | 55.4 |
| KOREF | PromethION R9.4.1 | wtdbg2 (trio) | 5.527 | 33.8 | 9.3 |
| KOREF | PB HiFi—PromethION hybrid | Flye (trio) | 5.706 | 42.2 | 16.5 |
Comparison between KOREF and other human genomes
| KOREF_S1v2.1 | AK1_v2 | JG2.0.0 Beta | HuRef | CHM13 v1.1 | GRCh38.p13 | Ash1v2.0 | PR1 v3.0 | |
|---|---|---|---|---|---|---|---|---|
| Scaffolds No. | 2,230 | 2,832 | 1,173 | 4,530 | 24 | 472 | 334 | 89 |
| Total length (bp) | 2,901,828,151 | 2,904,207,228 | 3,059,652,438 | 2,844,000,504 | 3,054,832,041 | 3,272,089,205 | 3,188,555,634 | 3,116,169,811 |
| Scaffold N50 (bp) | 150,051,441 | 44,846,623 | 152,668,378 | 143,733,266 | 154,259,566 | 67,794,783 | 146,254,838 | 149,697,505 |
| Phasing approach | De novo | De novo | De novo | Reference-guided | De novo | De novo | Reference-guided | De novo |
| Assembly level | Chromosome | Scaffold | Chromosome | Chromosome | Chromosome | Chromosome | Chromosome | Chromosome |
| Haplotype-resolved | Trio-binning | Read-based | No | No | Haploid cell line | No | No | No |
PR1 v3.0 assembly used CHM13 assembly as a reference genome to remove gaps.