| Literature DB >> 31390788 |
Yun Gyeong Lee1, Sang Chul Choi1, Yuna Kang1, Kyeong Min Kim1, Chon-Sik Kang2, Changsoo Kim3.
Abstract
The whole genome sequencing (WGS) has become a crucial tool in understanding genome structure and genetic variation. The MinION sequencing of Oxford Nanopore Technologies (ONT) is an excellent approach for performing WGS and it has advantages in comparison with other Next-Generation Sequencing (NGS): It is relatively inexpensive, portable, has simple library preparation, can be monitored in real-time, and has no theoretical limits on reading length. Sorghum bicolor (L.) Moench is diploid (2n = 2x = 20) with a genome size of about 730 Mb, and its genome sequence information is released in the Phytozome database. Therefore, sorghum can be used as a good reference. However, plant species have complex and large genomes when compared to animals or microorganisms. As a result, complete genome sequencing is difficult for plant species. MinION sequencing that produces long-reads can be an excellent tool for overcoming the weak assembly of short-reads generated from NGS by minimizing the generation of gaps or covering the repetitive sequence that appears on the plant genome. Here, we conducted the genome sequencing for S. bicolor cv. BTx623 while using the MinION platform and obtained 895,678 reads and 17.9 gigabytes (Gb) (ca. 25× coverage of reference) from long-read sequence data. A total of 6124 contigs (covering 45.9%) were generated from Canu, and a total of 2661 contigs (covering 50%) were generated from Minimap and Miniasm with a Racon through a de novo assembly using two different tools and mapped assembled contigs against the sorghum reference genome. Our results provide an optimal series of long-read sequencing analysis for plant species while using the MinION platform and a clue to determine the total sequencing scale for optimal coverage that is based on various genome sizes.Entities:
Keywords: Canu; Keywords: sorghum; MinION; Miniasm; long-read sequencing
Year: 2019 PMID: 31390788 PMCID: PMC6724115 DOI: 10.3390/plants8080270
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
The statistics of the raw fastq file.
| Result | 1st | 2nd | 3rd |
|---|---|---|---|
| Total generated file size (Gb) | 2.83 | 11.71 | 3.34 |
| Total number of fastq files | 35 | 170 | 37 |
| Total read numbers | 136,769 | 679,658 | 146,883 |
| The shortest read length (bp) | 167 | 74 | 38 |
| The longest read length (bp) | 190,250 | 110,486 | 217,000 |
| The most abundant read length (bp) (no. of reads) | 908 (61) | 947 (111) | 1028 (69) |
| Q-score | 11.2 | 10.7 | 10.9 |
Results of average depth and mapping rate for raw reads against reference genome.
| Result | 1st | 2nd | 3rd | Combined a |
|---|---|---|---|---|
| Average depth | 2.01 | 5.64 | 2.10 | 8.56 |
| Mapping rate (%) | 97.93 | 96.87 | 97.14 | 97.08 |
a Combined all three results.
Figure 1The coverage graph using Mosdepth. In this graph, the legend indicates the coverage graph for each result. The numbers in the parentheses indicate an average depth of coverage.
Summary of read data for the results of Canu.
| Result | 1st | 2nd | 3rd | Combined | |
|---|---|---|---|---|---|
| Total loaded reads | No. of reads | 119,022 | 649,003 | 127,653 | 895,678 |
| Total length (bp) | 1,495,987,647 | 6,216,312,936 | 1,767,114,081 | 9,479,414,664 | |
| Coverage | 2.04 | 8.51 | 2.42 | 12.63 | |
| Expected corrected reads | No. of reads | 117,932 | 647,151 | 125,836 | 893,520 |
| Total length (bp) | 1,333,102,902 | 6,187,551,664 | 1,359,065,630 | 8,029,184,425 | |
| Mean read length (bp) | 11,304 | 7,900 | 10,800 | 8,986 | |
| N50 length (bp) | 49,358 | 23,337 | 53,805 | 72,703 | |
| After correction/Before trimming | No. of reads | 110,540 | 607,805 | 116,100 | 845,774 |
| Total length (bp) | 1,235,198,760 | 5,658,532,542 | 1,549,842,529 | 8,673,782,926 | |
| Coverage | 1.68 | 7.75 | 2.12 | 11.56 | |
| After trimming a | No. of reads | 68,176 | 403,755 | 56,719 | 566,533 |
| Total bases (bp) | 411,454,770 | 2,794,594,634 | 376,245,841 | 4,739,533,665 | |
| UniTigging/READs | No. of reads | 68,176 | 410,746 | 56,719 | 577,103 |
| Total length (bp) | 424,463,809 | 2,844,670,276 | 381,679,172 | 4,833,385,452 | |
| Coverage | 0.58 | 3.89 | 0.52 | 6.44 | |
| UniTigging/concensus | No. of sequences | 159 | 5,740 | 127 | 6,124 |
| No. of repeats | 28 | 692 | 26 | 712 | |
| Length of repeats (bp) | 573,105 | 10,509,344 | 472,695 | 14,815,759 | |
| Total length (bp) | 3,088,777 | 178,246,454 | 3,256,717 | 344,366,012 | |
| Coverage | 0.004 | 0.237 | 0.004 | 0.459 | |
| Unassembled | No. of sequences | 38,897 | 168,888 | 32,340 | 216,120 |
| Total length (bp) | 259,436,098 | 1,180,881,063 | 252,418,869 | 1,832,920,246 | |
a Trimmed reads output.
Summary of Miniasm assemblies with Minimap and Racon.
| Result | No. of Round | 1st | 2nd | 3rd | Combined | |
|---|---|---|---|---|---|---|
| Raw file | Total size (Gb) | 2.83 | 11.71 | 3.34 | 17.9 | |
| Minimap | File size (byte) | 607,227,298 | 5,089,824,937 | 546,909,744 | 13,226,110,131 | |
| Miniasm | File size (byte) | 1,282,822 | 177,933,354 | 2,126,525 | 368,271,934 | |
| Total length (bp) | 1,286,782 | 176,978,175 | 2,139,682 | 370,303,449 | ||
| Racon | Total length (bp) | 1 | 1,289,492 | 177,650,167 | 2,145,749 | 373,675,134 |
| 2 | 1,278,467 | 177,931,139 | 2,141,277 | 374,668,365 | ||
| 3 | 1,262,947 | 177,915,228 | 2,127,089 | 374,934,532 | ||
| 4 | 1,247,138 | 177,805,838 | 2,112,239 | 375,048,732 | ||
| 5 | 1,232,808 | 177,683,528 | 2,097,341 | 375,105,174 |
Figure 2The five rounds polishing with Racon after Miniasm assembly versus each chromosome of the sorghum reference. The x-axis represents each chromosome of reference and the y-axis represents 2261 contigs. The forward matches are displayed in red, while the reverse matches are displayed in blue.
Comparison between Canu and Miniasm using final assembly results.
| Canu | Miniasm | |
|---|---|---|
| Number of Conigs | 6124 | 2661 |
| Assembled read length (bp) | 344,366,012 | 375,105,174 |
| N50 (bp) | 98,000,000 | 199,000,000 |