| Literature DB >> 29307138 |
Young-Joon Ko1, Jung Sun Kim2, Sangsoo Kim1.
Abstract
As next-generation sequencing technologies have advanced, enormous amounts of whole-genome sequence information in various species have been released. However, it is still difficult to assemble the whole genome precisely, due to inherent limitations of short-read sequencing technologies. In particular, the complexities of plants are incomparable to those of microorganisms or animals because of whole-genome duplications, repeat insertions, and Numt insertions, etc. In this study, we describe a new method for detecting misassembly sequence regions of Brassica rapa with genotyping-by-sequencing, followed by MadMapper clustering. The misassembly candidate regions were cross-checked with BAC clone paired-ends library sequences that have been mapped to the reference genome. The results were further verified with gene synteny relations between Brassica rapa and Arabidopsis thaliana. We conclude that this method will help detect misassembly regions and be applicable to incompletely assembled reference genomes from a variety of species.Entities:
Keywords: BAC end library; gene synteny; genotyping-by-sequencing; miassembly; next-generation sequencing; reference genome
Year: 2017 PMID: 29307138 PMCID: PMC5769862 DOI: 10.5808/GI.2017.15.4.128
Source DB: PubMed Journal: Genomics Inform ISSN: 1598-866X
Fig. 1Total work flow of misMM. GBS, genotyping-by-sequencing.
Fig. 2Before and after the results of the 2D matrix graphs of the MadMapper block shuffling analysis. A01 through A10 indicate the Brassica rapa pseudomolecules.
Results of misassembled block detection analysis in Brassica rapa with misMM
| Block No. | Misassembled candidate block | Adjacent to destination block | Synteny relation | Count of BAC end | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
| |||||||||
| Chr No. | Start position | End position | Block size (bp) | Chr No. | Start position | End position | Block size (bp) | |||
| 1 | A01 | 10,335,503 | 10,336,457 | 955 | A07 | 2,718,763 | 2,760,427 | 41,665 | No gene | 1 |
| 2,970,361 | 3,340,395 | 370,035 | ||||||||
| 4,284,326 | 5,685,009 | 1,400,684 | ||||||||
| 5,782,516 | 8,114,350 | 2,331,835 | ||||||||
| 8,306,623 | 8,390,460 | 83,838 | ||||||||
| 8,462,236 | 9,063,378 | 601,143 | ||||||||
|
| ||||||||||
| 2 | A01 | 11,453,104 | 11,488,558 | 35,455 | A04 | 3,271,457 | 4,978,203 | 1,706,747 | Related | 6 |
| 5,227,803 | 6,734,498 | 1,506,696 | ||||||||
| 7,605,871 | 7,605,928 | 58 | ||||||||
|
| ||||||||||
| 3 | A01 | 11,830,981 | - | 1 | A05 | 10,274,396 | 14,490,617 | 4,216,222 | Related | 1 |
| A07 | 13,576,261 | - | 1 | 14,602,065 | 14,704,957 | 102,893 | ||||
| A08 | 1,389,252 | 1,419,543 | 30,292 | 14,946,890 | 15,735,698 | 788,809 | ||||
| 6,968,479 | 7,090,412 | 121,934 | ||||||||
| 7,231,217 | 7,782,948 | 551,732 | ||||||||
| 7,825,594 | 8,040,473 | 214,880 | ||||||||
| 8,683,679 | 9,511,317 | 827,639 | ||||||||
|
| ||||||||||
| 4 | A01 | 17,853,386 | 17,853,417 | 32 | A03 | 28,233,583 | 28,599,515 | 365,933 | Related | 2 |
| A01 | 21,422,470 | 21,756,693 | 334,224 | 28,622,787 | 29,191,693 | 568,907 | ||||
| A02 | 26,385,973 | 26,386,023 | 51 | 29,806,067 | 31,527,446 | 1,721,380 | ||||
|
| ||||||||||
| 5 | A01 | 23,266,604 | 23,424,555 | 157,952 | A06 | 10,280,840 | 10,357,155 | 76,316 | Related | 13 |
| A02 | 13,440,136 | 13,440,137 | 2 | 10,732,633 | 14,236,176 | 3,503,544 | ||||
| A02 | 21,066,162 | 21,066,274 | 113 | 14,450,457 | 14,559,524 | 109,068 | ||||
| 8,950,753 | 10,162,388 | 1,211,636 | ||||||||
|
| ||||||||||
| 6 | A01 | 8,706,169 | 8,950,670 | 244,502 | A09 | 11,293,419 | 11,528,445 | 235,027 | Related | 63 |
| A06 | 19,457,789 | 19,703,630 | 245,842 | 11,610,344 | 14,668,929 | 3,058,586 | ||||
| 14,915,372 | 15,794,376 | 879,005 | ||||||||
| 15,949,568 | 18,361,629 | 2,412,062 | ||||||||
| 19,064,188 | 22,487,944 | 3,423,757 | ||||||||
| 22,634,782 | 23,337,316 | 702,535 | ||||||||
| 23,489,828 | 23,489,836 | 9 | ||||||||
|
| ||||||||||
| 7 | A02 | 21,427,161 | - | 1 | A10 | 11,579,416 | - | 1 | No gene | 0 |
| 176,000 | 1,638,829 | 1,462,830 | ||||||||
| 1,765,780 | 1,766,618 | 839 | ||||||||
| 1,786,668 | 1,792,156 | 5,489 | ||||||||
| 3,686,351 | 5,224,789 | 1,538,439 | ||||||||
| 5,335,436 | 5,459,255 | 123,820 | ||||||||
| 5,648,752 | 5,693,352 | 44,601 | ||||||||
|
| ||||||||||
| 8 | A03 | 15,343,238 | - | 1 | A08 | 2,368,697 | 3,803,367 | 1,434,671 | Related | 2 |
| 4,037,929 | 4,357,798 | 319,870 | ||||||||
| 4,787,505 | 4,835,708 | 48,204 | ||||||||
| 5,188,553 | 6,046,948 | 858,396 | ||||||||
| 7,117,019 | 7,501,164 | 384,146 | ||||||||
| 7,559,836 | 8,722,062 | 1,162,227 | ||||||||
| 9,000,508 | 9,002,057 | 1,550 | ||||||||
|
| ||||||||||
| 9 | A05 | 8,144,773 | 8,162,600 | 17,828 | A08 | 19,141,593 | 19,320,715 | 179,123 | Related | 16 |
| 8,217,883 | 8,234,265 | 16,383 | 19,394,784 | 19,497,679 | 102,896 | |||||
| 8,250,451 | 8,352,384 | 101,934 | 19,568,112 | 19,621,358 | 53,247 | |||||
| 19,674,213 | 19,711,491 | 37,279 | ||||||||
|
| ||||||||||
| 10 | A05 | 9,669,449 | 10,079,638 | 410,190 | A01 | 10,376,926 | 10,494,252 | 117,327 | Related | 23 |
| 10,687,155 | 11,394,090 | 706,936 | ||||||||
| 11,519,211 | 11,744,579 | 225,369 | ||||||||
| 11,900,084 | 16,836,976 | 4,936,893 | ||||||||
| 16,848,291 | 17,125,316 | 277,026 | ||||||||
| 17,226,336 | 17,789,202 | 562,867 | ||||||||
| 17,860,877 | 18,575,071 | 714,195 | ||||||||
| 9,305,241 | 9,707,259 | 402,019 | ||||||||
| 9,711,107 | 10,267,937 | 556,831 | ||||||||
|
| ||||||||||
| 11 | A07 | 2,319,220 | 2,321,114 | 1,895 | A01 | 24,324,484 | 24,353,432 | 28,949 | Related | 0 |
| 24,402,955 | 24,488,344 | 85,390 | ||||||||
| 24,619,832 | 24,806,034 | 186,203 | ||||||||
| 24,920,288 | 24,920,419 | 132 | ||||||||
|
| ||||||||||
| 12 | A07 | 3,920,950 | 4,009,069 | 88,120 | A10 | 11,579,416 | 1 | Related | 0 | |
| A08 | 3,927,665 | - | 1 | 176,000 | 1,638,829 | 1,462,830 | ||||
| 1,765,780 | 1,766,618 | 839 | ||||||||
| 1,786,668 | 1,792,156 | 5,489 | ||||||||
| 3,686,351 | 5,224,789 | 1,538,439 | ||||||||
| 5,335,436 | 5,459,255 | 123,820 | ||||||||
| 5,648,752 | 5,693,352 | 44,601 | ||||||||
|
| ||||||||||
| 13 | A07 | 8,271,542 | 8,274,604 | 3,063 | A03 | 12,032,914 | 12,032,953 | 40 | Related | 5 |
| 12,049,203 | 12,406,487 | 357,285 | ||||||||
| 12,473,776 | 13,917,498 | 1,443,723 | ||||||||
| 14,019,224 | 14,200,642 | 181,419 | ||||||||
| 14,222,379 | 14,355,939 | 133,561 | ||||||||
|
| ||||||||||
| 14 | A08 | 11,266,789 | - | 1 | A03 | 25,996,840 | 26,033,638 | 36,799 | Related | 0 |
| 26,067,147 | 27,037,677 | 970,531 | ||||||||
| 27,139,966 | 27,943,662 | 803,697 | ||||||||
|
| ||||||||||
| 15 | A05 | 8,552,907 | 8,593,005 | 40,099 | A02 | 11,596,619 | 13,185,910 | 1,589,292 | Related | 0 |
| A08 | 1,584,456 | 1,594,851 | 10,396 | 13,498,575 | 14,449,253 | 950,679 | ||||
| 14,804,284 | 18,303,089 | 3,498,806 | ||||||||
| 18,558,399 | 19,378,431 | 820,033 | ||||||||
| 19,548,342 | 19,666,145 | 117,804 | ||||||||
| 19,787,176 | 1 | |||||||||
|
| ||||||||||
| 16 | A08 | 4,941,300 | 4,969,852 | 28,553 | A10 | 11,146,660 | 11,437,447 | 290,788 | Related | 0 |
| 11,664,229 | 1 | |||||||||
| 11,764,627 | 11,814,368 | 49,742 | ||||||||
| 11,928,253 | 12,032,475 | 104,223 | ||||||||
Fig. 3Example of Brassica rapa genetic map made with misMM pipeline. Red colors indicate misassembled candidate blocks.
Example of validation of BAC end library results
| BAC ends library ID | gi No. | Length (bp) | Identity (%) | |||
|---|---|---|---|---|---|---|
| Chr No. | Start position | End position | ||||
| KBrB037L22F | 84732862 | 671 | 97.93 | A01 | 11,474,904 | 11,475,144 |
| KBrB037L22R | 84732863 | 671 | 99.4 | A04 | 4,869,416 | 4,870,085 |
| KBrB039C19R | 84733951 | 869 | 99.65 | A01 | 11,471,320 | 11,472,188 |
| KBrB039C19F | 84733950 | 822 | 99.76 | A04 | 4,884,036 | 4,884,855 |
| KBrB043O24F | 84737591 | 874 | 99.89 | A01 | 11,452,951 | 11,453,822 |
| KBrB043O24R | 84737592 | 816 | 100 | A04 | 4,884,025 | 4,884,840 |
| KBrB077H15F | 84762968 | 617 | 98.92 | A01 | 11,474,904 | 11,475,088 |
| KBrB077H15R | 84762969 | 646 | 100 | A04 | 4,884,386 | 4,885,031 |
| KBrB097P17F | 114827207 | 1,000 | 98.2 | A01 | 11,471,303 | 11,472,294 |
| KBrB097P17R | 114827208 | 937 | 98.16 | A04 | 4,883,252 | 4,884,169 |
| KBrH087A11R | 84341421 | 831 | 99.88 | A01 | 11,466,761 | 11,467,587 |
| KBrH087A11F | 84341072 | 844 | 99.63 | A04 | 4,977,838 | 4,978,643 |
Information on genes included in example misassembled candidate block
| Chr No. | Type | Start point | End point | |
|---|---|---|---|---|
| A01 | Gene | 11,455,026 | 11,470,735 | Bra033489 |
| A01 | Gene | 11,451,545 | 11,454,600 | Bra033490 |
Example of protein ortholog list, sorted by Brassica rapa gene coordination
| Comments | ||||||||
|---|---|---|---|---|---|---|---|---|
|
|
| |||||||
| ID | Chr No. | Start position | End position | ID | Chr No. | Start position | End position | |
| Bra033497 | A01 | 11,382,249 | 11,386,827 | AT4G15570 | Chr4 | 8,892,607 | 8,898,999 | - |
|
| ||||||||
| Bra033496 | A01 | 11,388,925 | 11,390,027 | AT4G15563 | Chr4 | 8,890,879 | 8,892,526 | - |
|
| ||||||||
| Bra033495 | A01 | 11,393,659 | 11,396,663 | AT4G15560 | Chr4 | 8,883,907 | 8,887,565 | - |
|
| ||||||||
| Bra033494 | A01 | 11,410,610 | 11,412,043 | AT4G15550 | Chr4 | 8,877,590 | 8,879,327 | - |
|
| ||||||||
| Bra033493 | A01 | 11,412,702 | 11,414,443 | AT4G15545 | Chr4 | 8,875,918 | 8,877,799 | - |
|
| ||||||||
| Bra033492 | A01 | 11,445,862 | 11,446,743 | AT5G49420 | Chr5 | 20,034,674 | 20,036,170 | - |
|
| ||||||||
| Bra033491 | A01 | 11,450,091 | 11,451,172 | AT4G14320 | Chr4 | 8,241,732 | 8,243,910 | - |
|
| ||||||||
| Bra033490 | A01 | 11,451,545 | 11,454,600 | AT4G14330 | Chr4 | 8,244,194 | 8,247,444 | Misassembled candidate |
|
| ||||||||
| Bra033489 | A01 | 11,455,026 | 11,470,735 | AT4G14350 | Chr4 | 8,256,086 | 8,260,787 | Misassembled candidate |
|
| ||||||||
| Bra039534 | A01 | 11,504,946 | 11,505,630 | AT2G35280 | Chr2 | 14,859,378 | 14,860,200 | - |
|
| ||||||||
| Bra039535 | A01 | 11,504,946 | 11,505,422 | AT2G35280 | Chr2 | 14,859,378 | 14,860,200 | - |
|
| ||||||||
| Bra039536 | A01 | 11,504,994 | 11,505,630 | AT2G35280 | Chr2 | 14,859,378 | 14,860,200 | - |
|
| ||||||||
| Bra039538 | A01 | 11,510,855 | 11,512,648 | AT3G59380 | Chr3 | 21,944,178 | 21,945,943 | - |
|
| ||||||||
| Bra039539 | A01 | 11,514,776 | 11,515,144 | AT4G15530 | Chr4 | 8,864,828 | 8,870,967 | - |
|
| ||||||||
| Bra039540 | A01 | 11,516,583 | 11,521,200 | AT4G15530 | Chr4 | 8,864,828 | 8,870,967 | - |
|
| ||||||||
| Bra039541 | A01 | 11,521,728 | 11,523,067 | AT4G15520 | Chr4 | 8,862,815 | 8,864,618 | - |
Example of protein ortholog list, sorted by Arabidopsis thaliana gene coordination
| Comments | ||||||||
|---|---|---|---|---|---|---|---|---|
|
|
| |||||||
| ID | Chr No. | Start position | End position | ID | Chr No. | Start position | End position | |
| Bra032781 | A04 | 4,968,584 | 4,971,945 | AT4G14290 | Chr4 | 8,225,481 | 8,230,281 | Included in ADB |
|
| ||||||||
| Bra032782 | A04 | 4,962,259 | 4,963,651 | AT4G14305 | Chr4 | 8,235,093 | 8,236,715 | Included in ADB |
|
| ||||||||
| Bra033490 | A01 | 11,451,545 | 11,454,600 | AT4G14330 | Chr4 | 8,244,194 | 8,247,444 | Included in MCB |
|
| ||||||||
| Bra033489 | A01 | 11,455,026 | 11,470,735 | AT4G14350 | Chr4 | 8,256,086 | 8,260,787 | Included in MCB |
|
| ||||||||
| Bra033487 | A04 | 4,917,814 | 4,918,407 | AT4G14380 | Chr4 | 8,285,766 | 8,286,772 | Included in ADB |
|
| ||||||||
| Bra033486 | A04 | 4,915,949 | 4,917,119 | AT4G14385 | Chr4 | 8,286,986 | 8,288,800 | Included in ADB |
|
| ||||||||
| Bra033483 | A04 | 4,882,642 | 4,883,358 | AT4G14440 | Chr4 | 8,306,745 | 8,307,753 | Included in ADB |
|
| ||||||||
| Bra033482 | A04 | 4,873,477 | 4,873,797 | AT4G14450 | Chr4 | 8,309,474 | 8,310,058 | Included in ADB |
Alternative alignments due to genome triplication have been removed.
ADB, adjacent to destination block; MCB, misassembled candidate blocks.