| Literature DB >> 31254434 |
Xiaohui Yang1,2, Yu Yang1, Jian Ling2, Jiantao Guan3, Xiao Guo1, Daofeng Dong1, Liping Jin2, Sanwen Huang2, Jun Liu3, Guangcun Li1,2.
Abstract
Traditional approaches for sequencing insertion ends of bacterial artificial chromosome (BAC) libraries are laborious and expensive, which are currently some of the bottlenecks limiting a better understanding of the genomic features of auto- or allopolyploid species. Here, we developed a highly efficient and low-cost BAC end analysis protocol, named BAC-anchor, to identify paired-end reads containing large internal gaps. Our approach mainly focused on the identification of high-throughput sequencing reads carrying restriction enzyme cutting sites and searching for large internal gaps based on the mapping locations of both ends of the reads. We sequenced and analysed eight libraries containing over 3 200 000 BAC end clones derived from the BAC library of the tetraploid potato cultivar C88 digested with two restriction enzymes, Cla I and Mlu I. About 25% of the BAC end reads carrying cutting sites generated a 60-100 kb internal gap in the potato DM reference genome, which was consistent with the mapping results of Sanger sequencing of the BAC end clones and indicated large differences between autotetraploid and haploid genotypes in potato. A total of 5341 Cla I- and 165 Mlu I-derived unique reads were distributed on different chromosomes of the DM reference genome and could be used to establish a physical map of target regions and assemble the C88 genome. The reads that matched different chromosomes are especially significant for the further assembly of complex polyploid genomes. Our study provides an example of analysing high-coverage BAC end libraries with low sequencing cost and is a resource for further genome sequencing studies.Entities:
Keywords: BAC end library; BAC-anchor; autotetraploid potato; heterozygosity; whole-genome profiling
Mesh:
Year: 2019 PMID: 31254434 PMCID: PMC6953197 DOI: 10.1111/pbi.13203
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 9.803
Figure 1The pipelines of BAC end libraries and BAC‐anchor analysis. (a) The construction of BAC end libraries; (b) sequencing, mapping and identification of three specific types of uniquely mapped reads; (c) define possible gap based on the Matchsamechr reads.
Statistics of different types of mapped reads
| Type of mapped reads | Numbers of reads from | Numbers of reads from | |
|---|---|---|---|
| Total reads carrying the enzyme cutting sites | 33 859 069 | 24 915 443 | |
| Unique reads carrying the enzyme cutting sites after removing the duplicates | 9 819 130 | 6 228 861 | |
| Matchsamechr unique reads | Total BAC end reads | 1 549 145 | 438 688 |
| Total insertion reads | 445 813 | 194 327 | |
| Matchdifferentchr unique reads | 2 433 257 | 1 061 139 | |
| No matched reads | 22 046 957 | 18 167 773 | |
| Mapped reads/total unique reads | 45.10% | 27.20% | |
Mapped gaps of reads before and after restriction enzyme cutting site alignment to the potato DM reference genome sequence
| Internal Gap (kb) | BAC end reads | Insertion reads | ||||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
| |||||
| Multi‐mapped | Unique mapped | Multi‐mapped | Unique mapped | Multi‐mapped | Unique mapped | Multi‐mapped | Unique mapped | |
| 1–10 | 981 160 | 627 506 | 185 908 | 131 947 | 21 361 | 6519 | 27 372 | 1376 |
| 10–20 | 13 504 | 2848 | 494 | 5 | 9282 | 3779 | 7249 | 923 |
| 20–30 | 11 362 | 1306 | 16 448 | 2732 | 5403 | 303 | 427 | 206 |
| 30–40 | 6901 | 1231 | 416 | 163 | 2220 | 65 | 813 | 24 |
| 40–50 | 7680 | 2957 | 19 214 | 44 | 3267 | 47 | 1346 | 1 |
| 50–60 | 13 152 | 7679 | 15 363 | 2180 | 2277 | 26 | 345 | 19 |
| 60–70 | 43 461 | 35 264 | 20 709 | 19 262 | 2072 | 234 | 186 | 6 |
| 70–80 | 94 860 | 77 251 | 69 203 | 13 816 | 2810 | 7 | 402 | 3 |
| 80–90 | 128 563 | 62 421 | 3092 | 2179 | 33 891 | 459 | 166 | 3 |
| 90–100 | 28 841 | 19 425 | 17 827 | 9643 | 1692 | 9 | 69 | 4 |
| 100–110 | 9816 | 5388 | 1676 | 295 | 3796 | 6 | 31 | 0 |
| 110–120 | 5821 | 3585 | 218 | 24 | 1442 | 12 | 18 | 0 |
| 120–130 | 5454 | 3602 | 132 | 0 | 985 | 1 | 25 | 4 |
| 130–140 | 2114 | 1295 | 3024 | 536 | 1326 | 5 | 29 | 10 |
| 140–150 | 751 | 466 | 209 | 52 | 765 | 34 | 102 | 0 |
| Total | 1 353 440 | 852 224 | 353 933 | 182 878 | 92 589 | 11 506 | 38 580 | 2579 |
Figure 2Gap mapped situation of BAC end and genomic reads based on alignment on the potato DM genome sequence.
Distribution of BAC end reads containing a 60–100 kb internal gap on the chromosome of the reference genome sequence
| Chromo. | Length (bps) |
|
| Non‐duplicate reads with < 100 kb distance between adjacent two BAC ends | Non‐duplicate reads with<100 kb distance between adjacent two BAC ends | ||
|---|---|---|---|---|---|---|---|
| Total | Unique | Total | Unique | ||||
| chr00 | – | 6940 | 283 | 5 | 2 | 253 | 30 |
| chr01 | 88 663 952 | 6413 | 471 | 960 | 12 | 343 | 128 |
| chr02 | 48 614 681 | 1921 | 174 | 433 | 4 | 114 | 60 |
| chr03 | 62 190 286 | 13 981 | 654 | 5073 | 34 | 563 | 91 |
| chr04 | 72 208 621 | 2315 | 232 | 0 | 0 | 143 | 89 |
| chr05 | 52 070 158 | 3202 | 255 | 939 | 7 | 175 | 80 |
| chr06 | 59 532 096 | 5414 | 367 | 11 772 | 19 | 266 | 101 |
| chr07 | 56 760 843 | 1275 | 140 | 2 | 2 | 91 | 49 |
| chr08 | 56 938 457 | 3278 | 166 | 353 | 9 | 110 | 56 |
| chr09 | 61 540 751 | 7194 | 444 | 22 | 15 | 366 | 78 |
| chr10 | 59 756 223 | 14 728 | 775 | 88 | 24 | 670 | 105 |
| chr11 | 45 475 667 | 7117 | 347 | 126 | 13 | 287 | 60 |
| chr12 | 61 165 649 | 83 632 | 1033 | 1146 | 24 | 937 | 96 |
| Total | – | 157 410 | 5341 | 20 919 | 165 | 4318 | 1023 |
Figure 3Dense coverage area of BAC end reads with a 60–100 kb internal gap on chromosome 1 of the potato DM genome sequence.
Aliment category and interlength of Sanger sequencing of BAC‐end clones on the DM reference genome sequence
| Alignment category | Interlength (kb) | Numbers of BAC‐end clones |
|---|---|---|
| Mapped same chromosome | <1 | 4 |
| 1–10 | 4 | |
| 30–40 | 1 | |
| 40–50 | 1 | |
| 50–60 | 2 | |
| 60–70 | 3 | |
| 70–80 | 8 | |
| 80–90 | 10 | |
| 90–100 | 5 | |
| 100–110 | 2 | |
| 110–120 | 2 | |
| 120–130 | 0 | |
| 130–140 | 0 | |
| 140–150 | 0 | |
| >150 | 9 | |
| Mapped different chromosome | 46 | |
| Only one end aligned | 9 | |
| No matched | 3 | |
| Total BAC‐end clones | 109 | |
Figure 4(a) The survey of genomic homogeneity of the autotetraploid potato. (b) The two examples showing the SNPs identified by high‐throughput sequencing and experimental verification by the Sanger sequencing approach. −, antisense chain; +, sense chain.