| Literature DB >> 22413085 |
Jungmin Ha, Brian Abernathy, William Nelson, David Grant, Xiaolei Wu, Henry T Nguyen, Gary Stacey, Yeisoo Yu, Rod A Wing, Randy C Shoemaker, Scott A Jackson.
Abstract
Soybean is a model for the legume research community because of its importance as a crop, densely populated genetic maps, and the availability of a genome sequence. Even though a whole-genome shotgun sequence and bacterial artificial chromosome (BAC) libraries are available, a high-resolution, chromosome-based physical map linked to the sequence assemblies is still needed for whole-genome alignments and to facilitate map-based gene cloning. Three independent G. max BAC libraries combined with genetic and gene-based markers were used to construct a minimum tiling path (MTP) of BAC clones. A total of 107,214 clones were assembled into 1355 FPC (FingerPrinted Contigs) contigs, incorporating 4628 markers and aligned to the G. max reference genome sequence using BAC end-sequence information. Four different MTPs were made for G. max that covered from 92.6% to 95.0% of the soybean draft genome sequence (gmax1.01). Because our purpose was to pick the most reliable and complete MTP, and not the MTP with the minimal number of clones, the FPC map and draft sequence were integrated and clones with unpaired BES were added to build a high-quality physical map with the fewest gaps possible (http://soybase.org). A physical map was also constructed for the undomesticated ancestor (G. soja) of soybean to explore genome variation between G. max and G. soja. 66,028 G. soja clones were assembled into 1053 FPC contigs covering approximately 547 Mbp of the G. max genome sequence. These physical maps for G. max and its undomesticated ancestor, G. soja, will serve as a framework for ordering sequence fragments, comparative genomics, cloning genes, and evolutionary analyses of legume genomes.Entities:
Keywords: FingerPrinted Contig; genome evolution; genome structure; whole-genome sequencing
Year: 2012 PMID: 22413085 PMCID: PMC3291501 DOI: 10.1534/g3.111.001834
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Summary of soybean BAC libraries used in the FPC maps
| Species | Library | Restriction Enzyme | Avg. Insert Size, kb | Genome Equivalents Coverage | No. of Clones | No. of Clones Fingerprinted |
|---|---|---|---|---|---|---|
| GM_WBa | 150 | 5.4x | 40,320 | 35,145 | ||
| GM_WBb | 150 | 12.0x | 91,160 | 61,379 | ||
| GM_WBc | 131 | 10.9x | 92,160 | 37,658 | ||
| GSS_Ba | 150 | 12.5x | 92,160 | 81,247 |
BAC, bacterial artificial chromosome; FPC, FingerPrinted Contigs.
Sequence coverage length of four different MTPs of G. max
| Scaffold | Gmax1.01 | Gaps (1000 N Arachne Scaffolds) | FPC Clones/Paired BES | FPC Clones/Unpaired BES | All Clones/Paired BES | All Clones/Unpaired BES |
|---|---|---|---|---|---|---|
| Gm01 | 55,915,595 | 14 | 54,031,028 | 54,244,841 | 54,433,357 | 54,601,601 |
| Gm02 | 51,656,713 | 26 | 46,688,786 | 47,183,562 | 47,929,653 | 48,513,213 |
| Gm03 | 47,781,076 | 26 | 43,827,475 | 44,370,246 | 44,853,580 | 45,265,110 |
| Gm04 | 49,243,852 | 15 | 46,627,725 | 46,846,968 | 47,116,298 | 47,312,649 |
| Gm05 | 41,936,504 | 10 | 40,348,170 | 40,564,085 | 40,845,469 | 41,053,468 |
| Gm06 | 50,722,821 | 27 | 46,260,437 | 46,788,486 | 47,211,351 | 47,644,800 |
| Gm07 | 44,683,157 | 14 | 41,102,917 | 41,164,938 | 41,920,695 | 42,048,769 |
| Gm08 | 46,995,532 | 12 | 43,259,780 | 43,501,037 | 43,820,537 | 44,082,436 |
| Gm09 | 46,843,750 | 14 | 44,028,454 | 44,385,184 | 44,620,246 | 44,965,599 |
| Gm10 | 50,969,635 | 30 | 46,425,807 | 46,591,044 | 47,456,533 | 47,653,723 |
| Gm11 | 39,172,790 | 20 | 36,518,365 | 36,952,892 | 37,127,519 | 37,458,495 |
| Gm12 | 40,113,140 | 21 | 36,686,674 | 37,102,118 | 37,428,667 | 37,907,123 |
| Gm13 | 44,408,971 | 24 | 38,222,478 | 38,577,480 | 38,771,342 | 39,016,163 |
| Gm14 | 49,711,204 | 13 | 46,563,799 | 46,777,020 | 47,097,234 | 47,295,050 |
| Gm15 | 50,939,160 | 20 | 47,896,452 | 48,328,091 | 48,564,594 | 48,828,076 |
| Gm16 | 37,397,385 | 23 | 33,365,708 | 33,564,017 | 34,143,930 | 34,511,921 |
| Gm17 | 41,906,774 | 15 | 38,264,930 | 38,544,807 | 39,073,164 | 39,268,910 |
| Gm18 | 62,308,140 | 25 | 58,408,218 | 58,891,175 | 59,710,660 | 60,015,568 |
| Gm19 | 50,589,441 | 17 | 47,831,756 | 48,094,335 | 48,742,366 | 48,956,251 |
| Gm20 | 46,773,167 | 11 | 43,107,226 | 43,451,533 | 43,160,043 | 43,575,687 |
| Total | 950,068,807 | 377 | 879,466,185 | 885,923,859 | 894,027,238 | 899,974,612 |
| Additional coverage from unpaired BES | 6,457,674 | 5,947,374 | ||||
MTP, minimum tiling path; FPC, FingerPrinted Contigs; . BES, BAC end sequences.
Figure 2Representation of integration of the G. max draft sequence and the physical maps of G. max and G. soja. By integrating the draft sequence and the physical maps, gaps in the sequence could be spanned using clones from the physical maps based on BES and gaps in physical map can be spanned by the sequence map. By adding clones with unpaired BES, gaps existing in both the sequence and the physical maps were filled. The yellow bold lines indicate FPC contigs from both physical maps. The black bold line (Chr) represents a sequence scaffold from gmax1.01, and blue fragments represent shotgun sequences that are part of a sequence scaffold. Black and red lines represent BAC clones and green boxes represent BESs. Red lines indicate BAC clones from the MTP. Purple lines indicate the clones with unpaired BESs. Purple dotted line represents a gap that can be partially filled or spanned by adding clones with unpaired BESs.
Map improvement of G. max sequence by filling gaps
| FPC map | gmax1.01 | MTP | ||
|---|---|---|---|---|
| No. of gaps | 1893 | 377 | 835 | |
| No. of gaps filled out by | 148 | 126 | 152 | 160 |
| (4x draft sequence) | (FPC clones including unpaired BES) | (all the fingerprinted clones including unpaired BES) | (clones with unpaired BES) | |
FPC, FingerPrinted Contigs; MTP, minimum tiling path; BES, BAC end sequences.
Summary of clones and contigs used to construct the FPC maps
| Valid fingerprints for FPC assembly | 134,182 | 81,247 |
| Total number of clones assembled | 107,214 | 66,028 |
| Contigs contain: | ||
| >1000 clones | 2 | − |
| 999-800 clones | 5 | 3 |
| 799-600 clones | 15 | − |
| 599-400 clones | 29 | 2 |
| 399-200 clones | 96 | 7 |
| 199-100 clones | 105 | 52 |
| 99-50 clones | 195 | 244 |
| 49-25 clones | 271 | 511 |
| 24-10 clones | 382 | 939 |
| 9-3 clones | 350 | 892 |
| 2 clones | 272 | 159 |
| The number of singletons | 26,968 | 15,219 |
FPC, FingerPrinted Contigs.
Figure 1Schematic of picking a MTP from the G. max FPC map and chromosome-based pseudomolecules. BAC clones were aligned through the fingerprinting method, constructing contigs that were used to build chromosome-based pseudomolecules. These pseudomolecules were constructed based on MTP clones. The yellow bar represents chromosome 14, and blue fragments represent FPC contigs. The middle panel is a screenshot from the FPC program showing part of contig 7308. Each horizontal line represents a single BAC clone, and red lines represent clones used to construct the MTP. The bottom panel shows a schematic of FPC clones anchored to sequence map (blue line at bottom) with positions in base pairs. Red lines indicate clones chosen from the MTP.
Summary of FPC maps of G. max and G. soja
| The number of contigs aligned | 1355 (78% of 1722) | 1053 (37% of 2809) |
| Total physical length of assembled contigs, bp | 838,932,828 (87% of 967,233,029) | 547,374,187 (58% of 950,068,807) |
| Total number of CB bands included in the contigs | 607,788 (93% of 648,007) | 426,033 (52% of 815,128) |
| Average number of bands per BAC | 73.3 | 102.1 |
| The number of markers anchored | 4628 | − |
FPC, FingerPrinted Contigs; BAC, bacterial artificial chromosome.
Summary of markers anchored to the FPC map of G. max
| No. Contigs | No. Clones | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Avg. | 0 | 1 | 2 | >2 | Avg. | 1 | <5 | <10 | ≥10 | |||
| SSR | 1.5 | 417 | 2601 | 301 | 633 | 2.7 | 2698 | 451 | 631 | 172 | ||
| RFLP | 1.6 | 98 | 331 | 145 | 102 | 3.5 | 205 | 306 | 125 | 40 | ||
| MHM | 41 | 503 | 1181 | Total | 1725 | |||||||
FPC, FingerPrinted Contigs; SSR, simple sequence repeat. RFLP, restriction fragment length polymorphism; MHM, multiple-hit markers.
The number and characteristics of G. max BAC clones used for picking MTP
| Library | No. Clones in MTP | |||
|---|---|---|---|---|
| FPC Clones/Paired BES | FPC Clones/Unpaired BES | All Clones/Paired BES | All Clones/Unpaired BES | |
| GM_WBa | 1422 | 1477 | 1019 | 1064 |
| GM_WBb | 3887 | 4034 | 3095 | 3218 |
| GM_WBc | 2035 | 2086 | 2969 | 3045 |
| Total | 7344 | 7597 | 7083 | 7327 |
| Gaps | 914 | 768 | 835 | 675 |
| Avg. of overlap | 21,942 bp | 23,419 bp | 22,094 bp | 23,526 bp |
BAC, bacterial artificial chromosome; MTP, minimum tiling path; BES, BAC end sequences; FPC, FingerPrinted Contigs;
Alignment of G. soja BESs against the G. max genome sequence
| No. Clones | |
|---|---|
| Total Number of | 180,099 |
| Clones with unpaired BES | 2199 |
| Clones with paired BES | 88,905 |
| Clones where only one end aligned | 2675 |
| Clones where BES aligned to different chromosomes | 19,143 |
| Clones where BES aligned to same chromosome | 67,047 |
| 75 kbp < clones < 225 kbp | 59,899 |
| Clones < 75 kbp | 3352 |
| Clones > 225 kbp | 1965 |
| Clones with BES with expected orientation | 63,888 |
| Clones with BES in opposite direction | 1184 |
| Clones with BES same direction | 1975 |
BES, BAC end sequences.
Figure 3Schematic of detecting rearrangements using mapped BES. (A) Potential translocation where paired BESs map to different chromosomes (blue and yellow). (B) Size distribution to show insertions/deletions. Expected range is 75 kbp to 225 kbp. Mapped pairs of BESs outside this range are predicted to have either insertions or deletions. (C) Potential inversion where paired BESs shown as expected on top (inverted relative to each other) are pointing the same direction on bottom.