Literature DB >> 22413085

Integration of the Draft Sequence and Physical Map as a Framework for Genomic Research in Soybean (Glycine max (L.) Merr.) and Wild Soybean (Glycine soja Sieb. and Zucc.).

Jungmin Ha, Brian Abernathy, William Nelson, David Grant, Xiaolei Wu, Henry T Nguyen, Gary Stacey, Yeisoo Yu, Rod A Wing, Randy C Shoemaker, Scott A Jackson.   

Abstract

Soybean is a model for the legume research community because of its importance as a crop, densely populated genetic maps, and the availability of a genome sequence. Even though a whole-genome shotgun sequence and bacterial artificial chromosome (BAC) libraries are available, a high-resolution, chromosome-based physical map linked to the sequence assemblies is still needed for whole-genome alignments and to facilitate map-based gene cloning. Three independent G. max BAC libraries combined with genetic and gene-based markers were used to construct a minimum tiling path (MTP) of BAC clones. A total of 107,214 clones were assembled into 1355 FPC (FingerPrinted Contigs) contigs, incorporating 4628 markers and aligned to the G. max reference genome sequence using BAC end-sequence information. Four different MTPs were made for G. max that covered from 92.6% to 95.0% of the soybean draft genome sequence (gmax1.01). Because our purpose was to pick the most reliable and complete MTP, and not the MTP with the minimal number of clones, the FPC map and draft sequence were integrated and clones with unpaired BES were added to build a high-quality physical map with the fewest gaps possible (http://soybase.org). A physical map was also constructed for the undomesticated ancestor (G. soja) of soybean to explore genome variation between G. max and G. soja. 66,028 G. soja clones were assembled into 1053 FPC contigs covering approximately 547 Mbp of the G. max genome sequence. These physical maps for G. max and its undomesticated ancestor, G. soja, will serve as a framework for ordering sequence fragments, comparative genomics, cloning genes, and evolutionary analyses of legume genomes.

Entities:  

Keywords:  FingerPrinted Contig; genome evolution; genome structure; whole-genome sequencing

Year:  2012        PMID: 22413085      PMCID: PMC3291501          DOI: 10.1534/g3.111.001834

Source DB:  PubMed          Journal:  G3 (Bethesda)        ISSN: 2160-1836            Impact factor:   3.154


With recent advances in sequencing technology, whole-genome sequencing projects are becoming routine. Several years ago, the legume research community recommended soybean as model genome for Phaseoloid legumes (Gepts ) because of its agronomical importance and existing genomic infrastructure. Shortly thereafter, physical mapping and whole-genome shotgun sequencing efforts for soybean were undertaken resulting in a genome sequence for soybean (Glycine max) (Schmutz ) followed by the resequencing of its undomesticated ancestor, Glycine soja Sieb. and Zucc. (Kim ). Even with a genome sequence, a physical map may still be needed to correctly locate DNA sequences to specific chromosomes, especially because the current short-read sequencing technologies are problematic in obtaining reliable ordering of complete chromosome assemblies as the result of repetitive sequences, large gene families, and segmental duplications that cannot be spanned by the short sequence reads (Lewin ). Clone-based maps have been integral to several genome sequencing projects, contributing to gene cloning, the understanding of genome structure, and evolutionary studies. McPherson et al. illustrated the benefit of using the clone-based physical map in the assembly of the human genome sequence (McPherson ). A physical map also contributed to the sequencing of the Drosophila melanogaster genome (Hoskins 2000), and a combination strategy of physical mapping and sequencing was applied to the mouse genome (Bouck ; Pennisi 2000). To support the increasing interest in map-based gene cloning of important genes, the physical map of Arabidopsis thaliana was constructed, resulting in deeper understanding of genome structure and evolution (Mozo ). Rice genome sequencing data were integrated with a physical map, and this integrated high-resolution physical map facilitated genome sequencing through a minimal tiling path of BAC clones (Chen ). To build a foundation to sequence the maize genome, physical and genetic maps of maize were developed and anchored to each other, resulting in an useful tool for evolutionary studies of maize (Cone ; Wei ; Wei ). For soybean, physical maps were constructed using BAC libraries from cv. Forrest and cv. Faribault (Wu ,b). However, the community selected the cultivar Williams82 for a reference genome sequence. A high-quality physical map was needed as a foundation to improve the usefulness of the whole genome sequence for the research community. An initial physical map for Williams 82 was derived from two BAC libraries made with different restriction enzymes (Pampanwar ; Soderlund ; Warren 2006). This map consisted of 97,272 fingerprinted BAC clones comprising 1893 contigs and approximately 30,000 singletons. The physical map needed to be integrated with the genome sequence and oriented with the genetic map to identify genes underlying quantitative trait loci, which is important for the genetic improvement of soybean and to understand the molecular and genetic basis of traits (Jackson ). To improve the genetic anchoring of physical map of G. max, 3290 microsatellites (simple sequence repeat [SSR]) markers were identified from BAC end sequences (BES) of clones comprising the initial physical map and 265 of these SSR were genetically mapped (Shoemaker ). The genomes of G. max and G. soja have been sequenced using whole-genome shotgun sequencing, G. max with traditional Sanger sequencing, and G. soja with next-generation sequencing. In both instances, a physical map can be used to improve the genome sequence by spanning gaps and correcting alignments. Wild soybean, G. soja, is a promising source of genes/alleles that were lost during domestication bottlenecks (Hyten ). Thus, the physical map of G. soja will be useful to clone potentially valuable genes, to improve the quality of the G. soja genome sequence, and as a foundation for comparative evolutionary studies. For the G. max physical map, a minimum tiling path (MTP) can be picked using BESs aligned to the genome sequence. Traditionally, the main purpose of a MTP has been to efficiently select clones to be sequenced; in other words, to minimize the number of clones to be sequenced by selecting clones that are adjacent and overlap minimally. In the case of G. max, in which the whole-genome shotgun data are available, the primary purpose of the MTP is to have a physical map anchored to the genome sequence, thereby providing a framework for genomic research. A reliable MTP covering nearly the whole genome complements a genome shotgun sequence in that it can be used to correct misalignments and to span gaps, which is important for finishing regions and cloning genes. For G. soja, the physical map provides an anchored, clone-based resource to shuttle between the two genomes, domesticated and undomesticated.

Materials and Methods

Source BAC libraries

The DNA source for soybean BAC libraries was from the cultivar Williams 82 that has been chosen as the standard genotype by the soybean community for genomic studies (Stacey ). Three different restriction enzymes HindIII, BstyI, and EcoRI, were used to construct the three libraries, GM_WBa, GM_WBb, and GM_WBc, respectively (Table 1). The DNA for G. soja BAC library, GSS_Ba, was from a single plant of accession PI468916, and HindIII was used to construct the library (Table 1).
Table 1 

Summary of soybean BAC libraries used in the FPC maps

SpeciesLibraryRestriction EnzymeAvg. Insert Size, kbGenome Equivalents CoverageNo. of ClonesNo. of Clones Fingerprinted
GM_WBaHindIII1505.4x40,32035,145
G. maxGM_WBbBstyI15012.0x91,16061,379
GM_WBcEcoRI13110.9x92,16037,658
G. sojaGSS_BaHindIII15012.5x92,16081,247

BAC, bacterial artificial chromosome; FPC, FingerPrinted Contigs.

BAC, bacterial artificial chromosome; FPC, FingerPrinted Contigs.

Source of sequences

Assembly of shotgun sequenced fragments in soybeans presents substantial challenges because of the duplicated nature of the genome (Shoemaker ), many repeat sequences, and common domains of several gene families. Although the shotgun sequencing data (gmax 1.01) has several fold coverage of the entire genome, 377 gaps remain (Schmutz ). We integrated 950,068,807 bp of sequence length from the 20 pseudomolecules with the physical map (Table 2).
Table 2 

Sequence coverage length of four different MTPs of G. max

ScaffoldGmax1.01Gaps (1000 N
Arachne Scaffolds)FPC Clones/Paired BESFPC Clones/Unpaired BESAll Clones/Paired BESAll Clones/Unpaired BES
Gm0155,915,5951454,031,02854,244,84154,433,35754,601,601
Gm0251,656,7132646,688,78647,183,56247,929,65348,513,213
Gm0347,781,0762643,827,47544,370,24644,853,58045,265,110
Gm0449,243,8521546,627,72546,846,96847,116,29847,312,649
Gm0541,936,5041040,348,17040,564,08540,845,46941,053,468
Gm0650,722,8212746,260,43746,788,48647,211,35147,644,800
Gm0744,683,1571441,102,91741,164,93841,920,69542,048,769
Gm0846,995,5321243,259,78043,501,03743,820,53744,082,436
Gm0946,843,7501444,028,45444,385,18444,620,24644,965,599
Gm1050,969,6353046,425,80746,591,04447,456,53347,653,723
Gm1139,172,7902036,518,36536,952,89237,127,51937,458,495
Gm1240,113,1402136,686,67437,102,11837,428,66737,907,123
Gm1344,408,9712438,222,47838,577,48038,771,34239,016,163
Gm1449,711,2041346,563,79946,777,02047,097,23447,295,050
Gm1550,939,1602047,896,45248,328,09148,564,59448,828,076
Gm1637,397,3852333,365,70833,564,01734,143,93034,511,921
Gm1741,906,7741538,264,93038,544,80739,073,16439,268,910
Gm1862,308,1402558,408,21858,891,17559,710,66060,015,568
Gm1950,589,4411747,831,75648,094,33548,742,36648,956,251
Gm2046,773,1671143,107,22643,451,53343,160,04343,575,687
Total950,068,807377879,466,185885,923,859894,027,238899,974,612
Additional coverage from unpaired BES6,457,6745,947,374

MTP, minimum tiling path; FPC, FingerPrinted Contigs; . BES, BAC end sequences.

MTP, minimum tiling path; FPC, FingerPrinted Contigs; . BES, BAC end sequences.

Source of MTP

Because the gmax 1.01 soybean assembly did not filter out clones with unusually long or short inserts, we limited BAC lengths to a range of 75 kb to 225 kb when MTPs were picked from two different clone pools; one pool contained only BAC clones, which were used to construct the FingerPrinted Contigs (FPC) map (clone pool A), and the other contained all the BAC clones from the three BAC libraries (clone pool B). Two kinds of MTPs were picked from each clone pool by using Dijkstra’s shortest path algorithm (Dijkstra 1983). One MTP was picked from only the BAC clones with paired BESs and the other from BAC clones with both paired and unpaired BESs in order to try and extend coverage into sequence gaps (Figure 2).
Figure 2

Representation of integration of the G. max draft sequence and the physical maps of G. max and G. soja. By integrating the draft sequence and the physical maps, gaps in the sequence could be spanned using clones from the physical maps based on BES and gaps in physical map can be spanned by the sequence map. By adding clones with unpaired BES, gaps existing in both the sequence and the physical maps were filled. The yellow bold lines indicate FPC contigs from both physical maps. The black bold line (Chr) represents a sequence scaffold from gmax1.01, and blue fragments represent shotgun sequences that are part of a sequence scaffold. Black and red lines represent BAC clones and green boxes represent BESs. Red lines indicate BAC clones from the MTP. Purple lines indicate the clones with unpaired BESs. Purple dotted line represents a gap that can be partially filled or spanned by adding clones with unpaired BESs.

Spanning gaps in the FPC map

To span the gaps in the preliminary FPC map having 1893 contigs, the map was integrated with a preliminary 4x sequence assembly from the Joint Genome Institute and the Stanford Human Genome Center. The average length of contigs was 157,040 bp, and the maximum size was 20,109,437 bp (Batzoglou ; Jaffe ). The integration was performed using the BSS and MTP modules of FPC as described in Nelson and Soderlund (2009). The 148 spanned gaps (contig merges) were automatically identified and performed by FPC (Table 3).
Table 3 

Map improvement of G. max sequence by filling gaps

FPC mapgmax1.01MTP
No. of gaps1893377835
No. of gaps filled out by148126152160
(4x draft sequence)(FPC clones including unpaired BES)(all the fingerprinted clones including unpaired BES)(clones with unpaired BES)

FPC, FingerPrinted Contigs; MTP, minimum tiling path; BES, BAC end sequences.

FPC, FingerPrinted Contigs; MTP, minimum tiling path; BES, BAC end sequences. There are many gaps represented as a series of Ns in the 8x soybean sequence (gmax1.01). A total of 1000 Ns indicate gaps between scaffolds that were not spanned using the Arachne assembler, 100 Ns indicate gaps without length information, and a specific number of Ns indicate gaps of known size (Figure 2). We assumed that a BAC clone would span at least part of a gap when one BES aligned near the edge of a contig abutting the gap and the clone pointed into the sequence gap. Some of the larger gaps with thousands of Ns were spanned by BAC clones with paired and/or unpaired BES by blast searching against the physical map already integrated with the 8x draft sequence data. To increase the coverage of the MTP picked from the clones building the FPC map, the physical location of the gaps on the FPC map were checked and the clones with unpaired BESs corresponding to the location were added to the MTP.

Results

BAC libraries

Three Glycine max cv. Williams 82 BAC libraries, GM_WBa, GM_WBb, and GM_WBc (http://genome.arizona.edu), were made with three different restriction enzymes, HindIII, BstyI, and EcoRI, respectively, to reduce the likelihood of missing parts of the genome attributable to cloning bias. All three libraries were used to construct the G. max physical map. A BAC library was constructed using HindIII for Glycine soja PI468916, called GSS_Ba. The average insert size of GM_WBa, GM_WBb, GM_WBc, and GSS_Ba were 150, 150, 131, and 150 kb and represent 5.4, 12, 10.9, and 12x coverage of each genome, respectively. Subsets of each library were fingerprinted for construction of the FPC maps (Table 1).

FPC maps for G. max and G. soja

Fingerprinted clones were clustered into contigs on the basis of their probability of coincidence score using the FPC software package (Soderlund , 2000). In total, 134,182 G. max and 81,247 G. soja BAC clones were used to construct the physical maps. A total of 107,214 G. max clones and 66,028 G. soja clones were ordered into contigs, and 26,968 and 15,219 clones remained as singletons (BACs that did not order into a contig), respectively (Table 4). Of the contigs, 1355 (78%) of G. max’s and 1053 (37%) of G. soja’s were ordered and oriented to 20 soybean chromosomes (Schmutz ) using the alignment function of FPC (Nelson and Soderlund 2009). The aligned contigs spanned 838,932,828 bp for G. max and 547,374,187 bp for G. soja of the sequence length (87% of 967,233,029 bp and 58% of 950,068,807 bp, gmax 1.01; Figure 1). For the G. max alignment, unanchored sequence scaffolds were included in gmax 1.01, whereas for G. soja, only anchored scaffolds were used. In terms of the consensus FPC map, 607,788 and 426,033 cb units (Consensus Bands) were included in the aligned contigs for G. max and G. soja, respectively (93% of 648,007 cb units and 52% of 815,128; Table 5).
Table 4 

Summary of clones and contigs used to construct the FPC maps

G. maxG. soja
Valid fingerprints for FPC assembly134,18281,247
Total number of clones assembled107,21466,028
Contigs contain:
 >1000 clones2
 999-800 clones53
 799-600 clones15
 599-400 clones292
 399-200 clones967
 199-100 clones10552
 99-50 clones195244
 49-25 clones271511
 24-10 clones382939
 9-3 clones350892
 2 clones272159
The number of singletons26,96815,219

FPC, FingerPrinted Contigs.

Figure 1

Schematic of picking a MTP from the G. max FPC map and chromosome-based pseudomolecules. BAC clones were aligned through the fingerprinting method, constructing contigs that were used to build chromosome-based pseudomolecules. These pseudomolecules were constructed based on MTP clones. The yellow bar represents chromosome 14, and blue fragments represent FPC contigs. The middle panel is a screenshot from the FPC program showing part of contig 7308. Each horizontal line represents a single BAC clone, and red lines represent clones used to construct the MTP. The bottom panel shows a schematic of FPC clones anchored to sequence map (blue line at bottom) with positions in base pairs. Red lines indicate clones chosen from the MTP.

Table 5 

Summary of FPC maps of G. max and G. soja

G. maxG. soja
The number of contigs aligned1355 (78% of 1722)1053 (37% of 2809)
Total physical length of assembled contigs, bp838,932,828 (87% of 967,233,029)547,374,187 (58% of 950,068,807)
Total number of CB bands included in the contigs607,788 (93% of 648,007)426,033 (52% of 815,128)
Average number of bands per BAC73.3102.1
The number of markers anchored4628

FPC, FingerPrinted Contigs; BAC, bacterial artificial chromosome.

FPC, FingerPrinted Contigs. Schematic of picking a MTP from the G. max FPC map and chromosome-based pseudomolecules. BAC clones were aligned through the fingerprinting method, constructing contigs that were used to build chromosome-based pseudomolecules. These pseudomolecules were constructed based on MTP clones. The yellow bar represents chromosome 14, and blue fragments represent FPC contigs. The middle panel is a screenshot from the FPC program showing part of contig 7308. Each horizontal line represents a single BAC clone, and red lines represent clones used to construct the MTP. The bottom panel shows a schematic of FPC clones anchored to sequence map (blue line at bottom) with positions in base pairs. Red lines indicate clones chosen from the MTP. FPC, FingerPrinted Contigs; BAC, bacterial artificial chromosome.

Genetic marker data for G. max

For a physical map to be useful in the assembly of a whole-genome sequence, it must be anchored to the genetic map (Jackson ). A genetically anchored physical map is helpful not only for gene cloning but for a better understanding of genome structure that might confound a whole genome sequencing strategy (Shoemaker ). Genetic markers and gene-based sequences from G. max were used to screen the BAC libraries (results available at http://www.soymap.org) to integrate the genetic and physical maps. The soybean genome sequence was then combined with the physical map using BES (Schmutz ) so that the FPC contigs could be further integrated with the sequence and genetic maps. In this study, 4628 genetic markers consisting of 3952 SSR markers and 676 RFLP markers were anchored to the G. max physical map. Of these markers, 1725 were multiple-hit markers (MHM), indicating that the markers were anchored more than two BAC clones, 1181 MHM were linked to more than two contigs, 503 MHM were anchored to multiple clones on one contig, and 41 MHM were anchored to multiple singletons. The average number of contigs hit by the 3952 SSR markers was 1.5, and the average number of contigs hit by 676 RFLP markers was 1.6. Of 3952 SSR markers, 417 hit 0 contigs, 2601 hit 1 contig, 301 hit 2 contigs, and 633 markers hit more than 2 contigs. Of the 676 RFLP markers, 98 hit 0 contigs, 331 hit 1 contig, 145 hit 2 contigs, and 102 hit more than 2 contigs (Table 6). There were many MHM primarily as the result of the short sequences used to screen the BAC libraries and the duplicated soybean genome; however, these data are useful for confirmation of clone order and contig integrity and alignment to the sequence map.
Table 6 

Summary of markers anchored to the FPC map of G. max

No. ContigsNo. Clones
Avg.012>2Avg.1<5<10≥10
SSR1.541726013016332.72698451631172
RFLP1.6983311451023.520530612540
MHM415031181Total1725

FPC, FingerPrinted Contigs; SSR, simple sequence repeat. RFLP, restriction fragment length polymorphism; MHM, multiple-hit markers.

FPC, FingerPrinted Contigs; SSR, simple sequence repeat. RFLP, restriction fragment length polymorphism; MHM, multiple-hit markers.

Minimum tiling path (MTP) for G. max

Four paradigms have been used to pick minimal tiling paths from FPC fingerprint maps. The first is a map-based approach. Fingerprints of clone pairs that appear to have minimal overlap are analyzed in the FPC gel image display (Coulson ). The second is a BES-based approach in which a seed clone is selected and sequenced. This sequence is used to query a BES database to find a minimally overlapping clone; the process is then repeated iteratively (Venter ). The third is a hybrid of the first two in which the seed clone selecting and extending process is the same as mentioned previously but the overlap is verified using a map-based approach to reduce the risk of false-positive overlaps (Marra ). The fourth approach makes use of both BES and existing genomic sequence by using BES-to-sequence alignments to estimate BAC overlaps more accurately than is possible from fingerprint overlaps alone. Functions to implement this approach are built into FPC (Nelson and Soderlund 2009). In the case of soybean, a genome sequence data (gmax 1.01) is already available. We integrated the sequence with the FPC map to build BAC-based pseudomolecules representing the 20 soybean chromosomes (http://soybase.org). Therefore, our MTP does not need to be “minimal” in the sense of budget constraints for BAC sequencing, and we instead selected BAC clones with the greatest reliability while attempting to minimize overlap between adjacent BACs (Figure 1). Two types of MTPs were picked from two different clone pools: (A) using only the clones contained in the FPC map; and (B) using all the clones from all three BAC libraries that had BESs, which may have been excluded from the FPC map because of fingerprinting errors (hereafter referred to as clone pools A and B, respectively). In the first approach, proximity in FPC provides an additional confirmation of overlapping MTP clones; however, a number of clones that have BES are not contained in the FPC map because of fingerprinting failures. FPC provides an approximation of where clones should be relative to one another in a contig as there may be error in the band calling of individual clones or in the determination of clone overlap. Therefore, for the clone-ordering process, clones may not end up in the FPC map although BESs can be used to order clones relative to the genome sequence. Thus, we used not only the FPC clones but also the clones not in FPC but having BESs to improve the accuracy of the BAC-based maps. The MTP with only FPC clones consists of 1422 GM_WBa, 3887 GM_WBb, and 2035 GM_WBc BAC clones containing 914 gaps and an average of 21.9 kbp overlap between clones. The MTP with all the fingerprinted clones, even those not in FPC contigs, comprises 1019 GM_WBa, 3095 GM_WBb, and 2969 GM_WBc clones with 835 gaps and an average of 22.1 kbp overlap between clones (Table 7). To attempt to span gaps in the sequence scaffolds, clones with unpaired BES were added to MTPs. BACs with unpaired BES were anchored to MTP only when they aligned near the edge of a contig pointing toward the gap (Figure 2). In the MTP composed of clones only in the FPC map, 146 gaps were spanned by clones with unpaired BESs and the average overlapping region was elongated by an average of 1.5 kbp. In the MTP built with all three BAC libraries, 160 gaps were covered by the clones with unpaired BESs and the BAC overlaps were extended by an average of 1.4 kbp (Tables 3 and 7).
Table 7

The number and characteristics of G. max BAC clones used for picking MTP

LibraryNo. Clones in MTP
FPC Clones/Paired BESFPC Clones/Unpaired BESAll Clones/Paired BESAll Clones/Unpaired BES
GM_WBa1422147710191064
GM_WBb3887403430953218
GM_WBc2035208629693045
Total7344759770837327
Gaps914768835675
Avg. of overlap21,942 bp23,419 bp22,094 bp23,526 bp

BAC, bacterial artificial chromosome; MTP, minimum tiling path; BES, BAC end sequences; FPC, FingerPrinted Contigs;

BAC, bacterial artificial chromosome; MTP, minimum tiling path; BES, BAC end sequences; FPC, FingerPrinted Contigs; Representation of integration of the G. max draft sequence and the physical maps of G. max and G. soja. By integrating the draft sequence and the physical maps, gaps in the sequence could be spanned using clones from the physical maps based on BES and gaps in physical map can be spanned by the sequence map. By adding clones with unpaired BES, gaps existing in both the sequence and the physical maps were filled. The yellow bold lines indicate FPC contigs from both physical maps. The black bold line (Chr) represents a sequence scaffold from gmax1.01, and blue fragments represent shotgun sequences that are part of a sequence scaffold. Black and red lines represent BAC clones and green boxes represent BESs. Red lines indicate BAC clones from the MTP. Purple lines indicate the clones with unpaired BESs. Purple dotted line represents a gap that can be partially filled or spanned by adding clones with unpaired BESs.

Alignment of G. soja BESs to G. max genome sequence

G. soja’s BES were aligned to G. max’s whole-genome sequence (gmax1.01) to detect structural difference between G. max and G. soja. Of 180,099 total BESs, 88,950 clones have paired end sequences, and 2199 clones have sequence for one end only (Table 8). Alignments of these BESs to the gmax1.01 genome resulted in 2675 of the 88,905 clones having only one end aligned to the reference genome. A majority of the clones, 67,047, had BESs that could be aligned to the same chromosome; however, 19,143 clones had BESs that aligned to different chromosomes, indicative of potential rearrangements (Figure 3A).
Table 8

Alignment of G. soja BESs against the G. max genome sequence

No. Clones
Total Number of G. soja BES180,099
Clones with unpaired BES2199
Clones with paired BES88,905
 Clones where only one end aligned2675
 Clones where BES aligned to different chromosomes19,143
 Clones where BES aligned to same chromosome67,047
  75 kbp < clones < 225 kbp59,899
  Clones < 75 kbp3352
  Clones > 225 kbp1965
  Clones with BES with expected orientation63,888
  Clones with BES in opposite direction1184
  Clones with BES same direction1975

BES, BAC end sequences.

Figure 3

Schematic of detecting rearrangements using mapped BES. (A) Potential translocation where paired BESs map to different chromosomes (blue and yellow). (B) Size distribution to show insertions/deletions. Expected range is 75 kbp to 225 kbp. Mapped pairs of BESs outside this range are predicted to have either insertions or deletions. (C) Potential inversion where paired BESs shown as expected on top (inverted relative to each other) are pointing the same direction on bottom.

BES, BAC end sequences. Schematic of detecting rearrangements using mapped BES. (A) Potential translocation where paired BESs map to different chromosomes (blue and yellow). (B) Size distribution to show insertions/deletions. Expected range is 75 kbp to 225 kbp. Mapped pairs of BESs outside this range are predicted to have either insertions or deletions. (C) Potential inversion where paired BESs shown as expected on top (inverted relative to each other) are pointing the same direction on bottom. By examining the distance and orientation of paired BESs, we were able to look at intrachromosomal rearrangements. BES pairs when aligned to the genome should be inverted relative to each other (sequencing from either end of the cloning vector) and we expected the distance between the ends to be within 75 to 225 kbp of each other (Figure 3B). Of the 67,047 clones where paired BESs aligned to same chromosomes, 89.3% (59,899) were within a range of 75 kbp to 225 kbp, 2.9% (1965) were greater than 225 kbp, and 5.0% (3352) less than 75 kbp apart (supporting information, Figure S1). BAC clones where paired BESs aligned more than 1.5 Mbp apart were excluded as potential artifacts. Of 3796 clones, 1965 were included within 225-kbp to 1.5-Mbp range. A majority of clones fell within the expected distance of an average BAC library insert distribution although there were many clones that had potential insertions/deletions. In terms of orientation of BESs where both BESs were located on same chromosome, 63,888 clones had the expected orientation (BESs pointing toward each other; Figure 3C). A total of 1184 clones had BESs pointing in the opposite direction, and another 1975 clones had BESs pointing in the same direction, indicative of potential inversions (Table 8).

Discussion

The MTP with the fewest gaps and the most coverage Over G. max genome sequence

To increase the coverage of the physical map but maintain reliability, three approaches were considered. First, the preliminary FPC map was integrated with whole-genome draft sequence, meaning that the draft sequence was aligned to the FPC contigs via BES alignments. A number of FPC contigs were merged based with this approach, and 148 gaps in the FPC map were closed (Table 3). This was done using the preliminary 4x sequence assembly from the Joint Genome Institute–Stanford Human Genome Center, using the Arachne assembler (Batzoglou ; Jaffe ); later assemblies did not yield additional FPC merges. Second, to increase coverage of the sequence map, clones with unpaired BES were added to the draft sequence and to the MTP (Figure 2). The 8x draft sequence (gmax1.01) that consists of 20 scaffolds covering 950,068,807 bp of sequence length has 377 gaps indicated with 1000 Ns that are not spanned by only paired BES information (Table 3). Nearly one-third, 126 of 377 sequence gaps (33.4%), were spanned by BAC clones from the clone pool A (only the clones contained in the FPC map) with or without paired BESs and an additional 26 gaps (152 of 377, 40.3%) by clones from clone pool B (all fingerprinted clones from the three BAC libraries) with or without paired BES. The MTP picked from all the fingerprinted clones (clone pool B) with paired BESs had 835 gaps of which 160 were covered by adding 244 clones with unpaired BES resulting in additional coverage of as much as 5,947,374 bp. In the case of the MTP picked from only the FPC clones (clone pool A), 6,457,374 bp was covered from clones with unpaired BESs. Finally, four different MTPs were picked from two different BAC clone pools to maximize coverage and minimize gaps: (1) FPC clone pool A, in which all BACs have paired BESs; (2) FPC clone pool A, in which BACs have both paired and unpaired BESs; (3) Pool B of all three BAC libraries, in which BACs have paired BESs; and (4) Pool B of all three BAC library, in which BACs have both paired and unpaired BESs (Table 7). Comparing MTPs 3 and 4 to 1 and 2, ∼ 80 sequence gaps were spanned, and the average length of overlap was similar. Because only ∼60% of the three BAC libraries (134,182/223,640) were used to construct the FPC map, there were more options with the larger pools to select clones that had more sequence coverage and less overlap with adjacent clones. Thus, when all clones were used, the number of clones used to build the MTPs decreased and the coverage length increased. When only clones with paired BES were used, it increased by 14,561,053 bp (from MTP1 to MTP3), and when both paired and unpaired BES were utilized, it increased by 14,050,753 bp (from MTP2 to MTP4). Comparing MTPs 2 and 4 to 1 and 3, in terms of BESs, ∼140 gaps were spanned, and the average length of overlap was increased by ∼1.4 kbp. Once an MTP was picked using clones with paired BES, clones with unpaired BES were used only where we were unable to place clones with paired BESs. Therefore, it was reasonable that both the total numbers of clones used to build the MTPs and the average lengths of overlap increased. The sum of gaps covered by the clones with unpaired BES in both pools was 306, which was ∼1.8 times more than the sum of gaps spanned when MTPs were picked in the larger pool with all the three BAC libraries. We conclude that the MTP selected using all the three BAC libraries containing clones with paired and unpaired BES is the best in that it had fewer gaps and the greatest coverage of the sequence map. In instances in which users need to know the relative locations of clones, this can be inferred through the FPC map constructed using clones with both paired and unpaired BESs. This high-resolution chromosome-anchored physical map will serve as an important tool for (1) improving the genome sequence by spanning gaps (in progress); (2) resolving assembly errors caused by repetitive sequences, large gene families and segmental duplications; (3) map-based cloning; and (4) cloning sequences that are too large or repetitive for polymerase chain reaction−based cloning (http://soybase.org).

The physical map of G. soja parallel to G. max genome sequence

FPC-based physical maps were originally made to assist in clone-by-clone sequencing by identifying minimal tiling paths; indeed, the maize FPC map was used for this purpose as recently as 2009 (Schnable ). In the case of whole-genome shotgun sequencing, physical maps may be used for closing sequence gaps, confirmation of the sequence assembly, and to provide an anchored, clone-based resource for further research. With the transition to “next-generation” sequencing technologies, BAC-based maps can be even more crucial for ordering sequence contigs/scaffolds and confirming assemblies (Mardis 2008; Shendure and Ji 2008). Wild soybean, G. soja, genome was sequenced using the Illumina Genome Analyzer resulting in 48.8 Gbp of sequence, ∼52-fold sequence coverage of the genome. The short reads (35 or 76 bp) were mapped to gmax1.01 reference for assembly (Kim ). Although it covered ∼43-fold of the reference genome, structural differences between two genomes were difficult to analyze because of the short read lengths and short distances between paired reads (Findley ; Mahama ; Yang ). Putative chromosomal structural rearrangements between G. soja and G. max could be detected through the alignment of BESs from G. soja against the G. max reference sequence (gmax 1.01; Table 8). BAC clones in which paired BESs aligned to different chromosomes indicate potential translocations; however, this interpretation is complicated by recent polyploidy events that occurred in the genus glycine. Insertions and deletions could be predicted from clones where paired BESs aligned too far (>225 kb) or too close (<75 kb) from each other on a chromosome. Inversions were predicted from paired BESs that pointed in either the opposition or same direction, as opposed to the expected orientation of toward each other (Figure 3). The average insert size of paired BESs between 75 kbp and 225 kbp was 146 kbp, consistent with the average insert size of GSS_Ba G. soja library (150 kbp; Table 1). The average insert size of paired BESs greater than 225 kbp was ∼445 kbp and less than 75 kbp was ∼37 kbp (Figure S1). This is an underestimate because small insertions or deletions would be missed because of the variability in BAC insert sizes. However, we were able to calculate a rough estimate of how much of the genome might be in flux between the two species (Kim ). Considering insertions and deletion only, we estimate that at least 998 kbp is flux between G. soja and the domesticated G. max. The estimated sizes of insertions and deletions were ∼300 kbp and 110 kbp, respectively, and deletions were 71% more frequent than insertions. A few hotspots for insertions, deletions, and inversions were detected on the G. max chromosomes (Figure S2). The importance of wild soybean (G. soja) as genetic resource for potentially valuable genes for introgression into soybean cannot be overstated. This was the reasoning for the sequencing of G. soja accession IT182932 as well another 17 other accessions of wild soybean (to 5x sequence coverage) (Kim ; Lam ). The sequence similarity between G. max and G. soja is ∼98%; however, structural differences are not captured in this statistic. Reciprocal translocations, segmental duplications, and insertions/deletions complicate the ability to map G. soja using G. max as a reference and short read WGS does not currently capture this information. Thus, physical maps remain useful for investigating and describing structural evolution that has occurred between these two genomes and to allow researchers to effectively shuttle between the genomes to capture useful genetic information for crop improvement and basic genetics.
  35 in total

1.  Shotgun sample sequence comparisons between mouse and human genomes.

Authors:  J B Bouck; M L Metzker; R A Gibbs
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  A BAC- and BIBAC-based physical map of the soybean genome.

Authors:  Chengcang Wu; Shuku Sun; Padmavathi Nimmakayala; Felipe A Santos; Khalid Meksem; Rachael Springman; Kejiao Ding; David A Lightfoot; Hong-Bin Zhang
Journal:  Genome Res       Date:  2004-01-12       Impact factor: 9.043

3.  Construction and characterization of a soybean bacterial artificial chromosome library and use of multiple complementary libraries for genome physical mapping.

Authors:  C-C Wu; P Nimmakayala; F A Santos; R Springman; C Scheuring; K Meksem; D A Lightfoot; H-B Zhang
Journal:  Theor Appl Genet       Date:  2004-05-26       Impact factor: 5.699

4.  Toward a physical map of the genome of the nematode Caenorhabditis elegans.

Authors:  A Coulson; J Sulston; S Brenner; J Karn
Journal:  Proc Natl Acad Sci U S A       Date:  1986-10       Impact factor: 11.205

5.  Genome structure in soybean revealed by a genomewide genetic map constructed from a single population.

Authors:  Kiwoung Yang; Jung-Kyung Moon; Namhee Jeong; Kyoungwhan Back; Hwan Mook Kim; Soon-Chun Jeong
Journal:  Genomics       Date:  2008-05-16       Impact factor: 5.736

Review 6.  Next-generation DNA sequencing methods.

Authors:  Elaine R Mardis
Journal:  Annu Rev Genomics Hum Genet       Date:  2008       Impact factor: 8.929

7.  A physical map of the human genome.

Authors:  J D McPherson; M Marra; L Hillier; R H Waterston; A Chinwalla; J Wallis; M Sekhon; K Wylie; E R Mardis; R K Wilson; R Fulton; T A Kucaba; C Wagner-McPherson; W B Barbazuk; S G Gregory; S J Humphray; L French; R S Evans; G Bethel; A Whittaker; J L Holden; O T McCann; A Dunham; C Soderlund; C E Scott; D R Bentley; G Schuler; H C Chen; W Jang; E D Green; J R Idol; V V Maduro; K T Montgomery; E Lee; A Miller; S Emerling; R Gibbs; S Scherer; J H Gorrell; E Sodergren; K Clerc-Blankenburg; P Tabor; S Naylor; D Garcia; P J de Jong; J J Catanese; N Nowak; K Osoegawa; S Qin; L Rowen; A Madan; M Dors; L Hood; B Trask; C Friedman; H Massa; V G Cheung; I R Kirsch; T Reid; R Yonescu; J Weissenbach; T Bruls; R Heilig; E Branscomb; A Olsen; N Doggett; J F Cheng; T Hawkins; R M Myers; J Shang; L Ramirez; J Schmutz; O Velasquez; K Dixon; N E Stone; D R Cox; D Haussler; W J Kent; T Furey; S Rogic; S Kennedy; S Jones; A Rosenthal; G Wen; M Schilhabel; G Gloeckner; G Nyakatura; R Siebert; B Schlegelberger; J Korenberg; X N Chen; A Fujiyama; M Hattori; A Toyoda; T Yada; H S Park; Y Sakaki; N Shimizu; S Asakawa; K Kawasaki; T Sasaki; A Shintani; A Shimizu; K Shibuya; J Kudoh; S Minoshima; J Ramser; P Seranski; C Hoff; A Poustka; R Reinhardt; H Lehrach
Journal:  Nature       Date:  2001-02-15       Impact factor: 49.962

8.  Genome duplication in soybean (Glycine subgenus soja).

Authors:  R C Shoemaker; K Polzin; J Labate; J Specht; E C Brummer; T Olson; N Young; V Concibido; J Wilcox; J P Tamulonis; G Kochert; H R Boerma
Journal:  Genetics       Date:  1996-09       Impact factor: 4.562

9.  ARACHNE: a whole-genome shotgun assembler.

Authors:  Serafim Batzoglou; David B Jaffe; Ken Stanley; Jonathan Butler; Sante Gnerre; Evan Mauceli; Bonnie Berger; Jill P Mesirov; Eric S Lander
Journal:  Genome Res       Date:  2002-01       Impact factor: 9.043

10.  The physical and genetic framework of the maize B73 genome.

Authors:  Fusheng Wei; Jianwei Zhang; Shiguo Zhou; Ruifeng He; Mary Schaeffer; Kristi Collura; David Kudrna; Ben P Faga; Marina Wissotski; Wolfgang Golser; Susan M Rock; Tina A Graves; Robert S Fulton; Ed Coe; Patrick S Schnable; David C Schwartz; Doreen Ware; Sandra W Clifton; Richard K Wilson; Rod A Wing
Journal:  PLoS Genet       Date:  2009-11-20       Impact factor: 5.917

View more
  3 in total

1.  Integrated physical, genetic and genome map of chickpea (Cicer arietinum L.).

Authors:  Rajeev K Varshney; Reyazul Rouf Mir; Sabhyata Bhatia; Mahendar Thudi; Yuqin Hu; Sarwar Azam; Yong Zhang; Deepa Jaganathan; Frank M You; Jinliang Gao; Oscar Riera-Lizarazu; Ming-Cheng Luo
Journal:  Funct Integr Genomics       Date:  2014-03-08       Impact factor: 3.410

2.  Transcriptomic dissection reveals wide spread differential expression in chickpea during early time points of Fusarium oxysporum f. sp. ciceri Race 1 attack.

Authors:  Sumanti Gupta; Anirban Bhar; Moniya Chatterjee; Amartya Ghosh; Sampa Das
Journal:  PLoS One       Date:  2017-05-25       Impact factor: 3.240

Review 3.  The Untapped Genetic Reservoir: The Past, Current, and Future Applications of the Wild Soybean (Glycine soja).

Authors:  Janice Kofsky; Hengyou Zhang; Bao-Hua Song
Journal:  Front Plant Sci       Date:  2018-07-09       Impact factor: 5.753

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.