| Literature DB >> 28448065 |
Sebastian Beier1, Axel Himmelbach1, Christian Colmsee1, Xiao-Qi Zhang2, Roberto A Barrero3, Qisen Zhang4, Lin Li5, Micha Bayer6, Daniel Bolser7, Stefan Taudien8, Marco Groth8, Marius Felder8, Alex Hastie9, Hana Šimková10, Helena Staňková10, Jan Vrána10, Saki Chan9, María Muñoz-Amatriaín11, Rachid Ounit12, Steve Wanamaker11, Thomas Schmutzer1, Lala Aliyeva-Schnorr1, Stefano Grasso13, Jaakko Tanskanen14, Dharanya Sampath15, Darren Heavens15, Sujie Cao16, Brett Chapman3, Fei Dai17, Yong Han17, Hua Li16, Xuan Li16, Chongyun Lin16, John K McCooke3, Cong Tan3, Songbo Wang16, Shuya Yin17, Gaofeng Zhou2, Jesse A Poland18, Matthew I Bellgard3, Andreas Houben1, Jaroslav Doležel10, Sarah Ayling15, Stefano Lonardi12, Peter Langridge19, Gary J Muehlbauer5,20, Paul Kersey7, Matthew D Clark15,21, Mario Caccamo15,22, Alan H Schulman14, Matthias Platzer8, Timothy J Close11, Mats Hansson23, Guoping Zhang17, Ilka Braumann24, Chengdao Li2,25,26, Robbie Waugh6,27, Uwe Scholz1, Nils Stein1,28, Martin Mascher1,29.
Abstract
Barley (Hordeum vulgare L.) is a cereal grass mainly used as animal fodder and raw material for the malting industry. The map-based reference genome sequence of barley cv. 'Morex' was constructed by the International Barley Genome Sequencing Consortium (IBSC) using hierarchical shotgun sequencing. Here, we report the experimental and computational procedures to (i) sequence and assemble more than 80,000 bacterial artificial chromosome (BAC) clones along the minimum tiling path of a genome-wide physical map, (ii) find and validate overlaps between adjacent BACs, (iii) construct 4,265 non-redundant sequence scaffolds representing clusters of overlapping BACs, and (iv) order and orient these BAC clusters along the seven barley chromosomes using positional information provided by dense genetic maps, an optical map and chromosome conformation capture sequencing (Hi-C). Integrative access to these sequence and mapping resources is provided by the barley genome explorer (BARLEX).Entities:
Mesh:
Year: 2017 PMID: 28448065 PMCID: PMC5407242 DOI: 10.1038/sdata.2017.44
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Assembly workflow.
(a) Assembly of individual BAC clones from paired-end and mate-pair read data. (b) Data integration procedures for pseudomolecule construction.
BAC assembly and anchoring statistics.
| 6,993 | 6,983 (99.9%) | 6,410 (91.8%) | 7.6 | 81.2 | |
| 9,061 | 8,969 (99.0%) | 8,195 (91.4%) | 9.9 | 104.5 | |
| 8,841 | 8,807 (99.6%) | 8,303 (94.3%) | 7.7 | 87.5 | |
| 8,314 | 8,306 (99.9%) | 7,783 (93.7%) | 6.7 | 91.2 | |
| 8,426 | 8,358 (99.2%) | 7,573 (90.6%) | 9.7 | 72.2 | |
| 8,305 | 7,886 (95.0%) | 6,476 (82.1%) | 7.4 | 70.7 | |
| 8,576 | 7,970 (92.9%) | 6,842 (85.8%) | 8.5 | 65.5 | |
| 8,256 | 8,031 (97.3%) | 6,714 (83.6%) | 7.6 | 83.6 | |
| — | 21,765 | 20,397 (93.7%) | 14.5 | 33.7 | |
| 66,772 | 87,075 | 78,693 (90.4%) | 9.8 | 70.3 |
*Number and percentage of BAC clones that have been assigned genetic positions in the POPSEQ map.
†BAC clones in physical contigs that had not been assigned to chromosomes.
Summary statistics of the updated POPSEQ map of the Morex WGS assembly.
| 74,184 | 123.7 | |
| 130,436 | 202.6 | |
| 119,131 | 187.6 | |
| 96,642 | 170.6 | |
| 117,314 | 177.8 | |
| 121,384 | 168.4 | |
| 132,085 | 190.2 | |
| 791,176 | 1220.9 |
Cluster summary statistics after each step of the BAC overlap graph construction.
| 1 | BAC, FPC | 9,637 | 71,828 | 13,211 | 2,036 | 21 | 12.9 |
| 2 | BAC | 4,890 | 79,871 | 4,002 | 3,202 | 60 | 38.3 |
| 3 | BAC, OM | 4,843 | 79,884 | 3,989 | 3,202 | 61 | 38.8 |
| 4 | FPC, BES, OM | 4,653 | 79,884 | 3,989 | 3,202 | 65 | 41.2 |
| 5 | FPC, BES | 4,562 | 79,908 | 3,965 | 3,202 | 66 | 41.7 |
| 6 | BAC, OM | 4,486 | 79,918 | 3,955 | 3,202 | 66 | 42.4 |
| 7 | FPC, BAC | 4,485 | 79,919 | 3,954 | 3,202 | 66 | 42.4 |
| 8 | FPC, OM | 4,390 | 79,919 | 3,954 | 3,202 | 66 | 43.0 |
| 9 | exBAC | 4,382 | 80,010 | 3,938 | 3,127 | 66 | 43.1 |
| 10 | BAC, OM | 4,323 | 80,010 | 3,938 | 3,127 | 67 | 43.8 |
| 11 | FPC, OM | 4,259 | 80,010 | 3,938 | 3,127 | 69 | 45.2 |
| 12 | BES, FPC | 4,251 | 80,010 | 3,938 | 3,127 | 69 | 45.2 |
*Datasets used in each step (BAC, BAC sequence overlap; FPC, physical map; OM, optical map; BES, BAC end sequences; exBAC, previously excluded BAC assemblis. Consistency with the POPSEQ genetic map was checked in each step.
†An N50 value N indicates that half of all clusters contain at least N BACs.
‡Arithmetic mean of the number of BACs per cluster.
Final cluster statistics.
| 389 | 605 | 324 | 415 | 549 | 768 | 943 | 242 | |
| 65 | 214 | 74 | 78 | 173 | 167 | 162 | 1190 | |
| 562.8 | 785.5 | 704 | 655.5 | 687.8 | 600.2 | 663.8 | 130.6 | |
| 555.9 | 760.3 | 695.8 | 648.4 | 668.2 | 581.1 | 646 | 28.9 | |
| 6.9 | 25.1 | 8.3 | 7.1 | 19.5 | 19.1 | 17.7 | 101.7 | |
| 2.5 | 2.1 | 3.6 | 2.5 | 2.0 | 1.1 | 1 | 0.1 |
Summary statistics of the GBS map.
| 346 | 195 | 133.3 | |
| 383 | 231 | 153.2 | |
| 385 | 231 | 154.9 | |
| 237 | 135 | 115.5 | |
| 474 | 265 | 173.3 | |
| 362 | 188 | 122.7 | |
| 450 | 253 | 143.9 | |
| 2,637 | 1,498 | 996.8 |
Figure 2Collinearity between the Hi-C map and two genetic maps.
The positions of genetic markers (x-axis) are plotted against their genetic positions (y-axis) in a GBS map (top row) and a POPSEQ map (bottom row) of the Morex x Barke recombinant inbred lines.
Figure 3Collinearity between the Hi-C map and a cytogenetic map of chromosome 3H.
Dots mark the positions of probes in the cytogenetic map (x-axis) and the Hi-C-derived pseudomolecule (y-axis). A linear regression line (red) was fitted with the R function lm(). Note that cytogenetic data is not available for distal regions because probes were designed only for non-recombining peri-centromeric regions[61].
Figure 4Accessing sequence and positional information with the barley genome explorer (BARLEX).
The barley pseudomolecule data was imported into BARLEX, where it is directly linked to the IPK Barley BLAST server. Users can paste a nucleotide or amino acid sequence (1) into the BARLEX input query form and select reference database such as pseudomolecules sequence, the set of all BAC assemblies or annotated genes (2). The sequence is then transferred to the IPK barley BLAST Server (3). The web page with the BLAST results (4) contains references to BARLEX information pages for different structural units (BAC sequence contigs, BAC, BAC cluster, chromosomal Hi-C map). For example, the pages of BAC sequence contigs visualize the repeat content based on genome-wide k-mer histograms (5) and are linked to a graph-based visualization (6) of the entire BAC assembly. Summary statistics and positional information of BAC clusters are presented in tables that can be searched, sorted and subsetted using user-defined criteria (7). Users can convert pseudomolecule coordinates (AGP positions) to intervals in the underlying BAC sequence assemblies (8).