| Literature DB >> 32265447 |
Yong Zhou1, Dmytro Chebotarov2, Dave Kudrna3, Victor Llaca4, Seunghee Lee3, Shanmugam Rajasekar3, Nahed Mohammed1, Noor Al-Bader1, Chandler Sobel-Sorenson3, Praveena Parakkal4, Lady Johanna Arbelaez5, Natalia Franco5, Nickolai Alexandrov2, N Ruaraidh Sackville Hamilton2, Hei Leung2, Ramil Mauleon2, Mathias Lorieux5,6, Andrea Zuccolo7,8, Kenneth McNally2, Jianwei Zhang9,10, Rod A Wing11,12,13.
Abstract
As the human population grows from 7.8 billion to 10 billion over the next 30 years, breeders must do everything possible to create crops that are highly productive and nutritious, while simultaneously having less of an environmental footprint. Rice will play a critical role in meeting this demand and thus, knowledge of the full repertoire of genetic diversity that exists in germplasm banks across the globe is required. To meet this demand, we describe the generation, validation and preliminary analyses of transposable element and long-range structural variation content of 12 near-gap-free reference genome sequences (RefSeqs) from representatives of 12 of 15 subpopulations of cultivated Asian rice. When combined with 4 existing RefSeqs, that represent the 3 remaining rice subpopulations and the largest admixed population, this collection of 16 Platinum Standard RefSeqs (PSRefSeq) can be used as a template to map resequencing data to detect virtually all standing natural variation that exists in the pan-genome of cultivated Asian rice.Entities:
Mesh:
Year: 2020 PMID: 32265447 PMCID: PMC7138821 DOI: 10.1038/s41597-020-0438-2
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Phylogenetic tree with the accession selected for PSRefSeq sequencing for each of the K = 15 subpopulations and a single admixture group. Groups are colored according to the assignment from Admixture analysis. The subpopulation designation is in parentheses following the name.
Sample collection information for the 12 Oryza sativa accessions.
| Variety Name | Genetic Stock ID | Country Origin | 15 subpops |
|---|---|---|---|
| CHAO MEO::IRGC 80273-1 | IRGC 132278 | Lao PDR | GJ-subtrp |
| Azucena | I1A41685 | Philippines | GJ-trop1 |
| KETAN NANGKA::IRGC 19961-2 | IRGC 128077 | Indonesia | GJ-trop2 |
| ARC 10497::IRGC 12485-1 | IRGC 117425 | India | cB |
| IR 64 | I1A42114 | Philippines | XI-1B1 |
| PR 106::IRGC 53418-1 | IRGC 127742 | India | XI-1B2 |
| LIMA::IRGC 81487-1 | IRGC 127564 | Indonesia | XI-3A |
| KHAO YAI GUANG::IRGC 65972-1 | IRGC 127518 | Thailand | XI-3B1 |
| GOBOL SAIL (BALAM)::IRGC 26624-2 | IRGC 132424 | Bangladesh | XI-2A |
| LIU XU::IRGC 109232-1 | IRGC 125827 | China | XI-3B2 |
| LARHA MUGAD::IRGC 52339-1 | IRGC 125619 | India | XI-2B |
| NATEL BORO::IRGC 34749-1 | IRGC 127652 | Bangladesh | cA2 |
Subpopulations: GJ = Geng-japonica where trop = tropical, subtrp = subtropical; cB = circum-Basmati; XI = Xian-indica; cA = circum-Aus.
Sequencing platforms used and data statistics for the 12 Oryza sativa genomes.
| Variety Name | Sequencing platform | Raw data (Gb) | Depth | Number of subreads (M) | Mean subread length (Kb) |
|---|---|---|---|---|---|
| CHAO MEO::IRGC 80273-1 | PacBio Sequel | 49.1 | 123× | 4.26 | 11.526 |
| Azucena | PacBio Sequel | 57.1 | 143× | 5.40 | 10.581 |
| KETAN NANGKA::IRGC 19961-2 | PacBio Sequel | 49.8 | 125× | 2.78 | 17.876 |
| ARC 10497::IRGC 12485-1 | PacBio Sequel | 44.7 | 112× | 4.06 | 11.026 |
| IR 64 | PacBio Sequel | 59.7 | 149× | 5.24 | 11.393 |
| PR 106::IRGC 53418-1 | PacBio Sequel | 42.2 | 105× | 2.08 | 20.317 |
| LIMA::IRGC 81487-1 | PacBio Sequel | 41.4 | 103× | 2.01 | 20.612 |
| KHAO YAI GUANG::IRGC 65972-1 | PacBio Sequel | 42.5 | 106× | 2.37 | 17.954 |
| GOBOL SAIL (BALAM)::IRGC 26624-2 | PacBio Sequel | 42.2 | 105× | 2.13 | 19.777 |
| LIU XU::IRGC 109232-1 | PacBio Sequel | 55.3 | 138× | 3.66 | 15.109 |
| LARHA MUGAD::IRGC 52339-1 | PacBio Sequel | 45.1 | 113× | 3.22 | 14.011 |
| NATEL BORO::IRGC 34749-1 | PacBio Sequel | 44.4 | 111× | 2.74 | 16.2 |
Fig. 2Genome assembly and validation pipeline.
De novo assembly, BUSCO evaluation and accession numbers in GenBank of the 12 Oryza sativa genomes.
| Variety Name | BioProject | BioSample | Genome size (bp) | #Contigs | Contig N50 (bp) | #Gaps | Scaffold N50 (bp) | BUSCO | Adjust BUSCO | Genome Accession | SRP | Supplementary Files (Bionano optical map) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CHAO MEO::IRGC 80273-1 | PRJNA565484 | SAMN12748601 | 376,856,903 | 55 | 11,024,768 | 43 | 30,350,168 | 97.60% | 98.49% | VYIH00000000 | SRP226088 | SUPPF_0000003210 |
| Azucena | PRJNA424001 | SAMN08217222 | 379,627,553 | 28 | 22,940,949 | 16 | 30,954,872 | 97.80% | 98.69% | PKQC000000000 | SRP227255 | SUPPF_0000003212 |
| KETAN NANGKA::IRGC 19961-2 | PRJNA564615 | SAMN12718029 | 380,759,091 | 21 | 22,679,302 | 9 | 30,696,581 | 98.00% | 98.89% | VYIC00000000 | SRP226080 | SUPPF_0000003204 |
| ARC 10497::IRGC 12485-1 | PRJNA565479 | SAMN12748569 | 378,463,869 | 40 | 17,921,520 | 28 | 30,566,713 | 98.40% | 99.30% | VYID00000000 | SRP226093 | SUPPF_0000003206 |
| IR 64 | PRJNA509165 | SAMN10564385 | 386,698,898 | 104 | 7,352,909 | 92 | 31,218,896 | 95.70% | 96.57% | RWKJ00000000 | SRP227298 | SUPPF_0000003213 |
| PR 106::IRGC 53418-1 | PRJNA563359 | SAMN12672924 | 391,176,105 | 16 | 27,051,416 | 4 | 32,028,703 | 96.60% | 97.48% | VYIB00000000 | SRP226078 | SUPPF_0000003202 |
| LIMA::IRGC 81487-1 | PRJNA564572 | SAMN12715984 | 392,625,308 | 17 | 27,369,091 | 5 | 32,421,942 | 98.50% | 99.40% | VXJH00000000 | SRP226079 | SUPPF_0000003203 |
| KHAO YAI GUANG::IRGC 65972-1 | PRJNA565481 | SAMN12748590 | 393,737,720 | 19 | 21,823,919 | 7 | 32,080,718 | 98.60% | 99.50% | VYIF00000000 | SRP226086 | SUPPF_0000003208 |
| GOBOL SAIL (BALAM)::IRGC 26624-2 | PRJNA564763 | SAMN12721963 | 391,772,995 | 15 | 29,604,901 | 3 | 31,753,752 | 97.90% | 98.79% | VXJI00000000 | SRP226082 | SUPPF_0000003205 |
| LIU XU::IRGC 109232-1 | PRJNA577228 | SAMN13021815 | 392,033,263 | 17 | 30,913,760 | 5 | 32,301,089 | 98.40% | 99.30% | WGGU00000000 | SRP226085 | SUPPF_0000003211 |
| LARHA MUGAD::IRGC 52339-1 | PRJNA565480 | SAMN12748589 | 390,195,943 | 16 | 30,747,645 | 4 | 32,107,744 | 98.60% | 99.50% | VYIE00000000 | SRP226084 | SUPPF_0000003207 |
| NATEL BORO::IRGC 34749-1 | PRJNA565483 | SAMN12748600 | 383,720,936 | 16 | 27,825,079 | 4 | 31,305,988 | 98.10% | 98.99% | VYIG00000000 | SRP226087 | SUPPF_0000003209 |
Fig. 3Bionano optical map validation of chromosome 1 for 12 de novo assemblies.
Abundance of the major TE classes in the 16 Oryza sativa genomes.
| Variety Name | Total | LTR-RT | LINEs | SINEs | DNA_TEs | Unclassified |
|---|---|---|---|---|---|---|
| NIPPONBARE | 46.07 | 23.55 | 1.52 | 0.41 | 16.18 | 4.41 |
| CHAO MEO::IRGC 80273-1 | 46.25 | 24.00 | 1.46 | 0.40 | 15.59 | 4.80 |
| Azucena | 47.07 | 24.48 | 1.47 | 0.40 | 15.82 | 4.89 |
| KETAN NANGKA::IRGC 19961-2 | 46.99 | 24.87 | 1.47 | 0.40 | 15.72 | 4.53 |
| ARC 10497::IRGC 12485-1 | 46.95 | 24.74 | 1.48 | 0.40 | 15.68 | 4.65 |
| PR 106::IRGC 53418-1 | 47.95 | 26.82 | 1.41 | 0.39 | 15.05 | 4.28 |
| Minghui 63 | 47.97 | 26.61 | 1.44 | 0.4 | 15.3 | 4.22 |
| IR 64 | 47.87 | 26.82 | 1.42 | 0.40 | 14.97 | 4.26 |
| Zhenshan 97 | 47.95 | 26.79 | 1.42 | 0.39 | 15.19 | 4.16 |
| LIMA::IRGC 81487-1 | 48.04 | 26.87 | 1.40 | 0.39 | 15.01 | 4.37 |
| KHAO YAI GUANG::IRGC 65972-1 | 48.27 | 27.27 | 1.40 | 0.39 | 14.87 | 4.34 |
| GOBOL SAIL (BALAM)::IRGC 26624-2 | 48.15 | 26.99 | 1.40 | 0.39 | 14.99 | 4.38 |
| LIU XU::IRGC 109232-1 | 46.92 | 27.06 | 1.26 | 0.32 | 14.31 | 3.97 |
| LARHA MUGAD::IRGC 52339-1 | 48.05 | 26.74 | 1.41 | 0.39 | 15.09 | 4.42 |
| N 22::IRGC 19379-1 | 47.79 | 25.95 | 1.44 | 0.39 | 15.20 | 4.81 |
| NATEL BORO::IRGC 34749-1 | 47.33 | 25.75 | 1.42 | 0.40 | 15.12 | 4.64 |
| Measurement(s) | genome • DNA • sequence_assembly • sequence feature annotation • physical map |
| Technology Type(s) | DNA sequencing • PacBio Sequel System • sequence assembly process • transposable elements annotation • Optical Mapping Illumina sequencing |
| Factor Type(s) | Oryza sativa cv. variety |
| Sample Characteristic - Organism | Oryza sativa |