| Literature DB >> 26506948 |
Kyunghee Kim1,2, Sang-Choon Lee1, Junki Lee1, Yeisoo Yu2,3, Kiwoung Yang1,4, Beom-Soon Choi2, Hee-Jong Koh1, Nomar Espinosa Waminal1, Hong-Il Choi1, Nam-Hoon Kim1, Woojong Jang1, Hyun-Seung Park1, Jonghoon Lee1, Hyun Oh Lee1,2, Ho Jun Joh1, Hyeon Ju Lee1, Jee Young Park1, Sampath Perumal1, Murukarthick Jayakodi1, Yun Sun Lee1, Backki Kim1, Dario Copetti3, Soonok Kim5, Sunggil Kim6, Ki-Byung Lim7, Young-Dong Kim8, Jungho Lee9, Kwang-Su Cho10, Beom-Seok Park11, Rod A Wing3, Tae-Jin Yang1.
Abstract
Cytoplasmic chloroplast (cp) genomes and nuclear ribosomal DNA (nR) are the primary sequences used to understand plant diversity and evolution. We introduce a high-throughput method to simultaneously obtain complete cp and nR sequences using Illumina platform whole-genome sequence. We applied the method to 30 rice specimens belonging to nine Oryza species. Concurrent phylogenomic analysis using cp and nR of several of specimens of the same Oryza AA genome species provides insight into the evolution and domestication of cultivated rice, clarifying three ambiguous but important issues in the evolution of wild Oryza species. First, cp-based trees clearly classify each lineage but can be biased by inter-subspecies cross-hybridization events during speciation. Second, O. glumaepatula, a South American wild rice, includes two cytoplasm types, one of which is derived from a recent interspecies hybridization with O. longistminata. Third, the Australian O. rufipogan-type rice is a perennial form of O. meridionalis.Entities:
Mesh:
Year: 2015 PMID: 26506948 PMCID: PMC4623524 DOI: 10.1038/srep15655
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Characterization of the 30 longest contigs derived from de novo genome assembly using 1x and 0.05x haploid genome equivalents of rice and ginseng, respectively.
(a) Classification based on best hit (Supplementary Tables S1 and S2 online). Number of contigs and percent coverage of cp, nR, mt and other sequences are presented above the bars. (b,c) Alignment of five and three contigs covering the complete cp genome sequences of rice (b) and ginseng (c), respectively. The contig numbers are indicated under the contigs and hit positions in parentheses are under the reference cp genome sequences for rice (GU592207) and ginseng (NC_006290). Sequence errors identified in the initial contigs are denoted by arrows. The overall structure of the cp genome is denoted with different colored bars: green, blue, and yellow, for LSC, IRs, and SSC, respectively. Mapping of 100x raw reads is shown above alignment. (d) Read mapping of 2x-depth 10-kb mate-pair reads on the assembled sequence. Purple and orange mate information indicates the proper range for 10-kb mate pairs.
Figure 2Assembly of complete 45S units.
(a–c) Schematic diagram of the method used to obtain a complete 45S unit. (a) A draft single contig included the 45S transcription unit and occasionally part of the IGS. In this example, Ctg_173 assembled using a rice dataset contained a partial IGS. (b) To obtain the full-length IGS, a hypothetical tandem array was constructed using two copies of the contig and intervening Ns. Through a gap-closing process, the Ns were filled in by nucleotide sequences originating from IGS regions. (c) If the IGS remains partial, adjustment of the intervening N length and repeated gap-closing will be necessary. Ultimately, a complete 45S unit with the full-length IGS can be obtained. (d) Structure of the complete 45S unit of Oryza species. (e) Status of read mapping on the assembled 45S units. The Os5 dataset was mapped again to assembled single contigs covering the entire 45S unit sequence (black line). Red line indicates GC content per 100-bp unit length.
Statistics for assembly of cp and nR sequences from 30 Oryza species.
| Species | Genome size (Mbp) | WGS reads for cp assembly | Complete sequence (bp) | Estimated copy number | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Amount (Mbp) | Coverage (x) | |||||||||
| Genome | Cp | Cp | 45Sa | 5S | 45S | 5S | ||||
| 1 | 430 | 860 (SRR1178954 | 2.0` | 99 | 134,551 (KM088016 | 7,928 (KM036282 | 324 (KM036298 | 390 | 593 | |
| 2 | 430 | 303 (SRR1182447) | 0.7 | 227 | 134,551 (KM088017) | 7,929 (KM036283) | 324 (KM036299) | 186 | 458 | |
| 3 | 460 | 303 (SRR1182807) | 0.7 | 147 | 134,551 (KM103382) | 8,160 (KM036287) | 322 (KM036303) | 131 | 971 | |
| 4 | 460 | 253 (SRR1182443) | 0.5 | 69 | 134,502 (KM103369) | 8,160 (KM036286) | 322 (KM036302) | 255 | 1,593 | |
| 5 | 460 | 327 (SRR921498) | 0.7 | 251 | 134,502 (KM103367) | 8,166 (KM036284) | 322 (KM036300) | 140 | 746 | |
| 6 | 460 | 303 (SRR921505) | 0.7 | 206 | 134,502 (KM103368) | 8,164 (KM036285) | 322 (KM036301) | 210 | 894 | |
| 7 | 439 | 500 (ERX096841) | 1.1 | 128 | 134,510 (KM103372) | 8,004 (KM117266) | 303 (KM036304) | 275 | 495 | |
| 8 | 380 | 1,442 (SRX480817) | 3.8 | 245 | 134,586 (Dataset S1) | 6,090 | 322 (Dataset S3) | 793 | 2,718 | |
| 9 | 380 | 712 (SRX480820) | 1.9 | 685 | 134,572 (Dataset S1) | 5,816 | 499 (Dataset S3) | 310 | 354 | |
| 10 | 395 | 571 (SRX809784) | 1.5 | 203 | 134,483 (Dataset S1) | 5,823 | 322 (Dataset S3) | 439 | 5,561 | |
| 11 | 448 | 775 (SRR1264534) | 1.4 | 112 | 134,516 (KM088022) | 7,904 (KM036288) | 322 (KM036305) | 441 | 1,129 | |
| 12 | 411 | 159 (SRX502175) | 0.4 | 91 | 134,585 (KM103378) | 7,835 (KM117256) | 325 (KM117247) | 667 | 4,418 | |
| 13 | 411 | 437 (SRX502171) | 1.1 | 257 | 134,678 (KM103379) | 7,845 (KM117257) | 325 (KM117248) | 748 | 2,380 | |
| 14 | 411 | 441 (SRX502173) | 1.1 | 319 | 134,678 (KM103380) | 7,845 (KM117258) | 325 (KM117249) | 771 | 1,291 | |
| 15 | 411 | 163 (SRX502178) | 0.4 | 97 | 134,613 (KM103381) | 7,836 (KM117252) | 325 (KM117250) | 368 | 1,989 | |
| 16 | 376 | 1,113 (SRX809864) | 3.0 | 1,277 | 134,598 (Dataset S1) | 5,888 | 325 (Dataset S3) | 3,769 | 5,984 | |
| 17 | 411 | 343 (SRX502164) | 0.8 | 113 | 134,598 (KM103371) | 7,836 (KM117253) | 325 (KM117251) | 630 | 4,414 | |
| 18 | 411 | 375 (SRR1264535) | 0.9 | 60 | 134,590 (KM088023) | 7,836 (KM036290) | 325 (KM036307) | 617 | 6,045 | |
| 19 | 357 | 330 (SRX502311) | 0.9 | 343 | 134,586 (KM103377) | 7,836 (KM117254) | 325 (KM117246) | 198 | 1,820 | |
| 20 | 357 | 188 (SRR1181643) | 0.5 | 50 | 134,598 (KM088021) | 7,836 (KM036289) | 325 (KM036306) | 455 | 3,670 | |
| 21 | 370 | 647 (SRX809780) | 1.7 | 218 | 134,614 (Dataset S1) | 5,899 | 325 (Dataset S3) | 773 | 6,447 | |
| 22 | 357 | 268 (SRX502309) | 0.8 | 257 | 134,598 (KM103370) | 7,836 (KM117255) | 325 (KM117245) | 358 | 2,374 | |
| 23 | 400 | 1,536 (SRR1997915) | 3.9 | 1,338 | 134,606 (KR364802) | 6,427 | 440 (KR364807) | 1,461 | 528 | |
| 24 | 366 | 397 (SRX809892) | 1.1 | 91 | 134,296 (Dataset S1) | 5,841 | 440 (Dataset S3) | 554 | 278 | |
| 25 | 400 | 1,728 (SRR1997912) | 4.3 | 1,446 | 134,575 (KR364803) | 5,830 | 440 (KR364806) | 2,292 | 1,107 | |
| 26 | 464 | 253 (SRR1264537) | 0.5 | 78 | 134,575 (KM103374) | 8,074 (KM036292) | 460 (KM036309) | 598 | 191 | |
| 27 | 388 | 760 (SRX809898) | 2.0 | 270 | 134,555 (Dataset S1) | 5,839 | 499 (Dataset S3) | 1,367 | 731 | |
| 28 | 435 | 1,000 (SRR1264536) | 2.3 | 109 | 134,556 (KM103373) | 8,190 (KM036291) | 499 (KM036308) | 461 | 2,525 | |
| 29 | 352 | 563 (SRR1264538) | 1.6 | 187 | 134,558 (KM088024) | 7,844 (KM036293) | 302 (KM036310) | 200 | 69 | |
| 30 | 423 | 250 (SRR1264539) | 0.6 | 60 | 134,604 (KM103375) | 7,745 (KM036294) | 326 (KM036311) | 307 | 1,317 | |
a The lengths of the most redundant and longest representative nR units are given for each species. The 45S transcription units were 5,769–5,783 bp long for Oryza species. We cannot rule out the presence of other nR units in each species because there is some variance in the length of the IGS.
bCopy numbers of 45S and 5S are based on the average depth of raw reads mapping to each sequence and were calculated based on the 1x haploid genome equivalent of raw reads.
cSRA and accession numbers of reads and assembled sequences deposited in GenBank.
dLength of 45S transcription units.
Figure 3Phylogenomic tree of cultivated rice in Asia and Africa with their putative ancestor species.
(a,b) Phylogenetic trees were built based on the complete cp genome (a) and 45S cistron sequences (b). O. sativa ssp. japonica and indica groups are denoted as J and I, respectively. Two cultivars, M23 (no. 3, red thick line) and Tongil (no. 4, blue thick line), derived from japonica x indica hybridization and vice versa are denoted as JxI and IxJ, respectively. Different species/subspecies are indicated with different colored labels. Lines connect the positions of each accession/cultivar in the two trees. Numbers in colored circles represent accessions labeled in Table 1. The phylogenetic tree was generated using Poisson correction and the neighbor-joining (NJ) method in MEGA6. Bootstrap values calculated for 1000 replicates are shown on the branches; the values less than 50% are not shown. (c,d) Pedigree of two cultivars, M23 (c) and Tongil (d), bred by crossing between O. sativa ssp japonica and indica38. Red and blue thick lines indicate final maternal genotype backgrounds for M23 and Tongil, respectively.
Figure 4Phylogenomic tree of Oryza species.
(a,b) Phylogenetic trees were built based on the complete cp genome (a) and 45S cistron sequences (b). O. sativa ssp. japonica and indica groups are denoted as J and I, respectively. Different species/subspecies are indicated with different colored labels. Numbers in colored circles represent accessions labeled in Table 1. Dashed lines connect the positions of each accession/cultivar in the two trees; red highlights major differences between trees. The tree was generated based on Bayesian Inference analysis using BEAST (version 1.8.1) as mentioned in Materials and Methods. Posterior probability (pp) above 0.5 are shown on the branches. Divergence time was calculated based on 9 million years ago (MYA) when Oryza species with AA and BB genome were estimated to be speciation29.