| Literature DB >> 21323774 |
Hiroaki Sakai1, Hiroshi Ikawa, Tsuyoshi Tanaka, Hisataka Numa, Hiroshi Minami, Masaki Fujisawa, Michie Shibata, Kanako Kurita, Ari Kikuta, Masao Hamada, Hiroyuki Kanamori, Nobukazu Namiki, Jianzhong Wu, Takeshi Itoh, Takashi Matsumoto, Takuji Sasaki.
Abstract
Here we present the genomic sequence of the African cultivated rice, Oryza glaberrima, and compare these data with the genome sequence of Asian cultivated rice, Oryza sativa. We obtained gene-enriched sequences of O. glaberrima that correspond to about 25% of the gene regions of the O. sativa (japonica) genome by methylation filtration and subtractive hybridization of repetitive sequences. While patterns of amino acid changes did not differ between the two species in terms of the biochemical properties, genes of O. glaberrima generally showed a larger synonymous-nonsynonymous substitution ratio, suggesting that O. glaberrima has undergone a genome-wide relaxation of purifying selection. We further investigated nucleotide substitutions around splice sites and found that eight genes of O. sativa experienced changes at splice sites after the divergence from O. glaberrima. These changes produced novel introns that partially truncated functional domains, suggesting that these newly emerged introns affect gene function. We also identified 2451 simple sequence repeats (SSRs) from the genomes of O. glaberrima and O. sativa. Although tri-nucleotide repeats were most common among the SSRs and were overrepresented in the protein-coding sequences, we found that selection against indels of tri-nucleotide repeats was relatively weak in both African and Asian rice. Our genome-wide sequencing of O. glaberrima and in-depth analyses provide rice researchers not only with useful genomic resources for future breeding but also with new insights into the genomic evolution of the African and Asian rice species.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21323774 PMCID: PMC3568898 DOI: 10.1111/j.1365-313X.2011.04539.x
Source DB: PubMed Journal: Plant J ISSN: 0960-7412 Impact factor: 6.417
Statistics of the Oryza glaberrima genome sequence
| Total | Methylation filtration | Subtractive hybridization | |
|---|---|---|---|
| Total number of sequences | 437 642 | 219 203 | 218 439 |
| Total length of sequences (bp) | 206 317 153 | 107 419 702 | 98 897 451 |
| Fraction of repetitive elements (%) | 22.5 | 10.4 | 35.0 |
| Number of filtered sequences | 368 537 | 205 601 | 162 936 |
| Total length of filtered sequences (bp) | 175 584 862 | 101 381 318 | 74 203 544 |
| Number of mapped sequences | 256 801 | 156 909 | 99 892 |
| Coverage of the | 69 083 576 | 45 795 595 | 30 226 496 |
Oj, Oryza sativa japonica.
Figure 1Compositions (bp) of sequence types (a) in the total Oryza sativa japonica (Osj) genome and (b) in the Osj genomic regions covered by Oryza glaberrima sequences.
Statistics of mapped and unmapped reads
| No. of mapped sequences | 256 801 |
| No. of mapped contigs | 148 435 |
| N50 of the mapped contigs (bp) | 603 |
| No. of unmapped sequences | 111 736 |
| No. of unmapped contigs | 4598 |
| No. of unmapped singlets | 20 461 |
| Total length of contigs and singlets (bp) | 12 728 728 |
| Average length of the unmapped contigs (bp) | 766 |
| Average length of the unmapped singlets (bp) | 450 |
Figure 2Functional classification of Oryza sativa japonica (Osj) and Oryza glaberrima (Og) proteins. The classifications of mapped and unmapped sequences of Og were derived from the Swiss-Prot database proteins that were homologous to the mapped and unmapped sequences (see Experimental procedures). Protein categories are based on the molecular functions of the Gene Ontology hierarchy.
The 10 most frequent domains among the unmapped sequences of Oryza glaberrima
| Mapped | Unmapped | |||||
|---|---|---|---|---|---|---|
| InterPro ID | Description | No. of genes with the domain | No. of genes without the domain | No. of genes with the domain | No. of genes without the domain | |
| IPR000719 | Protein kinase, catalytic domain | 377 | 3364 | 85 | 446 | 9.14 × 10−5 |
| IPR001611 | Leucine-rich repeat | 111 | 3630 | 82 | 449 | 2.20 × 10−16 |
| IPR002182 | NB-ARC | 50 | 3691 | 33 | 498 | 1.99 × 10−10 |
| IPR003591 | Leucine-rich repeat, typical subtype | 29 | 3712 | 20 | 511 | 5.01 × 10−7 |
| IPR008271 | Serine/threonine-protein kinase, active site | 322 | 3419 | 70 | 461 | 1.13 × 10−3 |
| IPR011009 | Protein kinase-like domain | 387 | 3354 | 91 | 440 | 1.10 × 10−5 |
| IPR013210 | Leucine-rich repeat-containing N-terminal domain, type 2 | 52 | 3689 | 40 | 491 | 5.73 × 10−14 |
| IPR016040 | NAD(P)-binding domain | 188 | 3553 | 32 | 499 | 3.4 × 10−1 |
| IPR017441 | Protein kinase, ATP binding site | 343 | 3398 | 78 | 453 | 1.52 × 10−4 |
| IPR017442 | Serine/threonine-protein kinase-like domain | 355 | 3386 | 81 | 450 | 9.83 × 10−5 |
Figure 3Distribution of the number of nucleotide substitutions. Distribution of the number of nucleotide substitutions between (a) Oryza glaberrima (Og) and Oryza sativa japonica (Osj) on the 12 chromosomes, (b) Og and Osj on chromosome 8, and (c) Osj and Oryza sativa indica (Osi) on chromosome 8. Nucleotide substitutions were counted in 10-kb windows with 10-kb steps along the chromosomes. Dashed lines show approximate positions of the centromeres.
Number of introns with lineage-specific donor and/or acceptor site changes
| No. of introns | 13 (6:7) | 52 (25:27) | 153 (48:105) |
| No. of introns between protein-coding exons | 7 (6) | 33 (27) | 109 (107) |
| No. of introns without stop codons | 2 (1) | 7 (4) | 22 (19) |
Osj, Oryza sativa japonica; Osi, Oryza sativa indica; Og, Oryza glaberrima.
Numbers of donor (left) and acceptor (right) sites are in parentheses.
Numbers of introns that have GT/AG splicing site motifs are in parentheses.
Figure 4Two examples of Oryza sativa japonica (Osj) genes that have nucleotide substitutions at splice site motifs. Red arrows on the gene indicate the positions of the substitutions. (a) AK069721 (Os10g0118000), mutation in Osj generated a new intron that disrupted the O-methyltransferase domain. (b) AK069386 (Os12g0482600), mutation in Oryza sativa (Os) generated an altered acceptor site that disrupted the DUF502 superfamily domain. Sequence alignments around the splice site motifs are shown on the right. Gray and yellow boxes are untranslated regions and protein-coding regions, respectively. Blue boxes above or below mRNAs indicate functional domains detected by NCBI BLAST searches.
Statistics of simple sequence repeats (SSRs)
| Non-transcribed regions | Protein-coding regions | ||||||
|---|---|---|---|---|---|---|---|
| Di- | Tri- | Tetra- | Di- | Tri- | Tetra- | ||
| No. of total SSRs | 258 | 496 | 187 | 3 | 777 | 7 | |
| No. of shared SSRs | 16 | 177 | 68 | 0 | 358 | 4 | |
| No. of polymorphic SSRs | 242 | 319 | 119 | 3 | 419 | 3 | |
| No. of SSRs with lineage-specific length polymorphisms | 3 | 8 | 11 | 0 | 22 | 0 | |
| 26 | 52 | 20 | 0 | 54 | 0 | ||
| 74 | 139 | 46 | 1 | 175 | 1 | ||
| No. of SSRs with the same length but lineage-specific polymorphisms | 5 | 9 | 3 | 0 | 23 | 0 | |
| 5 | 11 | 6 | 0 | 33 | 1 | ||
| 9 | 37 | 17 | 0 | 51 | 0 | ||
Osj, Oryza sativa japonica; Osi, Oryza sativa indica; Og, Oryza glaberrima.
Numbers of SSRs that are identical among the three genomes.
Numbers of SSRs whose length was different in one of the three genomes and same in the other two.
Numbers of SSRs whose length was identical among the three genomes but sequence was different in one of the three genomes.