Literature DB >> 12093379

Insights into cereal genomes from two draft genome sequences of rice.

Abstract

Draft genome sequences have been reported for two subspecies of rice. The drafts include the sequences of an estimated 99% of all rice genes and provide major advances in our understanding of the content and complexity of cereal genomes in general and the rice genome in particular.

Entities: Species

Mesh：

Year: 2002 PMID： 12093379 PMCID： PMC139371 DOI： 10.1186/gb-2002-3-6-reviews1015

Source DB: PubMed Journal: Genome Biol ISSN： 1474-7596 Impact factor: 13.583

A third of the human population depends on rice as a staple food [1]. Rice is also an important model for other cereal crops that, along with rice, account for more than 60% of worldwide agricultural production [2]. Its small genome and long history of genetic studies led to rice being an early target for complete genome sequencing. Draft sequence has recently been reported for two subspecies of rice: japonica [3], which is widely grown in Japan, and indica [4], which is widely grown in China and elsewhere. These sequence data can now be analyzed and compared with the published genome sequence of the widely adopted model species for dicotyledonous flowering plants, Arabidopsis thaliana [5].

Gene content of rice

The rice draft genomic sequences were generated by whole-genome shotgun (WGS) sequencing, largely starting from plasmid clones of nuclear DNA. This strategy contrasts with that employed by the International Rice Gene Sequencing Project (IRGSP) [6], which produces high-quality sequence from large-insert clones that are individually selected from a complete physical map.WGS sequencing rapidly and cost-effectively generates sequence from nearly all of the genes in the genome, but the lengths of contiguous sequence are shorter, errors are more numerous and integration with genetic maps is poorer. The japonica WGS sequencing [3] was conducted to 6X redundancy - that is, each base was sequenced an average of six times. Repeated sequences (which are numerous in the rice genome) were removed, permitting the assembly of 42,109 contiguous sequences, or contigs, representing 390 megabases of total sequence (thus, the mean length of contigs was 9.2 kb). The indica WGS sequencing [4] was conducted to 4X redundancy. In this case, repeated sequences were masked, permitting the assembly of 127,550 sequence contigs, representing 361 Mb total length (the mean length of contigs was thus 3.5 kb). The total size of the indica genome was estimated as 466 Mb [4] and that of japonica should be very similar. On the basis of comparisons with individually sequenced rice genes, finished sequence from the IRGSP, and other rice sequences, the japonica and indica sequences provide 99% and 92% coverage of genes, respectively [3,4]. The sequence probably contains relatively few mis-assemblies; this was tested for the indica data by using cDNA sequences to screen for artificial exon-sized rearrangements, and putative mis-assemblies were identified in only 1.1% of the genes tested [4]. Thus, although highly fragmented, the draft sequence generated by WGS allows us to gain early insights into many characteristics of the rice genome. Automated gene prediction within the rice genome sequence is complicated by gradients in patterns of the usage of codons and amino acids. This phenomenon was analyzed in the indica sequence [4], where the 5' ends of genes were found to have a higher GC content than the 3' ends by up to 25%. This difference extended approximately a kilobase from the 5' ends of genes. These gradients are not observed in A. thaliana, so assessing total gene numbers in rice is more difficult than in A. thaliana, and the numbers reported should be considered very much as approximations. Using various measures of estimation of true gene numbers, the range for japonica was reported as being 32,000-50,000 [3] and for indica as 46,000-56,000 [4]. Both estimates are considerably more than the 25,500 or so genes predicted in the A. thaliana genome. They are also greater than for Caenorhabditis elegans and Drosophila melanogaster (which have around 19,100 and 13,500 genes, respectively) [7,8] and, probably, than for human (which has perhaps 38,000) [9]. The relatively larger number of genes encoded by plant genomes is likely to be a consequence of the frequency with which polyploidy occurs and the apparent selective advantage of polyploidy.

The functions of rice genes

The proteins encoded by the predicted genes in both the indica and japonica sequences were analyzed and classified using the InterPro [10,11] and Gene Ontology (GO) [12,13] databases. The results are not directly comparable with the previously published classification of A. thaliana proteins [5], which used InterPro and PENDANT [14,15]. But when 25,426 A. thaliana genes were classified using GO, along with 53,398 predicted complete indica genes, allowing direct comparison of their functional classifications, the proportions of genes in each functional category were found to be almost identical between the two plants [4]. The major difference, as shown in Figure 1, is that a larger proportion of rice genes remain unclassified. The classified predicted proteins (10,893 from indica and 9,230 for A. thaliana) are very likely to represent the products of genuine genes, but what about the genes predicted to encode unclassifiable proteins? For the predicted A. thaliana proteins, 80-85% (around 21,000 genes) have homologous predicted genes in the rice indica and japonica sequences [3,4]. This includes about 8,000 proteins predicted to be in rice but not in D. melanogaster, C. elegans, Saccharomyces cerevisiae or sequenced bacterial genomes, so these probably represent the plant-specific set of genes. The approximately 4,000 predicted proteins of A. thaliana that did not have homologies in rice are either artefacts or are unique to dicotyledonous plants. When the predicted rice gene sequences are compared with the A. thaliana genome, less than half (about 30,000) have significant homologs. This represents the minimum number of genes in rice, therefore, although it should be recognized that they are not necessarily all functional. The predicted rice proteins with no homologs in A. thaliana are either artefacts of the automated annotation (probably most of them) or are unique to monocotyledonous plants. Although the number of genes in rice is relatively large, the number of distinct gene families that appear to be present (15,000 reported for japonica [3]) is similar to the number for A. thaliana, D. melanogaster and C. elegans (13,382, 10,736 and 14,177, respectively [5]). The increased number of genes per family in plants is clearly a consequence of gene duplication, but it is not necessarily the case that the functions of members of a family are redundant.

Figure 1

Comparison of the functional classifications of predicted rice and A. thaliana proteins (see text for further details; the figure is redrawn from [3,4]).

Comparative analyses

Analysis of the japonica sequence has revealed extensive gene duplication in rice [3]. The sequences of more than 2,000 mapped rice cDNA markers were used to identify homologs in the genome sequence. Using a threshold value of 80% identity over 100 base-pairs, the mean number of homologous loci per cDNA was over 1.94. Evidence of extensive duplication of genomic segments was also detected. Most of the segments identified were very small (four markers or fewer), suggesting that extensive recombination and/or rearrangement has occurred since the duplication event(s). Dating of the duplications suggested that whole-genome duplication occurred 40 to 50 million years ago. The genome of A. thaliana also appears to be the result of whole genome duplication - a tetraploidy event - in its ancestry, followed by extensive rearrangement [16]. This underlines the importance of polyploidy in plant evolution and as a key feature of plant genomes to be recognized when planning comparative analyses. There is extensive conservation in the japonica genome sequence [3] of genes previously identified in other cereals. The alignment of cereal genomes using sequence-based markers from cereals other than rice showed extensive conservation of gene order (conserved synteny), confirming previous results [17]. But can the draft rice sequence be used to assess the extent of conservation of synteny between the rice and A. thaliana genomes? Although the fragmented nature of WGS-derived sequence data makes such data poorly suited to the comparative analysis of distantly related genomes, conservation of genome structure was identified between japonica and A. thaliana [3]. At least 137 blocks of conserved synteny (defined as three or more A. thaliana genes from the same chromosome mapping to one rice bacterial artificial chromosome, or BAC, contig) were identified. This finding supports the general applicability of previous data from a specific region of the rice genome for which conserved genome microstructure could be identified in A. thaliana [18]. Several of the syntenic blocks in japonica aligned to multiple regions of the A. thaliana genome, providing further confirmation for observations that the A. thaliana genome is the result of multiple rounds of duplication [18,19]. More extensive analyses were reported [3] for adjacent pairs of genes present in A. thaliana, for which the positions in the japonica sequence were assessed. This analysis showed that many pairs of genes show synteny, but most pairs had intervening genes in rice. This pattern of interspersion of conserved and non-conserved genes confirms previous similar findings in comparisons of the microstructure of the genomes of rice and A. thaliana [18,20] and provides support for the generality of this feature of plant genome microstructure [21].

Towards a high-quality rice genome sequence

The published draft rice genome sequences contain at least parts of almost all rice genes. The annotation of those genes is presently poor, and their number could be anywhere in the range 30,000 to 60,000. Future rice genome sequencing plans include the integration of the draft sequence data recently reported [3,4] with data being generated by the IRGSP [6], which already incorporates sequence data generated by Monsanto [22]. This will allow the most rapid progress towards the completion of fully contiguous, high quality genome sequence, which will be needed to underpin comparative genomics and gene discovery in cereals. Verification and functional analysis of the predicted genes will be important if they are to fulfil their potential for aiding the development of our understanding of plant science and the advancement of agricultural production worldwide.

18 in total

1. InterPro--an integrated documentation resource for protein families, domains and functional sites.

Authors: R Apweiler; T K Attwood; A Bairoch; A Bateman; E Birney; M Biswas; P Bucher; L Cerutti; F Corpet; M D Croning; R Durbin; L Falquet; W Fleischmann; J Gouzy; H Hermjakob; N Hulo; I Jonassen; D Kahn; A Kanapin; Y Karavidopoulou; R Lopez; B Marx; N J Mulder; T M Oinn; M Pagni; F Servant; C J Sigrist; E M Zdobnov
Journal: Bioinformatics Date: 2000-12 Impact factor: 6.937

Review 2. Annotating eukaryote genomes.

Authors: S Lewis; M Ashburner; M G Reese
Journal: Curr Opin Struct Biol Date: 2000-06 Impact factor: 6.809

3. The use of the Monsanto draft rice genome sequence in research.

Authors: G F Barry
Journal: Plant Physiol Date: 2001-03 Impact factor: 8.340

4. Conservation of microstructure between a sequenced region of the genome of rice and multiple segments of the genome of Arabidopsis thaliana.

Authors: K Mayer; G Murphy; R Tarchini; R Wambutt; G Volckaert; T Pohl; A Düsterhöft; W Stiekema; K D Entian; N Terryn; K Lemcke; D Haase; C R Hall; A M van Dodeweerd; S V Tingey; H W Mewes; M W Bevan; I Bancroft
Journal: Genome Res Date: 2001-07 Impact factor: 9.043

5. Comparative genetics in the grasses.

Authors: M D Gale; K M Devos
Journal: Proc Natl Acad Sci U S A Date: 1998-03-03 Impact factor: 11.205

Review 6. International Rice Genome Sequencing Project: the effort to completely sequence the rice genome.

Authors: T Sasaki; B Burr
Journal: Curr Opin Plant Biol Date: 2000-04 Impact factor: 7.834

7. A draft sequence of the rice genome (Oryza sativa L. ssp. indica).

Authors: Jun Yu; Songnian Hu; Jun Wang; Gane Ka-Shu Wong; Songgang Li; Bin Liu; Yajun Deng; Li Dai; Yan Zhou; Xiuqing Zhang; Mengliang Cao; Jing Liu; Jiandong Sun; Jiabin Tang; Yanjiong Chen; Xiaobing Huang; Wei Lin; Chen Ye; Wei Tong; Lijuan Cong; Jianing Geng; Yujun Han; Lin Li; Wei Li; Guangqiang Hu; Xiangang Huang; Wenjie Li; Jian Li; Zhanwei Liu; Long Li; Jianping Liu; Qiuhui Qi; Jinsong Liu; Li Li; Tao Li; Xuegang Wang; Hong Lu; Tingting Wu; Miao Zhu; Peixiang Ni; Hua Han; Wei Dong; Xiaoyu Ren; Xiaoli Feng; Peng Cui; Xianran Li; Hao Wang; Xin Xu; Wenxue Zhai; Zhao Xu; Jinsong Zhang; Sijie He; Jianguo Zhang; Jichen Xu; Kunlin Zhang; Xianwu Zheng; Jianhai Dong; Wanyong Zeng; Lin Tao; Jia Ye; Jun Tan; Xide Ren; Xuewei Chen; Jun He; Daofeng Liu; Wei Tian; Chaoguang Tian; Hongai Xia; Qiyu Bao; Gang Li; Hui Gao; Ting Cao; Juan Wang; Wenming Zhao; Ping Li; Wei Chen; Xudong Wang; Yong Zhang; Jianfei Hu; Jing Wang; Song Liu; Jian Yang; Guangyu Zhang; Yuqing Xiong; Zhijie Li; Long Mao; Chengshu Zhou; Zhen Zhu; Runsheng Chen; Bailin Hao; Weimou Zheng; Shouyi Chen; Wei Guo; Guojie Li; Siqi Liu; Ming Tao; Jian Wang; Lihuang Zhu; Longping Yuan; Huanming Yang
Journal: Science Date: 2002-04-05 Impact factor: 47.728

8. A draft sequence of the rice genome (Oryza sativa L. ssp. japonica).

Authors: Stephen A Goff; Darrell Ricke; Tien-Hung Lan; Gernot Presting; Ronglin Wang; Molly Dunn; Jane Glazebrook; Allen Sessions; Paul Oeller; Hemant Varma; David Hadley; Don Hutchison; Chris Martin; Fumiaki Katagiri; B Markus Lange; Todd Moughamer; Yu Xia; Paul Budworth; Jingping Zhong; Trini Miguel; Uta Paszkowski; Shiping Zhang; Michelle Colbert; Wei-lin Sun; Lili Chen; Bret Cooper; Sylvia Park; Todd Charles Wood; Long Mao; Peter Quail; Rod Wing; Ralph Dean; Yeisoo Yu; Andrey Zharkikh; Richard Shen; Sudhir Sahasrabudhe; Alun Thomas; Rob Cannings; Alexander Gutin; Dmitry Pruss; Julia Reid; Sean Tavtigian; Jeff Mitchell; Glenn Eldredge; Terri Scholl; Rose Mary Miller; Satish Bhatnagar; Nils Adey; Todd Rubano; Nadeem Tusneem; Rosann Robinson; Jane Feldhaus; Teresita Macalma; Arnold Oliphant; Steven Briggs
Journal: Science Date: 2002-04-05 Impact factor: 47.728

Review 9. Genome sequence of the nematode C. elegans: a platform for investigating biology.

Authors:
Journal: Science Date: 1998-12-11 Impact factor: 47.728

10. The sequence of the human genome.

Authors: J C Venter; M D Adams; E W Myers; P W Li; R J Mural; G G Sutton; H O Smith; M Yandell; C A Evans; R A Holt; J D Gocayne; P Amanatides; R M Ballew; D H Huson; J R Wortman; Q Zhang; C D Kodira; X H Zheng; L Chen; M Skupski; G Subramanian; P D Thomas; J Zhang; G L Gabor Miklos; C Nelson; S Broder; A G Clark; J Nadeau; V A McKusick; N Zinder; A J Levine; R J Roberts; M Simon; C Slayman; M Hunkapiller; R Bolanos; A Delcher; I Dew; D Fasulo; M Flanigan; L Florea; A Halpern; S Hannenhalli; S Kravitz; S Levy; C Mobarry; K Reinert; K Remington; J Abu-Threideh; E Beasley; K Biddick; V Bonazzi; R Brandon; M Cargill; I Chandramouliswaran; R Charlab; K Chaturvedi; Z Deng; V Di Francesco; P Dunn; K Eilbeck; C Evangelista; A E Gabrielian; W Gan; W Ge; F Gong; Z Gu; P Guan; T J Heiman; M E Higgins; R R Ji; Z Ke; K A Ketchum; Z Lai; Y Lei; Z Li; J Li; Y Liang; X Lin; F Lu; G V Merkulov; N Milshina; H M Moore; A K Naik; V A Narayan; B Neelam; D Nusskern; D B Rusch; S Salzberg; W Shao; B Shue; J Sun; Z Wang; A Wang; X Wang; J Wang; M Wei; R Wides; C Xiao; C Yan; A Yao; J Ye; M Zhan; W Zhang; H Zhang; Q Zhao; L Zheng; F Zhong; W Zhong; S Zhu; S Zhao; D Gilbert; S Baumhueter; G Spier; C Carter; A Cravchik; T Woodage; F Ali; H An; A Awe; D Baldwin; H Baden; M Barnstead; I Barrow; K Beeson; D Busam; A Carver; A Center; M L Cheng; L Curry; S Danaher; L Davenport; R Desilets; S Dietz; K Dodson; L Doup; S Ferriera; N Garg; A Gluecksmann; B Hart; J Haynes; C Haynes; C Heiner; S Hladun; D Hostin; J Houck; T Howland; C Ibegwam; J Johnson; F Kalush; L Kline; S Koduru; A Love; F Mann; D May; S McCawley; T McIntosh; I McMullen; M Moy; L Moy; B Murphy; K Nelson; C Pfannkoch; E Pratts; V Puri; H Qureshi; M Reardon; R Rodriguez; Y H Rogers; D Romblad; B Ruhfel; R Scott; C Sitter; M Smallwood; E Stewart; R Strong; E Suh; R Thomas; N N Tint; S Tse; C Vech; G Wang; J Wetter; S Williams; M Williams; S Windsor; E Winn-Deen; K Wolfe; J Zaveri; K Zaveri; J F Abril; R Guigó; M J Campbell; K V Sjolander; B Karlak; A Kejariwal; H Mi; B Lazareva; T Hatton; A Narechania; K Diemer; A Muruganujan; N Guo; S Sato; V Bafna; S Istrail; R Lippert; R Schwartz; B Walenz; S Yooseph; D Allen; A Basu; J Baxendale; L Blick; M Caminha; J Carnes-Stine; P Caulk; Y H Chiang; M Coyne; C Dahlke; A Deslattes Mays; M Dombroski; M Donnelly; D Ely; S Esparham; C Fosler; H Gire; S Glanowski; K Glasser; A Glodek; M Gorokhov; K Graham; B Gropman; M Harris; J Heil; S Henderson; J Hoover; D Jennings; C Jordan; J Jordan; J Kasha; L Kagan; C Kraft; A Levitsky; M Lewis; X Liu; J Lopez; D Ma; W Majoros; J McDaniel; S Murphy; M Newman; T Nguyen; N Nguyen; M Nodell; S Pan; J Peck; M Peterson; W Rowe; R Sanders; J Scott; M Simpson; T Smith; A Sprague; T Stockwell; R Turner; E Venter; M Wang; M Wen; D Wu; M Wu; A Xia; A Zandieh; X Zhu
Journal: Science Date: 2001-02-16 Impact factor: 47.728

4 in total