Literature DB >> 25274737

RiceVarMap: a comprehensive database of rice genomic variations.

Hu Zhao1, Wen Yao1, Yidan Ouyang1, Wanneng Yang1, Gongwei Wang1, Xingming Lian1, Yongzhong Xing1, Lingling Chen1, Weibo Xie2.   

Abstract

Rice Variation Map (RiceVarMap, http:/ricevarmap.ncpgr.cn) is a database of rice genomic variations. The database provides comprehensive information of 6,551,358 single nucleotide polymorphisms (SNPs) and 1,214,627 insertions/deletions (INDELs) identified from sequencing data of 1479 rice accessions. The SNP genotypes of all accessions were imputed and evaluated, resulting in an overall missing data rate of 0.42% and an estimated accuracy greater than 99%. The SNP/INDEL genotypes of all accessions are available for online query and download. Users can search SNPs/INDELs by identifiers of the SNPs/INDELs, genomic regions, gene identifiers and keywords of gene annotation. Allele frequencies within various subpopulations and the effects of the variation that may alter the protein sequence of a gene are also listed for each SNP/INDEL. The database also provides geographical details and phenotype images for various rice accessions. In particular, the database provides tools to construct haplotype networks and design PCR-primers by taking into account surrounding known genomic variations. These data and tools are highly useful for exploring genetic variations and evolution studies of rice and other species.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2014        PMID: 25274737      PMCID: PMC4384008          DOI: 10.1093/nar/gku894

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Single nucleotide polymorphisms (SNPs) and insertions/deletions (INDELs) are two major types of genomic variations in living beings and have been widely utilized in basic research and industry. Researchers explored genetic variations as markers to study the genetic basis of complex traits via linkage and association mapping (1–3). High-density markers increase the probability that some markers are just located in or nearby the target genes, which would provide great advantages for marker-assisted selection in breeding (4). And in some cases, sufficient information of genetic variations may result in identifying the causal genes directly. In addition, when genotype information of individuals in a population is available, we can carry out various population genetic studies, such as estimating linkage disequilibrium blocks, constructing haplotype networks (5), inferring population history (6) and identifying natural or artificial selection signatures (7). Rice (Oryza sativa L.) is both a major staple crop which feeds nearly half of the population in the world and a model species in the plant research community. Although the draft genome sequences of both indica and japonica subspecies of rice were released in 2002 (8,9) and the genome sequences of japonica subspecies was finished in 2005 (10), the available databases of genomic variations in rice are still very limited for bench researchers. Most of the SNP data in rice deposited in dbSNP (11) and other databases (12–14) were identified based on comparisons of the draft genome sequences between the two rice subspecies. Recently, OryzaSNP consortium interrogated 20 diverse rice varieties using resequencing microarrays and obtained 160 000 non-redundant SNPs (15). The data can be queried at a convenient database OryzaSNP (http://oryzasnp.plantbiology.msu.edu). Huang et al. sequenced 950 rice accessions and identified 4 109 366 non-singleton SNPs (3). They further carried out genome-wide association study (GWAS) for a lot of important agronomic traits and provided a database (http://202.127.18.221/RiceHap2/) as supporting information of the corresponding publication. The HapRice database published this year provides allele frequencies of about 3300 SNPs in 253 rice accessions (16). These newly emerging databases provide progresses in some ways. However, a comprehensive database, like HapMap (17) in human research community, is highly required in rice. Such a database should include information of abundant high-quality genomic variations and detailed genotypes of a mass of rice accessions, frequency of different subpopulations and comprehensive annotation. Meanwhile, a user-friendly query interface and useful visualization tools should be available. To achieve these objectives, we collected and analysed sequence data of 1479 rice accessions, identified 6 551 358 SNPs and 1 214 627 INDELs, and constructed a comprehensive database of rice genomic variations, RiceVarMap. Compared with extant relevant databases, RiceVarMap provides not only the largest set of genomic variations and genotype data at present, but also characteristic phenotype images for various rice accessions and a set of intuitional query interface and useful tools, such as constructing haplotype networks and designing polymerase chain reaction (PCR) primers by taking into account surrounding known genomic variations. These data and tools are highly useful for exploring genetic variations and evolution studies of rice and other species. Based on these data, we also investigated natural variations in rice metabolism using GWAS approach (18) and designed two genotyping arrays for rice breeding (19,20). These data will be added to the database in the future.

DATA COLLECTION, PROCESSING AND EVALUATION

Currently, we collected sequencing data from two sets of rice germplasms comprising a total of 1479 accessions of cultivated rice (Oryza sativa L.). The first set of germplasm consists of 529 accessions selected to represent both the usefulness in rice improvement and the genetic diversity in the cultivated species (21,22). We sequenced the 529 accessions using the Illumina HiSeq 2000 in the form of 90-bp paired–end reads to generate high-quality sequences of more than one gigabase per accession (>2.5× per genome, total 6.7 billion reads). These raw data are available in NCBI with BioProject accession number PRJNA171289. The second set of germplasm is 950 rice accessions sequenced by Huang et al. (3) that were downloaded from the EBI European Nucleotide Archive (accession number ERP000106 and ERP000729), which consists of 4.6 billion 73-bp paired-end reads (∼1× per genome). Together these two sets of germplasms include both landraces and improved varieties from 73 countries. The two sets of sequences provide ∼2400-fold coverage of the rice genome. The detailed procedures of data analysis could be found in our published article (18) and webpages of RiceVarMap. Briefly, the two sets of raw sequences were combined together and aligned to rice reference genome (Nipponbare, MSU version 6.1) (23) using Burrows-Wheeler Aligner (BWA) (24). A total of 6 551 358 high-quality SNPs and 1 214 627 INDELs were identified using SAMtools and BCFtools (25). After obtaining raw genotype calls from BCFtools, 47.1% of genotypes were missing due to low-coverage sequencing. We then performed imputation using an in-house modified k nearest neighbour algorithm (26,27), resulting in an overall missing data rate reduced to 0.42% after imputation. A total of 6 428 770 SNPs with genotype missing data rate less than 20% were used for GWAS (with an overall missing data rate of 0.38%) (18). To estimate the accuracy of imputed genotypes, we genotyped 48 accessions using Illumina Infinium array RiceSNP50 (19). The results suggested an accuracy of 99.3% (the details of evaluation can be found in the website). Thus, the missing data rates and accuracy of the imputed genotype data set were comparable to high-coverage sequencing results. We further inferred the population structure of the 1479 accessions using ADMIXTURE (28). The accessions were accordingly classified into 809 indica, 547 japonica, 67 Aus and 56 intermediate type. The indica could be further classified into indica I which has germplasms of South China origin, indica II which contains germplasms from Southeast Asia and indica intermediate type. The japonica could also be divided into temperate japonica, tropical japonica and japonica intermediate type. The population structure of the collected rice accessions is shown in Figure 1a. The allele frequencies of each SNP in different populations by different classification were calculated and stored in RiceVarMap.
Figure 1.

Contents and functions of RiceVarMap. (a) Population structure of the 1479 rice accessions included in RiceVarMap. Labels denote the names of subpopulations and the number of accessions in the subpopulation. (b) The functions of RiceVarMap. (c) Haplotype network of S5 constructed using the tool ‘Haplotype Network Analysis’ implemented in RiceVarMap. Each circle represents a haplotype and the size is proportionate to the number of accessions with that haplotype. Branch length represents the genetic distance between two haplotypes.

Contents and functions of RiceVarMap. (a) Population structure of the 1479 rice accessions included in RiceVarMap. Labels denote the names of subpopulations and the number of accessions in the subpopulation. (b) The functions of RiceVarMap. (c) Haplotype network of S5 constructed using the tool ‘Haplotype Network Analysis’ implemented in RiceVarMap. Each circle represents a haplotype and the size is proportionate to the number of accessions with that haplotype. Branch length represents the genetic distance between two haplotypes.

DATABASE FEATURES

RiceVarMap is free and open to the public with comprehensive functions (Figure 1b). More detailed information is described as follows. In the database, each SNP or INDEL is labelled with a unique identifier (ID, e.g. sf0100000131, vf0136465397). The first letter of the ID indicates the polymorphic type, ‘s’ for SNP and ‘v’ for INDEL. The second letter represents the version of the reference genome, ‘f’ for the version 6.1 of Nipponbare. The number is the chromosome coordinate of a variation, e.g. sf0100000131 means a SNP at chromosome 1, 131 bp.

Search for SNPs/INDELs by region

Information of SNPs/INDELs can be queried by limiting genomic coordinates of the rice genome. Since the reference genome used by RiceVarMap is the version 6.1 of Nipponbare from MSU, it should be ensured that all coordinates correspond to this version before query. Basic Local Alignment Search Tool search in the ‘Tools’ menu can be used to obtain corrected genomic coordinates from sequence. Furthermore, SNPs can be filtered by limiting ranges of allele frequencies in different populations (maximum three simultaneous combinations).

Search for SNPs/INDELs within gene

The SNPs/INDELs may have great influence on gene functions. We provide a function for users to search SNPs/INDELs by gene identifiers, gene symbols or keywords of gene annotation, and wildcard characters are accepted. In order to better explain the functional changes caused by SNPs/INDELs, we utilized SNP effector (29) to annotate SNPs/INDELs and the ones with large-effect changes would be highlighted in the result page. Moreover, user can define upstream and downstream regions of genes and retrieve SNPs/INDELs in these regions. The acquired SNPs/INDELs would be displayed in the result page with the structure of the gene in a graph.

Search for genotypes with SNP/INDEL ID

In this interface, users can fetch the genotypes of different accessions through entering SNP/INDEL ID and selecting corresponding accessions. If the target genotype is a minor allele, it will be displayed in red in the search results. The genotypes can be downloaded in csv format.

Search for SNP/INDEL information with ID

Detailed information of single SNP/INDEL is described in this page, which includes not only basic SNP/INDEL information (e.g. position, major allele, minor allele) but also allele frequencies in different populations and information of SNP/INDEL variation effects.

Search for polymorphic positions between two cultivars

It is easy to figure out the polymorphic SNPs between two cultivars in this page, which makes it convenient to generate new molecular markers for bench work in further study.

Search for information of the cultivars

The geographical information of 1479 rice accessions is provided, and each cultivar is assigned with a cultivar ID (e.g. C001, HP84 and W001). The classification information and phenotype images of these rice accessions are available from this page. At the same time, we use Google map to help users to locate the position of each accession.

Design primer by SNP/INDEL ID

RiceVarMap also provides tools to facilitate bench works and further analysis. This tool is provided for researchers to pick PCR primers to validate SNPs/INDELs or develop molecular markers. Primers will be designed to flank the target SNP/INDEL and avoid to overlap with known SNPs/INDELs.

Design primer by region

This tool is designed for researchers to pick PCR primers to amplify target genomic regions and avoid to overlap with known SNPs/INDELs.

Haplotype network analysis

Haplotype network is frequently used in population genetic analysis. RiceVarMap provides a simple tool to generate haplotype network using modified functions from R package pegas (30). Users can download PDF format graph and the detailed haplotype information for further analysis.

A CASE OF APPLICATION

Hybrid sterility is a common phenomenon between different populations; one of the best-known examples for hybrid sterility is the case between the two rice subspecies, indica and japonica. The S5 locus has been well characterized to regulate fertility in indica-japonica hybrids, which are encoded by three tightly linked genes (31,32). We used one of the genes, LOC_Os06g11010, a ‘killer’ of gametes, to demonstrate the functions of RiceVarMap. A total of 14 SNPs were found from ‘Search for SNPs within gene’ function using the gene locus name as input. Five of them (sf0605759642, sf0605759919, sf0605760007, sf0605760352, sf0605760512) with minor allele frequencies greater than 0.05 in the whole population could be identified easily from the results and we observed that all of them were non-synonymous mutations, suggesting that this gene might undergo rapid evolution. We then copied IDs of these SNPs and generated a haplotype network using the tool ‘Haplotype Network Analysis’ (Figure 1c). There are five haplotypes found with at least 10 accessions and we found that nearly all aus accessions were of haplotype IV, which was characterized as wide-compatibility accessions that could overcome the sterility of the hybrid. Thus, accessions with this haplotype would be very useful for hybrid breeding. Finally, we could use the tool ‘Design Primer by SNP/INDEL ID’ to develop molecular markers based on these SNPs as well.

FUTURE DEVELOPMENT

We will make efforts to improve and update the database from the following three aspects. First, as more rice accessions are sequenced (6,33) and publicly available, the size of the database will be enlarged with more rice accessions, SNPs/INDELs and genotypes. And the reference genome used will be updated as well. Second, we have identified thousands of significant loci regulating metabolism (18) and constructed high-throughput phenotyping platforms (34). We are planning to add these data into RiceVarMap, making it a comprehensive database of rice genomic, metabolomic and phenomic variations. Third, we will take efforts to make the database more user-friendly and more efficient with the reflection and feedback of the first version of RiceVarMap.
  32 in total

1.  dbSNP: the NCBI database of genetic variation.

Authors:  S T Sherry; M H Ward; M Kholodov; J Baker; L Phan; E M Smigielski; K Sirotkin
Journal:  Nucleic Acids Res       Date:  2001-01-01       Impact factor: 16.971

2.  Comparative population genomics of maize domestication and improvement.

Authors:  Matthew B Hufford; Xun Xu; Joost van Heerwaarden; Tanja Pyhäjärvi; Jer-Ming Chia; Reed A Cartwright; Robert J Elshire; Jeffrey C Glaubitz; Kate E Guill; Shawn M Kaeppler; Jinsheng Lai; Peter L Morrell; Laura M Shannon; Chi Song; Nathan M Springer; Ruth A Swanson-Wagner; Peter Tiffin; Jun Wang; Gengyun Zhang; John Doebley; Michael D McMullen; Doreen Ware; Edward S Buckler; Shuang Yang; Jeffrey Ross-Ibarra
Journal:  Nat Genet       Date:  2012-06-03       Impact factor: 38.330

3.  Genome-wide association studies of 14 agronomic traits in rice landraces.

Authors:  Xuehui Huang; Xinghua Wei; Tao Sang; Qiang Zhao; Qi Feng; Yan Zhao; Canyang Li; Chuanrang Zhu; Tingting Lu; Zhiwu Zhang; Meng Li; Danlin Fan; Yunli Guo; Ahong Wang; Lu Wang; Liuwei Deng; Wenjun Li; Yiqi Lu; Qijun Weng; Kunyan Liu; Tao Huang; Taoying Zhou; Yufeng Jing; Wei Li; Zhang Lin; Edward S Buckler; Qian Qian; Qi-Fa Zhang; Jiayang Li; Bin Han
Journal:  Nat Genet       Date:  2010-10-24       Impact factor: 38.330

4.  A haplotype map of the human genome.

Authors: 
Journal:  Nature       Date:  2005-10-27       Impact factor: 49.962

5.  HapRice, an SNP haplotype database and a web tool for rice.

Authors:  Jun-ichi Yonemaru; Kaworu Ebana; Masahiro Yano
Journal:  Plant Cell Physiol       Date:  2013-12-13       Impact factor: 4.927

6.  A draft sequence of the rice genome (Oryza sativa L. ssp. japonica).

Authors:  Stephen A Goff; Darrell Ricke; Tien-Hung Lan; Gernot Presting; Ronglin Wang; Molly Dunn; Jane Glazebrook; Allen Sessions; Paul Oeller; Hemant Varma; David Hadley; Don Hutchison; Chris Martin; Fumiaki Katagiri; B Markus Lange; Todd Moughamer; Yu Xia; Paul Budworth; Jingping Zhong; Trini Miguel; Uta Paszkowski; Shiping Zhang; Michelle Colbert; Wei-lin Sun; Lili Chen; Bret Cooper; Sylvia Park; Todd Charles Wood; Long Mao; Peter Quail; Rod Wing; Ralph Dean; Yeisoo Yu; Andrey Zharkikh; Richard Shen; Sudhir Sahasrabudhe; Alun Thomas; Rob Cannings; Alexander Gutin; Dmitry Pruss; Julia Reid; Sean Tavtigian; Jeff Mitchell; Glenn Eldredge; Terri Scholl; Rose Mary Miller; Satish Bhatnagar; Nils Adey; Todd Rubano; Nadeem Tusneem; Rosann Robinson; Jane Feldhaus; Teresita Macalma; Arnold Oliphant; Steven Briggs
Journal:  Science       Date:  2002-04-05       Impact factor: 47.728

7.  Genomewide SNP variation reveals relationships among landraces and modern varieties of rice.

Authors:  Kenneth L McNally; Kevin L Childs; Regina Bohnert; Rebecca M Davidson; Keyan Zhao; Victor J Ulat; Georg Zeller; Richard M Clark; Douglas R Hoen; Thomas E Bureau; Renee Stokowski; Dennis G Ballinger; Kelly A Frazer; David R Cox; Badri Padhukasahasram; Carlos D Bustamante; Detlef Weigel; David J Mackill; Richard M Bruskiewich; Gunnar Rätsch; C Robin Buell; Hei Leung; Jan E Leach
Journal:  Proc Natl Acad Sci U S A       Date:  2009-07-13       Impact factor: 11.205

8.  A whole-genome SNP array (RICE6K) for genomic breeding in rice.

Authors:  Huihui Yu; Weibo Xie; Jing Li; Fasong Zhou; Qifa Zhang
Journal:  Plant Biotechnol J       Date:  2013-09-13       Impact factor: 9.803

Review 9.  Plant phenomics and high-throughput phenotyping: accelerating rice functional genomics using multidisciplinary technologies.

Authors:  Wanneng Yang; Lingfeng Duan; Guoxing Chen; Lizhong Xiong; Qian Liu
Journal:  Curr Opin Plant Biol       Date:  2013-04-08       Impact factor: 7.834

10.  A map of rice genome variation reveals the origin of cultivated rice.

Authors:  Xuehui Huang; Nori Kurata; Xinghua Wei; Zi-Xuan Wang; Ahong Wang; Qiang Zhao; Yan Zhao; Kunyan Liu; Hengyun Lu; Wenjun Li; Yunli Guo; Yiqi Lu; Congcong Zhou; Danlin Fan; Qijun Weng; Chuanrang Zhu; Tao Huang; Lei Zhang; Yongchun Wang; Lei Feng; Hiroyasu Furuumi; Takahiko Kubo; Toshie Miyabayashi; Xiaoping Yuan; Qun Xu; Guojun Dong; Qilin Zhan; Canyang Li; Asao Fujiyama; Atsushi Toyoda; Tingting Lu; Qi Feng; Qian Qian; Jiayang Li; Bin Han
Journal:  Nature       Date:  2012-10-03       Impact factor: 49.962

View more
  87 in total

1.  Stacking S5-n and f5-n to overcome sterility in indica-japonica hybrid rice.

Authors:  Jiaming Mi; Guangwei Li; Jianyan Huang; Huihui Yu; Fasong Zhou; Qifa Zhang; Yidan Ouyang; Tongmin Mou
Journal:  Theor Appl Genet       Date:  2015-12-24       Impact factor: 5.699

2.  Breeding signatures of rice improvement revealed by a genomic variation map from a large germplasm collection.

Authors:  Weibo Xie; Gongwei Wang; Meng Yuan; Wen Yao; Kai Lyu; Hu Zhao; Meng Yang; Pingbo Li; Xing Zhang; Jing Yuan; Quanxiu Wang; Fang Liu; Huaxia Dong; Lejing Zhang; Xinglei Li; Xiangzhou Meng; Wan Zhang; Lizhong Xiong; Yuqing He; Shiping Wang; Sibin Yu; Caiguo Xu; Jie Luo; Xianghua Li; Jinghua Xiao; Xingming Lian; Qifa Zhang
Journal:  Proc Natl Acad Sci U S A       Date:  2015-09-10       Impact factor: 11.205

3.  Evolutionarily Distinct BAHD N-Acyltransferases Are Responsible for Natural Variation of Aromatic Amine Conjugates in Rice.

Authors:  Meng Peng; Yanqiang Gao; Wei Chen; Wensheng Wang; Shuangqian Shen; Jian Shi; Cheng Wang; Yu Zhang; Li Zou; Shouchuang Wang; Jian Wan; Xianqing Liu; Liang Gong; Jie Luo
Journal:  Plant Cell       Date:  2016-06-27       Impact factor: 11.277

4.  The potentiality of rice microsatellite markers in assessment of cross-species transferability and genetic diversity of rice and its wild relatives.

Authors:  Umakanta Ngangkham; Sofini Dash; Madhuchhanda Parida; Sanghamitra Samantaray; Devachandra Nongthombam; Manoj Kumar Yadav; Awadhesh Kumar; Parameswaran Chidambaranathan; Jawahar L Katara; Bhaskar C Patra; Lotan K Bose
Journal:  3 Biotech       Date:  2019-05-20       Impact factor: 2.406

5.  The Amino Acid Permease 5 (OsAAP5) Regulates Tiller Number and Grain Yield in Rice.

Authors:  Jie Wang; Bowen Wu; Kai Lu; Qian Wei; Junjie Qian; Yunping Chen; Zhongming Fang
Journal:  Plant Physiol       Date:  2019-03-19       Impact factor: 8.340

6.  Genetic Diversity, Rather than Cultivar Type, Determines Relative Grain Cd Accumulation in Hybrid Rice.

Authors:  Liang Sun; Xiaxu Xu; Youru Jiang; Qihong Zhu; Fei Yang; Jieqiang Zhou; Yuanzhu Yang; Zhiyuan Huang; Aihong Li; Lianghui Chen; Wenbang Tang; Guoyu Zhang; Jiurong Wang; Guoying Xiao; Daoyou Huang; Caiyan Chen
Journal:  Front Plant Sci       Date:  2016-09-21       Impact factor: 5.753

7.  Divergent selection and genetic introgression shape the genome landscape of heterosis in hybrid rice.

Authors:  Zechuan Lin; Peng Qin; Xuanwen Zhang; Chenjian Fu; Hanchao Deng; Xingxue Fu; Zhen Huang; Shuqin Jiang; Chen Li; Xiaoyan Tang; Xiangfeng Wang; Guangming He; Yuanzhu Yang; Hang He; Xing Wang Deng
Journal:  Proc Natl Acad Sci U S A       Date:  2020-02-18       Impact factor: 11.205

8.  The PPR-SMR Protein ATP4 Is Required for Editing the Chloroplast rps8 mRNA in Rice and Maize.

Authors:  Jinghong Zhang; Yipo Guo; Qian Fang; Yongli Zhu; Yang Zhang; Xuejiao Liu; Yongjun Lin; Alice Barkan; Fei Zhou
Journal:  Plant Physiol       Date:  2020-09-14       Impact factor: 8.340

9.  The E3 Ubiquitin Ligase HAF1 Modulates Circadian Accumulation of EARLY FLOWERING3 to Control Heading Date in Rice under Long-Day Conditions.

Authors:  Chunmei Zhu; Qiang Peng; Debao Fu; Dongxia Zhuang; Yiming Yu; Min Duan; Weibo Xie; Yaohui Cai; Yidang Ouyang; Xingming Lian; Changyin Wu
Journal:  Plant Cell       Date:  2018-09-21       Impact factor: 11.277

10.  Genome-Wide Association Studies Reveal the Genetic Basis of Ionomic Variation in Rice.

Authors:  Meng Yang; Kai Lu; Fang-Jie Zhao; Weibo Xie; Priya Ramakrishna; Guangyuan Wang; Qingqing Du; Limin Liang; Cuiju Sun; Hu Zhao; Zhanyi Zhang; Zonghao Liu; Jingjing Tian; Xin-Yuan Huang; Wensheng Wang; Huaxia Dong; Jintao Hu; Luchang Ming; Yongzhong Xing; Gongwei Wang; Jinhua Xiao; David E Salt; Xingming Lian
Journal:  Plant Cell       Date:  2018-10-29       Impact factor: 11.277

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.