Literature DB >> 27899658

The BIG Data Center: from deposition to integration to translation.

Abstract

Biological data are generated at unprecedentedly exponential rates, posing considerable challenges in big data deposition, integration and translation. The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, provides a suite of database resources, including (i) Genome Sequence Archive, a data repository specialized for archiving raw sequence reads, (ii) Gene Expression Nebulas, a data portal of gene expression profiles based entirely on RNA-Seq data, (iii) Genome Variation Map, a comprehensive collection of genome variations for featured species, (iv) Genome Warehouse, a centralized resource housing genome-scale data with particular focus on economically important animals and plants, (v) Methylation Bank, an integrated database of whole-genome single-base resolution methylomes and (vi) Science Wikis, a central access point for biological wikis developed for community annotations. The BIG Data Center is dedicated to constructing and maintaining biological databases through big data integration and value-added curation, conducting basic research to translate big data into big knowledge and providing freely open access to a variety of data resources in support of worldwide research activities in both academia and industry. All of these resources are publicly available and can be found at http://bigd.big.ac.cn.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Year: 2016 PMID： 27899658 PMCID： PMC5210546 DOI： 10.1093/nar/gkw1060

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The rapid advancements of high-throughout sequencing technologies provide us with formidable capacity in genome sequencing, accordingly producing biological data at an unprecedentedly exponential rate and resultantly accumulating huge amounts of biological data at multiple omics levels (1). To address the most important and complex biological questions, it is often required to provide researchers with open access to various data resources (2). Nowadays, China has become a powerhouse in generating vast quantities of biological data, but is in the embarrassing situation of lacking a centralized data center that is committed to opening data in this big data world and to making data well-organized and publicly accessible to worldwide scientific communities (3). The BIG Data Center, established at Beijing Institute of Genomics (BIG), Chinese Academy of Sciences, takes full advantages of valuable resources and experiences in BIG as well as partner institutions to provide sustainable and reliable services in aid of research activities throughout the world. Specially, BIG features important achievements in not only actively participating in the International Human Genome Project China Part (4) but also presiding several prestigious national research projects (e.g. the Chinese Superhybrid Rice Genome Project (5), the Chicken (6), Silkworm (7), Date Palm (8), Common Carp (9), Cassava (10) and Rubber Tree (11) Genome Projects), pioneers the Chinese Population Precision Medicine Initiative (http://news.xinhuanet.com/english/2016-01/09/c_134993997.htm) and possesses rich experiences in developing and maintaining biological databases. In addition, it is well equipped with facilities including both DNA sequencers and high performance computing resources. Therefore, the BIG Data Center is dedicated to constructing and maintaining biological databases by big data integration and value-added curation, performing basic research by development of advanced methods to aid translation of big data into big discovery and providing freely open access to a suite of featured data resources in support of worldwide activities in both academia and industry (http://bigd.big.ac.cn; Figure 1).

Figure 1.

The BIG Data Center's core data resources. A full list of data resources, which contains links to each resource, is available at http://bigd.big.ac.cn/databases.

GENOME SEQUENCE ARCHIVE

The Genome Sequence Archive (GSA; http://gsa.big.ac.cn) is a data repository specialized for archiving raw sequence reads. It supports data generated from a variety of sequencing platforms ranging from Sanger sequencing machines to single-cell sequencing machines and provides data storing and sharing services free of charge for worldwide scientific communities. In addition to raw sequencing data, GSA also accommodates secondary analyzed files in acceptable formats (like BAM, VCF). Its user-friendly web interfaces simplify data entry and submitted data are roughly organized as two parts, viz., Metadata and File, where the former can be further assorted into BioProject, BioSample, Experiment and Run, and the latter contains raw sequence reads. Since its inception in August 2015, GSA, as of October 2016, houses a total of 155 projects, 7140 samples, 7646 experiments and 8433 runs for more than 50 species, and stores compressed sequence files that are more than 100TB in size. The BIG Data Center's core data resources. A full list of data resources, which contains links to each resource, is available at http://bigd.big.ac.cn/databases.

GENE EXPRESSION NEBULAS

High-throughput sequencing technologies provide a revolutionary way for transcriptome profiling, enable facile generation of large-scale RNA sequencing (RNA-Seq) data and accordingly facilitate high-resolution quantification of gene expression levels across a variety of tissues and treatments (12–14). Thus, gene expression profiling from RNA-Seq data is of fundamental significance for deciphering functional elements under diverse conditions and characterizing the dynamics of transcriptomic regulation. The Gene Expression Nebulas (GEN; http://bigd.big.ac.cn/gen) is a data portal of gene expression profiles based entirely on RNA-Seq data (that are retrieved from NCBI SRA (15)), which currently hosts two featured resources, namely, Mammalian Transcriptomic Database (16) and Rice Expression Database (RED) (17).

Mammalian transcriptomic database (MTD)

Mammalian transcriptomic database (MTD) (http://bigd.big.ac.cn/mtd) (16) is a mammalian transcriptomic database that is based on large quantities of RNA-Seq data across various tissues/cell lines. In the current version, it incorporates a wealth of transcriptomes from human, mouse, rat and pig, which are all obtained from NCBI SRA (15). MTD features easy-to-use web interfaces for exploration of transcriptomic profiling for genes or for a specific genomic region, characterization of detailed expression profiles at the levels of exon, transcript, and gene and visualization of transcriptomic data in an interactive manner powered by a genome browser. In addition, MTD allows users to search for genes or isoforms with customized transcriptional features, such as housekeeping genes, expression profiles of tissues/cell lines and isoforms undergoing an ‘exon skipped’ alternative splicing event. Moreover, it supports comparative transcriptomic analysis not only within a species but also across species, bearing the potential to reveal the dynamics of gene expression regulation. Together, MTD is a valuable resource for mammalian transcriptomic and evolutionary studies.

Rice expression database (RED)

RED (http://expression.ic4r.org), a committed project of Information Commons for Rice (IC4R) (17), is an integrated database hosting rice gene expression profiles derived entirely from high-quality RNA-Seq data. Unlike extant related databases that are mostly based on microarray data and/or contain limited RNA-Seq data, RED contains a comprehensive collection of 284 high-quality RNA-Seq experiments obtained from NCBI SRA (15) and thus houses a large number of gene expression profiles that span a broad range of rice growth stages and cover a wide variety of biotic and abiotic treatments. Powered by AJAX (Asynchronous JavaScript and XML, a collection of web development technologies for creating highly interactive web applications) and HighChart (a JavaScript-based library for setting up interactive charts in web pages), RED also features interactive search and display of expression profiles of concerned genes across different tissues and treatments. In addition, RED provides online tools for construction and visualization of gene co-expression networks, which can be achieved simply by specifying genes of interest. Ongoing efforts include integration of more high-quality RNA-Seq data and characterization of transcriptomic profiles by association with important agronomic traits in rice. Moreover, we plan to use RED as a framework to incorporate data from other plants, such as maize (Zea mays), rubber tree (Hevea brasiliensis), potato (Solanum tuberosum) and cotton (Gossypium raimondii), and to extend it into a generalized gene expression database for multiple economically important plants.

GENOME VARIATION MAP

The Genome Variation Map (GVM; http://bigd.big.ac.cn/gvm) is a data repository and retrieval system of genome variations, including single nucleotide polymorphisms (SNPs) and small insertions and deletions (indels). Currently, GVM focuses on genome variations for human as well as domesticated animals (e.g. dog) and cultivated plants (e.g. rice), which are of great importance for in-depth exploration of favorable traits (e.g. drought resistance in plant) and investigation of species domestication and evolution. The current version of GVM integrates a collection of variation data from several featured species including human, dog, rice and sorghum.

Dog genome SNP database (DoGSD)

Dog genome SNP database (DoGSD) (http://bigd.big.ac.cn/dogsd) (18) is a Canidae-specific SNP database for domesticated dogs and gray wolves, comprising ∼19 million high-quality whole genome SNPs of 77 individual samples (that are obtained from published studies (19–21) and NCBI dbSNP (22)). DoGSD integrates a comprehensive collection of SNP related information, including SNP annotation, associated genes, synonymous or non-synonymous SNPs, sample location, breed information, together with the population genetic statistics (FST) for online analysis. DoGSD provides friendly interfaces to browse detailed information for each SNP, to retrieve a list of SNPs for any given sample referencing to specific chromosome and to obtain SNP statistics for multiple different samples of interest. As a committed sub-project of Dog 10K Genomes Project (dog10K; http://dog10k.big.ac.cn), DoGSD is committed to incorporating more comprehensive variation data for the canine research community, serving as a critical resource for better understanding the evolutionary history of dogs, investigating genetic changes associated with domestication and relating genetic changes to phenotype.

Rice variation database (RVD)

As rice is not only a key model organism for plant studies but also the most widely consumed staple food for a large number of global human population, thousands of rice accessions have been re-sequenced to date. Rice variation database (RVD) (http://variation.ic4r.org) (17) is built based on a large collection of 5152 re-sequenced rice accessions, which are mainly from published literatures (23–27) and the 3K Rice Genomes Project (28), and accordingly includes ∼18 million SNPs against Os-Nipponbare-Reference-IRGSP-1.0 pseudomolecule identified by using unified standard SNP-calling pipeline. RVD provides detailed annotations, including SNP consequence, gene function and results from association studies, and hyperlinks to other external database resources. Besides, RVD is equipped with online analysis tools, viz., RiceClustalW for multiple sequence alignment against specific rice accession(s), Population Genetic Analysis for computing population genetic parameters for any specific region, and Gene Haplotype Analysis for calculating gene haplotype diversity and structure.

Sorghum genome SNP database (SorGSD)

Sorghum is not only one of the most important crops but also a potential bio-energy feedstock. SorGSD (http://bigd.big.ac.cn/sorgsd) (29) is a sorghum genome SNP database that is of great significance in genetic characterization of important quantitative traits in sorghum. SorGSD covers a diverse collection of 48 sorghum lines (30,31) that fall into four groups, viz., improved varieties, landraces, wild and weedy sorghums, and a wild relative Sorghum propinquum. Totally, SorGSD includes ∼62.9 million SNPs identified from the whole genome re-sequencing data of these individuals by mapping to the S. bicolor reference genome (v3). SorGSD provides a detailed summary of SNP information and their relevant annotations for all individual accessions, such as allele information, gene information, SNP density and external links to other resources. In addition, it allows comparison of SNP data among two or more sorghum lines, equips with easy-to-use visualization interfaces by integrating the GBrowse package and collects other sorghum-related resources and literature references, providing a valuable repository for sorghum genetic and molecular breeding studies.

Virtual chinese genome database (VCGDB)

Virtual Chinese Genome Database (VCGDB) (http://bigd.big.ac.cn/vcg) (32) is a dynamic genome database of Chinese populations based on whole-genome sequencing data of 194 individuals that are publicly available in the 1000 Genomes Project (33). VCGDB presents two types of genetic variations: virtual and dynamic; virtual variations are those shared by all collected individuals, reflecting what is common to the Chinese populations, whereas dynamic variations are those that vary among individuals, revealing genetic differences specific to the individuals. As a result, VCGDB houses a large variety of dynamic genomic variations including 35 million single nucleotide variations (SNVs), 0.5 million indels and 29 million rare variations. In addition, a highly interactive user-friendly interface is provided in VCGDB to display the virtual and dynamic variations and a web search engine is also installed in VCGDB to support online real-time high-performance queries. Based on the ongoing project of the Chinese Population Precision Medicine Initiative we lead, VCGDB will incorporate more Chinese population genomes and provide a more precise Chinese reference genome.

GENOME WAREHOUSE

The Genome Warehouse (GWH; http://bigd.big.ac.cn/gwh) is a centralized resource housing genome-scale data, with the purpose to archive high-quality genome sequences and gene annotation information. Currently, GWH offers users with open access to a featured collection of 26 genomes from economically important plants and animals, which are either publicly available in NCBI (34) or sequenced in-house; among them, the genome of Hevea brasiliensis (rubber tree) (11) that has been released recently and sequenced by our institution is a representative example. For each species, GWH contains detailed genome-related information including species metadata, genome assembly, sequence data and the corresponding annotations. Additionally, a functionality of ‘Tree View’ is provided to depict the evolutionary relationship of all species collected in GWH. For convenience, sequence data of individual genomes as well as their gene annotations are downloadable via File Transfer Protocol (FTP). Future directions of GWH include continuous integration of newly sequenced genomes and development of enhanced interfaces for data presentation and visualization.

METHYLATION BANK

The Methylation Bank (MethBank; http://bigd.big.ac.cn/methbank) (35) is a repository that integrates whole-genome single-base resolution methylomes and provides an interactive browser for visualization of high-resolution DNA methylation data. It incorporates high-quality whole-genome bisulfite sequencing methylome maps for five economically important crops (Oryza sativa, Glycine max, Manihot esculenta, Phaseolus vulgaris and Solanum lycopersicum) as well as two model animals (Danio rerio and Mus musculus) (36,37). Specifically, to quality-control all collected methylomes (that are publicly available in NCBI SRA (15) till May 2016), MethBank discards low-quality methylomes by considering genome coverage and bisulfite conversion rate, and as a result, obtains 42 high-quality methylomes for O. sativa, 21 for G. max, 1 for M. esculenta, 1 for P. vulgaris, 7 for S. lycopersicum, 9 for D. rerio and 9 for M. musculus. MethBank features genome-wide profiling of methylation levels across chromosomes, identification of differentially methylated promoters (DMP) between a range of conditions, and visualization of methylation profiles for genes, regions and CpG Islands under multiple different samples. In addition, MethBank offers intuitive interfaces for data browse and retrieval; it is able to provide a genome-wide methylation view and to retrieve gene methylation profiles and regional methylation levels across all collected samples. It is also equipped with interactive interfaces to facilitate search of methylation levels for any given gene that is related to DMP or highly-methylated CpG islands. In addition to DNA methylation, evidence has accumulated that RNA methylation is closely related with various biological processes and human diseases. Therefore, our ongoing efforts not only incorporate more types of DNA methylation from diverse species, but also integrate a wide range of RNA methylation data.

SCIENCE WIKIS

Community curation—harnessing community intelligence for biological knowledge curation, in contrast to expert curation that is heavily based on dedicated experts and vulnerably threatened by funding cuts (38)—promises to be a solution to deal with the deluge of biological data (39). A case in point is Wikipedia, an online encyclopedia that allows any user to create/edit any content and features community integration, huge coverage, up-to-date content as well as low cost for maintenance. Spirited by its extraordinary success, Science Wikis (http://bigd.big.ac.cn/sciencewikis) are a series of biological databases wikified for community curation (40), among which LncRNAWiki and RiceWiki are two featured resources that exploit the full potential of worldwide scientific communities for big data collection, integration and management.

LncRNAWiki

LncRNAWiki (http://bigd.big.ac.cn/lncrnawiki) (41) is a wiki-based, open-content and publicly editable platform that employs collective efforts in community curation of human long non-coding RNAs (lncRNAs). In addition, it quantifies community-curated efforts and provides explicit authorship based on quantitative contributions (41), which potentially attracts more people to share their knowledge and accordingly enables LncRNAWiki to serve as an up-to-date and comprehensive knowledgebase for human lncRNAs. As of September 2016, LncRNAWiki houses a total of 105 824 non-redundant lncRNAs that are integrated from GENCODE (42), NONCODE (43) and LNCipedia (44). Among them, 719 lncRNAs have been manually community-curated based on published literatures and 290 of them have been experimentally validated to be associated with cancer and other diseases. Moreover, considering the functional significance of lncRNA-encoded small proteins as reported in (45,46), we developed computational approaches for identification of small proteins in all collected human lncRNAs, identified 9387 lncRNAs potentially encoding small proteins and revealed that 2246 out of them have higher confidence by taking account of protein instability, secondary structure and transmembrane helix. As a result, all these identified lncRNAs as well as their associated small proteins are incorporated into LncRNAWiki. Since this July, LncRNAWiki has become a member of RNAcentral (47), further facilitating data exchange and sharing between LncRNAWiki and other related databases.

RiceWiki

RiceWiki (http://wiki.ic4r.org) (48) is a wiki-based, publicly editable platform for community curation of rice genes. Compared with other relevant databases, RiceWiki features collective intelligence on knowledge integration and annotation and explicit authorship in terms of quantified community-curated contributions (49). Since its inception in 2014, RiceWiki has been continuously updated, expanded and enriched, leading to more than 400 genes community-curated and covering over 3000 rice-related scientific articles. In addition, several MediaWiki extensions that are a bunch of codes for fulfilling customized functionalities are developed and deployed in RiceWiki, which aid to incorporate different types of rice-related data, such as RNA-Seq-based gene expression profiles from RED and rice-related literatures from Rice Literature Miner (http://literature.ic4r.org). Furthermore, a lightweight BLAST module is also implemented as a MediaWiki extension that enables community curators to conduct sequence alignment and facilitate gene annotation. Based on community curation, RiceWiki has the potential to cover a larger scope of rice-related knowledge and function as a comprehensive and up-to-date encyclopedia that are constantly improved and broadly shared by the rice research community.

TRAINING

The need for personnel training across diverse biological disciplines is high, especially in the face of critical challenges posed by big data generated in life and health sciences. We engage in the Genomics and Bioinformatics Training (GBT; http://bigd.big.ac.cn/training/gbt) at various levels ranging from introductory to in-depth and provide GBT courses for researchers and biomedical professionals at postgraduate level and above. Since the first GBT in 2008, we have delivered more than 20 GBT events (2–3× per year) over the past several years and more than 830 individuals have been trained, including graduate students and junior faculty members from the field of biology, medicine, agriculture and forestry. In the big data era where a range of data operations (including deposition, curation, processing, analysis and visualization) become routine and increasingly daunting, we solicit feedbacks from all trainers as well as colleagues and peers to keep pace with practical needs and improve our training programs. In addition, we are open to suggestions and worldwide collaborations to make the training programs more useful and better targeted.

CONCLUDING REMARKS

The BIG Data Center provides freely open access to a variety of database resources in support of research activities in both academia and industry throughout the world. With the ultimate goal to advance life and health sciences, therefore, it is dedicated to constructing and maintaining biological databases by value-added curation and performing basic research to address critical challenges in big data deposition, integration and translation. The BIG Data Center, albeit relatively young, will grow to be indispensable for worldwide biological studies as more data are integrated and its associated services are mature.

47 in total

1. Genome-wide association studies of 14 agronomic traits in rice landraces.

Authors: Xuehui Huang; Xinghua Wei; Tao Sang; Qiang Zhao; Qi Feng; Yan Zhao; Canyang Li; Chuanrang Zhu; Tingting Lu; Zhiwu Zhang; Meng Li; Danlin Fan; Yunli Guo; Ahong Wang; Lu Wang; Liuwei Deng; Wenjun Li; Yiqi Lu; Qijun Weng; Kunyan Liu; Tao Huang; Taoying Zhou; Yufeng Jing; Wei Li; Zhang Lin; Edward S Buckler; Qian Qian; Qi-Fa Zhang; Jiayang Li; Bin Han
Journal: Nat Genet Date: 2010-10-24 Impact factor: 38.330

2. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm.

Authors: Xuehui Huang; Yan Zhao; Xinghua Wei; Canyang Li; Ahong Wang; Qiang Zhao; Wenjun Li; Yunli Guo; Liuwei Deng; Chuanrang Zhu; Danlin Fan; Yiqi Lu; Qijun Weng; Kunyan Liu; Taoying Zhou; Yufeng Jing; Lizhen Si; Guojun Dong; Tao Huang; Tingting Lu; Qi Feng; Qian Qian; Jiayang Li; Bin Han
Journal: Nat Genet Date: 2011-12-04 Impact factor: 38.330

3. Resequencing 50 accessions of cultivated and wild rice yields markers for identifying agronomically important genes.

Authors: Xun Xu; Xin Liu; Song Ge; Jeffrey D Jensen; Fengyi Hu; Xin Li; Yang Dong; Ryan N Gutenkunst; Lin Fang; Lei Huang; Jingxiang Li; Weiming He; Guojie Zhang; Xiaoming Zheng; Fumin Zhang; Yingrui Li; Chang Yu; Karsten Kristiansen; Xiuqing Zhang; Jian Wang; Mark Wright; Susan McCouch; Rasmus Nielsen; Jun Wang; Wen Wang
Journal: Nat Biotechnol Date: 2011-12-11 Impact factor: 54.908

4. The genomics of selection in dogs and the parallel evolution between dogs and humans.

Authors: Guo-dong Wang; Weiwei Zhai; He-chuan Yang; Ruo-xi Fan; Xue Cao; Li Zhong; Lu Wang; Fei Liu; Hong Wu; Lu-guang Cheng; Andrei D Poyarkov; Nikolai A Poyarkov; Shu-sheng Tang; Wen-ming Zhao; Yun Gao; Xue-mei Lv; David M Irwin; Peter Savolainen; Chung-I Wu; Ya-ping Zhang
Journal: Nat Commun Date: 2013 Impact factor: 14.919

Review 5. RNA-Seq: a revolutionary tool for transcriptomics.

Authors: Zhong Wang; Mark Gerstein; Michael Snyder
Journal: Nat Rev Genet Date: 2009-01 Impact factor: 53.242

6. Genome-wide patterns of genetic variation in sweet and grain sorghum (Sorghum bicolor).

Authors: Lei-Ying Zheng; Xiao-Sen Guo; Bing He; Lian-Jun Sun; Yao Peng; Shan-Shan Dong; Teng-Fei Liu; Shuye Jiang; Srinivasan Ramachandran; Chun-Ming Liu; Hai-Chun Jing
Journal: Genome Biol Date: 2011-11-21 Impact factor: 13.583

7. An update on LNCipedia: a database for annotated human lncRNA sequences.

Authors: Pieter-Jan Volders; Kenneth Verheggen; Gerben Menschaert; Klaas Vandepoele; Lennart Martens; Jo Vandesompele; Pieter Mestdagh
Journal: Nucleic Acids Res Date: 2014-11-05 Impact factor: 16.971

8. An integrated map of genetic variation from 1,092 human genomes.

Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

9. Whole-genome sequencing of six dog breeds from continuous altitudes reveals adaptation to high-altitude hypoxia.

Authors: Xiao Gou; Zhen Wang; Ning Li; Feng Qiu; Ze Xu; Dawei Yan; Shuli Yang; Jia Jia; Xiaoyan Kong; Zehui Wei; Shaoxiong Lu; Linsheng Lian; Changxin Wu; Xueyan Wang; Guozhi Li; Teng Ma; Qiang Jiang; Xue Zhao; Jiaqiang Yang; Baohong Liu; Dongkai Wei; Hong Li; Jianfa Yang; Yulin Yan; Guiying Zhao; Xinxing Dong; Mingli Li; Weidong Deng; Jing Leng; Chaochun Wei; Chuan Wang; Huaming Mao; Hao Zhang; Guohui Ding; Yixue Li
Journal: Genome Res Date: 2014-04-10 Impact factor: 9.043

10. A map of rice genome variation reveals the origin of cultivated rice.

Authors: Xuehui Huang; Nori Kurata; Xinghua Wei; Zi-Xuan Wang; Ahong Wang; Qiang Zhao; Yan Zhao; Kunyan Liu; Hengyun Lu; Wenjun Li; Yunli Guo; Yiqi Lu; Congcong Zhou; Danlin Fan; Qijun Weng; Chuanrang Zhu; Tao Huang; Lei Zhang; Yongchun Wang; Lei Feng; Hiroyasu Furuumi; Takahiko Kubo; Toshie Miyabayashi; Xiaoping Yuan; Qun Xu; Guojun Dong; Qilin Zhan; Canyang Li; Asao Fujiyama; Atsushi Toyoda; Tingting Lu; Qi Feng; Qian Qian; Jiayang Li; Bin Han
Journal: Nature Date: 2012-10-03 Impact factor: 49.962

58 in total

1. Restoration of 5-hydroxymethylcytosine by ascorbate blocks kidney tumour growth.

Authors: Guangzhe Ge; Ding Peng; Ziying Xu; Bao Guan; Zijuan Xin; Qun He; Yuanyuan Zhou; Xuesong Li; Liqun Zhou; Weimin Ci
Journal: EMBO Rep Date: 2018-06-28 Impact factor: 8.807

2. MYC2 Orchestrates a Hierarchical Transcriptional Cascade That Regulates Jasmonate-Mediated Plant Immunity in Tomato.

Authors: Minmin Du; Jiuhai Zhao; David T W Tzeng; Yuanyuan Liu; Lei Deng; Tianxia Yang; Qingzhe Zhai; Fangming Wu; Zhuo Huang; Ming Zhou; Qiaomei Wang; Qian Chen; Silin Zhong; Chang-Bao Li; Chuanyou Li
Journal: Plant Cell Date: 2017-07-21 Impact factor: 11.277

3. Phylogeny of Y-chromosome haplogroup C3b-F1756, an important paternal lineage in Altaic-speaking populations.

Authors: Lan-Hai Wei; Yun-Zhi Huang; Shi Yan; Shao-Qing Wen; Ling-Xiang Wang; Pan-Xin Du; Da-Li Yao; Shi-Lin Li; Ya-Jun Yang; Li Jin; Hui Li
Journal: J Hum Genet Date: 2017-06-01 Impact factor: 3.172

4. Database Resources of the National Genomics Data Center in 2020.

Authors:
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971

5. Integrated genomic analysis identifies deregulated JAK/STAT-MYC-biosynthesis axis in aggressive NK-cell leukemia.

Authors: Liang Huang; Dan Liu; Na Wang; Shaoping Ling; Yuting Tang; Jun Wu; Lingtong Hao; Hui Luo; Xuelian Hu; Lingshuang Sheng; Lijun Zhu; Di Wang; Yi Luo; Zhen Shang; Min Xiao; Xia Mao; Kuangguo Zhou; Lihua Cao; Lili Dong; Xinchang Zheng; Pinpin Sui; Jianlin He; Shanlan Mo; Jin Yan; Qilin Ao; Lugui Qiu; Hongsheng Zhou; Qifa Liu; Hongyu Zhang; Jianyong Li; Jie Jin; Li Fu; Weili Zhao; Jieping Chen; Xin Du; Guoliang Qing; Hudan Liu; Xin Liu; Gang Huang; Ding Ma; Jianfeng Zhou; Qian-Fei Wang
Journal: Cell Res Date: 2017-11-17 Impact factor: 25.617

6. High-Throughput CRISPR/Cas9 Mutagenesis Streamlines Trait Gene Identification in Maize.

Authors: Hai-Jun Liu; Liumei Jian; Jieting Xu; Qinghua Zhang; Maolin Zhang; Minliang Jin; Yong Peng; Jiali Yan; Baozhu Han; Jie Liu; Fan Gao; Xiangguo Liu; Lei Huang; Wenjie Wei; Yunxiu Ding; Xiaofeng Yang; Zhenxian Li; Mingliang Zhang; Jiamin Sun; Minji Bai; Wenhao Song; Hanmo Chen; Xi'ang Sun; Wenqiang Li; Yuming Lu; Ya Liu; Jiuran Zhao; Yangwen Qian; David Jackson; Alisdair R Fernie; Jianbing Yan
Journal: Plant Cell Date: 2020-02-25 Impact factor: 11.277

7. MethBank 3.0: a database of DNA methylomes across a variety of species.

Authors: Rujiao Li; Fang Liang; Mengwei Li; Dong Zou; Shixiang Sun; Yongbing Zhao; Wenming Zhao; Yiming Bao; Jingfa Xiao; Zhang Zhang
Journal: Nucleic Acids Res Date: 2018-01-04 Impact factor: 16.971

8. Whole sequence analysis indicates a recent southern origin of Mongolian Y-chromosome C2c1a1a1-M407.

Authors: Yun-Zhi Huang; Lan-Hai Wei; Shi Yan; Shao-Qing Wen; Chuan-Chao Wang; Ya-Jun Yang; Ling-Xiang Wang; Yan Lu; Chao Zhang; Shu-Hua Xu; Da-Li Yao; Li Jin; Hui Li
Journal: Mol Genet Genomics Date: 2017-12-29 Impact factor: 3.291

9. Whole-sequence analysis indicates that the Y chromosome C2*-Star Cluster traces back to ordinary Mongols, rather than Genghis Khan.

Authors: Lan-Hai Wei; Shi Yan; Yan Lu; Shao-Qing Wen; Yun-Zhi Huang; Ling-Xiang Wang; Shi-Lin Li; Ya-Jun Yang; Xiao-Feng Wang; Chao Zhang; Shu-Hua Xu; Da-Li Yao; Li Jin; Hui Li
Journal: Eur J Hum Genet Date: 2018-01-22 Impact factor: 4.246

10. m⁶A modulates haematopoietic stem and progenitor cell specification.

Authors: Chunxia Zhang; Yusheng Chen; Baofa Sun; Lu Wang; Ying Yang; Dongyuan Ma; Junhua Lv; Jian Heng; Yanyan Ding; Yuanyuan Xue; Xinyan Lu; Wen Xiao; Yun-Gui Yang; Feng Liu
Journal: Nature Date: 2017-09-06 Impact factor: 49.962