Literature DB >> 15608233

ChickVD: a sequence variation database for the chicken genome.

Jing Wang1, Ximiao He, Jue Ruan, Mingtao Dai, Jie Chen, Yong Zhang, Yafeng Hu, Chen Ye, Shengting Li, Lijuan Cong, Lin Fang, Bin Liu, Songgang Li, Jian Wang, David W Burt, Gane Ka-Shu Wong, Jun Yu, Huanming Yang, Jun Wang.   

Abstract

Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DNA from domestic breeds. Using the Red Jungle Fowl genome sequence as a reference, we identified 3.1 million non-redundant DNA sequence variants. To facilitate the application of our data to avian genetics and to provide a foundation for functional and evolutionary studies, we created the 'Chicken Variation Database' (ChickVD). A graphical MapView shows variants mapped onto the chicken genome in the context of gene annotations and other features, including genetic markers, trait loci, cDNAs, chicken orthologs of human disease genes and raw sequence traces. ChickVD also stores information on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn.

Entities:  

Mesh:

Year:  2005        PMID: 15608233      PMCID: PMC540046          DOI: 10.1093/nar/gki092

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Chicken (Gallus gallus) is an important model organism for biomedical research, the study of embryology and development (1,2), aging (3), quantitative trait loci (QTL) analysis (4), in addition to being a major food source. Through the comprehensive study of sequence polymorphisms in the chicken, significant progress can be made in understanding the phenotypic differences between individuals/strains and the evolution of populations. Towards this end, the Beijing Genomics Institute (BGI) led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to identify and characterize extensive DNA sequence variation throughout the chicken genome. Sequence variation within the genomes of three different breeds of domesticated chickens [a male broiler (Cornish) from Roslin Institute, a female layer (White Leghorn) from Swedish University of Agricultural Sciences and a female Silkie from the Chinese Agricultural University in Beijing], was identified by direct comparison with the genome sequence of the Red Jungle Fowl (RJF), assembled by the Washington University Genome Sequencing Center (WUGSC), as a reference chicken genome sequence (Chicken Genome Consortium, 2004). From these comparisons 3.1 million unique, high-quality sequence variants were identified, primarily single nucleotide polymorphisms (SNPs). In order to facilitate the use of this information, we created the ‘Chicken Variation Database’ (ChickVD), an integrated information system for the storage, retrieval, visualization and analysis of chicken DNA sequence variation. To enhance the discovery of relationships between sequence variation and genes, we mapped each variant onto the RJF reference genome sequence in the context of gene annotations and other relevant features, such as genetic markers and QTLs. Therefore the ChickVD database, provides both a powerful information resource and an analysis workbench for applications in biological research, medicine and agriculture.

DATA SOURCE

The primary purpose of the chicken SNP project was to discover and map extensive sequence variation in the chicken genome, to increase the density of markers on the genetic map, and to facilitate studies on the genetic basis of phenotypic traits. We compared the sequence reads from the three chicken breeds to the 6.6X RJF reference genome sequence (Chicken Genome Consortium, 2004). For each of the breeds, we sampled ∼25% of the genome (a million sequence reads from each breed) using automated capillary sequencers (Amersham MegaBACE 1000). To minimize sequencing errors, we used the Phred quality score (Q-value) (5,6) to set minimum thresholds. For base substitutions, we used conservative thresholds of >Q25 at the variant site and >Q20 for the two flanking 5 bp regions. Even more stringent thresholds of Q30 and Q25 were used to score insertion–deletions (Indels). Using a genome-wide BlastN search, we were able to determine the sequence position on the RJF genome assembly and eliminate confusion between paralogs. Detailed alignments were performed using CrossMatch (http://www.phrap.org) for increased accuracy. A subset of polymorphisms was confirmed by PCR-based resequencing from the lines in which they were initially detected. To display each sequence variant in the context of its physical relationship to the nearest gene or other relevant features, we employed a number of methods to identify these target sequences. These included annotations within public databases, homology searches, prediction programs and information on experimentally derived genes. A non-redundant dataset of genes/cDNAs was integrated within ChickVD, including Ensembl gene annotations (7), GenBank genes with ‘complete CDS’ (8), full-length cDNAs (9) (http://pheasant.gsf.de/DEPARTMENT/DT40/dt40Transcript.html) and chicken orthologs of human disease genes (C. Webber and C. P. Ponting, unpublished data). We used BLAT (10) to map these sequences using the RJF reference genome, SIM4 (11) to determine the detailed exon–intron boundaries and then classified coding SNPs into synonymous or non-synonymous substitutions. To associate the location of sequence variants with genetically mapped phenotypic traits, we included information on mapped QTLs, using data provided by a number of collaborating institutions (Roslin Institute, Department of Medical Biochemistry and Microbiology of Uppsala University, USDA-ARS Avian Disease and Oncology Laboratory, Animal Breeding and Genetics Group of Wageningen University) and public resources, such as ChickAce (https://acedb.asg.wur.nl/).

DATA CONTENT

ChickVD contains 3.1 million DNA sequence variants; 2.8 million are categorized as SNPs and 0.3 million are Indels. Follow-up experiments indicate that over 90% of these SNPs are true SNPs, over 70% are common SNPs that segregate in many chicken breeds and the mean nucleotide diversity is estimated to be ∼5 SNP/kb (International Chicken Polymorphism Map Consortium, 2004). For the convenience of data display, all types of DNA sequence variation (substitutions, insertions or deletions) are referred to as ‘SNPs’. SNP information is detailed in a ‘SNP Report’, including type, allelic differences, flanking sequences and PCR-primer designs, location, associated genes and quality scores of sequence reads, functional site in the chicken genome (e.g. coding region, intron, untranslated regions), and details of any predicted or known functional outcomes (e.g. codon and deduced amino acid changes). For additional information to the potential user, 1.5 million contig-covered (CtgCoV) regions from the broiler, layer and Silkie are aligned to the RJF reference genome and 2.5 million raw sequence traces are available within ChickVD. For queries about gene-associated polymorphisms, we describe each gene/cDNA in a ‘Gene Report’, including gene structure, functional classification, Gene Ontology (12) and InterPro (13) annotations, gene-associated SNPs, and nucleotide/protein sequences. For chicken orthologs of human disease genes, hypertext links to the given Ensembl files of human disease genes and OMIM entries of disease descriptions (14) are also provided. A collection of 606 QTLs for a wide range of traits are integrated and cross-referenced to markers, genetic maps, genes and SNPs mapped onto the same chromosomal region, and to PubMed (15) for literature sources. ChickVD also hosts a collection of 884 references, which focus on chicken sequence variation, QTL study and basic chicken biology. A summary of ChickVD content is shown in Table 1 and all data are freely available from our FTP site (http://chicken.genomics.org.cn/chicken/jsp/download.jsp).
Table 1.

Data content of ChickVD: August 10, 2004

Data typeData statistics
Sequence variations3 119 698
 SNPs2 833 578
 Indels286 120
Confirmed mRNA transcripts3868
 GenBank with ‘complete cds’1087
 Riken1 full-length cDNAs1707
 BBSRC full-length cDNAs1074
Ensembl gene annotations17 909
Chicken orthologs of human disease genes995
Percentage length of CtgCoV regions on 
 RJF genome assembly 
 Broiler/RJF25.6%
 Layer/RJF25.0%
 Silkie/RJF27.3%
Raw sequence traces2 544 985
 SNP-associated traces1 205 058
QTLs606
References884

DATABASE USAGE AND ACCESS

All data housed in ChickVD are uniquely mapped onto the RJF reference genome and graphically represented in MapView, an efficient visualization tool initially developed in our Rice Information System (BGI-RIS) (16) that allows users to browse SNPs in genomic and functional context. MapView is composed of four types of subviewers in hierarchical architecture, namely ChroView, GeneView, SNPView and TraceView (Figure 1). ChroView is based on the reference sequence of RJF with QTLs marked along chromosomes, and displays density and statistics of genes and SNPs. ChroView also allows users to center the map onto a specific chromosome location and make options to expand for more detailed views of genes/cDNAs, SNPs via GeneView and SNPView, respectively. A factual report for each element contained in the visualization system is displayed automatically on demand. TraceView assists users to view the original trace files around the detected SNP. Q-values for each nucleotide and a position-location function facilitate further investigation of the SNPs.
Figure 1

Screenshots of the MapView system. SNP and QTL reports associated with a SNP are shown, with detailed views on information for SNPs in a selected chromosomal region.

The ChickVD online search tool is the entry point for querying SNPs and other data types in the database. Users can simply query SNPs by identifiers or genomic locations. The results can be further restricted to a specific chicken breed, a certain SNP type, a threshold of quality score or to a specific SNP functional class, such as a coding non-synonymous SNP. BLAST-based SNP search compares a user-supplied sequence against the flanking sequences that immediately surround the sequence variant. Advanced search interfaces for other data types (genes/cDNAs, QTLs, sequence traces and references) are also provided. For sequence searches, a key development is the implementation of SeqGetter, a search tool that allows extraction of all variations residing within genomic domains, as defined by flanking elements that can be other variants, chromosomal positions, genes, or even specific coding sequences, introns or regulatory domains.

SYSTEM DESIGN AND IMPLEMENTATION

ChickVD consists of three hardware components, a World Wide Web server, a database server and a sequence analysis/homology search server. The system is based on an Oracle9i relational database, and the front end consists of a set of JSP scripts running on TomCat web server. The search engine and MapView were developed using Java Servlet and JavaBean. Java Applets are applied for TraceView functions. To handle the large amount of complex data, we developed the standard sets of SNP-centric and QTL-centric XML formats that lay the foundation for our research work and allow ChickVD to accommodate the fast-accumulating data and to integrate new data types when they arise.

FUTURE DEVELOPMENTS

Continued efforts will be made to update SNP data and improve data quality. ChickVD will also be updated for each new chicken genome update, including annotations and more QTLs, markers and references, as soon as they become available. We will also introduce into ChickVD a version system and references around different versions. So in the near future, it will be possible for users to retrieve data from different versions, trace up and locate changes of a given entity between different versions. New versions of ChickVD will integrate allele frequencies of each variation site from ongoing projects with collaborating institutions. To enhance data utility, our development efforts will extend data structures and establish systems to support haplotype information that is expected to become the principal functional unit for chicken genetics. ChickVD continues to make enhancements to user interfaces, improve the functionality of searches and data representation, and evolve the database infrastructure and data model to ensure data quality and consistency. A side-by-side comparative map viewer will make comparative analysis of the SNP map and chicken genes easier. We welcome comments and suggestions from the chicken research community to make ChickVD a user-friendly knowledge resource.
  16 in total

1.  The Gene Ontology (GO) database and informatics resource.

Authors:  M A Harris; J Clark; A Ireland; J Lomax; M Ashburner; R Foulger; K Eilbeck; S Lewis; B Marshall; C Mungall; J Richter; G M Rubin; J A Blake; C Bult; M Dolan; H Drabkin; J T Eppig; D P Hill; L Ni; M Ringwald; R Balakrishnan; J M Cherry; K R Christie; M C Costanzo; S S Dwight; S Engel; D G Fisk; J E Hirschman; E L Hong; R S Nash; A Sethuraman; C L Theesfeld; D Botstein; K Dolinski; B Feierbach; T Berardini; S Mundodi; S Y Rhee; R Apweiler; D Barrell; E Camon; E Dimmer; V Lee; R Chisholm; P Gaudet; W Kibbe; R Kishore; E M Schwarz; P Sternberg; M Gwinn; L Hannick; J Wortman; M Berriman; V Wood; N de la Cruz; P Tonellato; P Jaiswal; T Seigfried; R White
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

2.  GenBank: update.

Authors:  Dennis A Benson; Ilene Karsch-Mizrachi; David J Lipman; James Ostell; David L Wheeler
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

3.  The InterPro Database, 2003 brings increased coverage and new features.

Authors:  Nicola J Mulder; Rolf Apweiler; Teresa K Attwood; Amos Bairoch; Daniel Barrell; Alex Bateman; David Binns; Margaret Biswas; Paul Bradley; Peer Bork; Phillip Bucher; Richard R Copley; Emmanuel Courcelle; Ujjwal Das; Richard Durbin; Laurent Falquet; Wolfgang Fleischmann; Sam Griffiths-Jones; Daniel Haft; Nicola Harte; Nicolas Hulo; Daniel Kahn; Alexander Kanapin; Maria Krestyaninova; Rodrigo Lopez; Ivica Letunic; David Lonsdale; Ville Silventoinen; Sandra E Orchard; Marco Pagni; David Peyruc; Chris P Ponting; Jeremy D Selengut; Florence Servant; Christian J A Sigrist; Robert Vaughan; Evgueni M Zdobnov
Journal:  Nucleic Acids Res       Date:  2003-01-01       Impact factor: 16.971

4.  Genetics. Chicken genome--science nuggets to come soon.

Authors:  Dave Burt; Olivier Pourquie
Journal:  Science       Date:  2003-06-13       Impact factor: 47.728

5.  Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders.

Authors:  Ada Hamosh; Alan F Scott; Joanna Amberger; Carol Bocchini; David Valle; Victor A McKusick
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

6.  A computer program for aligning a cDNA sequence with a genomic DNA sequence.

Authors:  L Florea; G Hartzell; Z Zhang; G M Rubin; W Miller
Journal:  Genome Res       Date:  1998-09       Impact factor: 9.043

7.  BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics.

Authors:  Wenming Zhao; Jing Wang; Ximiao He; Xiaobing Huang; Yongzhi Jiao; Mingtao Dai; Shulin Wei; Jian Fu; Ye Chen; Xiaoyu Ren; Yong Zhang; Peixiang Ni; Jianguo Zhang; Songgang Li; Jian Wang; Gane Ka-Shu Wong; Hongyu Zhao; Jun Yu; Huanming Yang; Jun Wang
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

8.  A chicken model to study the embryology of cloacal exstrophy.

Authors:  Jörg Männer; Dietrich Kluth
Journal:  J Pediatr Surg       Date:  2003-05       Impact factor: 2.545

9.  A comprehensive collection of chicken cDNAs.

Authors:  Paul E Boardman; Juan Sanz-Ezquerro; Ian M Overton; David W Burt; Elizabeth Bosch; Willy T Fong; Cheryll Tickle; William R A Brown; Stuart A Wilson; Simon J Hubbard
Journal:  Curr Biol       Date:  2002-11-19       Impact factor: 10.834

10.  Database resources of the National Center for Biotechnology Information: update.

Authors:  David L Wheeler; Deanna M Church; Ron Edgar; Scott Federhen; Wolfgang Helmberg; Thomas L Madden; Joan U Pontius; Gregory D Schuler; Lynn M Schriml; Edwin Sequeira; Tugba O Suzek; Tatiana A Tatusova; Lukas Wagner
Journal:  Nucleic Acids Res       Date:  2004-01-01       Impact factor: 16.971

View more
  16 in total

1.  Nonparametric methods for incorporating genomic information into genetic evaluations: an application to mortality in broilers.

Authors:  Oscar González-Recio; Daniel Gianola; Nanye Long; Kent A Weigel; Guilherme J M Rosa; Santiago Avendaño
Journal:  Genetics       Date:  2008-04       Impact factor: 4.562

2.  Online resources for genomic structural variation.

Authors:  Tam P Sneddon; Deanna M Church
Journal:  Methods Mol Biol       Date:  2012

3.  A genetic variation map for chicken with 2.8 million single-nucleotide polymorphisms.

Authors:  Gane Ka-Shu Wong; Bin Liu; Jun Wang; Yong Zhang; Xu Yang; Zengjin Zhang; Qingshun Meng; Jun Zhou; Dawei Li; Jingjing Zhang; Peixiang Ni; Songgang Li; Longhua Ran; Heng Li; Jianguo Zhang; Ruiqiang Li; Shengting Li; Hongkun Zheng; Wei Lin; Guangyuan Li; Xiaoling Wang; Wenming Zhao; Jun Li; Chen Ye; Mingtao Dai; Jue Ruan; Yan Zhou; Yuanzhe Li; Ximiao He; Yunze Zhang; Jing Wang; Xiangang Huang; Wei Tong; Jie Chen; Jia Ye; Chen Chen; Ning Wei; Guoqing Li; Le Dong; Fengdi Lan; Yongqiao Sun; Zhenpeng Zhang; Zheng Yang; Yingpu Yu; Yanqing Huang; Dandan He; Yan Xi; Dong Wei; Qiuhui Qi; Wenjie Li; Jianping Shi; Miaoheng Wang; Fei Xie; Jianjun Wang; Xiaowei Zhang; Pei Wang; Yiqiang Zhao; Ning Li; Ning Yang; Wei Dong; Songnian Hu; Changqing Zeng; Weimou Zheng; Bailin Hao; Ladeana W Hillier; Shiaw-Pyng Yang; Wesley C Warren; Richard K Wilson; Mikael Brandström; Hans Ellegren; Richard P M A Crooijmans; Jan J van der Poel; Henk Bovenhuis; Martien A M Groenen; Ivan Ovcharenko; Laurie Gordon; Lisa Stubbs; Susan Lucas; Tijana Glavina; Andrea Aerts; Pete Kaiser; Lisa Rothwell; John R Young; Sally Rogers; Brian A Walker; Andy van Hateren; Jim Kaufman; Nat Bumstead; Susan J Lamont; Huaijun Zhou; Paul M Hocking; David Morrice; Dirk-Jan de Koning; Andy Law; Neil Bartley; David W Burt; Henry Hunt; Hans H Cheng; Ulrika Gunnarsson; Per Wahlberg; Leif Andersson; Ellen Kindlund; Martti T Tammi; Björn Andersson; Caleb Webber; Chris P Ponting; Ian M Overton; Paul E Boardman; Haizhou Tang; Simon J Hubbard; Stuart A Wilson; Jun Yu; Jian Wang; Huanming Yang
Journal:  Nature       Date:  2004-12-09       Impact factor: 49.962

4.  Genetic effects of polymorphisms in candidate genes and the QTL region on chicken age at first egg.

Authors:  Haiping Xu; Hua Zeng; Chenglong Luo; Dexiang Zhang; Qian Wang; Liang Sun; Lishan Yang; Min Zhou; Qinghua Nie; Xiquan Zhang
Journal:  BMC Genet       Date:  2011-04-15       Impact factor: 2.797

5.  Snap: an integrated SNP annotation platform.

Authors:  Shengting Li; Lijia Ma; Heng Li; Søren Vang; Yafeng Hu; Lars Bolund; Jun Wang
Journal:  Nucleic Acids Res       Date:  2006-11-29       Impact factor: 16.971

6.  Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research.

Authors:  Suhua Chang; Jiajie Zhang; Xiaoyun Liao; Xinxing Zhu; Dahai Wang; Jiang Zhu; Tao Feng; Baoli Zhu; George F Gao; Jian Wang; Huanming Yang; Jun Yu; Jing Wang
Journal:  Nucleic Acids Res       Date:  2006-10-25       Impact factor: 16.971

7.  PigGIS: Pig Genomic Informatics System.

Authors:  Jue Ruan; Yiran Guo; Heng Li; Yafeng Hu; Fei Song; Xin Huang; Karsten Kristiensen; Lars Bolund; Jun Wang
Journal:  Nucleic Acids Res       Date:  2006-11-07       Impact factor: 16.971

8.  Current status and future perspectives for sequencing livestock genomes.

Authors:  Yongsheng Bai; Maureen Sartor; James Cavalcoli
Journal:  J Anim Sci Biotechnol       Date:  2012-03-01

9.  Identification and association of the single nucleotide polymorphisms in calpain3 (CAPN3) gene with carcass traits in chickens.

Authors:  Zeng-Rong Zhang; Yi-Ping Liu; Yong-Gang Yao; Xiao-Song Jiang; Hua-Rui Du; Qing Zhu
Journal:  BMC Genet       Date:  2009-03-05       Impact factor: 2.797

10.  MyBASE: a database for genome polymorphism and gene function studies of Mycobacterium.

Authors:  Xinxing Zhu; Suhua Chang; Kechi Fang; Sijia Cui; Jun Liu; Zuowei Wu; Xuping Yu; George F Gao; Huanming Yang; Baoli Zhu; Jing Wang
Journal:  BMC Microbiol       Date:  2009-02-20       Impact factor: 3.605

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.