| Literature DB >> 34961276 |
Jing Yu1, Sook Jung1, Chun-Huai Cheng1, Taein Lee1, Ping Zheng1, Katheryn Buble1, James Crabb1, Jodi Humann1, Heidi Hough1, Don Jones2, J Todd Campbell3, Josh Udall4, Dorrie Main1.
Abstract
Over the last eight years, the volume of whole genome, gene expression, SNP genotyping, and phenotype data generated by the cotton research community has exponentially increased. The efficient utilization/re-utilization of these complex and large datasets for knowledge discovery, translation, and application in crop improvement requires them to be curated, integrated with other types of data, and made available for access and analysis through efficient online search tools. Initiated in 2012, CottonGen is an online community database providing access to integrated peer-reviewed cotton genomic, genetic, and breeding data, and analysis tools. Used by cotton researchers worldwide, and managed by experts with crop-specific knowledge, it continuous to be the logical choice to integrate new data and provide necessary interfaces for information retrieval. The repository in CottonGen contains colleague, gene, genome, genotype, germplasm, map, marker, metabolite, phenotype, publication, QTL, species, transcriptome, and trait data curated by the CottonGen team. The number of data entries housed in CottonGen has increased dramatically, for example, since 2014 there has been an 18-fold increase in genes/mRNAs, a 23-fold increase in whole genomes, and a 372-fold increase in genotype data. New tools include a genetic map viewer, a genome browser, a synteny viewer, a metabolite pathways browser, sequence retrieval, BLAST, and a breeding information management system (BIMS), as well as various search pages for new data types. CottonGen serves as the home to the International Cotton Genome Initiative, managing its elections and serving as a communication and coordination hub for the community. With its extensive curation and integration of data and online tools, CottonGen will continue to facilitate utilization of its critical resources to empower research for cotton crop improvement.Entities:
Keywords: analysis; big data; bioinformatics; crop improvement; genotype; whole genome sequence
Year: 2021 PMID: 34961276 PMCID: PMC8705096 DOI: 10.3390/plants10122805
Source DB: PubMed Journal: Plants (Basel) ISSN: 2223-7747
Comparison of number of CottonGen entries between 15 August 2013 and 31 August 2021 by data type.
| Type | By 8/14/13 | By 8/31/21 | Data Details by 31 August 2021 |
|---|---|---|---|
| Genome | 46 | Whole genome assemblies and annotations of 30 diploid species: | |
| Gene and mRNA | 119,971 genes | 1,874,940 genes and 2,528,191 mRNAs | Genes and mRNAs from whole genome assemblies and parsed from NCBI nucleotide sequences |
| Transcript | 149,916 | 214,180 RefTrans | RefTrans for |
| Marker | 26,089 | 587,004 | Including 459,825 SNPs (TAMU63K and NAU80K arrays, and other SNPs), 109,848 SSRs |
| Map | 49 | 115 | 130,533 loci from 110 genetic maps, 2 consensus maps, 2 bin maps, and 1 silico map |
| QTL | 988 | 6772 | Including 4178 quality traits, 1547 agronomical trait, 273 biotic stress traits, and 189 biochemical traits |
| Species | 50 | 85 | Including the 4 cultivated species, 53 wild species, and 28 cross or lab made diploid, tetraploid, and hexaploidy hybrids |
| Germplasm | 14,959 | 19,827 | Including collection and sub-collections from US-NCGC, US-GRIN, China, and Uzbekistan |
| Phenotype data | 118,302 | 539,975 | Phenotypic scores from the US regional breeder’s tests; the trait evaluations from US, Uzbekistan, and China germplasm collections; and the data collected from various QTL studies |
| Genotype data | 68,640 | 25,532,891 | SNP genotype data from 25,213,321 measurements using 71,424 markers, SSR genotype data from 319,570 measurements using 2825 markers |
| Image | 0 | 45,211 | Including 44,998 NCGC digital characterizations |
| Publication | 10,731 | 16,066 | Including journal articles, conference proceedings, patents, book chapters, and theses/dissertations |
| Library | 181 | 181 | Including 135 cDNA, 41 genomic DNA, 2 SNP chip, and 2 unassigned libraries |
List of 30 diploid genome sequences available in CottonGen (by 31 August 2021).
| Genome Sequence Name | Germplasm Type | Pub Year (Ref.) |
|---|---|---|
| wild | 2012 [ | |
| wild | 2012 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2021 [ | |
| cultivar | 2014 [ | |
| cultivar | 2018 [ | |
| cultivar | 2020 [ | |
| cultivar | 2021 [ | |
| cultivar | 2020 [ | |
| wild | 2021 [ | |
| wild | 2019 [ | |
| wild | 2021 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2021 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2019 [ | |
| wild | 2021 [ | |
| wild | 2020 [ | |
| wild | 2019 [ | |
| wild | 2021 [ | |
| wild | 2019 [ |
List of 16 tetraploid genome sequences available in CottonGen (by 31 August 2021).
| Genome Sequence Name | Germplasm Type | Pub Year (ref.) |
|---|---|---|
| cultivar | 2015 [ | |
| cultivar | 2015 [ | |
| cultivar | 2017 [ | |
| cultivar | 2018 [ | |
| cultivar | 2019 [ | |
| cultivar | 2019 [ | |
| cultivar | 2019 [ | |
| cultivar | 2020 [ | |
| cultivar | 2020 [ | |
| cultivar | 2015 [ | |
| cultivar | 2018 [ | |
| cultivar | 2019 [ | |
| cultivar | 2020 [ | |
| wild | 2020 [ | |
| wild | 2020 [ | |
| wild | 2020 [ |
Figure 1Synteny Viewer in CottonGen. (A) Home page of Synteny Viewer allows researchers to choose a chromosome of agenome and multiple genomes for comparison. Researchers can also choose a synteny block ID. (B) A circular diagram and a table shows the synteny blocks between a chromosome of a reference genome and all chromosomes of another genome being compared. (C) A bar diagram and a table that shows all the genes in a syntenic block. The table displays E-value between the matching genes and the gene names have hyperlinks to the gene detail page. (D) A gene detail page with a resource side bar and the hyperlink to JBrowse. (E) JBrowse around the mRNA of interest with tracks such as gene, mRNA, SNP and SSR markers.
Figure 2SNP Genotype search page in CottonGen. (A. left) Researchers can search SNP genotype data by dataset name, species, germplasm name, SNP name, genome location, and/or gene name. Researchers can also upload a file with a list of germplasm names. (B. right) Search result table that shows SNP name, genome location, allele, and the genotype data of all the germplasm chosen in the order of SNP location in the genome. The red square highlights the options to download the genotype for all the markers displayed in the result page or the genotype data that are polymorphic in the germplasm set chosen.
Figure 3CottonGen BIMS. (A) ‘Template List’ subsection in ‘Data Import’ section provides downloadable templates for researchers to enter various breeding data. (B) ‘Search’ section allows researchers to search and save the list of accessions using any combination of properties and trait cut of values: accessions name, trial, location, cross, data year, and trait values. The middle section shows the statistical information on the filtered dataset for the trait chosen and the right section shows the number of accessions filtered so far. (C) A page with the search result table. Researchers can add more columns in the table using ‘Column options’ and save/download the result table. (D) ‘Data Analysis’ section that allows researchers to choose multiple datasets, using the categories or saved accession lists and compare the trait statistics between the datasets.