| Literature DB >> 26989146 |
Sook Jung1, Taein Lee2, Stephen Ficklin2, Jing Yu2, Chun-Huai Cheng2, Dorrie Main2.
Abstract
The Genome Database for Rosaceae (GDR) and CottonGen are comprehensive online data repositories that provide access to integrated genomic, genetic and breeding data through search, visualization and analysis tools for Rosaceae crops and Gossypium (cotton). These online databases use Chado, an open-source, generic and ontology-driven database schema for biological data, as the primary data storage platform. Chado is highly normalized and uses ontologies to indicate the 'types' of data. Therefore, Chado is flexible such that it has been used to house genomic, genetic and breeding data for GDR and CottonGen. These data include whole genome sequence and annotation, transcripts, molecular markers, genetic maps, Quantitative Trait Loci, Mendelian Trait Loci, traits, germplasm, pedigrees, large scale phenotypic and genotypic data, ontologies and publications. We provide information about how to store these types of data in Chado using GDR and CottonGen as examples sites that were converted from an older legacy infrastructure. Database URL: GDR (www.rosaceae.org), CottonGen (www.cottongen.org).Entities:
Mesh:
Substances:
Year: 2016 PMID: 26989146 PMCID: PMC4795932 DOI: 10.1093/database/baw010
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1Schematic diagram of how genomic features are stored in Chado using ontology. The bold red fields represent foreign keys to the cvterm table which houses vocabulary terms. Boxes in dark green represents the modules of Chado represented in this diagram.
Figure 2Schematic diagram of how the genetic map data of molecular markers and QTL are stored in Chado. The bold red fields represent foreign keys to the cvterm table which houses vocabulary terms. Boxes in dark green represents the modules of Chado represented in this diagram.
Storage of Genetic markers in Chado. Genetic markers are stored in feature table with a type_id of 'genetic_marker'
| Data type | Chado module | Table name | Field name | Vocabulary terms for type_id | Vocabulary | Description |
|---|---|---|---|---|---|---|
| Marker name | Sequence | feature | uniquename | The unique name of the marker | ||
| Source organism | Sequence | Feature | organism_id | The organism to which this marker belongs. A foreign key to the organism table. | ||
| Type | Sequence | Feature | type_id | genetic_marker | SO | All markers are of the SO type: 'genetic marker'. |
| Properties of a genetic marker | ||||||
| Data type | Chado module | Table name | Field name | Vocabulary terms for type_id | Vocabulary | Description |
| Alias | Sequence | Featureprop | Value | alias | In-house | A synonym or alias of the marker. |
| Marker type | Sequence | Featureprop | Value | marker_type | In-house | The actual marker type such as SSR, SNP, RFLP, etc. |
| Repeat motif | Sequence | featureprop | Value | repeat_motif | In-house | For SSR markers a repeat motif is stored. |
| Restriction enzyme | Sequence | featureprop | Value | restriction_enzyme | In-house | Restriction enzymes for the restriction site associated markers, such as RFLP, AFLP, etc. |
| Product length | Sequence | Featureprop | Value | product_length | In-house | The product size of the PCR-based markers such as SSR. |
| Maximum length | Sequence | Featureprop | Value | max_length | In-house | Maximum length of the PCR products observed in the original study that developed the marker. |
| minimum length | Sequence | Featureprop | Value | min_length | In-house | Minimum length of the PCR products observed in the original study that developed the marker. |
| is codonimant | Sequence | Featureprop | Value | is_codominant | In-house | whether the marker is codominant or not. |
| PCR condition | Sequence | Featureprop | Value | PCR_condition | In-house | Thermocycling condition of the the PCR protocol for PCR-based markers. |
| Screening method | Sequence | Featureprop | value | screening_method | In-house | gel type, % etc (eg. 2% agarose) for electrophoresis of PCR product and any other screening methods for other types of markers |
| Comments | Sequence | Featureprop | value | comments | In-house | Additional comments about the genetic marker. |
| Source description | Sequence | Featureprop | value | source | In-house | Whether the marker was developed from the sequence of EST, BAC, cDNA, genomic clone, or whole genome sequencing. |
| Alleles | Sequence | Featureprop | value | allele | SO | The marker alleles. Separated with a forward slash '/' character. |
| 5' flanking seq | Sequence | Featureprop | value | five_prime_flanking_region | SO | The 5' flanking sequence of the marker. |
| 3' flanking seq | Sequence | Featureprop | value | three_prime_flanking_region | SO | The 3' flanking sequence of the marker. |
| Other data linked to a genetic marker | ||||||
| Data Type | Chado Module | Table Name | Linking Table | Description | ||
| Source germplasm | Stock | Stock | feature_stock* | The germplasm, from which the marker was developed, has a record in the stock table and is associated with the marker. | ||
| Contact | Contact | Contact | feature_contact* | The individual that submitted the marker has a record in the contact table and is associated with the marker. | ||
| Reference | Publication | pub | feature_pub | Associates a publication stored in the pub table with the genetic markerl | ||
The Chado modules are for Chado version 1.2.
The vocabularies are: Sequence Ontology (SO), In-House (term add to the GDR/CottonGen internal vocabularies).
Tables with an asterisk (*) are custom tables.
Storage of Genetic map data in Chado. Genetic maps are stored in featuremap table
| Data type | Chado module | Table name | Field name | Vocabulary terms for type_id | Vocabulary | Description |
|---|---|---|---|---|---|---|
| Map name | Map | featuremap | name | Name of the genetic map | ||
| Map unit | Map | featuremap | unittype_id | cM, bin_unit | In-house | Units of the genetic map |
| Properties of a genetic map | ||||||
| Data type | Chado module | Table name | Field name | Vocabulary terms for type_id | Vocabulary | Description |
| Map type | Map | featuremapprop* | type_id | map_type | In-house | Map type such as genetic linkage map, in silico map or association map. |
| Analysis method | Map | featuremapprop* | type_id | analysis_method | In-house | Any analysis method that is used to build the map. |
| Software | Map | featuremapprop* | type_id | software | In-house | Any software that is used to build the map such as MapMaker. |
| Comments | Map | featuremapprop* | type_id | comments | In-house | Any comments about the map. |
| Genome group | Map | featuremapprop* | type_id | genome_group | In-house | Cotton specific data: the genome groups (one of the eight groups of diploid cotton) that the map corresponds to. |
| Population type | Map | featuremapprop* | type_id | population_type | In-house | Type of the mapping population such as F1, F2, BC1. |
| Other data linked to a genetic marker | ||||||
| Data type | Chado module | Table name | Linking table | Description | ||
| Population | Stock | stock | featuremap_stock* | Associates the mapping population stored in the stock table. | ||
| Contact | General | contact | featuremap_contact* | Associates the contact information stored in the contact table. | ||
| Organism | Organism | organism | featuremap_organism* | Associates the species information of the genetic map stored in the organism table. | ||
| Reference | Pub | pub | featuremap_pub* | Associates the publication stored in the pub table. | ||
The Chado modules are for Chado version 1.2.
The vocabularies are: Sequence Ontology (SO), In-House (term add to the GDR/CottonGen internal vocabularies).
Tables with an asterisk (*) are custom tables.
Storing positions of genetic markers and trait loci in genetic maps in Chado. The map position data are stored in the featurepos and featureposprop tables
| Data type | Chado module | Table name | Field name | Vocabulary terms for type_id | Vocabulary | Description |
|---|---|---|---|---|---|---|
| locus name | Map | featurepos | feature_id | A foreign key to the feature table. Refers to the features with type 'marker_locus', 'QTL', 'heritable_morphological_marker' or 'bin'. | ||
| map name | Map | featurepos | featuremap_id | A foreign key to the featuremap table. Refers to the genetic map. | ||
| linkage group | Map | featurepos | map_feature_id | A foreign key to the feature table. Refers to the features with type 'linkage_group'. | ||
| start | Map | featureposprop* | Value | start | In-house | The start position of the marker, QTL, or bin in the linkage group |
| stop | Map | featureposprop* | Value | stop | In-house | The stop position of the marker, QTL, or bin in the linkage group |
| QTL peak | Map | featureposprop* | Value | qtl_peak | In-house | The peak position of QTL. |
| probability | Map | featureposprop* | Value | probability | In-house | probability of the QTL span |
| comments | Map | featureposprop* | Value | comments | In-house | Any comments on the map position data. |
The Chado modules are for Chado version 1.2.
The vocabularies are: Sequence Ontology (SO), In-House (term add to the GDR/CottonGen internal vocabularies).
Tables with an asterisk (*) are custom tables.
Storage of QTL in Chado. QTL are stored in feature table with a type_id of 'QTL'
| Data type | Chado module | Table name | field name | Vocabulary terms for type_id | Vocabulary | Description |
|---|---|---|---|---|---|---|
| QTL label | Sequence | feature | Uniquename | curator-defined label for QTL | ||
| organism | Sequence | feature | organism_id | The organism to which the QTL belongs. A foreign key to the organism table. | ||
| type | Sequence | feature | type_id | QTL | SO | All QTL are of the SO type: 'QTL'. |
| Properties of QTL | ||||||
| Data Type | Chado Module | Table Name | Field Name | Vocabulary terms for type_id | Vocabulary | Description |
| Published symbol | Sequence | featureprop | Value | published_symbol | In-house | Published QTL symbol. |
| Bayes factor | Sequence | featureprop | Value | bayes_factor | In-house | Bayes factor as an evidence of the reported QTL |
| | Sequence | featureprop | Value | P_value | In-house | p values as an evidence of the reported QTL |
| R square | Sequence | featureprop | Value | R_square | In-house | The percentage of the total genetic variance explained by the locus |
| LOD | Sequence | featureprop | Value | LOD | In-house | LOD value as an evidence of the reported QTL |
| Additive effect | Sequence | featureprop | Value | additive_effect | In-house | Additive effect of the QTL allele |
| dominance effect | Sequence | featureprop | Value | dominance_effect | In-house | Dominance effect of the QTL allele |
| Direction | Sequence | featureprop | Value | direction | In-house | direction of the QTL effect |
| Screening method | Sequence | featureprop | Value | screening_method | In-house | Any screening method for the phenotyping |
| Comments | Sequence | featureprop | Value | comments | In-house | Any comments |
| Other data linked to QTL | ||||||
| Data type | Chado module | Table name | Linking table | Description | ||
| Trait name | Sequence | cvterm | feature_cvterm | Trait Ontology term that is associated with the QTL | ||
| Alias | Sequence | synonym | feature_synonym | Any alias for the QTL | ||
| Source | Stock | stock | feature_stock | Parent germplasm with the desirable allele | ||
| Reference | Pub | pub | feature_pub | Associates a publication stored in the pub table with the QTL | ||
| Dataset | Project | project | feature_project* | Dataset that includes all the QTL reported in the publication | ||
| Contact | Contact | contact | feature_contact* | The individual that submitted the QTL has a record in the contact table and is associated with the QTL | ||
| Colocalized marker | Sequence | feature | feature_relationship | Colocalized marker is a separate record in the feature table (with type_id as SO:genetic_marker). The relationship type_id 'located_in', the colocalized marker is the subject_id and the QTL is the object_id | ||
| Neighboring marker | Sequence | feature | feature_relationship | Neighboring marker is a separate record in the feature table (with type_id as SO:genetic_marker). The relationship type_id 'adjacent_to', the neighboring marker is the subject_id and the QTL is the object_id | ||
The Chado modules are for Chado version 1.2.
The vocabularies are: Sequence Ontology (SO), In-House (term add to the GDR/CottonGen internal vocabularies).
Tables with an asterisk (*) are custom tables.
Storage of MTL in Chado. MTL are stored in feature table with a type_id of 'heritable_morphological_marker'
| Data type | Chado module | Table name | Field name | Vocabulary terms for type_id | Vocabulary | Description |
|---|---|---|---|---|---|---|
| MTL name | Sequence | feature | uniquename | curator-defined label for MTL | ||
| organism | Sequence | feature | organism_id | The organism to which the MTL belongs. A foreign key to the organism table | ||
| type | Sequence | feature | type_id | heritable_morphological_marker | SO | All MTL are of the SO type: 'heritable_morphological_marker' |
| Properties of MTL | ||||||
| Data type | Chado module | Table name | Field name | Vocabulary terms for type_id | Vocabulary | Description |
| Published symbol | Sequence | featureprop | value | published_symbol | In-house | Published MTL symbol |
| Screening method | Sequence | featureprop | value | screening_method | In-house | Any screening method for the phenotyping |
| Description | Sequence | featureprop | value | description | In-house | Any description on the MTL |
| Comments | Sequence | featureprop | value | comments | In-house | Any comments |
| Other data linked to MTL | ||||||
| Data type | Chado module | Table name | Linking table | Description | ||
| Trait name | Sequence | cvterm | feature_cvterm | Trait Ontology term that is associated with the MTL | ||
| Alias | Sequence | synonym | feature_synonym | Any alias for the MTL | ||
| Source | Stock | stock | feature_stock | Parent germplasm with the desirable allele | ||
| Reference | Pub | pub | feature_pub | Associates a publication stored in the pub table with the QTL | ||
| Dataset | Project | project | feature_project* | Dataset that includes all the QTL reported in the publication | ||
| Contact | Contact | contact | feature_contact* | The individual that submitted the QTL has a record in the contact table and is associated with the QTL | ||
| Colocalized marker | Sequence | feature | feature_relationship | Colocalized marker is a separate record in the feature table (with type_id as SO:genetic_marker). The relationship type_id 'located_in', the colocalized marker is the subject_id and the QTL is the object_id | ||
| Neighboring marker | Sequence | feature | feature_relationship | Neighboring marker is a separate record in the feature table (with type_id as SO:genetic_marker). The relationship type_id 'adjacent_to', the neighboring marker is the subject_id and the QTL is the object_id | ||
The Chado modules are for Chado version 1.2.
The vocabularies are: Sequence Ontology (SO), In-House (term add to the GDR/CottonGen internal vocabularies).
Tables with an asterisk (*) are custom tables.
Figure 3Schematic diagram of how stocks are stored in Chado. Hierarchical stocks, from samples, cultivars to population are stored in samples and their relationship including pedigree are stored in stock_relationship table. The bold red fields represent foreign keys to the cvterm table which houses vocabulary terms. Boxes in dark green represents the modules of Chado represented in this diagram.
Figure 4Schematic diagram of how phenotypic data are stored in Chado. Datasets, such as passport data and cross data, which do not have associated phenotypic or genotypic data can also be stored in the nd_experiment table and linked to the stock table. The bold red fields represent foreign keys to the cvterm table which houses vocabulary terms. Boxes in dark green represents the modules of Chado represented in this diagram.
Figure 5Schematic diagram of how coded phenotypic values are stored in Chado. The same data can be stored in two different code system to enable comparison among datasets. The bold red fields represent foreign keys to the cvterm table which houses vocabulary terms. Boxes in dark green represents the modules of Chado represented in this diagram.
Figure 6Schematic diagram of how genotypic data are stored in Chado. The bold red fields represent foreign keys to the cvterm table which houses vocabulary terms. Boxes in dark green represents the modules of Chado represented in this diagram.
Figure 7Schematic diagram of how relationship between genotype and phenotype is stored in Chado. The bold red fields represent foreign keys to the cvterm table which houses vocabulary terms. Boxes in dark green represents the modules of Chado represented in this diagram.