| Literature DB >> 30759220 |
Ryan Barrett1, Cynthia L Neben1, Anjali D Zimmer1, Gilad Mishne1, Wendy McKennon1, Alicia Y Zhou1, Jeremy Ginsberg1.
Abstract
Next generation sequencing multi-gene panels have greatly improved the diagnostic yield and cost effectiveness of genetic testing and are rapidly being integrated into the clinic for hereditary cancer risk. With this technology comes a dramatic increase in the volume, type and complexity of data. This invaluable data though is too often buried or inaccessible to researchers, especially to those without strong analytical or programming skills. To effectively share comprehensive, integrated genotypic-phenotypic data, we built Color Data, a publicly available, cloud-based database that supports broad access and data literacy. The database is composed of 50 000 individuals who were sequenced for 30 genes associated with hereditary cancer risk and provides useful information on allele frequency and variant classification, as well as associated phenotypic information such as demographics and personal and family history. Our user-friendly interface allows researchers to easily execute their own queries with filtering, and the results of queries can be shared and/or downloaded. The rapid and broad dissemination of these research results will help increase the value of, and reduce the waste in, scientific resources and data. Furthermore, the database is able to quickly scale and support integration of additional genes and human hereditary conditions. We hope that this database will help researchers and scientists explore genotype-phenotype correlations in hereditary cancer, identify novel variants for functional analysis and enable data-driven drug discovery and development.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30759220 PMCID: PMC6372842 DOI: 10.1093/database/baz013
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1High-level workflow of the database. The workflow is divided into four subwork processes including ‘Data Collection’, ‘Bioinformatics’, ‘Architecture’ and ‘User’, grouped by four different color-rounded rectangles.
Criteria for inclusion and exclusion
| Input | Inclusion | Exclusion |
|---|---|---|
| Individual | Referred by health care provider order for a Color test | Participant in another research study |
| Informed consent | >10 missing phenotype data points | |
| Sample passed internal quality control | ||
| Phenotype data | Reported health history via online Color account | Reported event age > current individual age |
| Reported age, gender, number of children, number of siblings (unless reported to be adopted) | For numeric data points: An absolute modified Z-score > 5 or above Q3 + 3*IQR or below Q1 − 3*IQR | |
| Genotype data | Sequenced for 30 genes associated with hereditary cancer risk | For variants in |
| Read depth ≥ 20If variant sent for secondary confirmation, confirmed presentVariant classification submitted to ClinVar | For variants in genes other than |
aIf an individual has >10 phenotype data points missing, that individual is excluded from the database.
b SMAD4 has a common processed pseudogene, which may result in artifacts at lower allele fractions. Q, quartile. IQR, interquartile range.
Filter categories and filter values
| Filter categories | Filter values |
|---|---|
| Gender | F, M |
| Age | 18–25, 26–30, 31–35, 36–40, 41–45, 46–50, 51–55, 56–60, 61–65, 66–70, 71–75, 76–80, 81–85, 86–89, ≥90 |
| Ethnicity | African, Ashkenazi Jewish, Asian, not specified; Caucasian, Chinese, Filipino, Hispanic, Indian, Japanese, Multiple ethnicities, Native American, Pacific Islander, Unknown |
| Personal cancer history | Breast, Colorectal, Gastric, Melanoma, No cancer, Ovarian, Pancreatic, Prostate, Uterine |
| Classification | Benign, Likely Benign, Likely Pathogenic, Pathogenic, VUS |
| Gene |
|
| Variant | (Search by Nomenclature) |
| Zygosity | Heterozygous, Homozygous |
aUnknown includes information not reported.
bThe CDKN2A locus encodes two gene products, p14ARF and p16INK4a.
cFilter values for ‘Variant’ can only be selected by text typing with autocomplete using HGVS nomenclature.
F, female. M, male. VUS, variant of uncertain significance.
Figure 2Screenshots of query results for the pathogenic frequency and cancer age of onset in women with breast cancer. (A, B) Filter by ‘Gender: F’ and ‘Cancer history: Breast’. (C, D) Filter by ‘Classification: Pathogenic or Likely Pathogenic’. (E) Filter by ‘Gene: BRCA1 or BRCA2’. (F) Remove ‘Gene: BRCA1 or BRCA2’ and filter by ‘Gene: PALB2’. Query URL: https://data.color.com/v1/#gender=F&cancer_history=Breast
Figure 3Screenshots of query results for the Ashkenazi Jewish BRCA founder alleles. (A–E) Filter by ‘Variant: c.68_69delAG, c.5266dupC, or c.5946delT’. Ashkenazi Jewish: the BRCA founder alleles are BRCA1 c.68_69delAG, BRCA1 c.5266dupC and BRCA2 c.5946delT. Query URL: https://data.color.com/v1/#variant=c.68_69delAG&variant=c.5266dupC&variant=c.5946delT
Figure 4Screenshots of query results the personal and family history of cancer in individuals with Lynch syndrome. (A, B) Filter by ‘Classification: Pathogenic or Likely Pathogenic’ and ‘Gene: MLH1, MSH2, PMS2, MSH6, or EPCAM’. (C) Remove ‘Gene: MSH2, PMS2, MSH6, or EPCAM’. (D) Remove ‘Gene: MLH1’ and filter by ‘Gene: PMS2’. (E) Filter by ‘Gene: MLH1, MSH2, PMS2, MSH6, or EPCAM’. Query URL: https://data.color.com/v1/#classification=Likely%20Pathogenic&classification=Pathogenic&gene=MSH6&gene=MLH1&gene=MSH2&gene=PMS2&gene=EPCAM