| Literature DB >> 29593016 |
Mireia Obón-Santacana1,2, Mireia Vilardell1, Anna Carreras1, Xavier Duran1, Juan Velasco1, Iván Galván-Femenía1, Teresa Alonso1, Lluís Puig3, Lauro Sumoy4, Eric J Duell5, Manuel Perucho4, Victor Moreno2,6,7, Rafael de Cid1.
Abstract
PURPOSE: The prevalence of chronic non-communicable diseases (NCDs) is increasing worldwide. NCDs are the leading cause of both morbidity and mortality, and it is estimated that by 2030, they will be responsible for 80% of deaths across the world. The Genomes for Life (GCAT) project is a long-term prospective cohort study that was designed to integrate and assess the role of epidemiological, genomic and epigenomic factors in the development of major chronic diseases in Catalonia, a north-east region of Spain. PARTICIPANTS: At the end of 2017, the GCAT Study will have recruited 20 000 participants aged 40-65 years. Participants who agreed to take part in the study completed a self-administered computer-driven questionnaire, and underwent blood pressure, cardiac frequency and anthropometry measurements. For each participant, blood plasma, blood serum and white blood cells are collected at baseline. The GCAT Study has access to the electronic health records of the Catalan Public Healthcare System. Participants will be followed biannually at least 20 years after recruitment. FINDINGS TO DATE: Among all GCAT participants, 59.2% are women and 83.3% of the cohort identified themselves as Caucasian/white. More than half of the participants have higher education levels, 72.2% are current workers and 42.1% are classified as overweight (body mass index ≥25 and <30 kg/m2). We have genotyped 5459 participants, of which 5000 have metabolome data. Further, the whole genome of 808 participants will be sequenced by the end of 2017. FUTURE PLANS: The first follow-up study started in December 2017 and will end by March 2018. Residences of all subjects will be geocoded during the following year. Several genomic analyses are ongoing, and metabolomic and genomic integrations will be performed to identify underlying genetic variants, as well as environmental factors that influence metabolites. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.Entities:
Keywords: GWAS; WGS; catalan population; complex inheritance; electronic health records; follow-up; genomics; lifestyle; medical history; non-communicable diseases; prospective cohort; spanish cohort
Mesh:
Year: 2018 PMID: 29593016 PMCID: PMC5875652 DOI: 10.1136/bmjopen-2017-018324
Source DB: PubMed Journal: BMJ Open ISSN: 2044-6055 Impact factor: 2.692
Figure 1Genomes for Life (GCAT) recruitment centres. Distribution of the 11 GCAT permanent recruitment centres across the Catalan territory.
List of self-reported diseases at baseline for both men and women
| ICD-9-CM* code | Conditions | Prevalent cases n (%) |
| 272.0 | Hypercholesterolaemia or Hypertriglyceridaemia | 3456 (18.73) |
| 995.3 | Allergies | 3132 (16.97) |
| 401.9 | Hypertension | 2771 (12.31) |
| 346.90 | Migraine disorders | 1614 (8.75) |
| 472.0 | Rhinitis | 1415 (7.67) |
| 311 | Depression disorder | 1193 (6.46) |
| 493.90 | Asthma | 985 (5.34) |
| 692.9 | Eczema | 872 (4.73) |
| 41.86 | 855 (4.63) | |
| 569.0 | Colon and/or rectal polyps | 727 (3.94) |
| 696.8 | Psoriasis | 688 (3.73) |
| 250.00 | Diabetes mellitus | 613 (3.32) |
| 733.00 | Osteoporosis | 578 (3.14) |
| 714.9 | Arthritis | 552 (2.99) |
| 199.1 | Cancer | 405 (2.19) |
| Z14 | Inborn genetic diseases | 172 (0.93) |
| 496 | Chronic obstructive pulmonary disease | 89 (0.48) |
| 558 | Chronic colitis | 53 (0.29) |
| 434 | Stroke | 51 (0.28) |
| 573.3 | Chronic hepatitis | 38 (0.21) |
| 410.90 | Myocardial infarction, heart attack | 36 (0.19) |
| 413 | Coronary heart disease/angina pectoris | 35 (0.19) |
| 710 | Lupus erythematosus | 30 (0.16) |
| 560.89 | Crohn’s disease | 19 (0.10) |
| 295.90 | Schizophrenia | 17 (0.09) |
| 331 | Alzheimer’s disease/dementia | 2 (0.01) |
| 332 | Parkinson’s disease | 1 (0.01) |
*The International Statistical Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM).
Suitability and processing of collected samples
| Study purpose | Fraction sample | Vacutainer tube | Volume mL | Transport ToC | Time to PMPPC | Aliquots n (ToC) | Control assay* |
| Genomic/epigenomic | Buffy coat | EDTA | 10 | 4 | max 24 hours | 2 (−80) | SNP array, qPCR, PCR, STR |
| Highly concentrated buffy coat | Blood bag | 480 | 18 | max 48 hours | 2 (−80) | SNP array, qPCR, PCR, STR | |
| Proteomic/epigenomic | Plasma | PST | 4.5 | 4 | max 24 hours | 4 (−80) | – |
| Serum | SST | 5 | 4 | max 24 hours | 4 (−80) | Circulating microRNAs integrity analysis | |
| Functional/cell line | DMSO blood | ACD | 6 | 18 | – | 2 (N2) | EBV cell transformation and immortalisation |
*Suitability downstream analysis performed in collected samples.
ACD, anticoagulant citrate dextrose; DMSO, dimethyl sulfoxide; EBV, Epstein-Barr virus; PMPPC, Program of Predictive and Personalized Medicine of Cancer; PST, plasma separation tube; qPCR, quantitative PCR; SNP, single nucleotide polymorphism; SST, serum separation tubes; STR, short tandem repeat.
Summary of total omic data as of 2017
| Study purpose | Number of participants | Fraction sample | Platform | Analysed | |
| Metabolomic profile | 5000 | Plasma | NMR MS | – | 150 metabolites |
| Genotype | 5459 | Buffy coat | Infinium Multi-Ethnic Global (MEGAEX2) array | HiScan confocal scanner (Illumina) | 2×106 SNPs, InDels |
| Whole genome sequencing | 808 | Buffy coat | Illumina TruSeq PCR free/Illumina paired-end SBS | HiSeq 4000 sequencer (Illumina) | 30× coverage |
| Subexome | 200 | Buffy coat | Agilent Sureselect/Illumina paired-end SBS | MiSeq (Illumina) | Custom multigene panel 126 genes 400× coverage |
| Epigenome | 150* | Whole blood | Methylation EPIC 850K array | HiScan confocal scanner (Illumina) | Differentially methylated analysis at single site and regional levels (genes, CpG island, promoters, enhancers) |
EPIC, European Prospective Investigation into Cancer and Nutrition; InDels, insertions-deletions; MS, mass spectrometry; NMR, nuclear magnetic resonance; SBS, sequencing by synthesis; SNP, single nucleotide polymorphism.
*Current acquisition.
Summary of all available Genomes for Life (GCAT) data
| Data type | Number of participants | Details | Date of acquisition | Date available for research |
| Baseline assessment | Whole cohort | Questionnaire, physical measures, samples | 2014–2017 | 2018 |
| Repeat of baseline assessment | Whole cohort | Questionnaire follow-up every 2 years | 2018 | 2019 |
| Genotyping (baseline samples) | 5459 (GCATcore) | Dense genotyping array with 666 695 markers after quality control (see | 2016 | 2018 |
| Genotyping extended (baseline samples) | 5459 (GCATcore) | Dense genotyping map with 15 078 461 variants (see | 2017–2018 | 2018 |
| Food frequency web questionnaire (follow-up) | Whole cohort | Participants are invited by email to provide additional information about diet; estimates of nutrient intake | 2017–2018 | 2018 |
| Biochemical assay (baseline samples) | 6000 | Glycated haemoglobin (haemoglobin A1c) | 2016–2017 | 2018 |
| Metabolome (baseline samples) | 5000 (GCATcore) | Biomarkers with known disease association (lipids and vascular disease) | 2017–2018 | 2018 |
| Chronotype web questionnaire (follow-up) | Whole cohort | Participants are invited by email to provide additional information (ie, sleep behaviour, circadian rhythm, and work shift) | 2017–2018 | 2018 |
| Exposome (baseline) | Whole cohort | Map of environmental exposures acquired with geographical information system (GIS) technology | 2017–2018 | 2018 |
| Other web-based questionnaire data (follow-up) | Whole cohort | Participants are invited by email to provide additional information via web about working places. Information will be integrated with exposome assessment | 2017–2018 | 2018 |
| Exome | 200 (GCATcore) | Clinic custom exome of hereditary cancer in 126 hereditary cancer genes (400×) | 2017 | 2018 |
| Whole-genome sequencing | 808 | 30× whole genome sequencing from 1000 volunteers, 20% from GCATcore | 2017–2018 | 2018 |
| Epigenome | 150 | DNA methylation epigenomic profile using Infinum Methylation EPIC 850K beadarray assay | 2018 | 2019 |
| Health record linkage | ||||
| Primary care | Whole cohort | ICD/ATC/ OPCS procedures/laboratory | 2017–2018 | 2018 |
| Death registrations | Whole cohort | ICD-coded cause specific mortality | 2017–2018 | 2018 |
| Hospital inpatient | Whole cohort | ICD/ATC/OPCS procedures/laboratory | 2017–2018 | 2018 |
| Hospital outpatient | Whole cohort | ICD (few)/OPCS | 2018 | 2018 |
| Other | Whole cohort | National mental healthcare/national social healthcare | 2018 | 2018 |
ATC, Anatomical Therapeutic Chemical Classification System; EPIC, European Prospective Investigation into Cancer and Nutrition; ICD, International Statistical Classification of Diseases; OPCS, Classification of Interventions and Procedures.
The Genomes for Life (GCAT) Study: summary of baseline characteristics
| Characteristics | Values |
| Age | 51.03 (7.05) |
| Heart rate | 74.47 (11.12) |
| Diastolic blood pressure | 78.56 (9.71) |
| Systolic blood pressure | 123.54 (15.28) |
| Age at menarche (among women) | 12.38 (1.55) |
| Age at menopause (among women) | 48.56 (4.74) |
| Age at voice change (among men) | 14.7 (2.1) |
| Age at beard change (among men) | 16.0 (2.6) |
| Gender | |
| Male | 7471 (40.5) |
| Female | 10 918 (59.2) |
| Missing | 62 (0.3) |
| Marital status | |
| Married | 10 703 (58.0) |
| Divorced/separated | 2159 (11.7) |
| Domestic partner | 1142 (6.2) |
| Single | 1887 (10.2) |
| Widow/widower | 521 (2.8) |
| Missing | 2039 (11.1) |
| Education level | |
| Without studies | 73 (0.4) |
| Elementary education | 2104 (11.4) |
| Secondary education | 4519 (24.5) |
| Professional higher education | 2037 (11.0) |
| Secondary postdegree professional programme | 2594 (14.1) |
| College | 6772 (36.7) |
| Missing | 352 (1.9) |
| Ethnicity | |
| White, Caucasian | 15 363 (83.3) |
| Hispanic, Latin | 2803 (15.2) |
| Black | 14 (0.1) |
| Maghrebin | 14 (0.1) |
| Gipsy | 10 (0.1) |
| Asian | 1 (0.0) |
| Other | 18 (0.1) |
| Missing | 230 (1.2) |
| Working status | |
| Employed | 13 327 (72.2) |
| Not working/employed | 1796 (9.7) |
| Retired | 1255 (6.8) |
| Home maker | 1110 (6.0) |
| Student | 52 (0.3) |
| Laboral impairment | 376 (2.0) |
| Volunteer or unpaid work | 126 (0.7) |
| Other | 206 (1.1) |
| Missing | 203 (1.1) |
| Smoking status | |
| current, <=15 cig/day | 2469 (13.4) |
| current, 26+cig/day | 148 (0.8) |
| current, unknown | 318 (1.7) |
| current, 16–25 cig/day | 752 (4.1) |
| former, quit<=10 years | 2196 (11.9) |
| former, unknown | 153 (0.8) |
| former, quit 11–20 years | 2392 (13.0) |
| former, quit 20+ years | 1973 (10.7) |
| missing | 853 (4.6) |
| never | 7197 (39.0) |
| Alcohol consumption | |
| never or less than once a month | 4402 (23.9) |
| once per month | 1048 (5.7) |
| from 2 to 3 times per month | 2202 (11.9) |
| once per week | 3061 (16.6) |
| from 2 to 3 times per week | 3454 (18.7) |
| from 4 to 6 times per week | 1059 (5.7) |
| once per day | 1963 (10.6) |
| two or more times per day | 1036 (5.6) |
| missing | 226 (1.2) |
| Mediterranean Diet Adherence (PrediMed Score) | |
| Low | 2159 (11.7) |
| Medium | 12 904 (70) |
| High | 2893 (15.7) |
| Missing | 495 (2.7) |
| Health status | |
| Very good | 3124 (16.9) |
| Good | 13 080 (70.9) |
| Regular | 1960 (10.6) |
| Bad | 126 (0.7) |
| Very bad | 20 (0.1) |
| Missing | 141 (0.8) |
| Adopted | |
| Yes | 60 (0.3) |
| No | 18 243 (98.9) |
| Missing | 148 (0.8) |
| Body mass index | |
| Underweight | 47 (0.2) |
| Normal weight | 6083 (33) |
| Overweight | 7761 (42.1) |
| Obese | 4562 (24.7) |
| Missing | 89 (0.5) |
| Women related health | |
| Oral contraceptive use | |
| Never | 2351 (21.5) |
| Ever | 8404 (77) |
| Missing | 163 (1.5) |
| Hormone replacement therapy (HRT) use | |
| Never | 9280 (85) |
| Ever | 1317 (12.1) |
| Missing | 321 (2.9) |
| Men related health | |
| Prostate diseases | 660 (8.8) |
Two types of variables, continuous (presented in mean (SD)) and categorical (which are presented in n(%)) are shown in bold.
Figure 2Number of variants included in the GCATcore by chromosome and type of genomic data (first GCATcore release August 2017). The legend shows between parentheses the total number of variants for raw data (genotyping before in silico imputation), imputed data, SNPs, Indels and SNPs in coding and non-coding regions. GCAT, Genomes for Life; InDels, insertions-deletions; single nucleotide polymorphisms.