Literature DB >> 35603907

The missing heritability in type 1 diabetes.

Haipeng Pang¹, Jian Lin¹, Shuoming Luo¹, Gan Huang¹, Xia Li¹, Zhiguo Xie¹, Zhiguang Zhou¹.

Abstract

Type 1 diabetes (T1D) is a complex autoimmune disease characterized by an absolute deficiency of insulin. It affects more than 20 million people worldwide and imposes an enormous financial burden on patients. The underlying pathogenic mechanisms of T1D are still obscure, but it is widely accepted that both genetics and the environment play an important role in its onset and development. Previous studies have identified more than 60 susceptible loci associated with T1D, explaining approximately 80%-85% of the heritability. However, most identified variants confer only small increases in risk, which restricts their potential clinical application. In addition, there is still a so-called 'missing heritability' phenomenon. While the gap between known heritability and true heritability in T1D is small compared with that in other complex traits and disorders, further elucidation of T1D genetics has the potential to bring novel insights into its aetiology and provide new therapeutic targets. Many hypotheses have been proposed to explain the missing heritability, including variants remaining to be found (variants with small effect sizes, rare variants and structural variants) and interactions (gene-gene and gene-environment interactions; e.g. epigenetic effects). In the following review, we introduce the possible sources of missing heritability and discuss the existing related knowledge in the context of T1D.

Entities: Chemical

Keywords: gene-environment interactions; gene-gene interactions; missing heritability; rare variants; structural variants; type 1 diabetes

Mesh：

Year: 2022 PMID： 35603907 PMCID： PMC9545639 DOI： 10.1111/dom.14777

Source DB: PubMed Journal: Diabetes Obes Metab ISSN： 1462-8902 Impact factor: 6.408

INTRODUCTION

Currently, type 1 diabetes (T1D) is defined as an autoimmune‐mediated multifactorial disorder with a strong genetic component. Previous studies have identified more than 60 candidate loci for T1D. , Different candidate genes are involved in different stages of T1D. For instance, some alleles in HLA (human leukocyte antigen) and the PTPN22 (protein tyrosine phosphatase, non‐receptor type 22) rs247701 locus are associated with autoimmunity, while variants in CTLA‐4 (cytotoxic T lymphocyte‐associated protein 4), IFIH1 (interferon induced helicase C domain 1), SH2B3 (SH2B adaptor protein 3) and PTPN22 are related to the occurrence of multiple autoantibodies. There are also considerable racial differences in T1D genetics. A recent study indicated that approximately one‐fifth of the susceptible loci reported in Caucasians were non‐polymorphic or had a comparatively low frequency in the Chinese population, which might explain the lower T1D incidence in China. Before the advent of the genome‐wide association study (GWAS) era, only a few genetic loci were known to be associated with T1D. The HLA region was the first established risk locus for T1D and was identified by linkage studies. For many years, linkage studies have been used for genetic mapping of Mendelian and biological traits with familial segregation, and this method has been proven to have high power to detect risk factors with large effect sizes or genetic diseases with a known mode of inheritance. However, from an evolutionary standpoint, risk variants with large effect sizes are prone to be rare in the population because of negative selection. Therefore, the power of linkage studies is comparatively restricted for complex diseases. By contrast, association studies are more powerful for detecting common alleles with comparatively small effect sizes. At first, association studies focused on candidate genes, and several genes, including INS (insulin), CTLA‐4, PTPN22 and IL2RA (interleukin 2 receptor α), were identified to be associated with an increased risk of T1D. Clearly, the candidate gene approach investigates only the selected loci and ignores the rest of the regions. The development of the GWAS has dramatically improved the pace and efficiency of identifying T1D loci. The GWAS approach represents tremendous improvements compared with candidate gene study, in which the variant assay is confined to few functionally related loci and the sample sizes are always smaller. A large number of additional T1D loci have been discovered by GWAS because this technology is able to test the variants in a hypothesis‐free context. For instance, GWAS not only confirmed the previously discovered T1D loci but also uncovered some novel variants, such as those near the KIAA0350 gene and at UBASH3A (ubiquitin‐associated and SH3 containing A). These studies have provided valuable insights into the full elucidation of the genetic architecture of T1D. The application of the GWAS approach is based on the ‘common disease, common variant’ theory, assuming that common diseases at least partially result from common variants. However, it has been indicated that most common variants contribute a comparatively small increase in the risk of disease and explain only a small portion of heritability for human biological traits or complex diseases. For instance, hundreds of independent variants have been identified to be associated with human height, an easy‐to‐measure biological trait with high heritability. However, these loci explain only approximately less than 50% of the phenotypic variance. In addition, more than 700 loci with small effect sizes for type 2 diabetes (T2D) have been identified, explaining 20% of the total heritability. In the context of T1D, the identified loci can explain approximately 80% of the heritability. However, the high known heritability of T1D may be attributed to the two major candidate genes of T1D, HLA class II genes and the INS gene, which contribute approximately 50% and 10% risk to T1D genetic susceptibility, respectively.

THE HERITABILITY OF T1D

Heritability refers to the scale of the phenotypic variance in a population that is attributable to genetic effects and represents the extent to which a trait or disease is genetically determined. The total phenotypic variance (V P) can be divided into the genetic component (V G) and the environmental component (V E) in the traditional view, and the broad sense of heritability is then defined as the ratio V G/V P. The estimation of heritability is performed by analysing the empirical data of observed and expected phenotypic resemblance between relatives. One classic design is to estimate the phenotypic resemblance between monozygotic (MZ) and dizygotic (DZ) twins. Of note, confounding may cause bias in the estimated heritability. For instance, the estimate of heritability will be biased upward if the resemblance partly results from common environmental effects. Previous studies have indicated that genetic factors play an important role in T1D susceptibility. The mean prevalence of T1D in siblings is 6%, compared with 0.4% in the general population. In addition, the concordance rates for T1D are more than 50% in MZ twins and 6%‐10% in DZ twins after long‐term follow‐up, emphasizing the importance of genetic predisposition in T1D progression. T1D heritability is estimated as more than 50%. Notably, heritability is the genetic effect in a given environment (e.g. it would vary among different populations). For instance, the additive genetic contribution of T1D was estimated to be 72%‐88% for populations of European origin according to twin studies. , Another family study indicated that the heritability estimate of T1D was 66.5% in East Asian populations. The discrepancy might be attributed to the different effects of environmental factors among various populations. Besides, heritability estimate is largely based on childhood‐onset T1D and, given that concordance rates decline with age at onset, so the heritability will decrease. The heritability estimates from different traits or diseases depend strongly on their genetic architecture. For instance, the estimated heritability is more than 50% for height and 30%‐70% for T2D.

THE MISSING HERITABILITY OF T1D

The majority of the heritability for T1D has been revealed. In fact, single nucleotide polymorphism (SNP)‐based heritability can explain 80%‐85% of the estimates of pedigree heritability. However, approximately 20% of heritability remains to be further identified, and this discrepancy is always referred to as the missing heritability phenomenon. Given that individual differences in disease susceptibility are largely attributed to genetic factors, fully understanding the genetic component of T1D will contribute to improved prevention, diagnosis and treatment of this disease. Many explanations for the potential sources of missing heritability have been proposed, including large amounts of unmapped common variants with smaller effect sizes, rare and low‐frequency variants that are poorly detected by existing genotyping arrays, structural variants poorly captured by available arrays and limited power to detect gene–gene interactions and gene–environment interactions (e.g. epigenetic effects) (Figure 1).

FIGURE 1

The potential sources of missing heritability of type 1 diabetes. G‐G interaction, gene–gene interaction; G‐E interaction, gene–environment interaction

Genetic variants with small effect sizes

The first theory is that the GWAS approach cannot capture variants with small effect sizes. A very stringent threshold value is used to reduce the occurrence of false positives when carrying out the significance tests. However, many real associations may be missed, especially if variants have small effect sizes but still contribute to phenotype variability and disease susceptibility. Therefore, heritability could be improved by incorporating genetic variants with small effect sizes. For instance, it has been indicated that 45% of the variance in human height can be explained by including all SNPs simultaneously, compared with 5% of phenotypic variance when considering only the SNPs that reach genome‐wide significance. Several potential solutions have been proposed to solve this issue. For instance, a method was developed to assess the genomic heritability of quantitative traits when fitting all SNPs simultaneously by using a linear mixed model, and it has been indicated that a substantial proportion of variation in liability is tagged by common SNPs for Crohn's disease (CD), bipolar disorder and T1D. Furthermore, user‐friendly software was developed to evaluate missing heritability by including all SNPs. In addition, a new method, called phenotype correlation–genotype correlation (PCGC) regression, has been developed to estimate the contribution of common variants, and researchers found that PCGC regression improved the heritability explained by common variants substantially for some common diseases, such as T1D. Additionally, this hypothesis is supported by the fact that more new genetic variants are detected with increasing sample sizes. For T1D, with increasing sample sizes, especially meta‐analyses of T1D GWASs, an increasing number of risk loci have been identified. In fact, given that large numbers of low odds ratio variants associated with T1D have been identified, a polygenic risk score (PRS) that aggregates the effects of SNPs based on their estimated effect sizes has been developed to measure and quantify the heritable risk of diseases. , In contrast to GWAS with a very strict threshold value, the PRS can be constructed by including larger numbers of SNPs with more lenient signals. In practice, the PRS can be used for T1D prediction. , For example, a study to predict the progression of islet autoimmunity and T1D in high‐risk children indicated that the PRS could serve as an independent predictor of disease development. In addition, the PRS can aid in the discrimination between T1D and T2D. It is becoming increasingly difficult to distinguish T1D and T2D with the rising incidence of obesity. The Exeter group developed a T1D‐PRS, and this system plus autoantibodies showed highly discriminative ability for T1D and T2D. It should be noted that cases were further selected by age at onset of diabetes in a given population so the effect may not apply to other populations when diagnosed at different ages. Interestingly, the PRS may also contribute to the detection of missing heritability. A recent GWAS on T1D patients with low genetic risk scores identified 41 unreported loci, including two loci with common variants and 39 loci with rare variants. The new strategy highlights the importance of further grouping patients in the exploration for heritability because T1D itself is a heterogeneous and complex disease. In addition, some researchers have suggested that genetic elements, such as the genome‐encoded T‐cell receptor (TCR), might be ignored for technical reasons. The TCR is the cognate partner of major histocompatibility complex (MHC) molecules, and the TCR genotype has been implicated in autoimmune diseases such as multiple sclerosis. T1D is a T‐cell–mediated autoimmune disease. Nevertheless, the associations between the TCR haplotype and T1D are understudied. It has been observed that genome‐encoded TCRs play an important role in T1D susceptibility in an MHC‐dependent fashion in non‐obese diabetic (NOD) mice and in multiple strains of rats that model T1D. For example, rats expressing a high‐risk class II MHC haplotype and TCR‐Vβ13a simultaneously are highly susceptible to T1D. However, in the absence of TCR‐Vβ13a, rats with a high‐risk MHC manifest low penetrance of T1D. In addition, it has been indicated that the depletion of Vβ13+ T cells could prevent the development of T1D. , Therefore, germline variants within TCR regions may be viable candidates for T1D susceptibility and may explain the missing heritability. However, the exact role of TCR in human T1D needs further investigation.

Rare genetic variants

It has been suggested that rare genetic variants contribute to the missing heritability of common complex diseases. At present, there is considerable debate over the nature of genetic contributions to susceptibility to common diseases. In contrast to the traditional ‘common disease, common variant’ model, the ‘common disease, rare variant’ hypothesis argues that abundant rare genetic variants with comparatively high penetrance play a major role in the increased risk of common diseases. The population genetics theory suggested that strongly deleterious variants were rapidly removed from the general population by negative selection, while mildly deleterious variants could remain present but at low frequencies. , Population genetics studies have shown that most genetic variants with large functional effect sizes are prone to be rare and private, except for a small proportion of variants with large effect sizes that were common among different populations. , Indeed, recent deep‐sequencing studies have shown that rare and low‐frequency genetic variants account for a surprisingly high proportion of the variants in different populations. , , In fact, some researchers believe that both common variants (minor allele frequency [MAF] > 5%) with low penetrance (small effect size) and rare variants (MAF < 1%) with high penetrance (large effect size) contribute to common complex diseases in the whole population. Rare genetic variants do not occur frequently enough to be captured by the GWAS approach, and their effect sizes are not large enough to be detected by linkage analysis in family studies. Therefore, the identification of rare genetic variants is challenging for traditional sequencing technologies. However, the rapid development of next‐generation DNA sequencing tools has markedly enhanced the ability to detect rare variants. In addition, population biobanks have increased the power to detect disease associations because of the accessibility of massive population cohorts. For example, a recent study performed whole‐exome sequencing of the combined data from the UK Biobank and FinnGen to assess associations of multiple phenotypes with protein‐coding variants and identified abundant novel disease associations, most notably in rare and low‐frequency spectra. It has been indicated that rare variants could explain a substantial proportion of the missing heritability for human physiological traits and disease susceptibility. For instance, researchers performed whole‐genome sequencing (WGS) in pulmonary arterial hypertension, and the proportion of cases explained by genetics increased to 23.5% from the previously established 19.9% by including identified rare variants. In addition, recent research has implied that rare variants, especially those in regions of low linkage disequilibrium, are an important source of the missing heritability of height and body mass index. However, some contradictory results were obtained. Studies on T2D and associated quantitative traits reflecting glycaemic control did not detect rare variants, which is in agreement with previous findings where whole‐genome and whole‐exome sequencing did not identify any rare variants related to T2D in a large case‐control study. Some studies have investigated the role of rare and low‐frequency genetic variants in the context of T1D. Nejentsev et al. identified four rare variants by resequencing the exons and splice sites of 10 T1D candidate genes. These four rare variants were located on IFIH1 and were predicted to lower the risk of T1D by altering the structure and expression. The identification of four rare variants within IFIH1 pinpoints causal genes in genetic regions previously discovered by GWAS. Ge et al. identified rare deleterious variants in PTPN22 by deeply sequencing protein‐coding genes located in 49 initially reported T1D risk loci among multiple‐affected sibships of European ancestry. A major challenge in identifying rare variants is the limited resolution of traditional DNA sequencing technologies. WGS plus imputation can enhance the ability to detect rare variants. For instance, Forgetta et al. identified 27 independent variants, among which three were novel with a MAF less than 5%, by undertaking deep imputation of genotyped data followed by GWAS testing. This finding indicates that the identification of rare variants also leads to the discovery of T1D risk genes. In addition, a recently developed deep learning method for HLA imputation improves the accuracy of the identification of low‐frequency and rare variants within MHC regions, which harbour extremely complex sequence variations and haplotypes. In conclusion, rare variants explain at least a proportion of the missing heritability of T1D. In addition, given that rare variants tend to be population specific and that existing studies focus on European people, future studies should pay more attention to other ethnic populations.

Structural variants

Structural variants (SVs), especially copy number variants (CNVs), have been proposed as a potential source of missing heritability in complex diseases because previous association studies ignored them because of the insufficient coverage of SNP genotyping arrays. In fact, SVs, and CNVs in particular, encompass more nucleotides in the genome than SNPs and represent an important form of variation. The mutation rate to generate new CNVs is 100 to 1000 times the rate of DNA base‐pair changes, and these variations have a substantial effect on phenotypic variance. Therefore, it is plausible that SVs are important contributors to human diversity and disease susceptibility. SVs refer to long‐length sequence or position changes in the genome, such as insertions, deletions, inversions, microsatellites and CNVs. The alterations of SVs predominantly reside in non‐coding regions and do not directly lead to changes in protein composition. However, it has been indicated that SVs can modulate gene expression by affecting regulatory elements. In the context of T1D, a variable number of tandem repeats (VNTRs) 596 bp upstream of the translational start site of the INS gene was found to be associated with T1D. VNTRs can influence the negative selection of insulin‐specific autoimmune T lymphocytes in the thymus, thus affecting immune tolerance by regulating insulin mRNA transcription. Some studies have been performed to evaluate the contributions of SVs to complex traits and disease susceptibility. , , CNVs, which are larger than 1 kb in genomic regions and manifest as a variable number of copies in the population, have gained attention as detection methods have improved. , In 2010, the Welcome Trust Case Control Consortium performed a large GWAS to assess the association between CNVs and eight common diseases, including T1D, among 16 000 cases and 3000 shared controls by using a purpose‐designed array. The results indicated that the majority of common CNVs were strongly correlated with SNPs genotyped by the HapMap project, and the authors concluded it was improbable that common CNVs accounted for much of the heritability of complex diseases. However, the contributions of CNVs might be underevaluated because of ignorance of allele dosage when analysing SNP‐chip data. , Another study explored the association between CNVs that were in low linkage disequilibrium with SNPs and T1D by using a custom comparative genomic hybridization array specifically designed to array untagged CNV loci, and did not identify novel T1D associations. Therefore, it is improbable that untagged CNVs contribute substantially to T1D heritability. Although common CNVs might fail to explain the missing heritability of T1D, a study suggested that rare CNVs could increase the burden of susceptibility to T1D. Future association studies of rare CNVs in large datasets could enable the identification of specific regions, thus providing insights into T1D pathogenesis. There are still some challenges for SV studies. For instance, given the variable nature and repeat structure, many SVs remain poorly characterized by existing sequencing platforms. , In addition, previous studies mostly focus on genomic elements that are large, and small variable regions remain under investigation. ,

Gene–gene interactions

Another theory to explain the missing heritability is the presence of gene–gene interactions, also called epistasis. The term ‘epistasis’ was first used to describe a masking effect of one variant by another variant at a separate locus. This concept has been developed into any statistical departure from the simple additive combination of two loci on a specific outcome scale. In a genetic association study, if the effect of one variant is altered or masked by another variant at a different locus, the power to elucidate the initial variant is probably reduced, and the detection of the combined effects of two variants will be impeded by their interaction. Furthermore, the situation will become more complicated if more than two loci are involved. Notably, epistasis refers to statistical interactions instead of biological and mechanical interactions where direct physical or chemical reactions take place between different factors. Although the value of the identification of epistasis cannot lead to the elucidation of the underlying pathogenic mechanisms of complex diseases, it will improve power for the detection of genetic effects behind the phenotypes. For instance, in the analysis of real data for T1D, improved evidence for linkage at a single locus was present when considering the interaction with another locus. , It has been increasingly recognized that genetic interactions might account for a substantial proportion of the missing heritability. For instance, approximately 140 candidate loci of CD can explain approximately 14% of the heritability of the disease. Inspiringly, it can explain almost 80% of the missing heritability when taking into account genetic interactions. The estimation of heritability is based on the premise that there are no interactions among the disease‐causing variants. Therefore, the missing heritability may not only result from the yet‐to‐be identified variants but also from the ignored genetic interactions. However, other research has also suggested that the additive effects of genetic factors could explain a large proportion of continuous traits, while epistatic effects play only a comparatively small role. This phenomenon might be caused by most genetic factors contributing to the quantitative traits collectively, and each factor plays only a small role, making the effect additive. In complex diseases, there are always a small number of major loci that can interact with each other through epistasis, thus explaining the missing heritability. Some studies have investigated gene–gene interactions in T1D. However, the results of genetic interaction of T1D‐associated loci sometimes conflict. For example, Bergholdt et al. reported a statistical interaction between two genes, CBLB (casitas‐B‐lineage lymphoma b) and CTLA‐4, both of which are involved in T‐cell activation in T1D, and found that the rs3772534 G allele of CBLB was overtransmitted to offspring with the G/G genotype of rs3087243 in CTLA‐4. However, in a later study with a larger collection, there was no support for the interaction between rs3772534 and rs3087243. Similarly, contradictory results have been obtained concerning the interaction between IL4R, IL4 and IL13. , Given the inadequate sample sizes, the positive reports are probably false because the false‐discovery rate would be high in underpowered studies. Other research has indicate the interaction of different HLA class II haplotypes in T1D and found that these interactions explain moderate but significant fractions of phenotypic variance. , , In addition, evidence of a statistical interaction between HLA class II and PTPN22 as well as CTLA‐4 has been shown in some sufficiently well‐powered studies. , , In conclusion, existing results indicated that gene–gene interactions could explain a fraction of missing heritability in T1D. Future studies need large sample sizes to enhance the power to detect more genetic interactions in T1D.

Gene–environment interactions

Gene–environment interactions have also been suggested as a possible explanation for the missing heritability of complex diseases (Figure 1). , Although genetic factors represent the major determinant of T1D risk, genetics alone cannot explain the dramatic changes in the T1D epidemic. The incidence of T1D has increased considerably over the past 30 years. This rising trend can only be explained by changes in environmental factors because genetics remain almost stagnant over such a short time. Furthermore, the increasing incidence of T1D accompanied by a lower percentage of high‐risk genotypes of HLA emphasizes an amplification of environmental pressure. In addition, it has been indicated that the age of onset is associated with distinct clinical profiles of T1D. An immigrant study also indicated that the second generation of immigrants to Sweden, a country with a high prevalence of T1D, shows an increased risk of developing T1D. These studies have shown that environmental factors play an important role in T1D. In fact, it is often hypothesized that genetic factors determine the predisposition for developing T1D, while environmental factors provide the trigger for the onset of disease (Figure 2). Therefore, a better understanding of the environmental determinants of T1D not only contributes to revealing the underlying pathogenic mechanisms, but also provides novel targets to prevent or delay the disease.

FIGURE 2

The pathogenesis of type 1 diabetes (T1D). Both genetic and environmental factors contribute to the onset and development of T1D. Epigenetics serves as a bridge between these two factors. G‐G interaction, gene–gene interaction The involvement of both genetic and environmental factors in T1D is well established. However, most research has focused on identifying these factors in isolation. It has been indicated that inclusion of gene–environment interactions can improve the statistical power to identify gene‐disease associations. , In addition, for observational studies aiming to elucidate adverse environmental factors, which are not applicable to randomized controlled trials, demonstration of the expected gene–environment interactions can provide evidence to further validate a causal inference. Furthermore, identifying gene–environment interactions will lead to an improved understanding of biological interactions at the molecular level. There are two categories of evidence for gene–environment interactions in various complex diseases. The direct evidence is a statistical evaluation of gene–environment interactions. For instance, both the NOD2 gene and cigarette smoking are well‐characterized risk factors for the pathogenesis of CD. A case‐only study investigated their relationship and found a significant negative interaction. However, this finding needs to be confirmed in epidemiological studies, and the potential mechanisms warrant further investigation. In addition to direct evidence, there is more indirect evidence for gene–environment interactions. An apparent example is the epigenetics in disease risk. Epigenetics, which mainly includes DNA methylation, histone modification and non‐coding RNA, is defined as heritable changes in gene expression and thus cell function, but without alteration of DNA sequences. Epigenetics, which is malleable, can be impacted by environmental exposures and is considered a bridge between heritable and environmental factors (Figure 2). For instance, it has been shown that smoking could alter DNA methylation at various loci. In addition, a recent study indicated that the human microbiome could influence important traits by interacting with human genotypes. , However, the epigenetic contribution would be systematically missed by conventional GWAS because epigenetic modifications do not alter genomic sequences. Therefore, a new model of epigenetic inheritance, as a supplement to Mendelian heredity, may explain the missing heritability caused by the lack of detection in DNA sequence‐based analysis. In addition, epigenome‐wide association studies (EWASs) provide an efficient approach to systematically assess epigenetic variation related to traits or complex diseases. Furthermore, it has been indicated that the missing heritability might be associated with stochastic effects that were involved in unstable genomes and environmental triggers rather than the mutations in particular sets of genes. Genetically identical organisms in the same controlled environment exhibit distinct phenotypes and this phenomenon may be attributed to stochastic variations. Gene–environment interactions are associated with the onset and development of T1D. A case‐only study observed differences in birth month distributions among individuals carrying various HLA‐DQ genotypes. Different birth seasons are associated with different rates of viral infections and different levels of vitamin D. Therefore, this study showed the influence of environmental factors on T1D risk attributed to HLA alleles. Furthermore, an EWAS identified 132 differentially methylated loci for T1D in monocytes among 15 pairs of MZ twins. Later, the same group performed an EWAS across 406 365 CpGs in 52 MZ twins discordant for T1D and identified a substantial enrichment of differentially variable CpG positions in patients with T1D compared with their healthy co‐twins and unrelated healthy individuals. In addition, it has been indicated that T1D risk variants could alter susceptibility to viral infections, thus affecting autoimmune responses. For instance, an in vitro study indicated that the T1D risk allele HLA‐DR4 was involved in the hyper‐responsiveness of T cells to Coxsackie B4 virus (CBV4) antigens, and multiple lines of evidence have suggested that CBV4 is associated with the onset of T1D. Therefore, it is plausible that the interactions of infection with genes contribute to T1D risk and account for some missing heritability. In conclusion, existing studies have indicated that gene–environment interactions contribute to the pathogenesis of T1D and might partially explain the lack of heritability. However, large‐scale gene–environment interaction research encounters significant practical and methodological challenges. For example, the unified measurement of environmental exposures is difficult to achieve in different studies.

DISCUSSION

Identifying the genes that confer susceptibility to common diseases is a major challenge for genetic epidemiology. However, over the past several years, technological progress, especially the development of the GWAS, has allowed further characterization of the genetic components of common diseases. Based on the obtained GWAS data, the heritability explained by SNPs is lower than the estimated heritability using traditional epidemiological measures. This is the so‐called missing heritability phenomenon. Several hypotheses have been put forward to explicate the ‘dark matter’ of genomics, which mainly includes the variants remaining to be found and gene–gene or gene–environment interactions. Previous GWASs have identified more than 60 loci associated with T1D, which explain 80%‐85% of the heritability. However, complex diseases, including T1D, result from multiple genetic and environmental factors that interact through extremely complex networks. The objective of heritability measures is to quantify the phenotypic variability explained by genetics and the environment. It is difficult to make this distinction. These interactions should also be considered when exploring disease pathogenesis. These theories have achieved improved heritability to a certain degree in some cases. However, given that this research area is still in its infancy, further efforts are warranted to overcome numerous theoretical and practical obstacles. Although the development of high‐throughput sequencing technologies has enabled the identification of numerous genetic variants or loci related to complex diseases, GWAS alone has provided limited insights into the exact molecular mechanisms of disease development, mainly because the overwhelming majority of these polygenic determinants are located in non‐coding portions of the genome, and the functional sequences remain to be further confirmed. Thus, in the post‐GWAS era, functional annotation and mechanistic ascertainment of these loci are the next major task. In addition, some new analytical strategies in the post‐GWAS era may contribute to the elucidation of the missing heritability phenomenon. For example, previous studies mostly focused on the genome but ignored other types of data derived from the transcriptome or epigenome, which caused missing links between genetic variation and phenotype. Alternative splicing (AS), which allows a single gene to generate multiple RNA and protein isoforms, can influence gene expression via a post‐transcriptional regulatory mechanism. Transcriptome analysis indicated that AS changes might contribute to the development of T1D. Integrative multiomics analysis may represent a novel approach to further understand disease pathogenesis. For instance, a recent study combined two approaches, large‐scale GWAS and single‐cell epigenomics, to translate T1D risk variants into mechanistic insights, and the results suggested that risk variants within multiple T1D signals overlapped with exocrine‐specific cis‐regulatory elements in the pancreas, supporting that the exocrine pancreas might play a role in the pathogenesis of T1D. There is tremendous diversity in genetic architecture among different diseases or biological traits. For instance, infectious diseases are always associated with variants with large effects, , while some complex phenotypes, such as cell counts of red blood cells, height and levels of low‐density lipoproteins, often result from the joint action of multiple loci with small effects. In the context of diabetes, different genetic architectures were also presented because T1D was largely determined by the HLA region, while T2D was dependent upon the combined effect of many susceptible variants with small effect sizes. Epistasis also plays an unequal role in different circumstances. For instance, pervasive epistatic effects have been reported in autoimmune conditions, but the small addictive effects of genetic factors play a more important role in continuous traits. Therefore, different strategies should be considered when exploring missing heritability. The relevant research concerning the missing heritability of T1D is summarized above. However, other factors might also contribute to the missing heritability of T1D. For instance, parent‐of‐origin effects, which refer to the phenotypic effect of an allele depending on which parent the allele is inherited from, have been documented in multiple diseases, including T1D. Their contribution to heritability might be overlooked because they were difficult to discover. In addition, most GWASs have been performed based on additive allelic models. However, the potential candidate genes could be missed because of recessive effects. A recent GWAS meta‐analysis using a recessive model identified 51 loci associated with T2D, including five novel variants unreported by previous additive analysis. Therefore, recessive modelling may provide another way to detect new genetic associations. The known heritability of T1D is higher than that of other common complex diseases. Nevertheless, fully understanding the genetics of T1D can further elucidate its underlying pathogenesis and better predict or prevent the disease. It has been suggested that once individuals with T1D become symptomatic, the beta cell mass has already reached 20%‐30% of the normal amount, representing the very late phase of the disease. Therefore, early recognition of individuals at high risk for T1D would offer an opportunity to prevent or even reverse T1D progression. Among the risk factors for T1D, genetics has been considered to be of importance for the time‐independent characteristic. Therefore, genetic screening of children could distinguish individuals at high risk of T1D to some extent and be beneficial for the development of primary prevention. In addition, there is an asymptomatic phase characterized by the presence of islet autoantibodies before the clinical manifestation of diabetes. Screening for autoantibodies also represents an effective way to predict autoimmune progression. Thus, the use of genetic screening in combination with autoantibody screening for children would improve the effectiveness of identifying populations at high risk of T1D. Furthermore, because T1D exhibits great heterogeneity among different patients, finding the causes of the missing heritability in T1D is beneficial for the development of individualized medicine.

AUTHORS CONTRIBUTIONS

H.P. searched references, wrote the first draft of the paper and revised the text. J.L., S.L., G.H. and X.L. critically revised the text and provided substantial scientific contributions. Z.X. and Z.Z. proposed the project and revised the manuscript. All the authors approved the final version of the manuscript.

CONFLICT OF INTEREST

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

PEER REVIEW

The peer review history for this article is available at https://publons.com/publon/10.1111/dom.14777.

127 in total

Review 1. The genetics of complex autoimmune diseases: non-MHC susceptibility genes.

Authors: A Wandstrat; E Wakeland
Journal: Nat Immunol Date: 2001-09 Impact factor: 25.606

2. Effect of including environmental data in investigations of gene-disease associations in the presence of qualitative interactions.

Authors: Elizabeth Williamson; Anne-Louise Ponsonby; John Carlin; Terry Dwyer
Journal: Genet Epidemiol Date: 2010-09 Impact factor: 2.135

Review 3. Molecular mechanisms of epistasis within and between genes.

Authors: Ben Lehner
Journal: Trends Genet Date: 2011-06-22 Impact factor: 11.639

4. Estimating missing heritability for disease from genome-wide association studies.

Authors: Sang Hong Lee; Naomi R Wray; Michael E Goddard; Peter M Visscher
Journal: Am J Hum Genet Date: 2011-03-03 Impact factor: 11.025

5. MHC-environment interactions leading to type 1 diabetes: feasibility of an analysis of HLA DR-DQ alleles in relation to manifestation periods and dates of birth.

Authors: K Badenhoop; H Kahles; C Seidl; O Kordonouri; E R Lopez; M Walter; S Rosinger; A Ziegler; B O Böhm
Journal: Diabetes Obes Metab Date: 2009-02 Impact factor: 6.577

6. Genome-wide association study of CNVs in 16,000 cases of eight common diseases and 3,000 shared controls.

Authors: Nick Craddock; Matthew E Hurles; Niall Cardin; Richard D Pearson; Vincent Plagnol; Samuel Robson; Damjan Vukcevic; Chris Barnes; Donald F Conrad; Eleni Giannoulatou; Chris Holmes; Jonathan L Marchini; Kathy Stirrups; Martin D Tobin; Louise V Wain; Chris Yau; Jan Aerts; Tariq Ahmad; T Daniel Andrews; Hazel Arbury; Anthony Attwood; Adam Auton; Stephen G Ball; Anthony J Balmforth; Jeffrey C Barrett; Inês Barroso; Anne Barton; Amanda J Bennett; Sanjeev Bhaskar; Katarzyna Blaszczyk; John Bowes; Oliver J Brand; Peter S Braund; Francesca Bredin; Gerome Breen; Morris J Brown; Ian N Bruce; Jaswinder Bull; Oliver S Burren; John Burton; Jake Byrnes; Sian Caesar; Chris M Clee; Alison J Coffey; John M C Connell; Jason D Cooper; Anna F Dominiczak; Kate Downes; Hazel E Drummond; Darshna Dudakia; Andrew Dunham; Bernadette Ebbs; Diana Eccles; Sarah Edkins; Cathryn Edwards; Anna Elliot; Paul Emery; David M Evans; Gareth Evans; Steve Eyre; Anne Farmer; I Nicol Ferrier; Lars Feuk; Tomas Fitzgerald; Edward Flynn; Alistair Forbes; Liz Forty; Jayne A Franklyn; Rachel M Freathy; Polly Gibbs; Paul Gilbert; Omer Gokumen; Katherine Gordon-Smith; Emma Gray; Elaine Green; Chris J Groves; Detelina Grozeva; Rhian Gwilliam; Anita Hall; Naomi Hammond; Matt Hardy; Pile Harrison; Neelam Hassanali; Husam Hebaishi; Sarah Hines; Anne Hinks; Graham A Hitman; Lynne Hocking; Eleanor Howard; Philip Howard; Joanna M M Howson; Debbie Hughes; Sarah Hunt; John D Isaacs; Mahim Jain; Derek P Jewell; Toby Johnson; Jennifer D Jolley; Ian R Jones; Lisa A Jones; George Kirov; Cordelia F Langford; Hana Lango-Allen; G Mark Lathrop; James Lee; Kate L Lee; Charlie Lees; Kevin Lewis; Cecilia M Lindgren; Meeta Maisuria-Armer; Julian Maller; John Mansfield; Paul Martin; Dunecan C O Massey; Wendy L McArdle; Peter McGuffin; Kirsten E McLay; Alex Mentzer; Michael L Mimmack; Ann E Morgan; Andrew P Morris; Craig Mowat; Simon Myers; William Newman; Elaine R Nimmo; Michael C O'Donovan; Abiodun Onipinla; Ifejinelo Onyiah; Nigel R Ovington; Michael J Owen; Kimmo Palin; Kirstie Parnell; David Pernet; John R B Perry; Anne Phillips; Dalila Pinto; Natalie J Prescott; Inga Prokopenko; Michael A Quail; Suzanne Rafelt; Nigel W Rayner; Richard Redon; David M Reid; Susan M Ring; Neil Robertson; Ellie Russell; David St Clair; Jennifer G Sambrook; Jeremy D Sanderson; Helen Schuilenburg; Carol E Scott; Richard Scott; Sheila Seal; Sue Shaw-Hawkins; Beverley M Shields; Matthew J Simmonds; Debbie J Smyth; Elilan Somaskantharajah; Katarina Spanova; Sophia Steer; Jonathan Stephens; Helen E Stevens; Millicent A Stone; Zhan Su; Deborah P M Symmons; John R Thompson; Wendy Thomson; Mary E Travers; Clare Turnbull; Armand Valsesia; Mark Walker; Neil M Walker; Chris Wallace; Margaret Warren-Perry; Nicholas A Watkins; John Webster; Michael N Weedon; Anthony G Wilson; Matthew Woodburn; B Paul Wordsworth; Allan H Young; Eleftheria Zeggini; Nigel P Carter; Timothy M Frayling; Charles Lee; Gil McVean; Patricia B Munroe; Aarno Palotie; Stephen J Sawcer; Stephen W Scherer; David P Strachan; Chris Tyler-Smith; Matthew A Brown; Paul R Burton; Mark J Caulfield; Alastair Compston; Martin Farrall; Stephen C L Gough; Alistair S Hall; Andrew T Hattersley; Adrian V S Hill; Christopher G Mathew; Marcus Pembrey; Jack Satsangi; Michael R Stratton; Jane Worthington; Panos Deloukas; Audrey Duncanson; Dominic P Kwiatkowski; Mark I McCarthy; Willem Ouwehand; Miles Parkes; Nazneen Rahman; John A Todd; Nilesh J Samani; Peter Donnelly
Journal: Nature Date: 2010-04-01 Impact factor: 49.962

Review 7. Common vs. rare allele hypotheses for complex diseases.

Authors: Nicholas J Schork; Sarah S Murray; Kelly A Frazer; Eric J Topol
Journal: Curr Opin Genet Dev Date: 2009-05-28 Impact factor: 5.578

8. Genotype effects and epistasis in type 1 diabetes and HLA-DQ trans dimer associations with disease.

Authors: B P C Koeleman; B A Lie; D E Undlien; F Dudbridge; E Thorsby; R R P de Vries; F Cucca; B O Roep; M J Giphart; J A Todd
Journal: Genes Immun Date: 2004-08 Impact factor: 2.676

9. Additive and interaction effects at three amino acid positions in HLA-DQ and HLA-DR molecules drive type 1 diabetes risk.

Authors: Xinli Hu; Aaron J Deutsch; Tobias L Lenz; Suna Onengut-Gumuscu; Buhm Han; Wei-Min Chen; Joanna M M Howson; John A Todd; Paul I W de Bakker; Stephen S Rich; Soumya Raychaudhuri
Journal: Nat Genet Date: 2015-07-13 Impact factor: 38.330