Literature DB >> 29986096

A coding and non-coding transcriptomic perspective on the genomics of human metabolic disease.

James A Timmons^1,2, Philip J Atherton³, Ola Larsson⁴, Sanjana Sood¹, Ilya O Blokhin⁵, Robert J Brogan², Claude-Henry Volmar⁵, Andrea R Josse⁶, Cris Slentz⁷, Claes Wahlestedt⁴, Stuart M Phillips⁶, Bethan E Phillips³, Iain J Gallagher^2,8, William E Kraus⁷.

Abstract

Genome-wide association studies (GWAS), relying on hundreds of thousands of individuals, have revealed >200 genomic loci linked to metabolic disease (MD). Loss of insulin sensitivity (IS) is a key component of MD and we hypothesized that discovery of a robust IS transcriptome would help reveal the underlying genomic structure of MD. Using 1,012 human skeletal muscle samples, detailed physiology and a tissue-optimized approach for the quantification of coding (>18,000) and non-coding (>15,000) RNA (ncRNA), we identified 332 fasting IS-related genes (CORE-IS). Over 200 had a proven role in the biochemistry of insulin and/or metabolism or were located at GWAS MD loci. Over 50% of the CORE-IS genes responded to clinical treatment; 16 quantitatively tracking changes in IS across four independent studies (P = 0.0000053: negatively: AGL, G0S2, KPNA2, PGM2, RND3 and TSPAN9 and positively: ALDH6A1, DHTKD1, ECHDC3, MCCC1, OARD1, PCYT2, PRRX1, SGCG, SLC43A1 and SMIM8). A network of ncRNA positively related to IS and interacted with RNA coding for viral response proteins (P < 1 × 10-48), while reduced amino acid catabolic gene expression occurred without a change in expression of oxidative-phosphorylation genes. We illustrate that combining in-depth physiological phenotyping with robust RNA profiling methods, identifies molecular networks which are highly consistent with the genetics and biochemistry of human metabolic disease.

Entities: Chemical

Mesh：

Substances：

Year: 2018 PMID： 29986096 PMCID： PMC6125682 DOI： 10.1093/nar/gky570

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Nucleic acid profiling technologies combined with systems biology and in-depth clinical phenotyping provide unparalleled opportunities to study mechanisms of human disease (1–6). Type 2 diabetes (T2DM), whose increased prevalence is proposed to be driven by obesity (7), is a complex metabolic disease involving gene-environment interactions reflective of excess nutrient intake, insufficient physical activity and, potentially, ageing. A feature of T2DM is a variable loss of peripheral insulin sensitivity (IS) (8,9); in humans predominantly reflecting reduced efficacy of insulin signaling (‘insulin resistance’, IR) in skeletal muscle and liver (10,11). GWAS analyses have revealed >200 genomic loci linked to phenotypes underpinning the pathophysiology of T2DM (12). In turn, these loci provide a link to thousands of potential ‘disease’ genes as, by design, GWAS projects do not identify the individual gene responsible for the genetic association (13–15). One recent T2DM GWAS analysis identified 128 markers across 113 genomic loci using 26,676 T2DM cases and 132,532 controls, with an additional ∼70,000 samples used for post-hoc validation (16). Follow-up informatics analysis suggested these loci harbor genes that predominantly impact on insulin secretion and adipocyte or hepatocyte biology. However, estimation of the causal risk genes is currently imprecise (13,14), constraining the contextualization of GWAS results, such that additional lines of evidence to help identify the most likely ‘disease gene’ at otherwise validated loci, is warranted (15,17). Recent efforts to apply expression quantitative trait loci (eQTL) analysis to the genomics of metabolic disease has yielded limited progress (18), and this probably reflects the reported assumptions limiting the sensitivity of eQTL analysis (19), as well as suboptimal tissue transcriptomics. Direct quantification of the human coding and non-coding transcriptome in tissues physiologically responsible for regulating IS provides an alternative, and more quantitative route to associate individual genes with the genomics underpinning metabolic disease (20). Furthermore, characterizing the transcriptional response to treatments which improve IS, would support a causal role for individual genes, particularly if consistent with genetic association analysis (6). Historically, clinical RNA profiling studies have been small, and have relied on DNA microarrays biased towards the 3’ end of protein-coding transcripts (21–23), resulting in a lack of a consistent global molecular narrative for IS in skeletal muscle. This lack of clarity reflects technical caveats, such as concurrent drug-treatment (for T2DM patients), inadequate consideration of clinical phenotype (23,24), as well as incomplete capture of the transcriptome (25) – and limited statistical power (26). More modern transcriptomic techniques, such as short-read RNA sequencing (RNA-seq) and full transcript ‘tiling-type’ microarrays (27) attempt to provide a complete and accurate view of the transcriptome. While potentially more-suited for studying cells in culture (28), RNA-seq analysis of muscle tissue provides a skewed measure of the protein-coding transcriptome and reproducibility in clinical studies remains surprisingly low, e.g. R2 < 0.7 (29,30) – too low for building reproducible diagnostics. The reason for this, in part, is that each tissue has a unique and skewed transcriptome, such that the resulting RNA-seq DNA library has an over- and -variable representation from – in the case of muscle – contractile and mitochondrial genes (29–32). This characteristic of the DNA library can also invalidate subsequent pathway analysis (33). In addition, RNA-seq profiles of individual tissues (25,30,34,35) do not include the majority of long non-coding RNAs (lncRNAs) and thus to date at least 50% of the tissue transcriptome remains unexplored in most clinical cohorts. In the present study, we combined the benefits of signal-quantification from newer generation full transcriptome microarrays (dedicated non-competitive sequence-specific probes), with a novel tissue-specific transcriptome assembly approach, to over-come limitations of conventional (36) array analysis, while providing far more extensive analysis of lncRNAs than RNA-seq (32,37). In combination with detailed phenotyping, and multiple clinical intervention studies, we identified causality-linked and dynamic components of the metabolic disease transcriptome, providing an independent global genomic perspective, in support of many GWAS-MD loci, in a very resource efficient manner.

MATERIALS AND METHODS

All new arrays has been deposited at GEO (GSE104235). The remainder of our existing array data and data used from other labs are also available at GEO (GSE59880, GSE47881, GSE47969, GSE48278, GSE72462, GSE73142, GSE83619 and GSE13070). The transcriptome was defined by the latest generation gene-chip pipeline and then utilized to select the valid transcriptome from earlier generation technologies (Figure 1).

Figure 1.

Analysis process. Human muscle biopsy RNA profiles came from clinical projects named STRRIDE I, II (S1–2) and III (S-PD), Meta-Predict (MP), TIFIN, DRET, Gallagher et al. (GALL), Weigert et al. (WG), Sears et al. (SEARS), Heritage (HERTG) and IDEAL (2,22,37,38,46,50–54,,56). The present project involved the production and analysis of 470 new samples profiled on the HTA 2.0 Affymetrix platform along with 144 new samples profiled on the Illumina 12 platform; and 466 of our previously published (22,37,53,54) U133+2 muscle RNA profiles and literature cohorts (where appropriate clinical data was available). The HTA 2.0 platform was processed with a 169 796 probe-set CDF (ENST), representing ∼18 000 protein-coding genes and ∼16 000 non-coding genes. The older array platform offered an opportunity to replicate protein-coding gene expression with 73 654 probe-sets (ENST) being common to the HTA 2.0 platform. All samples and clinical data underwent a rigorous quality control check to ensure that the microarray profile was technically robust (minimal variation in NUSE score), and the insulin and glucose values were within the operating range of the HOMA2-IR model (10). Less than 10% of any project samples were lost to lack of one of these two components. The HOMA2-IR model requires either ELISA (more specific insulin determination) or RIA (less specific) in p.mol.l and glucose in mmol.l and incorporates non-linear features of the glucose-insulin relationship. The HOMA2-IR is a bona fide physiological model for peripheral insulin resistance correlating with other more precise but less physiological models (10). We utilized the log10 transformed insulin sensitivity values (S%) as the input values for the linear covariate adjusted modeling of peripheral insulin sensitivity. A meta-analysis of the individual covariate adjusted P-values was aggregated using the Stouffer method and an FDR calculated (60). This process identified ∼1000 genes related to S%, of which >300 were consistently (directionality) regulated; yielding a ‘core fasting insulin sensitivity gene list’ (CORE-IS). Having identified this reproducible set of S% genes, the causal nature of these genes during dynamic regulation of S% was investigated using clinical intervention studies (22,37,53,54) and systems biology methodologies (data-driven co-expression analysis (77,136); protein–protein interaction databases (137), transcription factor binding-site analysis (63) and ‘guilt-by-association’ methodologies (138)).

Customizing RNA expression quantification to individual tissue types using a custom CDF

We hypothesized that processing of raw data from the Affymetrix HTA 2.0 array (37) would benefit greatly from optimization, to more efficiently detect gene expression, as currently no account is made of study specific probe performance. The HTA 2.0 gene-chip contains 6.9 million short ‘probes’ which are computationally combined into groups (probe-sets) representing individual RNA transcripts (or part of a transcript) using a ‘map’ named the chip definition file (CDF). The CDF combines probe signals irrespective of whether each probe represents an active ‘signal’ in the tissue or cell type being profiled or not (27,36,38,39), i.e. there is no probe-level filtering. This in turn suggests that the present signal ‘summarization’ process is inefficient as it will include poor performing probes and unnecessarily incorporate back-ground noise. More accurate determination of transcript expression, coupled with studies incorporating larger sample sizes should allow for the establishment of the first robust IS RNA signature in humans. The first step in our approach was to check the sequence specificity of the 6.9 million probes on the HTA 2.0 array (39). The physical design of each probe is fixed and incorporates the features of the reference genome used at the time of design and thus each probe must be discarded if it is no longer specific. We mapped probe sequences to the latest reference genome (GRCh38_82p3) using bowtie (40) and probes which did not map to a single genomic location were discarded. Secondly, using the aroma.affymetrix package (41) we assessed the signal characteristics for each probe and those probes with a low and variable signal (e.g. <10 units and co-efficient of variation >25% (for muscle)) were removed. This process was applied to each tissue type investigated. The ∼50,000 probes that have an extreme GC content (i.e. <20%, >80%) were also removed, because the GC adjustment model is not linear at these extreme values. The remaining probes were combined into ‘probe-sets’, with three probes or more, and production of this ‘experiment specific’ CDF resulted in the removal of ∼3 million probes, i.e. >40% of the original chip. Notably, this filtering still allows for >200,000 transcripts (across the tissue types examined in this project). In the present analysis, each probe-set was based on ENSEMBL ENST definitions, and R Bioconductor packages (42) were used to update, assemble and summarize the expression data. We also included an exon specific probe-set, based on a recent observation that the RYR1 gene has one additional variant, which lacks a short 15 bp exon (ENSE00002436759), that correlated with a structural feature of muscle (‘fiber' type) (43). The HTA 2.0 chip processed with this new muscle CDF yielded 169,769 ENST probe-sets (This CDF is provided at GEO). Notably, the discarded probes (e.g. signal <10 units in this example) would have been otherwise incorporated into probe-sets that had an average signal of ∼70 units, i.e. ∼7 times greater than the poor performing probes (Supplemental Figure S1A). Use of a ‘tissue-specific’ CDF map for quantification of the transcriptome should reduce the influence of back-ground noise during normalization (44). Indeed, use of this muscle CDF improved detection of differentially expressed ‘exercise’ genes ∼3–5-fold above a fixed FDR value, and demonstrated a high technical reproducibility for quantifying IS related genes (Pearson coefficient of determination (R2) of 0.96, Supplemental Figure S1B). The same tissue-specific approach was used to produce CDFs for adipose tissue and islet cell datasets (37,45) and to process previously published HTA 2.0 muscle data. Quality control was performed using normalized unscaled standard error (NUSE) and principal components analysis (PCA). For the non HTA chip muscle data, quality control was performed as above and <10% of previously published samples were removed (or lacked valid insulin data, see below). Data generated on the U133+2 platform (predominately protein-coding RNA expression) was updated with a CDF that mapped probes to 73,654 probe-sets (ENST) in common with the muscle-specific HTA 2.0 profile. Our newly generated Illumina Human HT-12 V4.0 array data were processed using quantile normalization, as previously described (46). Gene names were used to map results to those ENST probe-sets on the HTA 2.0/U133+2 platforms (with the highest signal being chosen when a gene was represented more than once) yielding 47,216 transcripts (equating to 31 326 genes). The new Illumina array data was produced from the ‘IDEAL’ cohort (46) and was used, along with three literature data-sets, for independent validation of the RNA responses to clinical treatment (Figure 1). As expected the concordance of the Illumina platform results, was less than observed between probe-set based specific tissue confirmed ENSTs.

Clinical samples, insulin analysis and RNA production

The present project relied on 1,012 human muscle biopsy samples. We produced RNA from 470 human skeletal muscle biopsy samples from our recent clinical studies for profiling on the HTA 2.0 gene-chip (Metapredict (MP), STRRIDE PD (S-PF) and TIFIN, Figure 1) (47–52). Additionally, we re-processed and updated 466 of our previously analyzed muscle gene-chip profiles (STRRIDE I and II (S1–2), Gallagher (GALL), IDEAL, HERITAGE (HERTG) and Derby Resistance Training Study (DRET), Figure 1) (22,37,53–55). A further 176 gene-chip profiles were identified from the literature (Weigert et al. (WG) and in vivo drug treatment (SEARS) studies in Figure 1) (38,45,56). Further, a subset of 568 muscle gene-chip profiles were paired, from 284 individuals before and after distinct clinical interventions aimed at improving IS (MP, S1–2, S-PD, DRET, IDEAL, HERTG, SEARS and WG, Figure 1) (38,46–48,50,52,53,56–58). Sixty percent of these samples were not included in the baseline fasting CORE-IS analysis (see below), and thus represent entirely independent samples from those used to discover the CORE-IS genes. The clinical methodologies have been extensively presented elsewhere (22,37,38,45–48,50–54,56–58). The demographics for the subjects contributing to the present analyses are presented in Table 1 (i.e. baseline data for IS modeling) and Supplemental Table S1 (clinical intervention cohorts for clinical response modeling), and Supplemental Table S3 (for the post-hoc gender comparison). Studies complied, as per the original publications, with the 2008 Declaration of Helsinki and were approved by the relevant ethics committees stated in each published clinical article, while all participants gave their written, informed consent to participate.

Table 1.

	S-PD+MP	LVL-OLD	GALLAG	DRET	S1–2
Sample size (n)	191	91	93	39	150
Platform	HTA 2.0	HTA 2.0	U133+2	U133+2	U133+2
Gender (M/F)	89/103	50/41	64/29	22/17	75/84
AGE	43 (18–75)	71 (65–95)	56 (23–77)	49 (20–75)	50 (24–68)
BMI	30.5 (24.6–45.5)	26.6 (19.6–38.5)	29.6 (16.7–50.2)	25.9 (19.4–32.5)	30.1 (25.0–37.6)
VO₂ peak	26.8 (13.2–46.9)	-	26.0 (10.6–55.8)	-	26.8 (13.2–43.9)
Fasting Glucose	4.8 (3.99–7.4)	5.5 (4.4–6.7)	5.9 (4.3–23.3)	5.6 (3.4–7.5)	5.3 (3.8–7.3)
Fasting insulin (ELISA)	57.5 (8.9–257.0)	-	63.1 (10.0–251.1)	23.4 (10.5–67.6)	-
Fasting Insulin (RIA)		110.8 (34.8–324.4)			54.6 (7.2–241.9)
HOMA2-IR	1.23 (0.31–9.33)	2.042 (0.67–5.96)	1.32 (0.31–21.88)	0.52 (0.23–1.63)	1.02 (0.13–4.47)
IS	81.7 (17.5–324.3)	48.9 (16.8–151.2)	75.9 (4.6–323.6)	193.1 (61.2–442.7)	98.15 (22.2–751.3)
R ² IS versus age	<0.01	<0.01	0.01	0.03	<0.01
R ² IS versus BMI	0.11	0.22	0.22	0.05	0.15
R ² IS versus VO₂ peak	0.02	-	0.12	-	<0.01

Demographics of the clinical cohorts used for establishing the fasting peripheral insulin sensitivity transcriptome (2,22,37,46,50,51,53,54). The HTA 2.0 gene-chip platform data represent entirely new data while the U133+2 gene-chip data reflects our prior data, updated and re-processed. Values are median (range) or adjusted R2 using lm in R. VO2 peak is a measure of peak oxygen consumption or ‘maximal’ aerobic capacity, and the units are ml min−1kg−1, fasting glucose is m mol l−1, fasting insulin units are pmol l−1, while BMI and HOMA2-IR are unit-less and IS is percentile insulin sensitivity calculated from the HOMA2-IR model (10). VO2 peak is a trait that is highly inherited but also influenced by behavior (intensity and volume of physical activity) To produce the new HTA 2.0 array profiles, RNA was isolated using TRizol® (Life Technologies) and dissolved in 20 μl RNAse-free water, processed to single-stranded sense fragmented DNA using the GeneChip® WT PLUS Reagent Kit, which relies on a reverse transcription priming strategy that primes both poly-A and non-poly-A RNA. HTA 2.0 gene-chips were processed according to the manufacturer's protocol. Fragmented (5 μg) end-labeled sense strand target cDNA was hybridized to each array and scanned using a Gene Chip Scanner 30007G (Affymetrix Core, MPI A/S, Denmark). From the RNA processed, <5% of samples were excluded as outliers by visually evaluating NUSE plots prior to down-stream analysis. The technical reproducibility for detecting regulated genes was assessed by running RNA samples from the same subject/time-point several months apart and examining the correlation coefficient, without adjustment for any potential batch effects (Supplemental Figure S1B). For the Illumina Human HT-12 V4.0 arrays (Nestle Research Center, Lausanne, Switzerland) RNA was isolated from the skeletal muscle tissue by using TRizol® (Life Technologies), purified using the Qiagen RNeasy Micro kit (Qiagen, Venlo, NL, USA) and RNA quality was checked using an Agilent 2100 bioanalyzer (Agilent Technologies, Amsterdam, NL). There are numerous assays for insulin, each with their own performance characteristics (59). For one-hundred and ninety-one subjects (MP+S-PD studies) plasma insulin values (high-sensitivity, enzyme‐linked immunosorbent assay (ELISA) analysis, K6219 Dako, Sweden AB) were generated in a single core-lab. The Gallagher et al. study (22) also utilized a high-sensitivity ELISA (Dako Sweden AB), while the DRET study used a high-sensitivity ELISA from DRG Instruments (GmbH, Germany). The TIFIN study used the less sensitive radioimmunoassay (RIA, LINCO/Millipore), as did STRRIDE I/II (Access Immunoassay System, Beckman). IDEAL used a RIA (Architect System, Abbott Laboratories). Glucose values were determined with an oxidation reaction (YSI-2300 Stat Plus, Yellow Springs Instrument, Yellow Springs, OH). No individual insulin values were available from the published studies of Sears or Weigert et al. (38,56). All samples/subjects from our laboratories underwent updated quality control to ensure that the array profile was technically robust and that the insulin and glucose values were within the operating range for the HOMA2-IR model (10). For the cell-culture RNA studies, ∼20 000 HepG2 cells (hepatocellular carcinoma cell line) cultured in OPTIMEM (Gibco) were transfected (96-well plates) with 20 nM of ASO (Integrated DNA Technologies; Supplemental data S1) using Lipofectamine 2000 (Thermo Fisher Scientific) for either 24, 48 or 72 h. RNA isolation and DNase treatment were performed by RNeasy Mini Kit (Qiagen, USA), in accordance to manufacturer's instructions. RNA quality and concentrations were determined by Nanodrop 2000 (Thermo Fisher Scientific, USA). Twenty nanograms of RNA were reversely transcribed using the High Capacity cDNA Reverse Transcription Kit (Thermo Fisher Scientific, USA), according to manufacturer's recommendations. qRT-PCR (primers, Supplemental data S1) was performed on a QuantStudio 6 Flex machine (Life Technologies) using probe-based assays (Integrated DNA Technologies, USA). Data was scaled to GUSB and PPIB and were expressed relative to cells treated with a commercial control ASO.

Primary statistical analysis of gene expression

A table of normalized, log2 transformed RNA expression values was produced for each of the studies described above. First, we assessed the linear relationship between gene-expression and IS (log10 of the HOMA2 model S% values, (10)) together with age or BMI (Table 1), using analysis of variance (ANOVA) in R (i.e. using the following linear models in R: expression ∼ IS*age and expression ∼ IS*BMI). A correlation coefficient (R) and P value were calculated for each probe-set in the two bivariate analyses per study. Second, to produce a ‘significant’ sub-set of IS genes, we carried out a meta-analysis of each bivariate analysis across six clinical cohorts (n = 564, Figure 1), including the 73 654 ENST probe-sets common to all chip-types (99% of which were protein-coding), generating a false discovery rate (FDR) for each bivariate model separately, using the Stouffer method, in the R-package MetaDE (60), from the ANOVA P-values. Third, any probe-set with an FDR <10% was filtered by univariate regression coefficient such that only those with a directionally consistent R value across the four largest studies was retained. This resulted in the set of ‘CORE-IS’ genes with a much lower average FDR. Post hoc analysis and interpretation of these findings considered gene-expression versus IS, when a covariate, e.g. BMI, was set as the primary variable in the ANOVA model (i.e. using the following linear models in R: expression ∼ BMI*IS). Further interpretation of the CORE-IS genes was based on gender dependent expression using a group of males and females (n = 89 per group) where HOMA2_IR, age and peak VO2 were balanced (Supplemental Table S3). The limma packge in R (eBayes and lmFIT functions) was used to determine DE with P-values adjusted using the fdr method (61). For the ncRNA analysis there were a total of 282 HTA 2.0 profiles available. The analysis was performed as described above (i.e. using the two bivariate models, followed by filtering) for the combined S-PD and MP HTA 2.0. based studies (Figure 1, SMP, n = 191). Similar to the protein coding genes above, only ncRNAs with an FDR less than 10% (Benjamini-Hochberg FDR using the mult-test R package) for IS in both models and with a consistent (direction) R value in two studies; S-PD (n = 67) and MP (n = 124) were considered (93% were consistent). A third cohort consisting of older-subjects (65–95 years) was profiled on the HTA 2.0 array (TIFIN, n = 91, Figure 1) was used to further explore identified ncRNAs by comparing coefficients from univariate models. 52% of the IS relationships being consistent between the younger cohorts (S-PD and MP) and this older cohort. It was expected that there would be some differences due to both the impact of population stratification and potential interactions between IS and muscle age (62). The identified 86 ncRNAs completed the 332 CORE-IS gene list that was taken forward for validation and modeling. Individual CORE-IS genes that causally regulate IS, should respond to treatments that prevent T2DM. We utilized both differential expression (DE) analysis using limma (61) and Spearman rank correlation on data from six distinct life-style intervention studies (three independent of the groups above, IDEAL, HERTG and WG) and separately, one diabetes drug treatment study (SEARS). For DE, P-values were subjected to meta-analysis and a Stouffer FDR was calculated as above (60) to establish which genes were consistently regulated across each (very distinct) life-style intervention study. For each of four studies with valid individual insulin measures (MP, S1–2, IDEAL and DRET, Figure 1), log2 fold changes were calculated for the 332 CORE-IS genes and their correlation (Spearman) with changes in IS, per subject was calculated. DE was also calculated in a seventh drug-treatment study (56), where paired t-tests were used to examine if the CORE-IS genes were responsive to 12 weeks of treatment with a PPARγ agonist (The authors (56) would not provide individual clinical values). Transcription factor binding site analysis was carried out using SSA in oPOSSUM (63), with the 24,752 genes in the database as background (JASPAR CORE profiles (64)). Conservation cut-off was set at 0.6 and a matrix score threshold of 85%, searching 10K up- and down-stream of each CORE-IS gene transcription start site. Results were interpreted in light of the outcome found by analyzing the entire CORE-IS genes and the treatment modulated DE IS list, such that enrichment over these larger lists was considered interesting and results plotted (using Gviz, biomaRt and GenomicRanges packages). Heat-maps were generated using Morpheus (https://software.broadinstitute.org/morpheus/).

Validation using published literature and GWAS analysis

To compare the CORE-IS genes identified in the present study with GWAS-MD loci identified during the past decade, we utilized three main resources from the GWAS research community; the Type 2 Diabetes knowledge portal (www.type2diabetesgenetics.org/home/portalHome) which includes many phenotypes beyond T2DM, and the NHGRI-EBI Catalog (www.ebi.ac.uk/gwas/home), together with gene lists presented in the most recent GWAS-MD publications (7,16,65–76), i.e. those that recently reported genetic associations for risk of development of T2DM, insulin action (IS [insulin or HOMA_IR], fasting glucose, fasting lipids [HDL, LDL and triglycerides] and body composition [BMI or adiposity]). The NHGRI-EBI Catalog and published gene lists were included at the 5 × 10−8 GWAS significant threshold. The Type 2 Diabetes knowledge portal provides three demarcations for significance; ‘GWAS-level’ 5 × 10−8, ‘suggestive association’, 5 × 10−4 and ‘red’ indicating that the loci was non-significant for any GWAS phenotype. A systematic literature review, whereby the CORE-IS gene name was searched with and without the terms; ‘insulin’, ‘diabetes’ or ‘obesity’ (Pubmed, June 2017) followed by an extensive assessment of whether a direct biochemical interaction had been reported, was recorded (see below and Supplementary files).

Gene-network structure and pathway analysis

As we produced continuous quantitative data, for gene expression, from significant GWAS loci, we could utilize probabilistic network analysis incorporating inferred causal relationships between molecules and disease (4,5) to explore the biology of IS. The R-package MEGENA (77) was used to identify network structures within the largest and most detailed sub-group (SMP, n = 191, HTA 2.0 data and single-core high-sensitivity insulin assay). Our aim was to characterize the interacting components of the CORE-IS gene-list (78), determine if they formed significant sub-networks using strict thresholds (FDR < 1% for Spearman correlation; P < 0.01 for module significance and P < 0.01 for network connectivity; and 1000 permutations for calculating FDR and connectivity P-values). Stability of individual sub-clusters of a network was assessed based on the impact of a data-split on compactness on a parent cluster, versus the impact of randomly permuting or inserting nodes. Network plots were produced using Fruchterman–Reingold force-directed plotting (77) within MEGENA. Each presented significant network was populated with visual identifiers (color-coded circles) for GWAS-MD loci. Interpretation of each significant network or module was carried out in several ways. Firstly, using NetworkAnalyst (http://www.networkanalyst.ca/) and the IMEx Interactome curated protein–protein interaction database (79). Each gene list was analyzed using default settings and minimum order network was chosen for gene ontology analysis (GO, Biological Process, GOBP). Enriched GOBP, with FDR < 1 × 10−5, were considered of interest. For each analysis, the result was confirmed after removing Ubiquitin because we noted it appeared as the highest ranked gene (‘betweeness’ score) in every protein interactome, regardless of the input gene ID list (probably reflecting technology bias, (33)). To further describe each network module, summary statistics were calculated for the proportion of ncRNAs, number of positively associating IS genes, the sub-phenotypes underlying the link with the GWAS-MD loci, and the number of genes responsive to interventions. Finally, for significant modules (e.g. treatment responsive genes) TF analysis (as above) and guilt-by-association (GBA) analysis, was performed (80). For the latter, the ‘Genefriends’ human transcriptome database (81) was used to help locate additional links between CORE-IS genes, pathways and metabolic disease GWAS loci.

RESULTS

A total of 326 pre-intervention new muscle samples were profiled on the HTA 2.0 gene-chip (MP, S-PD and TIFIN, Figure 1). We updated the processing of 424 of our previously published muscle gene-chip profiles (Affymetrix U133+2 gene-chip [S1–2, GALL, DRET and HERTG, Figure 1 (2,22,37,46,50,51,53,54)] and several literature-based cohorts (38,53,56).

Identification of a robust set of genes which covary with fasting peripheral insulin sensitivity

To produce a high-stringency set of IS related protein coding genes, we performed meta-analysis of 564 samples originating from these six independent cohorts (73,654 ENST, on HTA 2.0 arrays (two studies) and U133+2 arrays (four studies), Figure 1). Each study had a comparable distribution of IS (Supplemental Figure S2) and clinical characteristics (Table 1). The P-values derived from the relationship between RNA abundance and IS in each of six studies were subjected to meta-analysis using the Stouffer method (60) identifying 1005 protein coding genes (mean FDR = 1.7%). To be considered further, the R value for RNA expression versus fasting IS had to have a consistent directionality across the four largest cohorts. This led to the identification of 246 protein-coding genes, forming part of the CORE-IS gene list (Supplemental data S2). The majority of the detectable ncRNAs in muscle were classed as ‘antisense’ or ‘lncRNA’ molecules (Figure 2A). To examine which lncRNAs were related to IS, we utilized three sub-cohorts of muscle data produced on the HTA 2.0 array (Table 1, n = 282) and found 86 ncRNA associated with IS (Mean FDR = 5%). Half of these were classed as ‘antisense’ or ‘lncRNA’ molecules and were largely consistently expressed in S-PD and MP cohorts, (Figures 1 and 2B); but less so in the cohort of older subjects (TIFIN, n = 91). To compare the expression pattern of IS-related muscle ncRNAs to other T2DM relevant human tissue-types, we produced tissue-specific HTA 2.0 CDFs for adipose tissue (37) and human pancreatic cells (45). The majority of the muscle IS-related lncRNAs were expressed at comparable levels in human adipose and islet-cells (Supplemental data S3, and Figure 6 below), so they may plausibly impact on insulin biology and metabolism across multiple organs.

Figure 2.

Figure 6.

A noncoding RNA network is positively linked to peripheral insulin sensitivity and translational control. Probabilistic networks are utilized to establish a causal relationship between molecules and disease (5). We utilized the response to diverse forms of treatment aimed at reversing insulin resistance, along with measured fasting relationships to estimate if planar filtered networks (77), based on data from 191 individuals, were enriched in genes dynamically regulated or were lncRNAs, harbored causal genetic variant related to metabolic disease, or dominated by proven T2DM disease pathways. MEGENA (77) was used to identify discrete planar filtered networks using the CORE-IS gene list as input (FDR<1% and spearman gene correlation; P< 0.01 for module significance and P< 0.01 for network connectivity). Protein–protein interaction analysis (79) facilitated the interpretation of the identified networks while providing additional biological plausibility to each module (78). Node and label size are proportional to the node degree value within each distinct module. This, the fourth large discrete planar filtered network was dominated by non-coding RNAs (49 genes, 70% ncRNA, and 84% of the network members positively co-varying with in vivo IS). Thirteen of the ncRNA had cis antisense expressed protein-coding genes and these 13 genes, along with the remaining 30% protein-coding CORE-IS genes in the network, formed a protein–protein interaction network strongly associated with viral responsive protein-coding genes (P< 1 × 10−48) and translational initiation (P< 1 × 10−37). The network included several shorter ncRNAs, including the highly conserved 7SK ncRNA pseudogenes which are 271 to 326 base-pairs long. The function of these is unknown despite the 7SK ncRNA gene itself being described half a century ago. 7SK RNA binds to HEXIM1 to inhibit the positive transcription elongation factor b (P-TEFb)—transcript elongation (HEXIM1 is negatively associated with IS, Table 2 (140) and also regulated by a lncRNA (141).

The human muscle non-coding RNA transcriptome: relationship with insulin sensitivity. (A) The non-competitive nature of RNA quantification and tissue configured CDF for the HTA 2.0 microarray facilitated the first detailed and quantitative view of long non-coding RNA molecule expression in human skeletal muscle in vivo (>20 000 ENST representing 16 223 genes (ENSG)). The majority of non-coding RNAs (ncRNAs) identified in human muscle were classed as either ‘anti-sense’ molecules or long non-coding RNA (lncRNA). (B) We utilized three clinical cohorts (n = 282), with varying ranges of chronological age, to examine which ncRNAs were related to insulin sensitivity (IS). Eighty-six long ncRNA associated with IS, the majority (43) being either antisense or lncRNA (Figure 2B, See Supplemental data S3). The vast majority of these were also expressed in adipose and pancreatic cells (in vitro beta cells, Supplemental data S3) indicating they could influence insulin biology across multiple organs. The relationship between each ncRNA and IS was largely consistent in the three sub-groups, with greater variability noted among the oldest subjects, which may reflect population stratification (62). LncRNAs like NEAT1 and MALAT1 were very highly expressed and regulate gene expression via interactions with chromatin (139), while the majority of the IS related ncRNAs were expressed at a level similar to coding mRNA. (C) Four examples of lncRNA molecules related to fasting IS and how they co-vary with their cis expressed protein-coding transcript in vivo (n = 191, younger samples). The protein-coding transcripts were not themselves significantly correlated with IS. (D) Human HepG2 cells (hepatocellular carcinoma cell line) were transfected with 20 nM of phosphorothioate antisense oligonucleotides (PS ASO, one control ASO and four different ASO targeting PRKCQ-AS1) in triplicates, and expression of PRKCQ-AS1 (A) and PRKCQ (B) was measured by quantitative real-time PCR. ASO-induced knockdown was statistically significant (P < 0.01). Values were normalized to GUSB/PPIB and expressed relative to levels in cells treated with control ASO. Values are mean ± SEM. Approximately half of the IS lncRNAs had cis-expressed transcripts, most of which were protein-coding (Supplemental data S3). Approximately 30% of these lncRNAs demonstrated a sizeable positive or negative co-expression pattern, across 191 samples, with their cis protein coding partner; these included LINC-PINT, HOXB-AS3, TMEM191A and PRKCQ-AS1 (Figure 2C), a natural antisense for the diabetes drug-target PRKCQ (codes for PKC-θ) (82–85). We initially sought to study three antisense lncRNAs in vitro: PRKCQ-AS1, NPSR1-AS1 and RP11–582J16.4 (AC037459.2). All were robustly expressed in both normal hepatocytes and HepG2 cells (insulin target cells) but the partner of NPSR1-AS1, NPSR1, was not detected in vitro and thus only PRKCQ-AS1, and AC037459.2 were studied. For each lncRNA, four phosphorothioate antisense oligonucleotides (PS-ASO) were designed. The PS-ASOs did not reduce the expression of AC037459.2. All four PS-ASO targeting PRKCQ-AS1, however, provided ∼50–60% knockdown in the HEPG2 human liver cell line, resulting in increased expression from the cis-expressed protein-coding gene, at the time-points examined (24–72 hours; Figure 2D). Given the role PRKCQ plays in insulin signaling, and the extensive complexity of the transcripts expressed from the PRKCQ loci, detailed studies are merited to explore the interaction with each splice-variant and subsequence impact on cell metabolism and signaling.

CORE-IS list is composed of genes with biochemical and genetic links with type 2 diabetes

An extensive literature review, searching for a substantial link between the CORE-IS genes and insulin, diabetes or obesity provided overwhelming support for the validity of our ENST-based analysis (Table 2; for the citations see Supplemental Table S2 within the supplemental materials document). We subsequently established that this subset of literature supported IS genes demonstrated the strongest univariate linear relationship with IS, as a sub-group of all CORE-IS genes (P < 0.03, Bonferroni corrected t test). To further explore the CORE-IS list we considered the hundreds of genomic loci that have been linked to the risk of developing T2DM or associated with variation in insulin, BMI or glucose homeostasis (7,12,72–76,86,16,65–71). For the majority of these loci it remains unclear which gene (or genes, given the potential for linkage disequilibrium (LD)) represent the causal link (14). Overlaps between the CORE-IS gene list and these GWAS-MD loci would further substantiate our analysis but more importantly help identify which gene underlies the reported genetic association. Indeed, at least 45 of the 332 CORE-IS genes originate from GWAS-MD loci at a genome wide significant level (P < 5 × 10−8), while a further 180 have suggestive associations according to www.type2diabetesgenetics.org/home/portalHome (see Materials and Methods, Table 2 and Supplemental data S4). The majority of genes linked to either risk of T2DM or risk of Obesity, with a smaller number to more discrete phenotypes (fasting insulin, fasting glucose or circulating lipids—reflective of smaller GWAS samples and/or phenotypes with greater random and technical error) (Figure 3A).

Table 2.

CORE-IS genes with established links with insulin biology or metabolic disease. A large-scale version of this table is available as supplementary material online

Gene ID	[RNA] relationship with IS	Published Biochemical relationship	T2DM Portal Evidence System (Grey-shade=5x10-8, unshaded=5x10-4)	GWAS Significant (GWAS Catalog/Articles)	Gene ID	[RNA] relationship with IS	Published Biochemical relationship	T2DM Portal Evidence System (Grey-shade=5x10-8, unshaded=5x10-4)	GWAS Significant (GWAS Catalog/Articles)
AASS	positive	> better S%	WC		MAGI3	negative		IS, TG
ABCG1	positive	> better S%	IS, Adiposity		MAPKAP1	negative	< better S%	WHR	Obesity
ABHD2	negative	< better S%	T2DM, LDL		MAPRE3	negative		FG, TG	T2DM
ACADL	positive	> better S%			MAST3	negative		BMI, WC, WHR
ACOT11	positive	> better S%	IS, T2DM, BMI, LDL		MAT2B	positive		T2DM, LDL, Adiposity
ACSS1	positive	> better S%	BMI		MCC	negative		T2DM, IS, HDL, TG, BMI, WC, LDL, HOMA-B	T2DM, Obesity, IR, Lipids
ADPRHL1	negative				MCCC1	positive	> better S%	BMI
AGL	negative	< better S%	TG		ME2	negative	< better S%	TG
AKR1B15	negative		WHR		MRC2	negative		T2DM, BMI, IS
ALDH1L1	positive	> better S%	BMI, LDL, WHR, Adiposity		MROH7	positive		WC
ALDH1L2	negative		BMI, WC		MROH7-TTC4	positive		T2DM
ALDH6A1	positive	> better S%			MSRB3	negative		WHR, WC	T2DM
ANKRD50	negative		T2DM		MSTN	negative	< better S%	BMI
ANKS1B	negative		IS, T2DM, BMI, FG, LDL	Obesity	MTDH	negative		T2DM, Adiposity
ASCC1	negative		IS		MTF2	positive		HDL (LDL, WHR)	Lipids
ATAD2	positive		T2DM, WHR		MTPN	negative			Obesity
ATP1A1	negative	< better S%	HDL		MTRF1	negative		T2DM
ATP1B3	negative		HDL		MUSK	negative		T2DM, IS, WHR, Adiposity
ATP2A2	positive		BMI		MYBPC2	negative		BMI
B4GALT6	negative	< better S%	IS		MYL1	negative		IS
BCL2L11	positive		BMI, T2DM, WHR, Adiposity		MYL12A	positive		IS
BDH1	positive	> better S%	IS, T2DM, BMI		MYLK4	negative		IS, Adiposity
BRD8	positive	> better S%	T2DM, BMI		N4BP2L1	positive		T2DM, WC, HDL, LDL, BMI
C15orf41	positive		IS, BMI, WC		NCAM1	negative		T2DM, IS, FG, BMI, HOMA-B	Obesity
C1orf158	negative			Lipids	NDUFAF5	negative	< better S%	T2DM, LDL, Adiposity
CAB39L	negative		BMI, WHR, WC		NETO2	negative		IS
CACNB1	negative		T2DM		NIN	negative		BMI, TG, Adiposity
CALM1	negative	< better S%	IS		NSF	negative	< better S%	BMI, WC, WHR, pro-insulin
CARNS1	negative	< better S%	BMI, TG		NUCB2	negative	< better S%	WC (BMI, T2DM, WHR)	T2DM, Obesity
CBX7	positive		WC		NUP210	positive		BMI, WC, WHR, HC
CCDC25	negative		WC		OARD1	positive		HC
CCDC69	negative		IS, T2DM, TG, LDL		OSBPL6	negative		T2DM, WC, WR, Adiposity
CCNB1IP1	positive		IS, T2DM		PARK2	negative	< better S%	BMI (T2DM, TG, IS, LDL, WC, Adiposity)	Obesity
CD151	negative		HOMA-B		PCYT2	positive	> better S%
CD38	negative	< better S%	T2DM		PDHX	negative	< better S%	WC
CDK14	negative	< better S%	IS, BMI, T2DM, WHR		PDLIM7	negative	< better S%
CDK2AP1	negative	< better S%	HDL, T2DM, LDL	T2DM	PGK1	negative	< better S%
CEBPD	positive	> better S%	T2DM		PGM2	negative	< better S%	IS
CEBPG	negative		IS, WC, HDL, WHR	T2DM	PHKB	negative	< better S%	BMI, WHR
CERS6	negative	< better S%	FG, (BMI, T2DM)	FG, (BMI, T2DM)	PKM	negative	< better S%	BMI, TG
CLIP4	positive		BMI		PLXDC1	negative		T2DM, LDL
COL15A1	negative		TG, WC		PON2	negative	< better S%
COL4A3BP	positive		BMI, LDL	Obesity	PPM1L	negative	< better S%	BMI, T2DM
COL5A2	negative		IS, HDL		PPP1R3B	negative	< better S%	T2DM, BMI, TG, WC, LDL, WHR	T2DM, IR, Lipids
COL6A1	negative		T2DM, WHR, FG, LDL	Obesity	PPP2R5C	negative	< better S%	IS, T2DM, BMI, WC, WHR
CORO1C	negative	< better S%	LDL		PRDX6	positive	> better S%
CRBN	positive	> better S%	T2DM		PRKAG3	negative	< better S%		Obesity
CTH	positive	> better S%			PRRX1	positive		LDL, WC, WHR	Obesity
CTSF	positive		IS		PRSS42	positive		HDL
DAAM1	negative		T2DM, BMI, Adiposity, WHR	T2DM	RAB29	negative		n/a	Obesity
DAPK1	negative		IS, HOMA-B, TG, WHR, Adiposity		RAB30	negative		T2DM, IS
DAZAP1	positive		T2DM (WC)	T2DM (WC)	RAB40C	positive		FG, WC, HC
DDR1	positive	> better S%	T2DM, LDL, TG, Adiposity	T2DM	RAPH1	negative	< better S%	T2DM, LDL
DHRS7	negative	< better S%	BMI, LDL		RCBTB1	negative		IS, BMI
DHTKD1	positive	> better S%	TG, WHR	T2DM	RDH10	negative	< better S%
DPYSL3	positive		T2DM		RND3	negative	< better S%	T2DM, LDL, BMI, WC	T2DM
E2F3	negative		T2DM, BMI, LDL, WC	T2DM	RNF10	negative	< better S%	BMI, WC
ECT2	negative		BMI		ROBO1	negative	< better S%	IS, BMI, WHR, WC, Adiposity, LDL, HDL
EDEM3	negative		T2DM		RXRG	negative	< better S%
ELP2	positive		TG, HDL, FG, WHR		SATB1	positive		WHR, WC
EML4	negative		T2DM (BMI, LDL, IS, FG, WC)	T2DM (BMI, LDL, IS, FG, WC)	SEC61A1	negative	< better S%
EPDR1	negative		IS		SEPT11	negative	< better S%	FG, BMI, Adiposity
ERBB3	negative	< better S%	WC		SESN1	positive		T2DM, LDL
EZH1	positive		WHR		SESN3	negative	< better S%	LDL
FAM118B	negative		LDL		SETDB2	negative	< better S%	IS
FAM160A1	positive		IS, Adiposity		SFRP4	negative	< better S%	IS, WHR, WC
FARP2	negative		TDM, HDL		SGCG	positive		T2DM, IS, LDL, TG, WHR	T2DM
FAT1	negative		IS, LDL, BMI	Obesity	SGMS2	negative	< better S%	T2DM
FERMT2	negative		IS, FG, BMI, Adiposity		SH3PXD2A	positive		T2DM, IS, Adiposity, WC
FOXP2	positive		T2DM, BMI, IS, Adiposity, WHR	Obesity	SH3RF2	negative		T2DM, BMI, WHR
FRMD3	negative	< better S%	T2DM, HDL, BMI, LDL		SHISA2	negative		HC
G0S2	negative	< better S%	FG		SHISA4	negative		BMI (T2DM, IS)	BMI (T2DM, IS)
GADL1	negative		IS, BMI, LDL, WHR	T2DM	SLC16A10	negative	< better S%	T2DM, BMI
GCLC	positive		T2DM, IS, BMI, WHR, TG		SLC27A1	positive	> better S%
GLUL	positive	> better S%			SLC43A1	positive	> better S%	LDL, WC, WHR
GPAT3	positive	> better S%	n/a		SLC4A4	negative		T2DM, IS, BMI, WC
GPD1	negative	< better S%	WC		SMIM8	positive		LDL
GRB14	negative		T2DM, IS, LDL, HDL, TG, WHR	T2DM, IS, LDL, HDL, TG, WHR	SMTNL2	negative		BMI
GRIA3	negative		TDM, Adiposity		SOS2	negative		IS, Adiposity
GSR	negative		T2DM, IS, HOMA-B		SPARC	negative	< better S%	WHR
HBP1	positive	> better S%	TG, WC		SSX2IP	negative		FG	Obesity
HEXIM1	negative	< better S%	Adiposity		STK38	negative		T2DM
HNRNPD	positive		WC		STK38L	negative		BMI, LDL, WHR
HOMER1	negative	< better S%	IS		TACC2	positive		T2DM, IS, BMI, WHR, WC, Adiposity
HOXB-AS3	positive			Obesity	TANC2	negative		T2DM, BMI, LDL, WC, WHR
IGF1R	positive	> better S%	T2DM, BMI, WC, FG, LDL, WHR	Lipids	TPRXL	positive		T2DM, Adiposity
ING5	positive		WHR, WC		TSPAN15	negative		WHR	T2DM
INSR	positive	> better S%	TG (T2DM, BMI, HOMA-B, HDL)	T2DM, Obesity, IR, Lipids	TSPAN3	negative		T2DM	T2DM
ITIH5	negative	< better S%	T2DM, WHR, FG		TSPAN9	negative		BMI, HDL, WHR, Adiposity
KCNN3	negative	< better S%	T2DM, IS, WC		TTC8	negative		T2DM
KIAA1109	positive		T2DM, IS, BMI, WHR, TG		TUBGCP4	positive		IS, TG, BMI, T2DM
KIAA1841	positive		T2DM		TULP4	negative		T2DM, IS, WC
KLF12	positive		T2DM, IS, BMI, FG, LDL, WHR, TG	T2DM, Obesity	UBE2G1	positive		HDL
KLF15	positive	> better S%			UBE4A	positive		T2DM, HDL
KLHL2	negative		T2DM, WHR		UNKL	positive		IS, WC
KLHL31	negative		T2DM, WHR, Adiposity, HOMA-B	Obesity	USP25	positive		FG, WC, HC
KPNA2	negative		T2DM, TG		VCL	negative		T2DM, BMI, Adiposity
KPNA5	negative		IS, BMI, HOMA-B		VPS8	positive		FG
LAMB2	negative		IS		WWOX	negative		T2DM, IS, FG, TG, BMI, WHR, WC, Adiposity	T2DM, Obesity
LDHA	negative	< better S%	FG, WHR		XPO4	positive		T2DM, HDL, HC
LDHB	positive	> better S%	T2DM, WC		ZEB1	negative		WC
LGALS1	negative	< better S%			ZFAND5	positive		WHR
LGR5	positive	> better S%	T2DM, WC	T2DM	ZMAT3	negative		T2DM
LIN52	positive		T2DM, BMI		ZMYM2	negative		WHR
LPL	positive	> better S%	TG, HDL (T2DM, LDL)	T2DM, Obesity, IR, Lipids	ZNF274	positive		Adiposity	Lipids
LRRFIP2	negative		FG		ZNF282	positive		WC
LRRN3	negative		T2DM, Adiposity

Two-hundred and 13 genes from 332 have prior links with T2DM phenotypes (54 GWAS-Significant for T2DM phenotypes, 82 with functional/biochemical studies linking to insulin resistance, biology or metabolic disease and 180 GWAS ‘suggestive’ associations (5x10-4). Adiposity = linked to adipose mass phenotype. BMI = Body mass index. FG = Fasting glucose. HDL = HDL cholesterol. IS = measure of insulin sensitivity. LDL = LDL cholesterol. Pro-insulin = pro-insulin levels. TG = Triglycerides. WC = Waist circumference. WHR = Waist-to-hip ratio. T2DM = Risk for T2DM. () = at the lower significance. Data resources are: http://www.type2diabetesgenetics.org/home/portalHome, https://www.ebi.ac.uk/gwas/home and Pubmed (June-July 2017).

Figure 3.

Relationship with GWAS-MD loci and the impact of covariate modeling on insulin sensitivity relationship with gene expression. (A) The majority of CORE-IS genes located at genome wide significant metabolic disease loci (GWAS, 5 × 10−8), were related to either ‘risk of developing T2DM’ (24) or ‘Obesity/BMI’ (22), partly reflecting the greater number of such loci, over those known to influence fasting insulin or glucose. See Table 2 for details of ∼180 genes; for what the Diabetes consortium defined as marginal GWAS loci (5 × 10−4). (B) When the relationship between IS and RNA is studied using ANOVA and a covariate, e.g., age or BMI, the model yields P-values for both variables, where the outcome depends on the order in the linear model and the inter-relationship between IS and Age/BMI. The log10 false discovery rate (FDR) values (using the Stouffer method and a calculated FDR (60)) are plotted for three gene-sets (CORE-IS ENST, mitochondrial located genes and the Mootha et al. (26) defined OXPHOS gene-set) from the two different models, IS*age and age*IS. As can be seen, the vast majority of CORE-IS genes remain significant when adjusted for age (i.e. age*IS) as compared to IS*age, but with a small systematic loss in statistical significance. There are only two significant OXPHOS genes, one down-regulated with better IS, and one of which is related to an interaction with age. (C) The log10 false discovery rate (FDR) values are plotted for three gene-sets (the CORE-IS ENST, mitochondrial located genes and the OXPHOS gene-set) from the two different models, IS*BMI and BMI*IS. As can be seen, there is a systematic loss in statistical significance when BMI is adjusted for (BMI*IS) in the model as compared to IS*BMI. Approximately 38% of the CORE-IS protein coding genes FDR-values are no longer significant when BMI is considered first, in the ANOVA Type I model, while a further 33 are identified, with a proportionate gain in new GWAS-MD loci (mainly for BMI or Obesity) when this alternate model is used. CORE-IS genes with established links with insulin biology or metabolic disease. A large-scale version of this table is available as supplementary material online Two-hundred and 13 genes from 332 have prior links with T2DM phenotypes (54 GWAS-Significant for T2DM phenotypes, 82 with functional/biochemical studies linking to insulin resistance, biology or metabolic disease and 180 GWAS ‘suggestive’ associations (5x10-4). Adiposity = linked to adipose mass phenotype. BMI = Body mass index. FG = Fasting glucose. HDL = HDL cholesterol. IS = measure of insulin sensitivity. LDL = LDL cholesterol. Pro-insulin = pro-insulin levels. TG = Triglycerides. WC = Waist circumference. WHR = Waist-to-hip ratio. T2DM = Risk for T2DM. () = at the lower significance. Data resources are: http://www.type2diabetesgenetics.org/home/portalHome, https://www.ebi.ac.uk/gwas/home and Pubmed (June-July 2017). Substantial differences in age, aerobic fitness etc. between T2DM cases and controls, especially in combination (23,24,87), will yield gene expression differences inappropriately assigned to T2DM or IR, as well as potentially obscuring genuine differences. For some, but not all, of our cohorts, age or BMI modestly covaried with IS (Table 1). When examining the results of the ANOVA analysis after first considering the co-variate (‘adjusting’), there was no overall impact of Age on IS gene selection except for some slight reductions the FDR achieved for some mitochondrial genes (Figure 3B). Notably, there was no IS dependent expression pattern for oxidative phosphorylation (OXPHOS) genes regardless of which ANOVA model was used (Figure 3B). In contrast, when BMI was utilized to ‘adjust’ the ANOVA model for IS, a number of CORE-IS genes were removed (Figure 3C) while an additional 33 genes passed the FDR threshold (Supplemental data S4), of which seven occurred at GWAS significant loci for obesity or blood lipids, but none for the other phenotypes. The genes lost from the CORE-IS, when the ANOVA model assigned covariance between IS and BMI to the covariate, included nine genes from Obesity risk loci, eight genes from T2DM risk loci and two others. This demonstrates that neither approach to linear modeling is fully compliant with the complexity of the biological relationship between BMI and insulin as the second ANOVA model enriched the IS correlated gene list with genetic loci associated with obesity rather than insulin. Finally, the lack of any relationship between OXPHOS gene expression and IS (26) supports recent mechanistic studies dissociating OXHPOS status and insulin signaling (88). We did, however, observe the expected univariate correlation of OXPHOS genes with chronological age and aerobic capacity.

Response of the CORE-IS genes to therapeutic clinical interventions

Supervised exercise-training (with or without calorie restriction) can improve IS in humans, and is considered a primary prevention strategy for reducing risk of T2DM (89). Thus, if the CORE-IS genes are responsible for regulating peripheral insulin sensitivity, treatment should cause differential expression (DE) of the CORE-IS RNA in muscle. Six independent life-style intervention studies (Supplemental Table S1) were used for meta-analysis and an FDR was calculated (60) (Figure 1)). We found that at least 135 (>40%) of the protein-coding IS-CORE genes were differentially regulated in the clinical intervention studies (meta-analysis FDR < 5%) and in a consistent direction (Figure 4A, Supplemental data S5 and Supplemental Figure S3). Further, in a seventh study where 12 weeks of PPARγ agonist drug-treatment improved IS (56), 70 protein coding IS genes were DE (P < 0.05 using a t-test, Supplemental data S5, i.e. ∼29% of the detected CORE-IS genes). In the two clinical intervention studies where HTA 2.0 array-data were available, 20 of the CORE-IS ncRNAs were DE by treatment (Supplemental data S5). Thus, there is substantial evidence that the CORE-IS genes are regulated by life-style interventions aimed at treating and/or preventing T2DM.

Figure 4.

Treatment-responsive insulin-sensitivity genes. Supervised exercise-training (with or without calorie restriction) can improve peripheral IS (48,50) and is considered the primary tool to reduce the risk of developing type 2 diabetes. If the Core-IS genes regulate IS they should respond to clinical interventions and do so in a manner consistent with the observed improvement in IS. (A) Differential gene-expression for 332 Core-IS genes was calculated, across six intervention studies (three were independent of the baseline IS analyses), P-values aggregated using the Stouffer method; and a false discovery rate (FDR) was calculated (60). Approximately 40% of genes were differentially regulated (FDR < 5%) by treatment, and critically in a direction consistent with their observed relationship with fasting IS, and despite the highly divergent nature of the exercise/life-style modification interventions. (B) The relationship between the magnitude and direction of CORE-IS gene expression and the magnitude and direction of IS responses to clinical intervention were calculated using Spearman rank correlation. Individual HOMA2-derived IS were not available for the HERTG (insulin assay) or the Weigert et al. (38) (data availability) studies. This lead to the identification of a subset of 16 genes, from ∼250, that were consistently correlated with change in IS, in all the four independent studies. The 16 genes segregated into two subsets—one (DHTKD1, SLC43A1, PCYT2, MCCC1, SGCG, ECHDC3, ALDH6A1, SMIM8 and OARD1) positively associated with improvements in IS, and one (PGM2, RND3, G0S2, AGL, TSPAN9 and KPNA2) negatively associated. Many CORE-IS genes, especially belonging to network 1 and 2 (Figure 5 and Table 3) were also, on average, DE following 3 months of a PPARγ agonist (Supplemental data S5, (56)) - unfortunately no individual IS values were accessible from the authors to study the quantitative correlative relationship with IS.

Figure 5.

Treatment-responsive IS networks regulate human peripheral insulin sensitivity and link to metabolic disease risk genes. Probabilistic networks have been utilized to establish a causal relationship between molecules and disease (5). We utilized the dynamic response to diverse forms of treatment aimed at reversing insulin resistance, along with measured fasting relationships to estimate if planar filtered networks (77) derived from 191 of the younger individuals (<70yr), were enriched in genes dynamically regulated or were lncRNAs, harboring causal genetic variant related to metabolic disease, or dominated by proven T2DM disease pathways. MEGENA (77) was used to identify discrete planar filtered networks using the CORE-IS gene list as input (FDR <1% and spearman rank correlation; P<0.01 for module significance and P< 0.01 for network connectivity). Follow-up protein–protein interaction analysis (79) facilitated the interpretation of the identified networks providing additional biological plausibility to each network (78). Node and label size are proportional to the node degree value within each distinct module. (A) represents the largest discrete planar filtered network (87 genes, 13% ncRNA and 64% negatively co-varying with in vivo fasting IS) that protein–protein interaction analysis identified as being involved with hormone signaling, including insulin, FDR<1%. This network contained 44% (7; ALDH6A, DHTKD1, KPNA2, MCCC1, PCYT2, PGM2 and RND3) of the 16 genes regulated in vivo proportional to changes in IS, across four independent treatment studies. Further, 33% of the life-style treatment-response genes (ALDH6A DHTKD1, PCYT2 and RND3) were responsive to three months of PPARγ agonist, given to insulin resistant subjects ((Supplemental data S5, (56)). There were numerous significant GWAS loci for T2DM (gene ID enclosed in blue-ring) or Obesity (gene ID enclosed in brown circle). (B) The second largest and discrete planar filtered network (81 genes, 22% ncRNA and 75% negatively co-varying with in vivo fasting IS) contained genes from GWAS loci for T2DM, Obesity and Lipids (gene ID enclosed in gold circle). Twenty-five percent of this network was responsive to anti-diabetes drug treatment (PPARγ agonist), given to insulin resistant subjects for three months (56). Protein–protein interaction analysis indicated that this module was enriched by genes (33 in the network, FDR < 1 × 10−5) involved in the positive regulation of RNA metabolism and transcription.

Table 3.

Summary of CORE-IS RNA networks

Network	Size	Relation to IS	Biotype	P–P interaction profile	GBA transcriptome profile*
1 Figure 5A	87	64% negative	13% ncRNA	Insulin receptor signaling pathway (FDR<1 × 10⁻⁵ 6.7 FE); interaction with host (FDR<1 × 10⁻⁴, 4.0 FE)	Extracellular matrix organization (FDR<1 × 10⁻³², 5.3 FE)
2 Figure 5B	81	75% negative	22% ncRNA	Positive regulation of nucleobase-containing compound metabolic process (FDR < 1 × 10⁻⁸, 3.0 FE); positive regulation of metabolic process (FDR <1 × 10⁻⁷, 2.3 FE)	Muscle filament sliding (FDR < 1 × 10⁻³⁵, 17.1 FE)
3 Supplemental Figure S5	83	66% negative	20% ncRNA	Interaction with host (FDR < 1 × 10⁻⁷, 6.2 FE); enzyme linked receptor protein signaling pathway (FDR < 1 × 10⁻⁷, 3.6 FE)	n/s
4 Figure 6	49	16% negative	70% ncRNA	Viral reproductive process (FDR < 1 × 10⁻⁴⁸, 5.5 FE); translational initiation (FDR < 1 × 10⁻³⁷, 8.7 FE)	n/s
5 -	32	13% negative	19% ncRNA	Protein catabolic process (FDR < 1 × 10⁻¹², 3.1 FE)	n/s

MEGENA planar filtered networks were produced using 191 HTA 2.0 gene-chip samples (See methods). protein–protein interaction analysis was carried out using www.networkanalyst.ca and the IMEx Interactome (79). The guilty-by-association analysis (80) used GeneFriends (81). *when using a database for GBA analysis, it is expected that co-correlation will occur between gene expression from the input tissue type (regardless of physiological status), and with genes in that tissue type within the data-base. This is a type of sampling bias that skews Gene Ontology analysis when inputting large lists from distinct tissues (in this case skeletal muscle).

While DE analysis further supports a direct role for the CORE-IS genes regulating IS in humans, the most important regulators of IS are likely to be regulated in proportion with the change in clinical status. Individual pre- and post-intervention IS values were available for four of the intervention studies. Spearman rank correlation coefficients were calculated for delta CORE-IS gene-expression versus delta IS, identifying 16 genes consistently regulated in proportion with the clinical improvement in all four independent studies (Figure 4B, cumulative binomial probability, P = 0.0000053) and, critically, in a directionally consistent manner with the fast IS relationship. These 16 genes segregated into two clusters, one positively (DHTKD1, SLC43A1, PCYT2, MCCC1, SGCG, ECHDC3, ALDH6A1, SMIM8 and OARD1) and the other negatively associated (PGM2, RND3, G0S2, AGL, TSPAN9 and KPNA2) with improvements in IS. PPARγ agonist drug-treatment collectively tended to influence these 16 genes more than the average across the entire CORE-IS gene list (median P = 0.07 versus P = 0.18). Conserved transcription factor (TF) binding-site analysis (63) identified RREB1 and INSM1 binding-sites (Z-score 13.9 and 11.6) in the 16 genes activated in proportion to the improvement in IS (Supplemental Figure S4). These binding sites were not enriched in the (larger) set of ‘all exercise-responsive’ CORE-IS genes nor in the entire 332 Core-IS gene list (Supplemental data S6), suggesting a more specific relationship with these 16 IS genes ('Delta-IS'). In the two studies examining ncRNAs, some changed in proportion to clinical improvements (e.g. RORA-AS2, SNORA70, AL136979.1, PRKCQ-AS1, SNORD83A, ATP13A4-AS1 and NPSR1-AS1). In each case the pattern of change was positively correlated with changes in fasting IS (Supplemental data S6). The treatment regulated CORE-IS genes can be further investigated by ‘guilt-by-association’ (GBA) analysis (80), relying on completely independent data (81). We tested the hypothesis that the nine CORE-IS genes positively correlated with a change in IS in intervention studies and the transcription-factors distinctly associated with this subset of Delta-IS genes, could regulate an additional ‘layer’ of genes responsible for regulating IS. Multi-tissue (81) analysis identified 52 genes (P < 1 × 10−6) co-expressed with Delta-IS genes (Supplemental data S7). This 52-gene subset was 77-fold enriched for branched-chain amino acid (BCAA) catabolic genes (P < 1 × 10−5) located in the mitochondrial matrix, and included nine additional GWAS-MD loci beyond the 45 already in the CORE-IS list (ALDH2, C3, CD36, HNF4A, HMGCS2, KHK, IDH2, RREB1 and SERPINA6 (65)). The cluster of genes covarying with the 7 negatively correlating Delta-IS genes contained a further two GWAS-MD loci (SLC2A4 and PPARG). Thus, both the GBA analysis and the CORE-IS analysis identified genes linked to GWAS-MD and in particular, catabolism of BCAAs (valine, leucine and isoleucine) and aromatic amino acids (phenylalanine and tyrosine). Indeed, studies from 1960 onwards have identified associations between circulating levels of BCAA and aromatic AA with IR (47,90). More recently, pre-clinical studies identified that catabolism of valine may act as a signal to promote muscle lipid uptake (91). Here, we show that altered muscle gene expression in the AA catabolic pathways occurs in the absence of any systematic loss of mitochondrial OXPHOS gene expression (Figure 3B).

Networks analysis reveals the inter-relationship between the GWAS-MD enriched IS genes

We next examined if the CORE-IS genes formed robust networks, and if protein–protein (P–P) interaction analysis, relying on further independent data (79), would demonstrate that each ‘gene’ network contained proteins known to physically interact (78). Typically, genes from validated GWAS loci are used to weight nodes within gene-expression networks (5), however, in this case we used the identified RNA-RNA network relationships to gain insight into how GWAS-MD loci genes may interact, within the quantitative and continuous framework of RNA abundance. Moreover, this strategy allowed us to associate ncRNAs (mostly of unknown function) with specific proteins and hence biochemical pathways. Using MEGENA (77) we identified seven discrete planar filtered networks (Table 3) (n = 191, FDR <1% for Spearman correlation; P < 0.01 for module significance and P < 0.01 for network connectivity). Summary of CORE-IS RNA networks MEGENA planar filtered networks were produced using 191 HTA 2.0 gene-chip samples (See methods). protein–protein interaction analysis was carried out using www.networkanalyst.ca and the IMEx Interactome (79). The guilty-by-association analysis (80) used GeneFriends (81). *when using a database for GBA analysis, it is expected that co-correlation will occur between gene expression from the input tissue type (regardless of physiological status), and with genes in that tissue type within the data-base. This is a type of sampling bias that skews Gene Ontology analysis when inputting large lists from distinct tissues (in this case skeletal muscle). The largest discrete planar filtered network contained 87 CORE-IS genes (13% ncRNA) with an enrichment in genes negatively co-varying with in vivo IS (64%, cf 30% of all CORE-IS genes demonstrated a negative co-variation with IS, Table 3 and Figure 5a). Protein–protein interaction analysis characterized the function of the protein-coding members of a minimum network (79) as ‘hormone signaling including Insulin’ (Table 3); and included the insulin-receptor, itself. This network contained 7 (ALDH6A, DHTKD1, KPNA2, MCCC1, PCYT2, PGM2 and RND3) out of the 16 CORE-IS genes regulated proportionately to changes in IS across four clinical studies. Remarkably, ∼50% of these protein-coding genes were also DE following anti-diabetes drug-treatment (56); including ALDH6A DHTKD1, PCYT2 and RND3 (this compares with <5% of the transcriptome responding to drug -treatment (56)). A module centered around ALDH6A (encoding methyl-malonate semi-aldehyde dehydrogenase), a protein which catabolizes valine. Closely co-expressed with ALDH6A, was the insulin-receptor, and MCCC1 (methyl-crotonoyl-CoA carboxylase 1, which is involved in leucine catabolism (92,93)), and AASS (alpha-aminoadipate delta-semialdehyde synthase, which catabolizes lysine). DHTKD1 (dehydrogenase E1 and transketolase domain containing 1) also catabolizes amino-acids, as does ALDH1L1. In each case, greater (BCAA catabolising) gene expression was associated with better IS. Seven members of the network overlapped with genes from GWAS significant Obesity loci (Figure 5A, brown circles) and five were from GWAS significant T2DM-risk loci (blue circles). Many of the remaining members of the network are involved in metabolism; with clear roles in T2DM (glycolysis or glycogen break-down, LDHB or PGM2). Thus, the first discrete network is strongly modulated by all clinical interventions, and biochemically linked to amino-acid catabolism. Treatment-responsive IS networks regulate human peripheral insulin sensitivity and link to metabolic disease risk genes. Probabilistic networks have been utilized to establish a causal relationship between molecules and disease (5). We utilized the dynamic response to diverse forms of treatment aimed at reversing insulin resistance, along with measured fasting relationships to estimate if planar filtered networks (77) derived from 191 of the younger individuals (<70yr), were enriched in genes dynamically regulated or were lncRNAs, harboring causal genetic variant related to metabolic disease, or dominated by proven T2DM disease pathways. MEGENA (77) was used to identify discrete planar filtered networks using the CORE-IS gene list as input (FDR <1% and spearman rank correlation; P<0.01 for module significance and P< 0.01 for network connectivity). Follow-up protein–protein interaction analysis (79) facilitated the interpretation of the identified networks providing additional biological plausibility to each network (78). Node and label size are proportional to the node degree value within each distinct module. (A) represents the largest discrete planar filtered network (87 genes, 13% ncRNA and 64% negatively co-varying with in vivo fasting IS) that protein–protein interaction analysis identified as being involved with hormone signaling, including insulin, FDR<1%. This network contained 44% (7; ALDH6A, DHTKD1, KPNA2, MCCC1, PCYT2, PGM2 and RND3) of the 16 genes regulated in vivo proportional to changes in IS, across four independent treatment studies. Further, 33% of the life-style treatment-response genes (ALDH6A DHTKD1, PCYT2 and RND3) were responsive to three months of PPARγ agonist, given to insulin resistant subjects ((Supplemental data S5, (56)). There were numerous significant GWAS loci for T2DM (gene ID enclosed in blue-ring) or Obesity (gene ID enclosed in brown circle). (B) The second largest and discrete planar filtered network (81 genes, 22% ncRNA and 75% negatively co-varying with in vivo fasting IS) contained genes from GWAS loci for T2DM, Obesity and Lipids (gene ID enclosed in gold circle). Twenty-five percent of this network was responsive to anti-diabetes drug treatment (PPARγ agonist), given to insulin resistant subjects for three months (56). Protein–protein interaction analysis indicated that this module was enriched by genes (33 in the network, FDR < 1 × 10−5) involved in the positive regulation of RNA metabolism and transcription. The second discrete network (81 genes) was composed of 22% ncRNA (Table 3 and Figure 5B) and was predominantly negatively associated with IS (75%). A third of the protein-coding components were responsive to PPARγ agonist drug-treatment (56); P–P interaction analysis indicated the network included genes involved with ‘positive regulation of metabolic process’ (FDR < 1 × 10−7, 2.3-fold enriched (FE), n = 44 genes) and ‘positive regulation of nucleobase-containing compound metabolic processes’ (Table 3). Five members of the network overlapped with genes from GWAS significant Obesity loci (Figure 5B, brown circles) and 6 were from GWAS significant T2DM-risk loci (blue circles) with one at a GWAS loci associated with lipids (gold circle). CCDC69 is a network 2 hub gene (coiled-coil domain containing 69, a cytokinesis protein (94)) and it regulates the cell-cycle, as does the GWAS-linked CDK2AP1. Given that the majority of skeletal muscle nuclei are post-mitotic, the gene–environment interactions and/or genetic influence we note in muscle may also be reflective of events in other tissue-types, e.g. ß-cells and/or adipocytes. Indeed, the lncRNA LINC02210 expression in muscle is positively related to IS but more highly expressed in ß-cells as compared with muscle or adipose tissue (Supplemental data S3). In sum, this second network appears to reflect reduced activity of processes that may ultimately reflect muscle use: muscle remodeling and substrate cycling as 47 genes were differentially expressed by exercise. The third network (Supplemental Figure S5) contained eight genes from loci identified in T2DM and insulin sensitivity GWAS, with a further three linked with obesity and fasting glucose GWAS loci. While 66% of genes had a negative relationship to IS, there was no enrichment in ncRNA (20% ncRNA). P–P interaction analysis contained 33 protein-coding genes involved with the ‘positive regulation of cellular metabolic processes, signaling’ (FDR < 1 × 10−5, 2.3 FE) and ‘interaction with host’ (Table 3). Interestingly, the majority of this network was non-responsive to (‘resistant’) PPARγ agonist treatment (56). However, it did contain 4 genes out of the 16 genes regulated in vivo following therapeutic lifestyle interventions (Figure 4B) in proportion to changes in IS (GOS2, AGL, LAT3 and ECHDC3). Network 3 included a number of insulin- and lipid-related signaling molecules (B4GALT6, CDK14, CERS6, GRB14, FATP1, GOS2, IGF1R, NLK, SLC27A1 and SESN1) (95–102). GRB14 has been consistently linked to ‘metabolic disease’ and is negatively correlated (R = –0.36) with IS in the present and other published analyses (103–105). The network also included regulators of carbohydrate homeostasis, (amylo-alpha-1,6-glucosidase, 4-alpha-glucanotransferase (AGL), a glycogen debranching enzyme) which were negatively associated with IS. Down-regulation was associated with improved IS following clinical intervention. Increased SLC27A1 (i.e. fatty acid transport protein 1, FATP1), a plasma membrane long-chain fatty acid transporter activated by mTORC1 signaling (98), and less G0S2 (G0/G1 switch 2)—a negative regulator of lipase activity (and a treatment response gene)—was related to better IS, expanding the evidence base in humans, for the ‘lipid paradox’ and muscle health (106). Thus, network 3 contains genes influenced by ‘substrate turn-over’ (exercise) rather than metabolic signaling (PPARγ activation) with clear links to substrate storage and utilization. A fourth network was dominated by non-coding RNAs (70%, 34 out of the 86 ncRNAs included among CORE-IS gene list, Table 3 and Figure 6), and required information gained from GBA and sense–antisense protein-coding relationship analysis to provide some biochemical context. Twelve of the lncRNAs had a total of 16 antisense cis-expressed protein-coding partners demonstrating positive (6) or negative (10) co-expression (Supplemental data S3). The combination of these 16 cis-expressed protein-coding genes and the 15 protein-coding CORE-IS genes from the module were analyzed for P–P interactions (79). This yielded a distinct interactive proteome: four gene-sets (mainly represented by the same 55 genes) were defined as representing ‘SRP-dependent co-translational protein targeting to membrane’, or ‘viral transcription’; >60 times enriched over the genome (Bonferroni < 1 × 10−83). A second group of 38, non-overlapping from the first, were defined as belonging to ‘viral process’ (19-fold enriched over the genome, Bonferroni < 1 × 10−34). Gene ontology analysis using DAVID, confirmed these extremely enriched categories (Supplemental data S8). This ncRNA-dominated network included some classic metabolic genes, (COL4A3BP — also known as CERT or GPBP), a GWAS loci gene for obesity, and a protein shuttling ceramides and diacylglycerol. Plausibly, greater CERT expression combats the negative impact of lipid molecules on insulin signaling (106,107) and thus may help to distinguish muscle lipid profile in T2DM patients from the ‘similarly’ highly lipid-laden but high IS state of endurance trained muscle (108), revealing further molecular details of the ‘lipid paradox’ (106). A noncoding RNA network is positively linked to peripheral insulin sensitivity and translational control. Probabilistic networks are utilized to establish a causal relationship between molecules and disease (5). We utilized the response to diverse forms of treatment aimed at reversing insulin resistance, along with measured fasting relationships to estimate if planar filtered networks (77), based on data from 191 individuals, were enriched in genes dynamically regulated or were lncRNAs, harbored causal genetic variant related to metabolic disease, or dominated by proven T2DM disease pathways. MEGENA (77) was used to identify discrete planar filtered networks using the CORE-IS gene list as input (FDR<1% and spearman gene correlation; P< 0.01 for module significance and P< 0.01 for network connectivity). Protein–protein interaction analysis (79) facilitated the interpretation of the identified networks while providing additional biological plausibility to each module (78). Node and label size are proportional to the node degree value within each distinct module. This, the fourth large discrete planar filtered network was dominated by non-coding RNAs (49 genes, 70% ncRNA, and 84% of the network members positively co-varying with in vivo IS). Thirteen of the ncRNA had cis antisense expressed protein-coding genes and these 13 genes, along with the remaining 30% protein-coding CORE-IS genes in the network, formed a protein–protein interaction network strongly associated with viral responsive protein-coding genes (P< 1 × 10−48) and translational initiation (P< 1 × 10−37). The network included several shorter ncRNAs, including the highly conserved 7SK ncRNA pseudogenes which are 271 to 326 base-pairs long. The function of these is unknown despite the 7SK ncRNA gene itself being described half a century ago. 7SK RNA binds to HEXIM1 to inhibit the positive transcription elongation factor b (P-TEFb)—transcript elongation (HEXIM1 is negatively associated with IS, Table 2 (140) and also regulated by a lncRNA (141).

DISCUSSION

Tissue responses to the hormone insulin (i.e. IS) underpin glycemic control in humans. Skeletal muscle is the largest target organ for insulin in humans, and in the post-prandial state insulin promotes cellular glucose transport for non-oxidative storage as glycogen. It is a striking observation that various modes of exercise or drug treatment (50,109–112) alter IS to a highly variable extent (50,111) such that it is essential we generate greater understanding of the molecular regulation of IS. In the present study, we have established the first robust gene expression signature for IS and utilized three strategies to validate our findings and explore the functional implications of our observations: systematic literature review, quantification of responses to clinical interventions; and application of systems biology methods. We independently identified many known IS regulators, including GRB14, a receptor tyrosine kinase adaptor protein (113) which inhibits insulin signaling (105,114), located at a GWAS significant metabolic disease loci, while providing the first detailed map of the relationship between lncRNAs and IS. We may have expected a greater number of GWAS loci related specifically to insulin, however as noted above, linear modeling does not fully capture the complex interactions between genes, insulin, obesity and risk of development of T2DM. Further, GWAS have revealed rather limited numbers of candidate loci for fasting insulin, perhaps reflecting a lack of a standard approach to measuring fasting insulin (59) across GWAS cohorts. Overall, we clearly demonstrate that quantitative modeling of RNA can reveal biological features consistent with both DNA analysis and biochemical analysis, demonstrating a close and expected (115) relationship between these three molecular universes (Table 2).

Biochemical and physiological influences on the molecular associates of IS

Studying the influence of associated risk-factors for T2DM, such as self-reported physical activity or diet is fraught with challenges (116) making the study of the molecular nature of IS in humans very challenging. Preliminary transcriptomics studies in muscle from T2DM patients claimed to find a specific PGC1α-mediated down-regulation (26) of OXPHOS genes, concluding that this was a therapeutic-target for preventing T2DM (117). However, this observation did not generalize to larger clinical cohorts (22); while any potential impact of loss of OXPHOS on insulin signaling has been shown to be capricious even in controlled cell models (88). Instead, single-study observations of reduced muscle ‘OXPHOS’ gene expression arise because of a number of experimental covariates, including using ‘control’ muscle post ‘hyper-insulinemic clamp’—this acutely up-regulates OXPHOS pathway transcription in healthy subjects (i.e. the ‘control’ samples); or when subjects are not matched for aerobic capacity, physical activity, age or any combination of the above (24,87,118,119). In fact, even in skeletal muscle, the IS signature involves a selective alteration in AA, and lipid and ketone metabolizing genes (Table 2) in the complete absence of any loss of OXPHOS gene expression (n = 564, Figure 3). This indicates qualitative changes in substrate metabolism, rather than a crude loss of global oxidative capacity potentially underpins the correlative relationship between circulating BCAA and diabetes (47,120), and this would be teleologically consistent with the high absolute oxidative capacity found in muscle (purposed for contraction, and not representing a limit to substrate oxidation rates in resting muscle). There is evidence that males and females with T2DM suffer different clinical sequela, particularly with respect to cardiovascular disease (121). The CORE-IS was not enriched in genes on the X or Y chromosomes; nor did it contain genes known to be subject to gender-specific imprinting. We examined the impact of sexual dimorphism on muscle gene expression, as this has not been reliably assessed before, and found 650 genes with sexual dimorphic gene expression (1%FDR, n = 89 per sex, Supplemental data S9) and this included 15% of the CORE-IS genes. Interestingly, the expression of a type II muscle fiber-type splice-variant (from the RYR1 gene) was lower expressed in women (Supplemental data S9); however, this prototype muscle fiber-type biomarker (43) did not correlate with IS. Thus, with our global transcriptomic methods, a modest relationship between sex, IS and muscle gene expression is detectable; potentially reflecting small differences in fiber-type composition, along with environmental factors, and this may contribute to differences in T2DM risk and disease progression between the genders (122). Far larger intervention studies, including many hundreds of both sexes, matched for physiological status, are required to robustly analyze if any variation in the responses to treatment reflects sexual dimorphism influenced genes, and thus this was not attempted in the present study.

Non-coding transcriptome generates novel insights into IS

Using RNA-seq, the Illumina Human BodyMap and Genotype-Tissue Expression Project (GTEx, (30)) has reported a median count of less than 5000 lncRNA genes across a large number of human tissue and cell types (123), often with only a few hundred per tissue-type. The present analysis, which relies on a probe-level signal filter (study specificity), a secondary transcript-level ‘noise’ filter and confirmation that the low-end of the dynamic expression range equates to known low-expressed RNA and proteins (Supplemental data S10 and raw data), demonstrates the advantages of our new approach to customising array processing to tissue/study specific patterns of RNA detection. Further, based again on RNA-seq observations, there is a belief that lncRNAs are expressed in a highly cell-type specific manner. For example, the lncRNA antisense for PDX1 (PDX1-AS or ’PLUTO’) - promotes expression of PDX1 in in ß-cells via regulation of chromatin - is down-regulated in T2DM (45). While reported as ‘ß-cell specific’, we observed PLUTO to be expressed in skeletal muscle at a value equal to islet cell-line HTA 2.0 data (EndoC-ßH1 cells), while PDX1 is 8-fold lower expressed in muscle than in islets (Supplemental data S10). Thus, while there are clearly further methodological factors to consider before we have a definitive view on the human transcriptome, the present analysis indicates that current representations of the ncRNA transcriptome by the Illumina Human BodyMap and GTEx are far from comprehensive, with reported tissue specific expression patterns being inaccurate. Notably, the IS-related ncRNAs in the present study are equally expressed in muscle, islet cells and adipose tissue (Figure 6 and Supplemental data S3) implying that they have the potential to impact on multiple aspects of insulin biology. Akerman et al recently reported that modulation of lncRNA in islet cells impacts genes expression both in cis and trans, and impacts insulin production and secretion in vitro (45). Further, six ncRNAs were dysregulated in islet samples from people with impaired glucose tolerance and T2DM. Of these ncRNAs, we could only locate AC009487.1 in the current reference genome, and this gene was not detected in skeletal muscle. We did find in vivo regulation of ncRNAs in relation to peripheral IS before and after treatment aimed at improving IS (Figure 2; Supplemental data S3, S5, and S8). Strikingly, eleven ncRNAs were regulated by treatment and some directly in proportion to IS improvements; these included PRKCQ-AS1, which is anti-sense to the gene coding for PKC-θ, a known regulator of insulin signaling and also a drug-target for T2DM (83–85). We were one of the first groups to report in vivo interactions between lncRNA and a cis-expressed protein coding gene in humans (124). In the present study, we found that expression of PRKCQ-AS1 and the mRNA for PKC-θ (PRKCQ) co-varied in vivo (Figure 2)—although the slope of the relationship indicates a modest interaction. In vitro loss of PRKCQ-AS1 was associated with greater expression of PRKCQ (Figure 2D, (P = 0.05, n = 12 transfections). This suggests that PRKCQ-AS1 may act to maintain a relatively constant level of PRKCQ, despite fluctuations in transcription from that locus. Like other sense-antisense partners, e.g. PDX1/PDX1-AS, co-expression can also be reflective of regional chromatin status and hence transcription from that genomic locus (45). A ncRNA antisense transcript at the Sorbin and SH3 domain containing three (SORBS3) loci, AC037459.2, was also related to fasting IS. SORBS3—which codes for the protein vinexin—is down-regulated in the muscle of obese individuals, with a 5–25% increase in cytosine methylation at the SORBS3 locus (125). Nonetheless, despite changes in expression of ENTPD1-AS1, NPSR1-AS1 and TTN-AS1 being positively associated with improvements in IS, and negatively associated for ATP13A4-AS1 and RORA-AS2, transcripts at the SORBS3 locus were stable to clinical intervention in our analysis. Further, detailed studies are merited, especially with respect to the complex PRKCQ-AS1/PRKCQ locus, given the importance of PKC-θ for insulin biology (84).

Quantitative measures of RNA and interpretation of metabolic disease GWAS loci

Notably, when we compared the CORE-IS list with a recently published study that appraised extreme IR in a small group of obese individuals there was 100% concordance for directionality for the genes in common to both studies (Supplemental data S11, (126)). This would support the interpretation that the present CORE-IS signature represents the natural trajectory of obesity-related metabolic disease and, due to the continuous/quantitative nature of RNA data, we believe that the CORE-IS signature is potentially suitable for use as a drug-screen aimed at finding treatments for improving IS. So far, the majority of genome-wide association markers related to human disease do not occur within the sequence coding the processed transcript, but rather occur at regulatory regions potentially impacting on transcription, as well as RNA stability and translation (127). There has been limited progress moving from GWAS loci candidate regions to the identification of the potential gene responsible for the clinical associations, partly reflective of extensive linkage disequilibrium (14). Thus, identification of 45 GWAS-MD genome-wide significant loci (plus an additional nine GWAS significant from the network-analysis and >150 statistically weaker associations) indicates that network modeling of RNA (77) provides an opportunity to examine the continuous quantitative relationship between a genomic disease risk loci and the phenotype in question, something that is not possible with DNA sequence alone. It is important to state that there were no occasions when a CORE-IS gene only occurred at non-GWAS-MD gene using the Type 2 Diabetes knowledge portal (it contains GWAS results beyond metabolic disease). Further, in ∼5% of occasions GWAS-MD significant CORE-IS genes were associated with a non-metabolic disease at the ‘GWAS’ or a ‘suggestive association’ level, (but in most cases this non-metabolic phenotype was ‘Height’, which is related to body size and BMI calculation). Gibson et al has demonstrated that global eQTL analysis can be an effective strategy for identifying the functional gene at GWAS loci for individual diseases (128). Indeed, large scale eQTL analysis of adipose tissue samples provides support for 14 of the GWAS-MD loci mentioned above, of which 7 increased the risk for T2DM. Genotype-transcript associations have also been observed in human pancreas islets, using cis-exon-eQTL mapping (129); where ∼2000 samples were required to identify an eQTL capturing 1% heritability (19). Van de Bunt et al. reported 2341 (or 6% of RNA-seq detectable genes) with an exon-eQTL within 250 kb of the transcription start site (an arbitrary distance) and 44 of these eQTL genes were related to peripheral IS in our analyses (Supplemental data S12). Further exploration of the CORE-IS gene-list using network analysis (77) in hepatic and adipose tissue would no doubt reveal additional commonalities across tissues; however, this will require new RNA datasets using RNA quantification methods consistent with the present study, which should overcome limitations in short-read RNA-seq (30,37), thus potentially circumventing some of the challenges faced when implementing eQTL methodologies (19).

Network structures and dynamics responses of the CORE-IS genes

Using scale-free planar filtered network analysis (Table 3), we mapped the network structure of the CORE-IS genes (77). This approach—well suited to RNA profiles from a relatively cell-type homogenous tissue like skeletal muscle—reveals the regulatory interactions underlying insulin sensitivity as well as smaller functional sub-clusters of genes (modules). For example, we found GRB14 closely co-expressed with six additional genes, including an aromatic AA transporter—SLC16A10 (TAT1, Supplemental Figure S5). Using transcription factor analysis (63), we noted that four of these genes (GRB14, HOMER1, KPNA5 and SLC16A10) had HNF1A type binding sites (Z-score = 9.3); HNF1A occurs at a GWAS risk loci for T2DM (71) and loss-of-function of HNF1A is related to familial early-onset diabetes (130). HNF1A type transcription factor binding sites were also enriched in CORE-IS genes differentially expressed following exercise-training (Z-score = 15.3) and interestingly polymorphisms in HNF1A are also linked to muscle histological phenotype (131), while this transcription factor also plays an established role in the pancreas. This illustrates the potentially pervasive influence of disease risk genes across multiple organs and the layers of genomic variants which influence the regulation of the expression of the CORE-IS genes. Unique to the present analysis, we were able to identify a molecular signature that is regulated in proportion to the success of clinical interventions aimed at combating T2DM. A subset of 16 of the CORE-IS genes quantitatively related to improvement in insulin action in four diverse clinical studies, providing a major step toward identifying a molecular basis for the variable improvements in IS observed following standardized exercise/life-style interventions (48,50). One quarter were related to amino-acid metabolism (ALDH6A1, DHTKD1, LAT3 and MCCC1) and were positively linked to IS; four genes were ‘substrate’ metabolism genes (AGL, G0S2, KPNA2 and PGM2) involved in either blocking substrate mobilization or up-regulating glycolytic capacity, all negatively linked to IS and four were located at GWAS significant loci. PRRX1 is involved in pancreatic developmental and modulating adipogenesis via TGFβ activity (132), a key pathway involved in the positive cardiovascular adaptation to endurance exercise (20,133), was positively associated with IS responses. This 16-gene signature represents a potentially useful insulin-target tissue biomarker for examining the responses to novel treatments aimed at improving IS. In many of the largest GWAS studies, variation in fasting insulin was related to BMI with a shared variance of ∼33% (86) and statistical adjustment for BMI facilitated GWAS analysis. Recent studies have linked (implied causality) risk phenotypes for IR (e.g. elevated BMI, fasting glucose, fasting triglycerides (TG) or reduced HDL (69)) with GWAS significant loci. This has led to the emerging concept of failure in ‘adipose tissue capacity’ to adapt to over-feeding, promoting ectopic muscle and liver IR (69). Lotta et al combined information from 53 loci associated with higher fasting insulin, as well as lipids; a 53-SNP genetic score which associated with poor insulin sensitivity in >7000 independent subjects and a 12% increased relative risk for T2DM (69) and a lower rate of peripheral adipose tissue ‘accumulation’. This genetic estimate of IR status, was related to higher concentrations of all 3 BCAA’s (134) and included 5 CORE-IS genes from the present study (GRB14, MCC, LPL, PPP1R3B and INSR). A gain of function mutation in the lipid ‘mobilizing’ gene, LPL, is associated with better IS and protection from T2DM (69). Consistent with this, muscle LPL mRNA expression was positively associated with IS (n = 564, R = 0.35, FDR < 1%). It is understood that under normal physiological conditions in humans, a high proportion of body mass is attributable to skeletal muscle (∼40%), such that the same risk genes may relate to IS in both adipose and muscle tissue compartments, incorporating aspects of tissue plasticity that are common to adipose and muscle (20,135). Development and application of an appropriately validated multi-RNA risk-score will capture more clinical variance, and hence offers greater potential than the aforementioned 53-SNP genetic score. In conclusion, based on the production and data modeling of a total of 1012 human muscle RNA profiles, as well as a new tissue-specific approach to substantially enhance the analysis of high-density DNA arrays, we produced a robust and highly validated global molecular model of human peripheral insulin sensitivity. The model revealed a novel interpretative structure for potentially exploring the dynamic interactions of metabolic-disease GWAS loci and, for the first time, illustrates the network interactions between the coding and non-coding IS transcriptome. Among other implications, this work lays the foundation for the discovery of novel biomarkers for metabolic disease and a tool for in vitro drug-screening.

DATA AVAILABILITY

All new array data has been deposited at GEO (GSE104235). Click here for additional data file.

138 in total

1. The effects of aerobic, resistance, and combination training on insulin sensitivity and secretion in overweight adults from STRRIDE AT/RT: a randomized trial.

Authors: Hiba AbouAssi; Cris A Slentz; Catherine R Mikus; Charles J Tanner; Lori A Bateman; Leslie H Willis; A Tamlyn Shields; Lucy W Piner; Lorrie E Penry; Erik A Kraus; Kim M Huffman; Connie W Bales; Joseph A Houmard; William E Kraus
Journal: J Appl Physiol (1985) Date: 2015-06-15

2. Obesity-induced CerS6-dependent C16:0 ceramide production promotes weight gain and glucose intolerance.

Authors: Sarah M Turpin; Hayley T Nicholls; Diana M Willmes; Arnaud Mourier; Susanne Brodesser; Claudia M Wunderlich; Jan Mauer; Elaine Xu; Philipp Hammerschmidt; Hella S Brönneke; Aleksandra Trifunovic; Giuseppe LoSasso; F Thomas Wunderlich; Jan-Wilhelm Kornfeld; Matthias Blüher; Martin Krönke; Jens C Brüning
Journal: Cell Metab Date: 2014-10-07 Impact factor: 27.287

3. Role of a novel coiled-coil domain-containing protein CCDC69 in regulating central spindle assembly.

Authors: Debjani Pal; Di Wu; Akiko Haruta; Fumio Matsumura; Qize Wei
Journal: Cell Cycle Date: 2010-10-22 Impact factor: 4.534

4. Effect of the volume and intensity of exercise training on insulin sensitivity.

Authors: Joseph A Houmard; Charles J Tanner; Cris A Slentz; Brian D Duscha; Jennifer S McCartney; William E Kraus
Journal: J Appl Physiol (1985) Date: 2003-09-12

5. Impact of combined resistance and aerobic exercise training on branched-chain amino acid turnover, glycine metabolism and insulin sensitivity in overweight humans.

Authors: Erin L Glynn; Lucy W Piner; Kim M Huffman; Cris A Slentz; Lorraine Elliot-Penry; Hiba AbouAssi; Phillip J White; James R Bain; Michael J Muehlbauer; Olga R Ilkayeva; Robert D Stevens; Kathryn N Porter Starr; Connie W Bales; Elena Volpi; M Julia Brosnan; Jeff K Trimmer; Timothy P Rolph; Christopher B Newgard; William E Kraus
Journal: Diabetologia Date: 2015-08-09 Impact factor: 10.122

Review 6. Transcriptional control of mitochondrial energy metabolism through the PGC1 coactivators.

Authors: Bruce M Spiegelman
Journal: Novartis Found Symp Date: 2007

7. Role of diacylglycerol activation of PKCθ in lipid-induced muscle insulin resistance in humans.

Authors: Julia Szendroedi; Toru Yoshimura; Esther Phielix; Chrysi Koliaki; Mellissa Marcucci; Dongyan Zhang; Tomas Jelenik; Janette Müller; Christian Herder; Peter Nowotny; Gerald I Shulman; Michael Roden
Journal: Proc Natl Acad Sci U S A Date: 2014-06-16 Impact factor: 11.205

8. Protein kinase C (PKC)-alpha activation inhibits PKC-zeta and mediates the action of PED/PEA-15 on glucose transport in the L6 skeletal muscle cells.

Authors: G Condorelli; G Vigliotta; A Trencia; M A Maitan; M Caruso; C Miele; F Oriente; S Santopietro; P Formisano; F Beguinot
Journal: Diabetes Date: 2001-06 Impact factor: 9.461

9. Transcriptomics in type 2 diabetes: Bridging the gap between genotype and phenotype.

Authors: Christopher P Jenkinson; Harald H H Göring; Rector Arya; John Blangero; Ravindranath Duggirala; Ralph A DeFronzo
Journal: Genom Data Date: 2015-12-17

10. Endothelial function increases after a 16-week diet and exercise intervention in overweight and obese young women.

Authors: Lisa M Cotie; Andrea R Josse; Stuart M Phillips; Maureen J MacDonald
Journal: Biomed Res Int Date: 2014-03-20 Impact factor: 3.411

20 in total

1. Administration of Phosphonate Inhibitors of Dehydrogenases of 2-Oxoglutarate and 2-Oxoadipate to Rats Elicits Target-Specific Metabolic and Physiological Responses.

Authors: Victoria I Bunik; Artem V Artiukhov; Alexey V Kazantsev; Vasily A Aleshin; Alexandra I Boyko; Alexander L Ksenofontov; Nikolay V Lukashev; Anastasia V Graf
Journal: Front Chem Date: 2022-06-20 Impact factor: 5.545

2. Delayed Impact of 2-Oxoadipate Dehydrogenase Inhibition on the Rat Brain Metabolism Is Linked to Protein Glutarylation.

Authors: Alexandra I Boyko; Irina S Karlina; Lev G Zavileyskiy; Vasily A Aleshin; Artem V Artiukhov; Thilo Kaehne; Alexander L Ksenofontov; Sergey I Ryabov; Anastasia V Graf; Angela Tramonti; Victoria I Bunik
Journal: Front Med (Lausanne) Date: 2022-06-01

3. Altered mitochondrial microenvironment at the spotlight of musculoskeletal aging and Alzheimer's disease.

Authors: Panagiotis Giannos; Konstantinos Prokopidis; Stuart M Raleigh; Eirini Kelaiditi; Mathew Hill
Journal: Sci Rep Date: 2022-07-04 Impact factor: 4.996

4. Significance of bilayer-forming phospholipids for skeletal muscle insulin sensitivity and mitochondrial function.

Authors: Sophie Grapentine; Marica Bakovic
Journal: J Biomed Res Date: 2019-05-31

5. Human and mouse muscle transcriptomic analyses identify insulin receptor mRNA downregulation in hyperinsulinemia-associated insulin resistance.

Authors: Haoning Howard Cen; Bahira Hussein; José Diego Botezelli; Su Wang; Jiashuo Aaron Zhang; Nilou Noursadeghi; Niels Jessen; Brian Rodrigues; James A Timmons; James D Johnson
Journal: FASEB J Date: 2022-01 Impact factor: 5.834

6. Hibernation induces widespread transcriptional remodeling in metabolic tissues of the grizzly bear.

Authors: Heiko T Jansen; Shawn Trojahn; Michael W Saxton; Joanna L Kelley; Corey R Quackenbush; Brandon D Evans Hutzenbiler; O Lynne Nelson; Omar E Cornejo; Charles T Robbins
Journal: Commun Biol Date: 2019-09-13

7. Longevity-related molecular pathways are subject to midlife "switch" in humans.

Authors: James A Timmons; Claude-Henry Volmar; Hannah Crossland; Bethan E Phillips; Sanjana Sood; Karolina J Janczura; Timo Törmäkangas; Urho M Kujala; William E Kraus; Philip J Atherton; Claes Wahlestedt
Journal: Aging Cell Date: 2019-06-06 Impact factor: 9.304

8. Reference quantitative transcriptome dataset for adult Caenorhabditis elegans.

Authors: Allison Piovesan; Francesca Antonaros; Pierluigi Strippoli; Lorenza Vitale; Maria Chiara Pelleri; Maria Caracausi
Journal: Data Brief Date: 2019-06-13

Review 9. (ADP-ribosyl)hydrolases: structure, function, and biology.

Authors: Johannes Gregor Matthias Rack; Luca Palazzo; Ivan Ahel
Journal: Genes Dev Date: 2020-02-06 Impact factor: 11.361

10. Molecular Transducers of Human Skeletal Muscle Remodeling under Different Loading States.

Authors: Tanner Stokes; James A Timmons; Hannah Crossland; Thomas R Tripp; Kevin Murphy; Chris McGlory; Cameron J Mitchell; Sara Y Oikawa; Robert W Morton; Bethan E Phillips; Steven K Baker; Phillip J Atherton; Claes Wahlestedt; Stuart M Phillips
Journal: Cell Rep Date: 2020-08-04 Impact factor: 9.423