Literature DB >> 29229843

Machine learning shows association between genetic variability in PPARG and cerebral connectivity in preterm infants.

Michelle L Krishnan1, Zi Wang2, Paul Aljabar1, Gareth Ball1, Ghazala Mirza3, Alka Saxena3, Serena J Counsell1, Joseph V Hajnal1,2, Giovanni Montana2, A David Edwards4.   

Abstract

Preterm infants show abnormal structural and functional brain development, and have a high risk of long-term neurocognitive problems. The molecular and cellular mechanisms involved are poorly understood, but novel methods now make it possible to address them by examining the relationship between common genetic variability and brain endophenotype. We addressed the hypothesis that variability in the Peroxisome Proliferator Activated Receptor (PPAR) pathway would be related to brain development. We employed machine learning in an unsupervised, unbiased, combined analysis of whole-brain diffusion tractography together with genomewide, single-nucleotide polymorphism (SNP)-based genotypes from a cohort of 272 preterm infants, using Sparse Reduced Rank Regression (sRRR) and correcting for ethnicity and age at birth and imaging. Empirical selection frequencies for SNPs associated with cerebral connectivity ranged from 0.663 to zero, with multiple highly selected SNPs mapping to genes for PPARG (six SNPs), ITGA6 (four SNPs), and FXR1 (two SNPs). SNPs in PPARG were significantly overrepresented (ranked 7-11 and 67 of 556,000 SNPs; P < 2.2 × 10-7), and were mostly in introns or regulatory regions with predicted effects including protein coding and nonsense-mediated decay. Edge-centric graph-theoretic analysis showed that highly selected white-matter tracts were consistent across the group and important for information transfer (P < 2.2 × 10-17); they most often connected to the insula (P < 6 × 10-17). These results suggest that the inhibited brain development seen in humans exposed to the stress of a premature extrauterine environment is modulated by genetic factors, and that PPARG signaling has a previously unrecognized role in cerebral development.
Copyright © 2017 the Author(s). Published by PNAS.

Entities:  

Keywords:  PPARG; brain development; machine learning; magnetic resonance imaging; preterm

Mesh:

Substances:

Year:  2017        PMID: 29229843      PMCID: PMC5748164          DOI: 10.1073/pnas.1704907114

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


Preterm birth accounts for 11% of all births (1), and is the leading global cause of death and disability under 5 y of age (2). Over 30% of survivors experience neurocognitive problems from early life (3) lasting into adulthood (4), including anxiety, inattention, and social and communication problems (5), and socioemotional problems (6). Psychiatric disorders are present in around 25% of preterm adolescents, constituting a 3–4-fold increased risk compared with term-born peers (review in ref. 7), including a risk ratio of 7.4 for bipolar affective disorder and 2.5 for nonaffective psychosis (8), and a threefold increase in the prevalence of autism-spectrum disorders (ASD) (9). Imaging studies have shown that adverse functional outcomes are associated with changes in brain structure, connectivity, and function (10, 11), but while this phenotype has been extensively investigated in recent years, few studies have addressed the cellular or molecular mechanisms involved. Recent advances in machine learning and imaging genomics now make it possible to investigate potential mechanisms by studying the genetic variability associated with the cerebral endophenotypes. Previously, our preliminary candidate gene and genomewide pathway-based studies have suggested an association between white-matter development and a number of metabolic pathways, with the strongest link to the Peroxisome Proliferator Activated Receptor (PPAR) pathway (12, 13), raising the hypothesis that the PPAR pathway modulates brain development in preterm infants. To test this, and to explore genetic influences on preterm brain development further, we collected a large cohort of linked diffusion MRI (d-MRI) and genomic data, and undertook an unsupervised, unbiased machine-learning analysis of whole-brain diffusion tractography together with genomewide, single-nucleotide polymorphism (SNP)-based genotypes.

Results

Participants.

A cohort of 272 infants born at less than 33 wk gestational age (GA) (mean 29 wk + 4 d) had suitable imaging at term-equivalent age [mean age at scan (SA) 42 wk + 4 d] and allied genomic DNA available ().

Population Stratification.

Relatedness between individuals in the cohort was assessed by calculating pairwise identity by state (IBS) values and using this distance matrix to perform principal component analysis. This revealed a degree of stratification along the first two components, corresponding to parental self-reported ethnicity (). The first principal component of the IBS matrix was used as a covariate for adjustment of the phenotype.

Sparse Reduced Rank Regression Selects a Consistent Cerebral Endophenotype.

White-matter tracts, defined as the edges in the tractography connectivity matrix, were ranked according to their selection probabilities in the Sparse Reduced Rank Regression (sRRR) model. Selection frequencies ranged from 0.817 to zero; Fig. 1 shows the 100 most frequently selected edges. Two separate approaches were employed to eliminate individual relationships between imaging and genetic data and achieve a set of null results: (i) permutation of individuals within the imaging dataset; and (ii) replacing the original phenotype matrix with a matrix of randomly generated values with standard normal distribution and the same dimensions. Fig. 1 shows the empirical distribution together with these two null sets. The random replacement, but not the permutation of individuals, produced a null distribution with uniformly low selection frequency, which demonstrates a level of similarity between individuals and a consistent endophenotype.
Fig. 1.

Selection frequencies of imaging variables with sRRR. Solid green line: Empirical selection frequencies of imaging variables (edges from the probabilistic tractography connectivity matrix) as ranked by the sRRR model with 1,000 subsamples of size 2/3 total number of subjects and convergence criterion = 1 × 10−6. Empty gray circles: A permuted distribution was computed with the same parameters by permuting the order of individual imaging datasets between each subsample of the data. Solid gray circles: A null distribution was computed using a randomly generated matrix of standard normally distributed values with the same dimensions as the empirical data, using 20,000 subsamples of 2/3 of samples. Inset (larger in ): Surface anatomical location of the 10 edges most highly ranked by the sRRR method, using cortical atlas coordinates from the UNC AAL neonatal atlas (71), rendered with the BrainNet viewer (14). Surface views, cortical regions shown as gray circles and brain surface semitransparent. First row from left to right: lateral view of left hemisphere, view from above, lateral view of right hemisphere. Second row from left to right: medial view of left hemisphere, inferior aspect, medial view of right hemisphere. Third row: anterior aspect and posterior aspect. Top 10 ranked tracts (directionality not implied), subscripts “_L” and “_R” indicate laterality: (RegionA * RegionB): Insula_R*Temporal_Inferior_R; Insula_R*Occipital_Middle_R; Insula_L*Occipital_Middle_L; Frontal_Superior_R*Temporal_Superior_R; Frontal_Middle_R*Insula_R; Occipital_Superior_R*Temporal_Inferior_R; Hippocampus_R*Occipital_Middle_R; Rolandic_Operculum_R*Insula_R; Frontal_Superior_R*Insula_R; Frontal_Middle_L*Insula_L.

Selection frequencies of imaging variables with sRRR. Solid green line: Empirical selection frequencies of imaging variables (edges from the probabilistic tractography connectivity matrix) as ranked by the sRRR model with 1,000 subsamples of size 2/3 total number of subjects and convergence criterion = 1 × 10−6. Empty gray circles: A permuted distribution was computed with the same parameters by permuting the order of individual imaging datasets between each subsample of the data. Solid gray circles: A null distribution was computed using a randomly generated matrix of standard normally distributed values with the same dimensions as the empirical data, using 20,000 subsamples of 2/3 of samples. Inset (larger in ): Surface anatomical location of the 10 edges most highly ranked by the sRRR method, using cortical atlas coordinates from the UNC AAL neonatal atlas (71), rendered with the BrainNet viewer (14). Surface views, cortical regions shown as gray circles and brain surface semitransparent. First row from left to right: lateral view of left hemisphere, view from above, lateral view of right hemisphere. Second row from left to right: medial view of left hemisphere, inferior aspect, medial view of right hemisphere. Third row: anterior aspect and posterior aspect. Top 10 ranked tracts (directionality not implied), subscripts “_L” and “_R” indicate laterality: (RegionA * RegionB): Insula_R*Temporal_Inferior_R; Insula_R*Occipital_Middle_R; Insula_L*Occipital_Middle_L; Frontal_Superior_R*Temporal_Superior_R; Frontal_Middle_R*Insula_R; Occipital_Superior_R*Temporal_Inferior_R; Hippocampus_R*Occipital_Middle_R; Rolandic_Operculum_R*Insula_R; Frontal_Superior_R*Insula_R; Frontal_Middle_L*Insula_L.

Consistently Selected White-Matter Tracts Have a Significant Role in Information Flow.

To understand the cerebral endophenotype further, the 10 tractography connections selected most often in the sRRR model were visualized according to University of North Carolina Automated Anatomical Labeling atlas coordinates using BrainNet Viewer software (14), and the group median connectivity matrix was examined from an edge-centric perspective, applying graph theoretical measures to detect link communities (15). These tracts were distributed within middle-frontal, middle-temporal, and parahippocampal/entorhinal link communities (15), and integrated fractional anisotropy ranged from 0.06 to 0.2 (Fig. 1, Inset and ). Their impact on information flow in the network was then studied using an “edge lesioning” strategy (15), in which the effect on global communicability of removing them was compared with the effect of removing other sets of 10 edges at random. Global communicability captures information flow in the network and accounts for both shortest paths and all other paths connecting two nodes (16). Removal of the top 10 tracts decreased global communicability from an unlesioned baseline of 1–0.858, significantly more than in 1,000 random lesions (, P < 2.2 × 10−17, Wilcoxon rank sum test), supporting the importance of highly selected white-matter tracts in the cerebral network architecture. The cortical regions linked by the 100 most highly selected tracts were extracted and counted (Fig. 2), with the insular cortex occurring significantly more frequently than predicted (P < 6 × 10−17, Fisher’s Exact Test).
Fig. 2.

Cortical regions in top 100 tracts by ranked sRRR. Frequency of participation of cortical regions among the top 100 ranked tracts, normalized by total frequency in the network. Insula occurrence is significantly higher than expected (P < 6 × 10−17, Fisher’s Exact Test). Regions (x axis, left to right): Insula; Temporal Middle; Occipital Middle; Rolandic Operculum; Temporal Superior; Temporal Pole Middle; Precentral; Temporal Inferior; Postcentral; Supramarginal; Frontal Middle; Superior Motor Area; Temporal Pole Superior; Occipital Inferior; Occipital Superior; Frontal Inferior Trigone; Frontal Superior; Lingual; Parietal Inferior; Hippocampus; Precuneus; Amygdala; Angular; Cuneus; Cingulum Posterior; Frontal Superior Medial; ParaHippocampal; Fusiform; Paracentral Lobule; Frontal Inferior Operculum; Cingulum Anterior; Frontal Superior Orbital; Olfactory; Parietal Superior; Frontal Middle Orbital.

Cortical regions in top 100 tracts by ranked sRRR. Frequency of participation of cortical regions among the top 100 ranked tracts, normalized by total frequency in the network. Insula occurrence is significantly higher than expected (P < 6 × 10−17, Fisher’s Exact Test). Regions (x axis, left to right): Insula; Temporal Middle; Occipital Middle; Rolandic Operculum; Temporal Superior; Temporal Pole Middle; Precentral; Temporal Inferior; Postcentral; Supramarginal; Frontal Middle; Superior Motor Area; Temporal Pole Superior; Occipital Inferior; Occipital Superior; Frontal Inferior Trigone; Frontal Superior; Lingual; Parietal Inferior; Hippocampus; Precuneus; Amygdala; Angular; Cuneus; Cingulum Posterior; Frontal Superior Medial; ParaHippocampal; Fusiform; Paracentral Lobule; Frontal Inferior Operculum; Cingulum Anterior; Frontal Superior Orbital; Olfactory; Parietal Superior; Frontal Middle Orbital.

Genetic Variation Is Associated with the Preterm Cerebral Endophenotype.

Genetic associations with image features were identified using the sRRR method as previously reported (17, 18). Genomewide SNPs were ranked according to their selection probabilities in the sRRR model, and the empirical and null distributions were inspected. This revealed empirical selection frequencies between 0.663 and zero, with a steeper rate of decrease after the top 100 ranked SNPs, and uniformly low null distribution (Fig. 3). The top 100 SNPs were thus examined in more detail as a stringent subset of genetic variables most highly and stably associated with the tractography features (). These top 100 SNPs mapped to 47 genes (), mostly in linkage equilibrium with each other apart from three separate hotspots of linkage disequilibrium centered on the genes PPARG (six SNPs), Integrin Subunit Alpha 6 (ITGA6) (four SNPs), and Fragile X Mental Retardation, Autosomal Homolog 1 (FXR1) (two SNPs).
Fig. 3.

Selection frequencies of top 1,000 SNPs ranked by sRRR. Selection frequencies of top 1,000 SNPs ranked by the sRRR method over 1,000 subsamples of 2/3 of the individuals and convergence criterion = 1 × 10−6. Empirical results (solid line) show that there is a plateau of the highest selection frequencies (maximum 0.663), which is stable for a subset of 100 SNPs. The first 10 of 100 equally highly ranked SNPs mapped to six genes (in alphabetical order): AGAP1, POGZ, PPARG, TSEN2, UBE2E1 (full list of mapped genes in and a full list of SNPs in ). The null distribution obtained through permutation of the individuals with 20,000 subsamples is very low and uniform (dotted line).

Selection frequencies of top 1,000 SNPs ranked by sRRR. Selection frequencies of top 1,000 SNPs ranked by the sRRR method over 1,000 subsamples of 2/3 of the individuals and convergence criterion = 1 × 10−6. Empirical results (solid line) show that there is a plateau of the highest selection frequencies (maximum 0.663), which is stable for a subset of 100 SNPs. The first 10 of 100 equally highly ranked SNPs mapped to six genes (in alphabetical order): AGAP1, POGZ, PPARG, TSEN2, UBE2E1 (full list of mapped genes in and a full list of SNPs in ). The null distribution obtained through permutation of the individuals with 20,000 subsamples is very low and uniform (dotted line).

SNPs in the PPARG Gene Are Most Highly Associated with Variability in Imaging Features.

SNPs in PPARG were significantly overrepresented among the top 100 SNPs (P < 2.2 × 10−7, Fisher’s Exact Test), ranked by the sRRR model according to strength of association in positions 7–11 (rs17036282, rs6801982, rs4135334, rs4135336, rs4135342) and position 67 (rs6442313) of 556,000, with uniform selection frequencies of 0.663. The PPARG SNPs were mostly in intronic or regulatory regions (promoter flanking regions and open chromatin regions) (), with predicted effects on processes including protein coding, retained introns, and nonsense-mediated decay ().

Highly Associated Genes Are Involved in Biological Processes Including Lipid Metabolism.

To explore the 47 top-ranked genes the Gene Ontology (GO) framework was surveyed using the Cytoscape tool ClueGO (19) to create a functionally organized GO term network (). This revealed several significantly overrepresented themes of interest (hypergeometric test, adjusted P < 0.05 with Benjamini–Hochberg correction) including lipid metabolism (PPARG), neuron projection regeneration (ADM, PRRX1), response to nerve growth factor stimulation (BPTF, EP300), acetylcholine biosynthesis (CHAT), and presynaptic membrane assembly (PTPRD) (full annotation in ). Given the significant overrepresentation of PPARG SNPs among the most highly ranked SNPs and our previous finding of association between lipid metabolism genes and white-matter integrity in preterms (12, 13), the top 100 SNPs were tested for significant overlap with a list of SNPs mapping to genes classified with the GO term “lipid metabolism” (GO: 0006629), using the R package SuperExactTest (20). Four genes (PPARG, ADM, CHAT, and PNPL6) involved in lipid metabolism according to the GO classification were present among the top 100 SNPs ranked by sRRR, more than would be expected by chance (P < 0.005) (Fig. 4). As a null frame of reference, this result was compared with the overlap between the bottom-ranked 100 SNPs and the GO lipid list, which was not significant.
Fig. 4.

Representation of PPARG SNPs in the lipid metabolism gene ontology category. Group 1: Top 100 ranked SNPs associated with imaging features by sRRR. Group 2: Bottom 100 SNPs ranked by sRRR. Group 3: All SNPs in the lipid metabolism GO category present on genotyping array. Circular plot illustrating all possible intersections between these three groups and the corresponding statistics. The three tracks in the middle represent the three SNP lists, with individual colored blocks showing “presence” (dark) or “absence” (light) of the SNP groups in each intersection test. The height of the bars in the outer layer is proportional to the log of intersection sizes, indicated by the numbers on the top of the bars. The color intensity of the bars represents the P-value significance of the intersections (background = 19,000 protein-coding human genes). The number of SNPs contributing to each intersection is listed above the segment. There is a significant representation of the top 100 SNPs ranked by sRRR (group 1) among the SNPs in the GO lipid metabolism category (group 3) (P < 0.05). The SNPs present in both groups 1 and 3 map to four genes as shown (PPARG, ADM, CHAT, PNPLA6).

Representation of PPARG SNPs in the lipid metabolism gene ontology category. Group 1: Top 100 ranked SNPs associated with imaging features by sRRR. Group 2: Bottom 100 SNPs ranked by sRRR. Group 3: All SNPs in the lipid metabolism GO category present on genotyping array. Circular plot illustrating all possible intersections between these three groups and the corresponding statistics. The three tracks in the middle represent the three SNP lists, with individual colored blocks showing “presence” (dark) or “absence” (light) of the SNP groups in each intersection test. The height of the bars in the outer layer is proportional to the log of intersection sizes, indicated by the numbers on the top of the bars. The color intensity of the bars represents the P-value significance of the intersections (background = 19,000 protein-coding human genes). The number of SNPs contributing to each intersection is listed above the segment. There is a significant representation of the top 100 SNPs ranked by sRRR (group 1) among the SNPs in the GO lipid metabolism category (group 3) (P < 0.05). The SNPs present in both groups 1 and 3 map to four genes as shown (PPARG, ADM, CHAT, PNPLA6).

Highly Associated Genes Are Associated with Neuropsychiatric Diseases.

Given the uniform selection frequency of the top 100 ranked SNPs, we examined evidence in literature for all their mapped genes. A machine-learning-based text-mining strategy was employed, using the Agilent Literature Search tool in Cytoscape (21) to query text-based search engines and extract associations among the genes, visualizing them as a network with the sentences for each association forming the network edges (). Mentions of at least 2 of the 47 genes of interest were found in 405 Pubmed-indexed abstracts, which were queried for the occurrence of disease terms using the tool pubmed.mineR (22). The most frequently occurring disease terms related to cancer, reflecting a known ascertainment bias in the literature (23). Once cancer-related terms were removed, the most frequently occurring disease terms were “autism spectrum disorder,” “intellectual disability,” and “schizophrenia” (), neuropsychiatric conditions more common in the preterm population.

Discussion

The molecular and cellular events leading to abnormal brain development in preterm infants are poorly understood, but hypoxia, ischemia, and inflammation are all believed to play a role (11) and the host response to these external insults is modulated by the combined effects of multiple genes (24, 25). The present results are consistent with the hypothesis that changes in white-matter structure that predict adverse outcome are modulated by genetic variability in the PPAR signaling pathway. The genetic imaging approach relies on heritability and an appropriate endophenotype. Common DNA sequence variation is estimated to account for up to 50% of additive genetic variation in complex traits, including neuroanatomical features (26) and neurocognitive disorders including ASD (27) and schizophrenia (28). Imaging cerebral endophenotypes generally have high heritability and relevance (29, 30): in the neonatal period 60% of the variability between individuals in d-MRI features can be attributed to genetic factors (31, 32), and d-MRI measures of white-matter structure predict neurodevelopmental outcome (33–35). Analysis of the cerebral connections selected by the algorithm showed that they were stable within the group, and virtual lesioning showed that they are important for information transfer in the network. They thus represent structures that are highly relevant to long-term neurological function. Machine learning using penalized regression provided an unsupervised, unbiased method to address the hypothesis. sRRR is specifically designed to deal with cohorts where the number of individuals is smaller than the number of features, and outperforms mass-univariate linear models when considering genetic effect sizes comparable to those expected here (17). The approach involves fitting a predictive model for the phenotype using all SNPs, while also ranking all SNPs based on their predictive value, and is of benefit in imaging genomics studies where there are many more features than individuals and the number of possible hypotheses is vast (17, 36). The strategy is not impacted by multiple testing concerns since sRRR is based on selecting the variables that contribute most to the relationship between predictors and responses within a multivariate model, rather than performing repeated univariate tests. In addition to testing the primary hypothesis, the study produced further observations. First, other genes were found to be linked to the endophenotype: Integrin Subunit Alpha 6 (ITGA6) (four SNPs) which is involved in insulin-like growth factor 1 signaling, and Fragile X Mental Retardation, Autosomal Homolog 1 (FXR1) (two SNPs), while genes involved in lipid metabolism, various neural processes, and neuropsychiatric disease appear to be overrepresented among highly selected SNPs. Further work is needed to understand the significance of these observations. Second, insular cortex was more frequently involved in the top 100 ranked tracts than expected by chance. This is consistent with previous observations. This region is highly connected, receiving direct input from the somatosensory cortex and projecting outputs to both cortical and subcortical regions (37) and is part of the rich club in adults (38) and preterm infants (39). In individuals born preterm the volume, surface area, and folding of the insular cortex is reduced in infancy and early adulthood (40–42) accompanied by alterations in functional activation patterns (43, 44), visual function (45), and cognition (46, 47). Insular abnormalities have been implicated in ASD (48, 49) and attention-deficit hyperactivity disorder (50, 51) that are more prevalent in the preterm population, and the insula has recently been shown in preterms to be the source of spontaneous neuronal bursts (delta brushes) that are instructive in neuronal circuit development (52). Tractography methods aim to provide an insight into in vivo macrostructural brain connectivity via the diffusion features of brain tissue (53–55). Deterministic tractography algorithms propagate streamlines from a seed region along the main estimated fiber orientation, voxel-by-voxel, with one fiber orientation measurement taken in each voxel. This strategy has been successfully employed in the study of a wide range of neurological and psychiatric diseases, but is typically challenged by areas of high uncertainty such as in approaching the gray matter where the anisotropy is typically lower; at areas of crossing fibers where there are fiber populations traveling in different directions; in tracts such as cortico-striatal projections where functionally related anatomical subdivisions of the striatum project to different cortical areas (information funneling); and in the developing brain where there is generally less myelination and lower anisotropy in the white matter (56), resulting in penalization of long-range connections. In our analysis we therefore used advanced probabilistic approaches that provide an integrated probabilistic analysis of whole tracts by estimating the orientation dispersion function at all points in the tract simultaneously (57, 58). This requires significant computing power but provides a much more robust approach, particularly for interhemispheric fibers and in the context of the higher water content and lower myelination of the developing brain (manifesting as much lower fractional anisotropy values in white matter compared with adults), since it allows tracking to overcome areas of high uncertainty (59), and we have previously applied this successfully to the preterm brain (34). Further work is required to characterize the exact relationship between PPARG and preterm brain development, notably to determine whether the effect is brain specific or systemic. PPARs are ligand-dependent nuclear hormone receptor transcription factors that are highly involved in cell growth, differentiation, inflammation, lipid and glucose metabolism, and homeostasis (review in ref. 60). The PPARG gene is expressed in many tissues including the brain (61), within human white matter, and across brain cell types (). PPARG gene expression is up-regulated in neurons in response to excitotoxicity and ischemia (62, 63), and modulates the microglial response to injury (64, 65). PPARG agonists improve neuronal and glial survival in a variety of animal models involving ischemia and inflammation (62, 66–69) and it has been suggested that they provide clinical improvement in children with autism (70). The availability of safe drugs modulating PPARG means that this finding has immediate clinical implications for research into neuroprotective strategies for preterm infants.

Methods

Diffusion MR Imaging.

All MRI studies were supervised by an experienced pediatrician or pediatric nurse. Pulse oximetry, temperature, and heart rate were monitored throughout the period of image acquisition; ear protection in the form of silicone-based putty was placed in the external ear (President Putty, Coltene; Whaledent) and Minimuffs (Natus Medical Inc.) were used for each infant. MRI was performed on a Philips 3-Tesla system (Philips Medical Systems) using an eight-channel phased-array head coil, with acquisition of T2-weighted and 32 direction d-MRI images. All MR images were assessed for the presence of image artifacts and severe motion. Acquisition parameters are in . The T2-weighted MRI anatomical scans were reviewed to exclude subjects with extensive brain abnormalities, major focal destructive parenchymal lesions, multiple punctate white-matter lesions, or white-matter cysts. All MR images were assessed for the presence of image artifacts (inferior-temporal signal dropout, aliasing, field inhomogeneity, etc.) and severe motion. All exclusion criteria were designed so as not to bias the study but preserve the full spectrum of clinical heterogeneity typical of a preterm born population.

Diffusion Tractography.

Tractography was performed on diffusion MR data using a modified version of probabilistic tractography that gives an idea of the diffusive transfer between voxels (57). Regions of interest for seeding tractography of cortico–cortical connections were obtained by segmentation of the brain based on a 90-node anatomical neonatal atlas (71), and the resulting segmentations were registered to the diffusion space using a custom neonatal pipeline (72). A weighted adjacency matrix of brain regions was produced for each infant, from which self-connections along the diagonal were removed and upon which symmetry was enforced, with removal of the redundant lower triangle. Tractography data were linearly adjusted for main covariates (GA, SA, ancestry) and used to reconstruct weighted connectivity matrices for each individual. Each individual matrix was converted into a single vector of numerical values corresponding to edge weights, and appended to form the rows of a single group matrix of n individuals by q edges, where n = 272 and q = 4,005. This vectorized group connectivity matrix was then adjusted for the main covariates of GA, SA, and ancestry, and used as the phenotype in the model. Additionally, a group connectivity matrix was obtained from the median of all individual subject connectivity matrices () and used for selected downstream analyses. Another matrix of the same dimensions n × q made up of randomly generated, normally distributed values with mean zero and SD 1 was used as the null phenotype.

Genomewide Genotyping.

Genetic predictors consisted of the genomewide genotype matrix recoded in terms of minor allele counts, including SNPs with minor allele frequency (MAF) ≥5% and 100% genotyping rate. Saliva samples were collected using Oragene DNA OG-250 kits (DNAGenotek Inc.), and genotyped on Illumina HumanOmniExpress-24 v1.1 chip (Illumina). Filtering was carried out using PLINK (73). SNPs with MAF ≥5%, 100% genotyping rate, and Hardy–Weinberg equilibrium exact test P ≥ 1 × 10−6 were retained, resulting in 556,227 SNPs for further analysis. This genotype matrix was converted into minor allele counts.

Assessment of Population Stratification.

Whole genome SNP data were used for IBS, based on pairwise Euclidean distance as implemented in PLINK 1.9 (73), to assess relatedness between individuals. Dimension reduction in the IBS distance matrix was carried out by principal component analysis (74), and the first principal component was used as a covariate in downstream analyses to adjust for population stratification. Information on self-reported ethnicity (as defined in ISB Standard DSCN 11/2008) was collected by asking mothers (and fathers when present) to define themselves according to a list of options. The terms were drawn from Ethnic Category National Codes as in Department of Health Guidance at the time. Parental self-reported ethnicity was summarized into broader categories for the purposes of data visualization by aggregating all White subcategories into a single group “White,” all Black subcategories into “Black,” and all Asian subcategories into “Asian.” In cases where either one parent self-reported as Mixed or if there was a discrepancy between maternal and paternal ethnicities, the term “Mixed” was applied. Where parents were both from an Association of Southeast Asian Nations member state (two cases), the individual was classified by the authors as “SE Asian.” These aggregated ethnic categories were used to label the datapoints of PCA plots of the first two principal components of the IBS variance-standardized relationship matrix (). This illustrates the correspondence between the first two components of genetic ancestry and ethnicity, and provides an overview of the cohort population mixture as well as providing a means for phenotype adjustment.

sRRR.

Genetic associations with image features were identified using the sRRR model, which has been previously presented in detail (17, 18). sRRR is a method for multivariate modeling of high-dimensional imaging responses (measurements taken over regions of interest or individual voxels) and genetic covariates (e.g., SNPs) that enforces sparsity in the regression coefficients. Given the assumption that only a subset of genetic markers will be found in statistically meaningful association with a subset of image features (i.e., there is a sparse pattern), the model must be able to select those variables. This is achieved by driving some coefficients in the model to zero by penalizing the l norm of the coefficients for genetic markers and image features (). Such sparsity constraints ensure that the model performs simultaneous genotype and phenotype selection (17). The motivation behind this approach is to improve the power to detect causal genetic variants associated with high-dimensional imaging responses (18). In the current work, genomewide SNPs were tested for association with white-matter tracts reconstructed by probabilistic tractography. We define an n × q matrix of phenotypes Y (where the q elements are the 4,005 vectorized edges from the tractography connectivity matrix), and an n × p matrix of minor allele counts for p SNPs (where P = 556,227 SNPs). Selection frequencies for SNPs were ranked by the sRRR method over 1,000 subsamples of 2/3 of the individuals and convergence criterion = 1 × 10−6. Model parameters in .

Computational Literature Search.

When a query is entered (e.g., a list of genes), it is submitted to the user-selected search engine, and the retrieved results (documents) are fetched from their respective sources. Each document is then parsed into sentences and analyzed for protein–protein associations. Agilent Literature Search (21) uses a set of “context” files (lexicons) for defining protein names (and aliases) and association terms (verbs) of interest.

Graph Theory Assessment of Top 10 Imaging Variables (Edges) Ranked by sRRR.

The 10 tractography edges ranked most highly by sRRR were assessed from an “edge-centric” perspective as previously described for the adult brain (15). In the current approach, the importance of selected edges for information flow in the brain is investigated by removing the edges of interest and assessing the impact of their loss on the “communicability” of the network, compared with removing the same number of other randomly selected edges over many iterations. The communicability measure was introduced (16) as a broad generalization of the concept of shortest path between two nodes in a network, incorporating the concept that information flow in a system can also follow routes other than the shortest path (75). Details in .

Data Availability.

sRRR model code and data are available from the authors upon request, subject to approval of future uses by the National Research Ethics Service. Publicly available data: U.K.BEC (www.braineac.org/); RNA-seq data (76) web.stanford.edu/group/barres_lab/brain_rnaseq.html; GIANT (database giant.princeton.edu/); Brainspan (www.brainspan.org/). The study was approved by the National Research Ethics Service, and written informed consent was given by all participating families.
  74 in total

1.  pubmed.mineR: an R package with text-mining algorithms to analyse PubMed abstracts.

Authors:  Jyoti Rani; A B Rauf Shah; Srinivasan Ramachandran
Journal:  J Biosci       Date:  2015-10       Impact factor: 1.826

2.  Identifying population differences in whole-brain structural networks: a machine learning approach.

Authors:  Emma C Robinson; Alexander Hammers; Anders Ericsson; A David Edwards; Daniel Rueckert
Journal:  Neuroimage       Date:  2010-01-14       Impact factor: 6.556

3.  Selecting causal genes from genome-wide association studies via functionally coherent subnetworks.

Authors:  Murat Taşan; Gabriel Musso; Tong Hao; Marc Vidal; Calum A MacRae; Frederick P Roth
Journal:  Nat Methods       Date:  2014-12-22       Impact factor: 28.547

4.  Rosiglitazone Promotes White Matter Integrity and Long-Term Functional Recovery After Focal Cerebral Ischemia.

Authors:  Lijuan Han; Wei Cai; Leilei Mao; Jia Liu; Peiying Li; Rehana K Leak; Yun Xu; Xiaoming Hu; Jun Chen
Journal:  Stroke       Date:  2015-08-04       Impact factor: 7.914

5.  Insula response and connectivity during social and non-social attention in children with autism.

Authors:  Paola Odriozola; Lucina Q Uddin; Charles J Lynch; John Kochalka; Tianwen Chen; Vinod Menon
Journal:  Soc Cogn Affect Neurosci       Date:  2015-10-09       Impact factor: 3.436

6.  Rich-club organization of the newborn human brain.

Authors:  Gareth Ball; Paul Aljabar; Sally Zebari; Nora Tusor; Tomoki Arichi; Nazakat Merchant; Emma C Robinson; Enitan Ogundipe; Daniel Rueckert; A David Edwards; Serena J Counsell
Journal:  Proc Natl Acad Sci U S A       Date:  2014-05-05       Impact factor: 11.205

7.  Thalamocortical Connectivity Predicts Cognition in Children Born Preterm.

Authors:  Gareth Ball; Libuse Pazderova; Andrew Chew; Nora Tusor; Nazakat Merchant; Tomoki Arichi; Joanna M Allsop; Frances M Cowan; A David Edwards; Serena J Counsell
Journal:  Cereb Cortex       Date:  2015-01-16       Impact factor: 5.357

8.  A longitudinal study of associations between psychiatric symptoms and disorders and cerebral gray matter volumes in adolescents born very preterm.

Authors:  Violeta L Botellero; Jon Skranes; Knut Jørgen Bjuland; Asta Kristine Håberg; Stian Lydersen; Ann-Mari Brubakk; Marit S Indredavik; Marit Martinussen
Journal:  BMC Pediatr       Date:  2017-02-01       Impact factor: 2.125

9.  Common genetic variants and risk of brain injury after preterm birth.

Authors:  James P Boardman; Andrew Walley; Gareth Ball; Petros Takousis; Michelle L Krishnan; Laurelle Hughes-Carre; Paul Aljabar; Ahmed Serag; Caroline King; Nazakat Merchant; Latha Srinivasan; Philippe Froguel; Jo Hajnal; Daniel Rueckert; Serena Counsell; A David Edwards
Journal:  Pediatrics       Date:  2014-05-12       Impact factor: 7.124

10.  Possible relationship between common genetic variation and white matter development in a pilot study of preterm infants.

Authors:  Michelle L Krishnan; Zi Wang; Matt Silver; James P Boardman; Gareth Ball; Serena J Counsell; Andrew J Walley; Giovanni Montana; Anthony David Edwards
Journal:  Brain Behav       Date:  2016-04-02       Impact factor: 2.708

View more
  8 in total

1.  TGCnA: temporal gene coexpression network analysis using a low-rank plus sparse framework.

Authors:  Jinyu Li; Yutong Lai; Chi Zhang; Qi Zhang
Journal:  J Appl Stat       Date:  2019-09-16       Impact factor: 1.416

2.  Altered Cerebral Curvature in Preterm Infants Is Associated with the Common Genetic Variation Related to Autism Spectrum Disorder and Lipid Metabolism.

Authors:  Hyuna Kim; Ja-Hye Ahn; Joo Young Lee; Yong Hun Jang; Young-Eun Kim; Johanna Inhyang Kim; Bung-Nyun Kim; Hyun Ju Lee
Journal:  J Clin Med       Date:  2022-05-31       Impact factor: 4.964

3.  Reverse GWAS: Using genetics to identify and model phenotypic subtypes.

Authors:  Andy Dahl; Na Cai; Arthur Ko; Markku Laakso; Päivi Pajukanta; Jonathan Flint; Noah Zaitlen
Journal:  PLoS Genet       Date:  2019-04-05       Impact factor: 5.917

Review 4.  Microglia-Mediated Neurodegeneration in Perinatal Brain Injuries.

Authors:  Bobbi Fleiss; Juliette Van Steenwinckel; Cindy Bokobza; Isabelle K Shearer; Emily Ross-Munro; Pierre Gressens
Journal:  Biomolecules       Date:  2021-01-13

5.  Space Radiation Alters Genotype-Phenotype Correlations in Fear Learning and Memory Tests.

Authors:  Ovidiu Dan Iancu; Sydney Weber Boutros; Reid H J Olsen; Matthew J Davis; Blair Stewart; Massarra Eiwaz; Tessa Marzulla; John Belknap; Christina M Fallgren; Elijah F Edmondson; Michael M Weil; Jacob Raber
Journal:  Front Genet       Date:  2018-10-09       Impact factor: 4.599

Review 6.  Metabolic Dysfunction and Peroxisome Proliferator-Activated Receptors (PPAR) in Multiple Sclerosis.

Authors:  Véronique Ferret-Sena; Carlos Capela; Armando Sena
Journal:  Int J Mol Sci       Date:  2018-06-01       Impact factor: 5.923

7.  Cortical morphology at birth reflects spatiotemporal patterns of gene expression in the fetal human brain.

Authors:  Gareth Ball; Jakob Seidlitz; Jonathan O'Muircheartaigh; Ralica Dimitrova; Daphna Fenchel; Antonios Makropoulos; Daan Christiaens; Andreas Schuh; Jonathan Passerat-Palmbach; Jana Hutter; Lucilio Cordero-Grande; Emer Hughes; Anthony Price; Jo V Hajnal; Daniel Rueckert; Emma C Robinson; A David Edwards
Journal:  PLoS Biol       Date:  2020-11-23       Impact factor: 8.029

8.  Pro12Ala polymorphism of peroxisome proliferator activated receptor gamma 2 may be associated with adverse neurodevelopment in European preterm babies.

Authors:  Suresh Victor; Andrew Chew; Shona Falconer
Journal:  Brain Behav       Date:  2021-06-21       Impact factor: 2.708

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.