Literature DB >> 31010885

Ancestry patterns inferred from massive RNA-seq data.

Ruth Barral-Arca1,2, Jacobo Pardo-Seco1,2, Xabi Bello1,2, Federico Martinón-Torres2, Antonio Salas1,2.   

Abstract

There is a growing body of evidence suggesting that patterns of gene expression vary within and between human populations. However, the impact of this variation in human diseases has been poorly explored, in part owing to the lack of a standardized protocol to estimate biogeographical ancestry from gene expression studies. Here we examine several studies that provide new solid evidence indicating that the ancestral background of individuals impacts gene expression patterns. Next, we test a procedure to infer genetic ancestry from RNA-seq data in 25 data sets where information on ethnicity was reported. Genome data of reference continental populations retrieved from The 1000 Genomes Project were used for comparisons. Remarkably, only eight out of 25 data sets passed FastQC default filters. We demonstrate that, for these eight population sets, the ancestral background of donors could be inferred very efficiently, even in data sets including samples with complex patterns of admixture (e.g., American-admixed populations). For most of the gene expression data sets of suboptimal quality, ancestral inference yielded odd patterns. The present study thus brings a cautionary note for gene expression studies highlighting the importance to control for the potential confounding effect of ancestral genetic background.
© 2019 Barral-Arca et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

Entities:  

Keywords:  RNA-seq; SNPs; biogeographical ancestry; gene expression; genomics; transcriptomics

Mesh:

Substances:

Year:  2019        PMID: 31010885      PMCID: PMC6573782          DOI: 10.1261/rna.070052.118

Source DB:  PubMed          Journal:  RNA        ISSN: 1355-8382            Impact factor:   4.942


INTRODUCTION

There is growing body of evidence indicating that ethnicity can impact gene expression. Human populations show millions of DNA polymorphisms and variable patterns of allele frequencies, and a number of these markers fall at regulatory DNA positions (Spielman et al. 2007), which could ultimately lead to differential gene expression patterns. The study by Spielman et al. (2007) based on the analysis of >4000 genes indicated that genetic variation among populations contributes to differences in gene expression phenotypes; most of the variation observed was due to allele frequency differences at cis-linked regulators. The subsequent study by Price et al. (2008) on “African-American” cell lines, however, indicated that both cis and trans markers have highly significant effects, although they estimated that ∼12% of all heritable variation in human gene expression was due to cis variants. The large-scale study by Stranger et al. (2007) confirmed that gene expression levels are hereditable and indicated an abundance of cis-regulatory variation in the human genome, whereas Storey et al. (2007) estimated that ∼17% of genes are differentially expressed among populations. Most recently, Serrano-Gómez et al. (2017) found five genes modulated by genetic ancestry in breast tumors from Colombian women. A major hindrance for gene expression studies is that the most of them do not consider ethnicity or ancestral background to be a variable of interest in their analysis. There are only very few attempts at monitoring ancestry in transcriptomic studies; see, for example, Serrano-Gómez et al. (2017) on breast cancer patients and Barral-Arca et al. (2018) in an infectious disease context. In contrast, procedures to infer biogeographical ancestry (BGA) from DNA data are very popular in human population genetics at continental or regional geographical scale (Galanter et al. 2012; Reich et al. 2012; Lazaridis et al. 2014; Pardo-Seco et al. 2014a. 2016), and in forensic genetics (e.g., investigation of crime scene evidence; Sánchez et al. 2006; Phillips et al. 2009; Pardo-Seco et al. 2014b). Usually, ancestry can be efficiently inferred using dense single nucleotide polymorphism (SNP) data derived from genotyping studies (Reich et al. 2012; Pardo-Seco et al. 2014a, 2016) or massive sequencing studies (Lazaridis et al. 2014; Salas et al. 2017), but also by genotyping panels of ancestry informative markers (AIMs) (Sánchez et al. 2006; Phillips et al. 2009) or a reasonable number of random SNPs uniformly distributed along the genome (Pardo-Seco et al. 2014b). At the same time, although DNA polymorphisms are generally genotyped/sequenced, there are now bioinformatic procedures that allow inferring DNA variation indirectly from RNA data obtained from large-scale sequencing projects (RNA-seq). This strategy has two main advantages: It is inexpensive, and it can be applied when no DNA data is available (e.g., using RNA data sets retrieved from public repositories). However, these procedures are still uncommon in the field of gene expression studies. This might be due to a variety of factors, including: (i) the very little effort made to date at evaluating the efficiency of these procedures, and/or (ii) the trend to believe that ethnicity does not impact gene expression patterns, despite the existing evidence against this belief. To the best of our knowledge, our recent study, Barral-Arca et al. (2018) constitutes the only one where inferred DNA variation was used to estimate genome ancestry in a gene expression context. This study explored a 2-transcript host cell signature to distinguish viral from bacterial infections (Barral-Arca et al. 2018) in a RNA-seq Mexican data set (see also Herberg et al. 2016). The aim of the present study is to formally test the preliminary procedures used in Barral-Arca et al. (2018) by way of exploring a broader set of RNA-seq data sets representing worldwide populations. In addition, we show new evidence clearly indicating the impact of ethnicity on modulating gene expression.

RESULTS

Characteristics of the data sets and quality control

The quality of the gene expression data sets can have an important impact on the inference of SNP variation. Of the 25 data sets initially explored (Fig 1A; see Materials and Methods), only eight (32%) passed the default FastQC quality filters (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) (Supplemental Fig. S1; see also Conesa et al. 2016), while as many as 17 (68%) did not (Supplemental Fig. S2).
FIGURE 1.

Gene expression data sets explored in the present study for the inference of ancestry. (A) The map shows the geographic location of the 25 RNA-seq data sets that were initially recruited from GEO; the correspondence between these ID codes and the GEO accession numbers, and the characteristics of the data sets are provided in Supplemental Table S1. (B) Only eight out of these 25 data sets passed all the quality filters, and these were used for the subsequent studies. The histogram shows the number of shared SNPs (the final set after applying all the filters) between data sets (see Supplemental Table S1 for more information). (C) Distribution of SNPs in chromosomes for the eight data sets used to infer the ancestry of donors (their GEO ID code is indicated; Supplemental Table S1).

Gene expression data sets explored in the present study for the inference of ancestry. (A) The map shows the geographic location of the 25 RNA-seq data sets that were initially recruited from GEO; the correspondence between these ID codes and the GEO accession numbers, and the characteristics of the data sets are provided in Supplemental Table S1. (B) Only eight out of these 25 data sets passed all the quality filters, and these were used for the subsequent studies. The histogram shows the number of shared SNPs (the final set after applying all the filters) between data sets (see Supplemental Table S1 for more information). (C) Distribution of SNPs in chromosomes for the eight data sets used to infer the ancestry of donors (their GEO ID code is indicated; Supplemental Table S1). The eight data sets that met all the quality requirements (Supplemental Fig. S1) represent populations from different continental regions and provided a high number of SNPs (>4000 after applying additional genome filters indicated above; Table 1; Supplemental Table S1). The small number of SNPs shared by all of these data sets (Fig. 1B; Supplemental Table S3) makes it inviable to process all the data sets for a common set of SNPs. Therefore, all the analyses were carried out by comparing each of the eight data sets individually against the reference continental populations. Figure 1C shows the distribution of the SNPs in chromosomes in the eight data sets.
TABLE 1.

RNA-seq data sets used in the present study and characteristics of the inferred SNPs

RNA-seq data sets used in the present study and characteristics of the inferred SNPs We further explored the data sets that did not satisfy the quality filters with the aim of evaluating the impact of low quality RNA-seq data on ancestry inference. Although the quality of the data was suboptimal for variant calling in 17 data sets, four of them still yielded patterns of ancestry according to expectations (see section below).

Ethnicity correlates with gene expression patterns

In contrast to genomic studies where matched ethnic samples are generally required, e.g., case-control studies (Martinón-Torres et al. 2016; Aung et al. 2017), only a few gene expression studies reported the ethnicity of the donors. For the sake of illustrating the role of the ancestral background in gene expression, we selected two studies on tuberculosis (TB) where ethnicity for individual samples was declared in the GEO database, and where more than one ethnic group was considered in the same study (hence using the same methodology). The study by Berry et al. (2010) investigated the immune response of patients infected by Mycobacterium tuberculosis (GEO: PRJNA422129). The MDS plot shows expression patterns of the active TB cohort (Fig. 2A); its Dimension 1 (accounting for ∼20% of the variation) is responsible for the major separation between South African and UK expression patterns, while Dimension 2 (which explains a notable 13% of the variation) separates a few UK cases from the rest. Their latent cohort of TB patients (Fig. 2B) was also ascribed to two well differentiated clusters in its Dimension 1 (accounting for ∼18% of the variation), which again clearly separates their gene expression patterns according to their different geographic origins.
FIGURE 2.

Analysis of gene expression patterns in two case studies where information on ethnicity was available, indicating that ethnicity status per se impacts gene expression patterns. For each study, we show a PCA plot (right) built with the most highly expressed genes between ethnic groups and using the Deseq function “plotPCA,” while the heatmaps were built using a minimum subset of the most highly expressed genes (including all is not possible because of space limitations) that allowed to visualize different patterns between population sets (left): (A) Active TB from Berry et al. (2010), (B) latent TB from Berry et al. (2010), (C) control female group from Singhania et al. (2018), (D) control male group from Singhania et al. (2018), and (E) male TB cases from Singhania et al. (2018).

Analysis of gene expression patterns in two case studies where information on ethnicity was available, indicating that ethnicity status per se impacts gene expression patterns. For each study, we show a PCA plot (right) built with the most highly expressed genes between ethnic groups and using the Deseq function “plotPCA,” while the heatmaps were built using a minimum subset of the most highly expressed genes (including all is not possible because of space limitations) that allowed to visualize different patterns between population sets (left): (A) Active TB from Berry et al. (2010), (B) latent TB from Berry et al. (2010), (C) control female group from Singhania et al. (2018), (D) control male group from Singhania et al. (2018), and (E) male TB cases from Singhania et al. (2018). In their follow-up study on TB infected patients (GEO: PRJNA422130), Singhania et al. (2018), used a similar sampling scheme, with some overlapping of samples with their previous study (Berry et al. 2010). There is a marked substructure of expression patterns in the control female group in the Singhania et al. (2018) study (Fig. 2C), which perfectly allocates the female controls in the vertex of an equilateral triangle according to ethnicity. In control males (Fig. 2D), the expression patterns of South Asians are much more dispersed than in Kenians and Indians. Their male cases included donors from Afghanistan, South Asia, Sudan and Tanzania. Dimension 1 (∼26% of the variance) differentiates these clusters by main continental ancestries; one pole is dominated by the African samples (Sudan and Tanzania), while Asian ones (Afghanistan and South Asia) plot on the opposite (Fig. 2E). Dimension 2 (∼14% of the variation) further distinguishes the two African samples and indicates a higher dispersion for the South Asian donors. Finally, the effect of ethnicity is not only visible in RNA-seq data. For the sake of illustrating the same issue in expression arrays, we examined the study by Obermoser et al. (2013) in regard to influenza and pneumococcal vaccines. The effect of ethnicity is clear when examining (i) their control females, with a clear differentiation between “African-American” and “Caucasian” in Dimension 2 (∼8%; Supplemental Fig. S3A), (ii) their control males, with a similar pattern (Dimension 2; ∼11%) but this time separating Asians from “Caucasians” (Supplemental Fig. S3B), and (iii) their Pneumovax vaccinated males (Supplemental Fig. S3C) and females (Supplemental Fig. S3D), where Dimension 2 completely separates their “Caucasian” from their Asian donors again. It is most remarkable that, although ethnicity information is clearly specified in the GEO repository, this information is not used at all, or not even mentioned in the main text of the associated publications.

Functional characteristics of the SNPs inferred from RNA-seq data

Eight population sets passed through all the quality filters specified above. We examined the functional characteristics of these data sets (Table 2). Since these SNPs were inferred from RNA-seq data, one could expect the main proportion of them to be located at exonic regions. Surprisingly, when considering all the SNPs from all the eight data sets together, 25.3% were intergenic, and only 3.1% fell at intronic regions. Furthermore, a total of 74% of them do not have exonic function. These percentages vary considerably depending on the data set considered (Table 2). To illustrate this phenomenon, we allocate the reads for the eight different studies corresponding to the housekeeping gene CHMO2A (charged multivesicular body protein 2A). Different studies have different coverages outside the exon regions (Supplemental Fig. S4). The same patterns are observed for other genes (data not shown).
TABLE 2.

Functional characteristics of the variants annotated from the different RNA-seq data sets

Functional characteristics of the variants annotated from the different RNA-seq data sets

Inference of ancestry from RNA-seq data

Figure 3 shows MDS plots and ancestry analysis for the eight populations that passed through all the quality filters specified above. Three European ancestry population sets, namely Spain (Salas et al. 2016), Sweden (GEO: PRJNA354367; Shchetynsky et al. 2017), and UK (GEO: PRJNA294293; Wood et al. 2016), are represented in Figure 3A–C. The three data sets fall within the cluster defined by the European populations in The 1000 Genomes Project (1000G) reference populations. In the case of Sweden, there is one individual that shares a visible proportion of ancestry membership with the East Asian population.
FIGURE 3.

MDS plots and ancestry analysis for each of the eight data sets that overcome all the quality filters; their GEO ID numbers are indicated on top of each MDS analysis together with the number of SNPs involved in each analysis. In the admixture barplots (right) the label of the test population is bolded and their ancestral memberships barplots slightly separated from the barplots of the reference continental populations (from 1000G). (A) Spain; (B) Sweden (GEO acc. no: PRJNA354367); (C) UK (GEO acc. no: PRJNA294293); (D) China_1 (GEO acc. no: PRJNA296108); (E) China_2 (GEO acc. no: PRJNA412314); (F) Korea_1 (GEO acc. no: PRJNA218851); (G) Colombia_2 (GEO acc. no: PRJNA279199); and (H) Mexico (GEO acc. no: PRJNA285798).

MDS plots and ancestry analysis for each of the eight data sets that overcome all the quality filters; their GEO ID numbers are indicated on top of each MDS analysis together with the number of SNPs involved in each analysis. In the admixture barplots (right) the label of the test population is bolded and their ancestral memberships barplots slightly separated from the barplots of the reference continental populations (from 1000G). (A) Spain; (B) Sweden (GEO acc. no: PRJNA354367); (C) UK (GEO acc. no: PRJNA294293); (D) China_1 (GEO acc. no: PRJNA296108); (E) China_2 (GEO acc. no: PRJNA412314); (F) Korea_1 (GEO acc. no: PRJNA218851); (G) Colombia_2 (GEO acc. no: PRJNA279199); and (H) Mexico (GEO acc. no: PRJNA285798). Three additional data sets represent gene expression data of patients from Asia, two of them from China (Fig. 3D–F), and one from Korea (Fig. 3G). In all three cases, the profiles fall within the East Asian clade defined by the reference 1000G populations. The plot built on data set China_1 consists of only 7807 SNPs (GEO: PRJNA296108; Yu et al. 2017); this explains a subtle but visible dispersion of the plots compared to, e.g., China_2 (a plot built on 39,198 SNPs; GEO: PRJNA412314 [Hong et al. 2018]). Korean profiles (GEO: PRJNA218851; Kim et al. 2014) fall also entirely within the Asian cluster. Two additional data sets represent admixed populations: Colombia (GEO: PRJNA279199; Rojas-Peña et al. 2015) and Mexico (GEO: PRJNA285798; DeBerg et al. 2018). These two sample sets provide the opportunity to calibrate the method for the detection of complex patterns of admixture from gene expression data. From the MDS plot it is clear that the Colombians samples include patient profiles of different ancestries (Fig. 3G). For instance, three individuals have a clear African ancestry, which is not unusual given the impact of the transatlantic slave trade in America (Salas et al. 2004, 2005), and in this country in particular (Salas et al. 2008). The other individuals display a variable admixture contribution from a European and a Native American component (East Asia is used here as a surrogate ancestral population of Native Americans). Again, this pattern of admixture is common in South American populations (Reich et al. 2012). The Mexican data set has a different population admixture pattern (Fig. 3H). Thus, Mexican samples have a main Native American component (much higher than the Mexican sample from 1000G [MXL]), with a minor dispersion (of some of the samples) toward the European pole; this pattern is also expected according to different studies indicating an important Native American background in Mexican genomes with variable admixture with Europeans and minor sub-Saharan African influence (Sandoval et al. 2009). Figure 4 summarizes the main ancestries observed in these eight data sets. The three European data sets show a main European ancestry, while the four Asian data sets show a predominant Asian ancestry. Conversely, the two American samples show both differential proportions of ancestry in agreement with their different patterns of admixture.
FIGURE 4.

Summary of ancestral memberships for the eight data sets explored in the present study.

Summary of ancestral memberships for the eight data sets explored in the present study. Four data sets did not pass the quality filters, but the patterns of ancestry could also be inferred (Supplemental Fig. S5). The Russian sample set (GEO: PRJNA350714; Supplemental Fig. S5A; Nikitina et al. 2017) only yielded 558 SNPs but still all the samples fall within the European cluster; the fact that this MDS plot is built on a few hundred SNPs explains the global scattering of the data points on the plot compared to other plots (e.g., in Fig. 3). Supplemental Figure S5B indicates patterns of ancestry for “Hispanic” and “African-American” data sets (GEO: PRJNA341854; Rastogi et al. 2018). Whatever the pseudo-ethnic category “Hispanic” means from the biological point of view (Salas et al. 2007), their patterns of genetic variation fit well with expectations: Most individuals have a three-way continental admixture pattern, with significant membership within the European cluster. On the other hand, some of the “African-American” samples fall within the African reference set, but others have a more variable and complex pattern of ancestry; it is also remarkable that one “African-American” profile falls neatly in the Asian pole of the plot. The South African sample (GEO: PRJNA422129; Supplemental Fig. S5C; Berry et al. 2010) falls entirely within the African cluster. The sample set from China (GEO: PRJNA134241; Huang et al. 2011) also fit completely within the East Asian cluster. Finally, inferring ancestry from data sets that did not pass all the quality filters can lead to artifacts that are difficult to interpret from a genetic point of view. Supplemental Figure S6 displays all the MDS plots and ancestry barplots for 13 suboptimum data sets. Compared to expected patterns of variation (Fig. 3), these plots lack structure (e.g., Supplemental Fig. S6A,B) or the samples form outlier clusters clearly separated from the three main continental groups (e.g., Supplemental Fig. S6E–G) denoting artificial patterns of variation. Some data sets might represent real variability, as is the case of the samples from Malaysia (Supplemental Fig. S6K) or Afghanistan (Supplemental Fig. S6L), but the fact that the original RNA-seq data sets are in suboptimal conditions calls for caution with these patterns. In general, it may be argued that these genetic/ancestry analyses themselves could also serve as an additional quality filter per se.

DISCUSSION

We provide new evidence suggesting that ancestral background can impact gene expression patterns. Notably, this evidence stands in contrast with the fact that only a very small number of gene expression studies control for ancestral background information in their experiments; when this information is available, it is not used at all. Thus, there are studies where cases entirely represent a given continental ancestry while their control group was recruited from another continental ancestral background. For instance, in the very recent study by Tian et al. (2017), the authors investigated the immune response of patients infected by dengue virus. Their patients were recruited in Sri Lanka, whereas their controls were sampled in San Diego (USA). The MDS plot in Supplemental Figure S7 indicates that these two groups differentiate clearly in Dimension 2 (23%). The extent to which the differences between cases and controls were due to ethnicity instead of the presence of the pathogen remains to be investigated. Here we demonstrate that it is possible to infer DNA variation from RNA-seq data, and that this variation can be used to estimate ancestry background of patients. Although the DNA information from samples represents mainly coding region variation (which shows more evolutionary constraints and therefore less variation than noncoding region variants), the procedure performs well with only a few thousand SNPs available or even a few hundred, a finding that is consistent with those advanced by Salas (2019). Moreover, the procedure performs also correctly when analyzing populations with complex patterns of admixture, as is the case of many American populations. A remarkable finding of the present study is that an important proportion of the investigated RNA-seq data sets shows deficiencies in terms of data quality. Exploring the consequences of this finding within the framework of each individual study is beyond the scope of the present study, but the issue deserves further attention in future investigations. We observed that low quality RNA-seq data produce unusual patterns of ancestry; this therefore led to the conclusion that unusual patterns of genetic variation inferred from RNA-seq data can be used as an additional quality control of the data. Ancestral background is usually inferred in cases and controls by directly genotyping a set of SNPs (e.g., AIMs). However, here we demonstrate that ancestry can be deduced from RNA-seq data. This has several advantages for future gene expression studies. On the one hand, it avoids the need of obtaining ad hoc samples for DNA genomic analyses from donors (there could be many situations where only a RNA sample can be obtained). On the other hand, once the RNA-seq data have been generated, inferring DNA variation from expression patterns is inexpensive; it only requires the use of a few computational tools already available in the public domain.

MATERIALS AND METHODS

RNA-seq data sets and evaluations of data quality

Figure 5 summarizes the process followed in the present study to infer ancestry proportions from RNA-seq data. First, gene expression data were retrieved from the GEO (Gene Accession Omnibus; https://www.ncbi.nlm.nih.gov/geo/) database (Edgar et al. 2002). The vast majority of RNA-seq studies do not record the population background of their samples. We manually searched for RNA-seq data sets that have information on ethnicity or sampling location available and rejected studies that used inaccurate ethnic terminology (e.g., “latin” people) (Salas et al. 2007). Finally, the databases were chosen with the idea to represent all the continents and different tissues. We initially collected data from 25 RNA-seq data sets (corresponding to 20 independent studies) representing populations from different continental regions (Table 1; Fig. 1A; Supplemental Tables S1, S2). Two out of 20 studies (Berry et al. 2010; Singhania et al. 2018) examined expression patterns of groups of individuals from different ethnicities at the same time; therefore, their sampling design allows us to evaluate the impact of ethnicity in gene expression patterns.
FIGURE 5.

Bioinformatic procedure to infer ancestry using RNA-seq data.

Bioinformatic procedure to infer ancestry using RNA-seq data. For variant calling analyses, it is particularly relevant to initially evaluate the quality of the data. We examined the quality of the RNA-seq data by way of exploring per base sequence quality plots and using FastQC (Brown et al. 2017) and MultiQC (Ewels et al. 2016) software. Quality score plots show the distribution of base quality values for each position in the input sequence. A medium level of duplication might be unavoidable because some sequences occur more frequently than others. Therefore, to detect transcripts with extremely low expression level, it is necessary to over-sequence the library, which could generate large amounts of the most common sequences such as housekeeping genes. FastQC and MultiQC were used to produce duplication level plots. These plots show for each sample set the fraction of reads observed at different duplication levels. This procedure allows the identification of fragments of adapters remaining on the reads, contamination, or any kind of enrichment bias (due to, e.g., PCR over amplification). The software raises a warning (orange line) if duplicated sequences constitute more than 20% of the total library and it will report an error (red line) if they represent more than 50% of the total (Supplemental Figs. S1 and S2). RNA-seq reads prior to variant calling were processed using Opossum (Oikkonen and Lise 2017). Although the Opossum software deals with duplicate sequences, we have empirically observed that when the duplication level is too high (>50% of the total), it yields weak results in terms of the SNPs inferred. This could be caused by: (i) Removal of duplicates from the library may lead to an extremely poor library, hence the SNPs inferred are very few, biased and not evenly distributed across the genome, and/or (ii) overamplification of the library increases the chances of PCR errors and artifacts.

Annotating DNA variation from RNA-seq data sets and statistical treatment of SNP data

Variants were inferred from the processed RNA-seq data using the variant caller software Platypus (Rimmer et al. 2014). The number of annotated markers inferred from the different data sets varies from a few hundred (PRJNA318782; Supplemental Table S1) to more than 4000K variants (PRJNA163279). SNPs without rs code, low genotyping rate (below 10%), and in LD (r2 > 0.75) were removed from the analysis. Eliminating LD allows us to mitigate the effect of ascertainment bias which could be particularly important in genome data inferred from gene expression patterns. SNP data from continental reference populations were retrieved from 1000G (http://www.internationalgenome.org). Reference genome data from 1000G were processed as per previous studies (Pardo-Seco et al. 2014a, 2016). Subsequently, each SNP population data set was individually intersected with SNP data from the 1000G reference populations. Next, we computed identity-by-state (IBS) values between each population data set and the reference populations using the software PLINK (Purcell et al. 2007). Multidimensional scaling (MDS) analysis using this matrix of IBS values was used to represent the clustering gene expression patterns and geographical affinities between the test data and the reference continental populations. MDS plots were built using the function cmdscale (library stats) from R (http://www.r-project.org). We additionally investigated admixture patterns for each data set using populations in 1000G as reference data sets representing main continental regions. Admixture proportions were obtained by using a maximum likelihood estimation of individual ancestries from multilocus SNP data and using the ADMIXTURE software (Alexander et al. 2009), and fixing the number of predefined clusters (K) to the number of continental regions considered for each analysis.

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article.
  47 in total

1.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository.

Authors:  Ron Edgar; Michael Domrachev; Alex E Lash
Journal:  Nucleic Acids Res       Date:  2002-01-01       Impact factor: 16.971

2.  PLINK: a tool set for whole-genome association and population-based linkage analyses.

Authors:  Shaun Purcell; Benjamin Neale; Kathe Todd-Brown; Lori Thomas; Manuel A R Ferreira; David Bender; Julian Maller; Pamela Sklar; Paul I W de Bakker; Mark J Daly; Pak C Sham
Journal:  Am J Hum Genet       Date:  2007-07-25       Impact factor: 11.025

3.  Genetic association study of exfoliation syndrome identifies a protective rare variant at LOXL1 and five new susceptibility loci.

Authors:  Luis Fernández-Vega Cueto; Tin Aung; Mineo Ozaki; Mei Chin Lee; Ursula Schlötzer-Schrehardt; Gudmar Thorleifsson; Takanori Mizoguchi; Robert P Igo; Aravind Haripriya; Susan E Williams; Yury S Astakhov; Andrew C Orr; Kathryn P Burdon; Satoko Nakano; Kazuhiko Mori; Khaled Abu-Amero; Michael Hauser; Zheng Li; Gopalakrishnan Prakadeeswari; Jessica N Cooke Bailey; Alina Popa Cherecheanu; Jae H Kang; Sarah Nelson; Ken Hayashi; Shin-Ichi Manabe; Shigeyasu Kazama; Tomasz Zarnowski; Kenji Inoue; Murat Irkec; Miguel Coca-Prados; Kazuhisa Sugiyama; Irma Järvelä; Patricio Schlottmann; S Fabian Lerner; Hasnaa Lamari; Yildirim Nilgün; Mukharram Bikbov; Ki Ho Park; Soon Cheol Cha; Kenji Yamashiro; Juan C Zenteno; Jost B Jonas; Rajesh S Kumar; Shamira A Perera; Anita S Y Chan; Nino Kobakhidze; Ronnie George; Lingam Vijaya; Tan Do; Deepak P Edward; Lourdes de Juan Marcos; Mohammad Pakravan; Sasan Moghimi; Ryuichi Ideta; Daniella Bach-Holm; Per Kappelgaard; Barbara Wirostko; Samuel Thomas; Daniel Gaston; Karen Bedard; Wenda L Greer; Zhenglin Yang; Xueyi Chen; Lulin Huang; Jinghong Sang; Hongyan Jia; Liyun Jia; Chunyan Qiao; Hui Zhang; Xuyang Liu; Bowen Zhao; Ya-Xing Wang; Liang Xu; Stéphanie Leruez; Pascal Reynier; George Chichua; Sergo Tabagari; Steffen Uebe; Matthias Zenkel; Daniel Berner; Georg Mossböck; Nicole Weisschuh; Ursula Hoja; Ulrich-Christoph Welge-Luessen; Christian Mardin; Panayiota Founti; Anthi Chatzikyriakidou; Theofanis Pappas; Eleftherios Anastasopoulos; Alexandros Lambropoulos; Arkasubhra Ghosh; Rohit Shetty; Natalia Porporato; Vijayan Saravanan; Rengaraj Venkatesh; Chandrashekaran Shivkumar; Narendran Kalpana; Sripriya Sarangapani; Mozhgan R Kanavi; Afsaneh Naderi Beni; Shahin Yazdani; Alireza Lashay; Homa Naderifar; Nassim Khatibi; Antonio Fea; Carlo Lavia; Laura Dallorto; Teresa Rolle; Paolo Frezzotti; Daniela Paoli; Erika Salvi; Paolo Manunta; Yosai Mori; Kazunori Miyata; Tomomi Higashide; Etsuo Chihara; Satoshi Ishiko; Akitoshi Yoshida; Masahide Yanagi; Yoshiaki Kiuchi; Tsutomu Ohashi; Toshiya Sakurai; Takako Sugimoto; Hideki Chuman; Makoto Aihara; Masaru Inatani; Masahiro Miyake; Norimoto Gotoh; Fumihiko Matsuda; Nagahisa Yoshimura; Yoko Ikeda; Morio Ueno; Chie Sotozono; Jin Wook Jeoung; Min Sagong; Kyu Hyung Park; Jeeyun Ahn; Marisa Cruz-Aguilar; Sidi M Ezzouhairi; Abderrahman Rafei; Yaan Fun Chong; Xiao Yu Ng; Shuang Ru Goh; Yueming Chen; Victor H K Yong; Muhammad Imran Khan; Olusola O Olawoye; Adeyinka O Ashaye; Idakwo Ugbede; Adeola Onakoya; Nkiru Kizor-Akaraiwe; Chaiwat Teekhasaenee; Yanin Suwan; Wasu Supakontanasan; Suhanya Okeke; Nkechi J Uche; Ifeoma Asimadu; Humaira Ayub; Farah Akhtar; Ewa Kosior-Jarecka; Urszula Lukasik; Ignacio Lischinsky; Vania Castro; Rodolfo Perez Grossmann; Gordana Sunaric Megevand; Sylvain Roy; Edward Dervan; Eoin Silke; Aparna Rao; Priti Sahay; Pablo Fornero; Osvaldo Cuello; Delia Sivori; Tamara Zompa; Richard A Mills; Emmanuelle Souzeau; Paul Mitchell; Jie Jin Wang; Alex W Hewitt; Michael Coote; Jonathan G Crowston; Sergei Y Astakhov; Eugeny L Akopov; Anton Emelyanov; Vera Vysochinskaya; Gyulli Kazakbaeva; Rinat Fayzrakhmanov; Saleh A Al-Obeidan; Ohoud Owaidhah; Leyla Ali Aljasim; Balram Chowbay; Jia Nee Foo; Raphael Q Soh; Kar Seng Sim; Zhicheng Xie; Augustine W O Cheong; Shi Qi Mok; Hui Meng Soo; Xiao Yin Chen; Su Qin Peh; Khai Koon Heng; Rahat Husain; Su-Ling Ho; Axel M Hillmer; Ching-Yu Cheng; Francisco A Escudero-Domínguez; Rogelio González-Sarmiento; Frederico Martinon-Torres; Antonio Salas; Kessara Pathanapitoon; Linda Hansapinyo; Boonsong Wanichwecharugruang; Naris Kitnarong; Anavaj Sakuntabhai; Hip X Nguyn; Giang T T Nguyn; Trình V Nguyn; Werner Zenz; Alexander Binder; Daniela S Klobassa; Martin L Hibberd; Sonia Davila; Stefan Herms; Markus M Nöthen; Susanne Moebus; Robyn M Rautenbach; Ari Ziskind; Trevor R Carmichael; Michele Ramsay; Lydia Álvarez; Montserrat García; Héctor González-Iglesias; Pedro P Rodríguez-Calvo; Çilingir Oguz; Nevbahar Tamcelik; Eray Atalay; Bilge Batu; Dilek Aktas; Burcu Kasım; M Roy Wilson; Anne L Coleman; Yutao Liu; Pratap Challa; Leon Herndon; Rachel W Kuchtey; John Kuchtey; Karen Curtin; Craig J Chaya; Alan Crandall; Linda M Zangwill; Tien Yin Wong; Masakazu Nakano; Shigeru Kinoshita; Anneke I den Hollander; Eija Vesti; John H Fingert; Richard K Lee; Arthur J Sit; Bradford J Shingleton; Ningli Wang; Daniele Cusi; Raheel Qamar; Peter Kraft; Margaret A Pericak-Vance; Soumya Raychaudhuri; Steffen Heegaard; Tero Kivelä; André Reis; Friedrich E Kruse; Robert N Weinreb; Louis R Pasquale; Jonathan L Haines; Unnur Thorsteinsdottir; Fridbert Jonasson; R Rand Allingham; Dan Milea; Robert Ritch; Toshiaki Kubota; Kei Tashiro; Eranga N Vithana; Shazia Micheal; Fotis Topouzis; Jamie E Craig; Michael Dubina; Periasamy Sundaresan; Kari Stefansson; Janey L Wiggs; Francesca Pasutto; Chiea Chuen Khor
Journal:  Nat Genet       Date:  2017-05-29       Impact factor: 38.330

4.  Population genomics of human gene expression.

Authors:  Barbara E Stranger; Alexandra C Nica; Matthew S Forrest; Antigone Dimas; Christine P Bird; Claude Beazley; Catherine E Ingle; Mark Dunning; Paul Flicek; Daphne Koller; Stephen Montgomery; Simon Tavaré; Panos Deloukas; Emmanouil T Dermitzakis
Journal:  Nat Genet       Date:  2007-09-16       Impact factor: 38.330

5.  Development of a panel of genome-wide ancestry informative markers to study admixture throughout the Americas.

Authors:  Joshua Mark Galanter; Juan Carlos Fernandez-Lopez; Christopher R Gignoux; Jill Barnholtz-Sloan; Ceres Fernandez-Rozadilla; Marc Via; Alfredo Hidalgo-Miranda; Alejandra V Contreras; Laura Uribe Figueroa; Paola Raska; Gerardo Jimenez-Sanchez; Irma Silva Zolezzi; Maria Torres; Clara Ruiz Ponte; Yarimar Ruiz; Antonio Salas; Elizabeth Nguyen; Celeste Eng; Lisbeth Borjas; William Zabala; Guillermo Barreto; Fernando Rondón González; Adriana Ibarra; Patricia Taboada; Liliana Porras; Fabián Moreno; Abigail Bigham; Gerardo Gutierrez; Tom Brutsaert; Fabiola León-Velarde; Lorna G Moore; Enrique Vargas; Miguel Cruz; Jorge Escobedo; José Rodriguez-Santana; William Rodriguez-Cintrón; Rocio Chapela; Jean G Ford; Carlos Bustamante; Daniela Seminara; Mark Shriver; Elad Ziv; Esteban Gonzalez Burchard; Robert Haile; Esteban Parra; Angel Carracedo
Journal:  PLoS Genet       Date:  2012-03-08       Impact factor: 5.917

6.  Discovery of new candidate genes for rheumatoid arthritis through integration of genetic association data with expression pathway analysis.

Authors:  Klementy Shchetynsky; Lina-Marcella Diaz-Gallo; Lasse Folkersen; Aase Haj Hensvold; Anca Irinel Catrina; Louise Berg; Lars Klareskog; Leonid Padyukov
Journal:  Arthritis Res Ther       Date:  2017-02-02       Impact factor: 5.156

7.  Making the most of RNA-seq: Pre-processing sequencing data with Opossum for reliable SNP variant detection.

Authors:  Laura Oikkonen; Stefano Lise
Journal:  Wellcome Open Res       Date:  2017-01-17

8.  FQC Dashboard: integrates FastQC results into a web-based, interactive, and extensible FASTQ quality control tool.

Authors:  Joseph Brown; Meg Pirrung; Lee Ann McCue
Journal:  Bioinformatics       Date:  2017-06-09       Impact factor: 6.937

9.  Shared and organism-specific host responses to childhood diarrheal diseases revealed by whole blood transcript profiling.

Authors:  Hannah A DeBerg; Mussaret B Zaidi; Matthew C Altman; Prasong Khaenam; Vivian H Gersuk; Freddy D Campos; Iza Perez-Martinez; Mario Meza-Segura; Damien Chaussabel; Jacques Banchereau; Teresa Estrada-Garcia; Peter S Linsley
Journal:  PLoS One       Date:  2018-01-29       Impact factor: 3.240

10.  Whole Exome Sequencing reveals new candidate genes in host genomic susceptibility to Respiratory Syncytial Virus Disease.

Authors:  Antonio Salas; Jacobo Pardo-Seco; Miriam Cebey-López; Alberto Gómez-Carballa; Pablo Obando-Pacheco; Irene Rivero-Calle; María-José Currás-Tuala; Jorge Amigo; José Gómez-Rial; Federico Martinón-Torres
Journal:  Sci Rep       Date:  2017-11-21       Impact factor: 4.379

View more
  7 in total

1.  African Americans and European Americans exhibit distinct gene expression patterns across tissues and tumors associated with immunologic functions and environmental exposures.

Authors:  Urminder Singh; Kyle M Hernandez; Bruce J Aronow; Eve Syrkin Wurtele
Journal:  Sci Rep       Date:  2021-05-10       Impact factor: 4.379

2.  RNA-Seq Data-Mining Allows the Discovery of Two Long Non-Coding RNA Biomarkers of Viral Infection in Humans.

Authors:  Ruth Barral-Arca; Alberto Gómez-Carballa; Miriam Cebey-López; María José Currás-Tuala; Sara Pischedda; Sandra Viz-Lasheras; Xabier Bello; Federico Martinón-Torres; Antonio Salas
Journal:  Int J Mol Sci       Date:  2020-04-15       Impact factor: 5.923

3.  Host Transcriptomic Response Following Administration of Rotavirus Vaccine in Infants' Mimics Wild Type Infection.

Authors:  Alberto Gómez-Carballa; Ruth Barral-Arca; Miriam Cebey-López; Maria José Currás-Tuala; Sara Pischedda; José Gómez-Rial; Dominic Habgood-Coote; Jethro A Herberg; Myrsini Kaforou; Federico Martinón-Torres; Antonio Salas
Journal:  Front Immunol       Date:  2021-01-21       Impact factor: 7.561

4.  A multi-tissue study of immune gene expression profiling highlights the key role of the nasal epithelium in COVID-19 severity.

Authors:  Alberto Gómez-Carballa; Irene Rivero-Calle; Jacobo Pardo-Seco; José Gómez-Rial; Carmen Rivero-Velasco; Nuria Rodríguez-Núñez; Gema Barbeito-Castiñeiras; Hugo Pérez-Freixo; Miriam Cebey-López; Ruth Barral-Arca; Carmen Rodriguez-Tenreiro; Ana Dacosta-Urbieta; Xabier Bello; Sara Pischedda; María José Currás-Tuala; Sandra Viz-Lasheras; Federico Martinón-Torres; Antonio Salas
Journal:  Environ Res       Date:  2022-02-22       Impact factor: 8.431

5.  Clinical implementation of RNA sequencing for Mendelian disease diagnostics.

Authors:  Vicente A Yépez; Mirjana Gusic; Robert Kopajtich; Christian Mertes; Nicholas H Smith; Charlotte L Alston; Rui Ban; Skadi Beblo; Riccardo Berutti; Holger Blessing; Elżbieta Ciara; Felix Distelmaier; Peter Freisinger; Johannes Häberle; Susan J Hayflick; Maja Hempel; Yulia S Itkis; Yoshihito Kishita; Thomas Klopstock; Tatiana D Krylova; Costanza Lamperti; Dominic Lenz; Christine Makowski; Signe Mosegaard; Michaela F Müller; Gerard Muñoz-Pujol; Agnieszka Nadel; Akira Ohtake; Yasushi Okazaki; Elena Procopio; Thomas Schwarzmayr; Joél Smet; Christian Staufner; Sarah L Stenton; Tim M Strom; Caterina Terrile; Frederic Tort; Rudy Van Coster; Arnaud Vanlander; Matias Wagner; Manting Xu; Fang Fang; Daniele Ghezzi; Johannes A Mayr; Dorota Piekutowska-Abramczuk; Antonia Ribes; Agnès Rötig; Robert W Taylor; Saskia B Wortmann; Kei Murayama; Thomas Meitinger; Julien Gagneur; Holger Prokisch
Journal:  Genome Med       Date:  2022-04-05       Impact factor: 11.117

6.  A Meta-Analysis of Multiple Whole Blood Gene Expression Data Unveils a Diagnostic Host-Response Transcript Signature for Respiratory Syncytial Virus.

Authors:  Ruth Barral-Arca; Alberto Gómez-Carballa; Miriam Cebey-López; Xabier Bello; Federico Martinón-Torres; Antonio Salas
Journal:  Int J Mol Sci       Date:  2020-03-06       Impact factor: 5.923

7.  Human Alveolar and Splenic Macrophage Populations Display a Distinct Transcriptomic Response to Infection With Mycobacterium tuberculosis.

Authors:  Lelia Lavalett; Hector Ortega; Luis F Barrera
Journal:  Front Immunol       Date:  2020-04-21       Impact factor: 7.561

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.