Literature DB >> 26426330

Re-Annotator: Annotation Pipeline for Microarray Probe Sequences.

Janine Arloth¹, Daniel M Bader², Simone Röh¹, Andre Altmann³.

Abstract

Microarray technologies are established approaches for high throughput gene expression, methylation and genotyping analysis. An accurate mapping of the array probes is essential to generate reliable biological findings. However, manufacturers of the microarray platforms typically provide incomplete and outdated annotation tables, which often rely on older genome and transcriptome versions that differ substantially from up-to-date sequence databases. Here, we present the Re-Annotator, a re-annotation pipeline for microarray probe sequences. It is primarily designed for gene expression microarrays but can also be adapted to other types of microarrays. The Re-Annotator uses a custom-built mRNA reference database to identify the positions of gene expression array probe sequences. We applied Re-Annotator to the Illumina Human-HT12 v4 microarray platform and found that about one quarter (25%) of the probes differed from the manufacturer's annotation. In further computational experiments on experimental gene expression data, we compared Re-Annotator to another probe re-annotation tool, ReMOAT, and found that Re-Annotator provided an improved re-annotation of microarray probes. A thorough re-annotation of probe information is crucial to any microarray analysis. The Re-Annotator pipeline is freely available at http://sourceforge.net/projects/reannotator along with re-annotated files for Illumina microarrays HumanHT-12 v3/v4 and MouseRef-8 v2.

Entities: Gene Species

Mesh：

Year: 2015 PMID： 26426330 PMCID： PMC4591122 DOI： 10.1371/journal.pone.0139516

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

Introduction

Analysis of gene expression profiles under various conditions is one of the corner stones in modern molecular biology research. One major challenge in working with gene expression microarrays is the quality of the annotation of the array probes used by the platform. Differences in probe annotations complicate the replication of studies as well as meta-analyses across platforms. Moreover, the annotations provided by the manufacturers quickly become outdated with every update of the genome assemblies as well as the accompanying annotation tables. For example, the number of annotated transcripts in the RefSeq Gene database (RefSeq release 59 [1] differs from hg18 (NCBI build 36.1) to hg19 (GRCh37 build 37) assembly by more than 1,200 transcripts (43,236 to 44,596). Furthermore, the initial mapping provided by the manufacturers contains several severe problems–some probes map to non-transcribed genomic regions, bind secondary targets or have other properties that may confound a proper analysis, such as common SNPs in the probe sequence. Clearly, these probes should be removed from the analysis as an accurate probe annotation is fundamental for all downstream analyses and ensures accurate biological interpretation of the results. Outdated annotation of probes becomes an increasing problem in publicly available gene expression catalogues such as the ALLEN brain atlas [2] as researchers tend to use the provided expression data as is, that is, without further validity checks and quality control. In order to identify probes with potential annotation problems, a sound re-annotation of all probes is required. Recently, approaches were developed that allow the re-annotation of gene expression microarray data by re-aligning the probe sequences to the entire human genome [3,4]. However, when using the whole genome as the mapping reference, there is an increased likelihood of short reads being mapped to multiple locations and intergenic region and thereby decreasing the number of uniquely mappable probes [5]. Still, 24% of the human genome cannot be uniquely mapped using 50 bp long sequences with two mismatches [6,7], which corresponds to the sequence length of Illumina array probes. Therefore, including untranscribed regions, which theoretically should not even be part of the cDNA library of interest, reduces the mappability and introduces an additional source of unnecessary errors. Our approach, Re-Annotator, considers these mappability issues: in a first round the pipeline maps microarray probe sequences directly to a custom-built mRNA reference instead of the entire human genome sequence. Thus, our pipeline enables us to correctly annotate various formerly non-mappable probes [5]. In the following we describe the Re-Annotator pipeline and use it to re-annotate the probes sequences of two Illumina gene expression microarrays: one for human (HumanHT-12 v4) and one for mice (MouseRef8 v2). In further computational experiments we compare the re-annotation by Re-Annotator and a genome only based re-annotation by ReMOAT [3] for experimental gene expression data.

Materials and Methods

The Re-Annotator Pipeline

The pipeline comprises two essential parts: (1) the creation of a customized reference sequence database and (2) the step-wise re-annotation of the probe sequence.

In Silico mRNA reference database

First, for each transcript in the RefSeq database the genomic location of the exons are extracted. Next, the genomic sequence of the exons of each isoform are concatenated to form one in silico mRNA. These sequences (one per listed isoform in the database) form the reference database for the mapping step (Fig 1).

Fig 1

Annotation pipeline.

Schematic overview of the computational pipeline flow: (top) creating the in silico mRNA reference database and (bottom) step-wise alignment process. Purple sequences in the left column correspond to alignments to the in silico reference and brown sequences in the right column correspond to alignments to the genome.

Annotation pipeline.

Step-wise alignment process

The annotation process starts with the alignment of the probe sequences provided by the manufacturer to the mRNA reference database using the BWA [8]. The quality of the probe sequence does not decrease by position (as it would with real sequencing data), thus the aligner is executed without seeding (seed length > length of probe sequence). A step-wise alignment process is performed allowing at first no mismatches between probe sequence and reference sequence. Next, all unaligned probe sequences are resubmitted for alignment with the number of allowed mismatches increased by one. Array probe sequences that do not align to the mRNA reference database with a maximum of four mismatches are mapped to the whole genome using the same step-wise approach. Alignments are stored in SAM format [9] where all alternative best hits are included. Successful alignments to the mRNA reference database are processed according to the following steps: Alignments starting or ending with a mismatch, insertion or deletion are trimmed Alignment positions in the in silico mRNA are mapped to the genome position Alignments are discarded if the alignment strand does not match the original coding strand information of the RefSeq database Likewise, successful alignments to the genome sequence are processed according the following steps: Steps (1) and (3) of mRNA reference alignments procedure above Gene annotation is added using the genome annotation tool ANNOVAR [10] Alignments that are annotated to be in intergenic or up- and downstream regions (defined as 1 kb distance to the transcription start or end site) are excluded After these steps for each mapped probe we know its genomic positions and which genes in the RefSeq database are located at the probe’s genomic location. Optionally, the location of SNPs in mapped probe sequence can be added to the annotation.

Annotation post-processing

In order to ensure that a probe is specific for one genomic region, we eliminate probes with multiple hits that cannot be assigned to the same gene, i.e., we recommend that coordinates of different hits should be not more than 25 bp apart (default for Illumina arrays) from each other. This threshold is largely arbitrary (half of the probe length of Illumina probes) and it may remove a few relevant probes, but it will guarantee that the remaining probes are aligned uniquely to a distinct region. Further, we require that probes with multiple annotations should be aligned in the same direction (alternate haplotype regions of the original assembly were ignored). The final re-annotation is provided with additional information on the probe gene symbol and updated position (also for splice-annotations).

System requirements

Re-Annotator comprises shell and Perl scripts. In order to successfully execute the Re-Annotator pipeline following software should be pre-installed on the system: the Burrows-Wheeler Aligner (BWA) [8], SAMtools [9], ANNOVAR [10] and Perl. In addition, following databases should be available: the genome assembly of interest (e.g., hg19), a corresponding gene annotation table (e.g., RefSeq), the microarray probe sequences to be annotated and optionally a SNP database (e.g., dbSNP or 1000 genomes).

Re-Annotation of Illumina Gene Expression Microarrays

We applied the Re-Annotator pipeline to the probe sequences of Illumina HumanHT-12 v4 (http://support.illumina.com/array/array_kits/humanht-12_v4_expression_beadchip_kit/downloads.html) and Human MouseRef-8 v2 (http://support.illumina.com/array/array_kits/mouseref-8_v2_expression_beadchip_kit/downloads.html). As a reference genome we used hg19 and mm9, respectively, obtained from the UCSC Genome Browser [11]. The mRNA reference databases were built on the RefSeq database versions provided by the UCSC Genome Browser (downloaded January 2012). Furthermore, we used the SNPs for the central European population (CEU) from the 1000 genomes database [12] for annotating SNPs in mapped probe sequences.

Competitive Comparison of Re-Annotator on Experimental Gene Expression Data

We re-annotated the probe sequences of the HumanHT-12 v4 chip with ReMOAT [3] another re-annotation software in order to compare Re-Annotator to a genome only re-annotation approach. ReMOAT is based on a BLAST search against the corresponding genome and all transcripts defined in the UCSC annotation tables. For this head-to-head comparison we consider “good” probes to be rated as “good” or “excellent” by ReMOAT and probes with a uniquely identified gene by Re-Annotator; “bad” probes are the remaining probes. Further, we used experimental data to quantify the differences between different annotations and to highlight the effects of different annotations on experimental results.

Experimental data and pre-processing

We reanalyzed gene expression profiles in whole blood cells from 36 individuals at baseline (i.e., without any stimulation) hybridized to the Illumina HumanHT-12 v4 microarray chips (Gene Expression Omnibus Accession: GSE64930). First, each probe was independently filtered using a detection p-value of 0.01 in at least 18 subjects (50%), leaving 13,610 expressed probes for further analysis. Secondly, intensity values were transformed and normalized through variance stabilization and normalization (VSN) [13].

Comparison of Re-Annotator and ReMOAT

We compared Re-Annotator and ReMOAT using two different measures. The first measure examines the variance of probes in different groups and the second measure is concerned with whether probes passed the detection threshold or not. Using the real-world dataset we grouped the probes in five categories. Categories 1 and 2 served as reference and comprise probes that were annotated as “good” and as “bad” by both tools, respectively. We refer to category 1 and 2 as “Both” and “None”, respectively. Category 3 comprised probes that were annotated as “good” by Re-Annotator and as “bad” by ReMOAT, while category 4 comprised probes that were annotated as “bad” by Re-Annotator and as “good” by ReMOAT. Categories 3 and 4 are referred to as “InRA” and “InRM”, respectively. Category 5 comprised probes that are marked as repeats by ReMOAT using RepeatMasker (www.repeatmasker.org) (“InRepeat”). We compared the mean relative variability (coefficient of variation = CV) of probes in categories 3–5 to the mean CV in both reference categories 1 and 2 using a two-sided Wilcoxon test. For the second measure we hypothesized that probes annotated as “bad” are less likely to pass the detection criteria, i.e., they are more likely considered not expressed in the experiment. For each annotation tool we computed the 2x2 contingency table between annotation (“good” and “bad”) and expression status (“expressed” and “not expressed”) and derived the odds ratio (OR) along with the 95% confidence interval (CI). Next, we compared the ORs for Re-Annotator and ReMOAT.

Results

Re-Annotator improves annotation of human probes compared to manufacturer

We analyzed the Illumina HumanHT-12 v4 probe sequences using the Re-Annotator Pipeline. Of all 47,230 probe sequences, 95% (Fig 2A; Table 1) were aligned to either our custom-built mRNA sequence database (n = 34,277; in the first alignment step) or if no hit was found to the reference genome (n = 10,661; in the second alignment step). A large fraction of the latter probe sequences were aligned to genomic locations without any known transcribed gene (n = 7,493). After the post-processing filter, 77.7% of all aligned probe sequences (see Table 1) mapped to a distinct region (defined as a maximum of 25 bp distance between multiple hits for the same sequence) in the genome and were included in the final annotation file for the HumanHT-12 v4 BeadChip array. This set of 34,936 probes is referred to as “reliable” array probes in the following. The majority (93.8%) of those reliable probes (Fig 2B; Table 1) were aligned without mismatches. The number of hits per probe to a region ranged from 1 to 32, where 67.7% had only one unique hit and 96% (n = 33,539) had less than five hits (Fig 2C). The vast majority of reliable probes (92.1%; Fig 2D; Table 1) resided in regions without known SNPs in the Caucasian population (based on the 1,000 Genomes Project). It is conceivable that SNPs within the probe sequence may be the source of “differential” expression via altered hybridization efficiency. However, Schurmann et al. [4] reported no consistent effects of SNPs located in array probe sequences on hybridization efficiency. Thus, one has to test individually whether these SNPs are associated with alternate expression signals intensity.

Fig 2

Results of the re-annotation of Illumina HumanHT-12 v4 probe sequences.

(A) Barplot of the alignment basis. The left bar represents the array probe sequences that could be aligned to the in silico mRNA reference database. The middle bar represents sequences that were aligned to the whole genome reference subdivided into genic and intergenic alignments. The right bar represents unaligned sequences. Histograms showing (B) the number of mismatches between probe sequence and reference, (C) the number of equally top scoring best hits per probe sequence and (D) the number of SNPs (in the 1000 genomes data) within an aligned probe sequence. (E) Histogram of the annotation of probes, which have no annotation according to the manufacture and now have been rescued and reliably annotated. (F) Histogram showing the changes in annotation from the manufacture to our re-annotation (Manufacture | Re-Annotator).

Table 1

Number of probe sequences at different steps of the re-annotation for two Illumina chips.

	HumanHT-12 v4	MouseRef-8 v2
Total	47,132	25,697
Aligned	44,938	25,542
Post-processing (“good”)	34,936	24,799
No mismatch	32,754	NA
Single alignment hit	23,661	23,187

Results of the re-annotation of Illumina HumanHT-12 v4 probe sequences.

Number of probe sequences at different steps of the re-annotation for two Illumina chips.

Detailed numbers for probes tat could be aligned (Aligned), that were considered “good” probes after the post-processing, and were aligned without mismatches to the reference genome, or were unique alignments. Roughly 23.5% (n = 11,086) of all Illumina probes were not annotated with probe coordinates and gene symbols by the manufacturer. Re-Annotator rescued and reliably annotated about 36,6% (n = 4,062) of these previously un-annotated probes. Seventy-two percent (n = 2,939) of these probes were annotated to exonic regions and the rest to intronic regions (see Fig 2E). A total of 36,144 probes had a complete Illumina annotation. Re-Annotator provided an annotation differing from the manufacturer’s for 21.5% (n = 7,789) of these probes (see Fig 2F). Thus in summary Re-Annotator provided an updated annotation for one quarter (25%) of all probes on the Illumina HumanHT-12 v4 chip.

Re-Annotator refines annotation of mouse probes compared to manufacturer

Additionally, we analyzed the Illumina MouseRef-8 v2 probe sequences (n = 25,697) using our Re-Annotator pipeline. Almost all probes were aligned (99.4%; Fig 3A; Table 1) to either the mRNA reference database (n = 24,994) or the genome sequence (n = 548) and passed the post-processing filter (Table 1). 97.9% of all post-processed array probe sequences were aligned without mismatches (Fig 3B; Table 1) and 93.5% mapped to a single region with only 14 probes having five or more hits per probe (Fig 3C; Table 1). The successful re-annotation of the mouse microarray can be explained by the reduced array content to only RefSeq genes, providing good transcriptomic annotation quality.

Fig 3

Results of the re-annotation of Illumina MouseRef-8 v2 probe sequences.

Results of the re-annotation of Illumina MouseRef-8 v2 probe sequences.

(A) Barplot of the alignment basis. The left bar represents the array probe sequences that could be aligned to the in silico mRNA reference database. The middle bar represents sequences that were aligned to the whole genome reference subdivided into genic and intergenic alignments. The right bar represents unaligned sequences. Histograms showing (B) the number of mismatches between probe sequence and reference and (C) the number of equally top scoring best hits per probe sequence. Comparing Re-Annotator to ReMOAT, another re-annotation tool, we found that 86.3% (n = 29,759) of probe sequences annotated as "good" by ReMOAT (quality equal to”Perfect” and “Good”; n = 34,476 sequences) were also classified as “good” probes by Re-Annotator (Fig 4A). However, 5% (n = 1,532) of these 29,795 array probes received different annotations by the two tools. A total of 4,717 probes, which were annotated as “good” by ReMOAT, were excluded by Re-Annotator due to alignments in intergenic regions (63.7%), multiple different hits in the mRNA reference (26%) and no alignment (10.3%) (Fig 4B).

Fig 4

Comparison between Re-Annotator and ReMOAT on HumanHT-12 v4 probe sequences.

(A) Venn diagram representing the overlap of transcripts annotated with ReMOAT and Re-Annotator. A total of 5% of the probe sequences were annotated with different genes by Re-Annotator and ReMOAT. (B) Bar graph detailing the exclusion reason for probes included by ReMOAT but excluded by Re-Annotator.

Comparison between Re-Annotator and ReMOAT on HumanHT-12 v4 probe sequences.

Re-Annotator provides probes with less variable intensity then ReMOAT

We computed the coefficient of variance (CV) for each probe across all 36 subjects and compared the mean CV of the probes in the categories InRA, InRM and InRepeat to categories Both and None. Table 2 lists the group sizes (number of probes within a group) of the five categories for all expressed probes along with the estimated CV (mean and standard deviation). InRA has a significantly reduced CV compared to “None” (P = 0.002, two-sided Wilcoxon test) but not to “Both” (P = 0.69, two-sided Wilcoxon test). The reverse holds true for InRM (P = 9.58x10-6 for “Both”; P = 0.85 for “None”, two-sided Wilcoxon test). Suggesting that probes retained only by Re-Annotator behaved more like probes retained by both tools than probes retained only by ReMOAT. Further, our experiment showed that probes in repeat regions displayed an unchanged CV compared to “Both” (P = 0.17 two-sided Wilcoxon test) but a significantly reduced CV compared to “None” (P = 9.82x10-5, two-sided Wilcoxon test).

Table 2

Group size and relative variability for the five probe categories.

Rows three and four compare the variance for probes in categories InRA, InRM and InRepeat to the two reference categories Both and None, respectively, using a two-sided Wilcoxon test.

	Both	None	InRA	InRM	InRepeat
Number of probes	9,236	377	1,208	433	1,176
Mean CV (SD)	0.1709 (0.002)	0.1714 (0.003)	0.1711 (0.002)	0.1716 (0.005)	0.1709 (0.002)
P-value (Wilcoxon test statistic) wrt. None	NA	NA	0.0026 (W = 204,340)	0.85 (W = 82,264)	9.82 x10^-5 (W = 192,160)
P-value (Wilcoxon test statistic) wrt. Both ^a	NA	NA	0.69 (W = 736,370)	9.58x10^-6 (W = 110,040)	0.17 (W = 668,970)

aFor the test we drew a subsample of the “Both” group to match the sample size of the corresponding comparison group.

Group size and relative variability for the five probe categories.

Rows three and four compare the variance for probes in categories InRA, InRM and InRepeat to the two reference categories Both and None, respectively, using a two-sided Wilcoxon test. aFor the test we drew a subsample of the “Both” group to match the sample size of the corresponding comparison group.

Re-Annotator provides probes that are more likely to be expressed

Table 3 lists the two 2x2 contingency tables for expression status and annotation by the two tools. Based on these values, probes annotated as “bad” were less likely to be expressed. However, the effect was significantly stronger for Re-Annotator (OR: 3.9; 95% CI: 3.65–4.16) than for ReMOAT (OR: 2.11; 95% CI: 2.01–2.23) (P<0.05; due to lack of overlap in the CIs).

Table 3

Contingency tables between expression status and annotation for Re-Annotator and ReMOAT.

	Re-Annotator		ReMOAT
	Good	Bad	Good	Bad
Expressed	12,321	1,227	11,431	2,117
Not Expressed	22,615	8,775	22,559	8,831

Discussion

A precise annotation of microarray probe sequences is essential for accurate biological findings and replicability. In this work, we present a pipeline to re-annotate probe sequences of gene expression microarrays using a custom-built mRNA reference and applied it to three Illumina BeadChip arrays (Human HT-12 v3, v4 and MouseRef-8 v2). The re-annotation revealed that indeed one quarter of the array probes were incompletely or incorrectly annotated by the manufacturer. A source of such mis-annotation may be due to changes in genome assembly or changes in exon/intron boundaries since the original design of the chip. Over 21% of re-annotated probes were assigned to different genes as given by the manufacturer. For example, three of the five Illumina HumanHT-12 v4 array probe sequences illustrated in Fig 5 all perfectly re-annotated within the first or second exon of the human gene ISCA1 on chromosome 9 using the Re-Annotator. Originally these probes were annotated on chromosome 5 within an intergenic region (Fig 5). A reason for this discrepancy was that the probe sequences were designed using an older assembly version (hg18). In this release, the region on chromosome 5 was annotated with the gene ISCA1L. The three probes, however, also have a perfect match on chromosome 9 in the ISCA1 gene. In the new release (hg19), the ISCA1L gene was removed, i.e., the region on chromosome 5 is without annotation, and therefore Re-Annotator selected the region on chromosome 9 in the ISCA1 gene. Hence, it is important to keep the annotation tables of the probes up-to-date. ReMOAT, based on a genomic alignment, placed these probes in accordance with the Re-Annotator annotation (Fig 5). We recommend checking all given probe sequence annotations (second matches as well as other given genomic matches) also when using the ReMOAT annotation, as the given genomic location might be incorrect. Such an example is illustrated in Fig 6; the probe sequence was allocated to an intergenic region. We annotated this probe sequence to be on chromosome 17 within an exon of ABCA9, which was in accordance with the second match of the ReMOAT annotation.

Fig 5

USCS genome browser graphic for the human ISCA1 gene.

The gene is located on chromosome 9; the targeting Illumina probes are ILMN_1715931, ILMN_1672024, ILMN_2219556, ILMN_1675421 and ILMN_2341711. Custom tracks represent the probe sequences annotated by (A) the Re-Annotator, (B) manufacturer and (C) ReMOAT.

Fig 6

USCS genome browser graphic for the human ABCA9 gene.

The gene is located on chromosome 9; the targeting Illumina probe is ILMN_1651396. Custom tracks represent the probe sequences annotated by (A) the Re-Annotator and (B) manufacturer and ReMOAT.

USCS genome browser graphic for the human ISCA1 gene.

USCS genome browser graphic for the human ABCA9 gene.

The gene is located on chromosome 9; the targeting Illumina probe is ILMN_1651396. Custom tracks represent the probe sequences annotated by (A) the Re-Annotator and (B) manufacturer and ReMOAT. Furthermore, the Re-Annotator conducts no filtering based on the RepeatMasker as recommended by Barbosa-Morais et al. [3]. However, we found many regions marked by this algorithm to be perfectly mappable, and filtering may eliminate data on important genes. An example is a probe located within the FKBP5 gene (ILNM_1778444). When applying repeat masking, this probe is marked as unreliable since it is located within a short interspersed nuclear element (SINE). Still, there are no issues of mappability or uniqueness; thus, the probe should not be excluded from further analysis. Approximately 74% of all human probes present on the latest Illumina gene expression array (HumanHT-12 v4) were uniquely allocated to one gene locus. Such a re-annotation is important for removing uninformative probes, such as probes that cannot be placed into a distinct region, before starting differential gene expression analysis. This increases specificity of an analysis and will decrease the false discovery rate. With our pipeline we closed these gaps and compensated for wrong annotations. A thorough re-annotation of probe sequences is not a standard part of gene expression microarray analysis. To highlight its profound effect, we applied our pipeline to Illumina BeadChip Human HT-12 v4 and compared it to the Illumina annotation as well as to the ReMOAT annotation. We discovered that the Human HT-12 v4 re-annotation differs substantially from the annotations provided by Illumina and ReMoat (by 25% and 16%, respectively). Our pipeline improves the probe annotation and proves to be an essential step in producing high quality microarray results. The Re-Annotator pipeline is freely available at http://sourceforge.net/projects/reannotator along with re-annotated files for Illumina microarrays HumanHT-12 v3/v4 and MouseRef-8 v2.

13 in total

1. The human genome browser at UCSC.

Authors: W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal: Genome Res Date: 2002-06 Impact factor: 9.043

2. The Sequence Alignment/Map format and SAMtools.

Authors: Heng Li; Bob Handsaker; Alec Wysoker; Tim Fennell; Jue Ruan; Nils Homer; Gabor Marth; Goncalo Abecasis; Richard Durbin
Journal: Bioinformatics Date: 2009-06-08 Impact factor: 6.937

3. Analyzing illumina gene expression microarray data from different tissues: methodological aspects of data analysis in the metaxpress consortium.

Authors: Claudia Schurmann; Katharina Heim; Arne Schillert; Stefan Blankenberg; Maren Carstensen; Marcus Dörr; Karlhans Endlich; Stephan B Felix; Christian Gieger; Harald Grallert; Christian Herder; Wolfgang Hoffmann; Georg Homuth; Thomas Illig; Jochen Kruppa; Thomas Meitinger; Christian Müller; Matthias Nauck; Annette Peters; Rainer Rettig; Michael Roden; Konstantin Strauch; Uwe Völker; Henry Völzke; Simone Wahl; Henri Wallaschofski; Philipp S Wild; Tanja Zeller; Alexander Teumer; Holger Prokisch; Andreas Ziegler
Journal: PLoS One Date: 2012-12-07 Impact factor: 3.240

4. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score.

Authors: Hayan Lee; Michael C Schatz
Journal: Bioinformatics Date: 2012-06-04 Impact factor: 6.937

5. Fast computation and applications of genome mappability.

Authors: Thomas Derrien; Jordi Estellé; Santiago Marco Sola; David G Knowles; Emanuele Raineri; Roderic Guigó; Paolo Ribeca
Journal: PLoS One Date: 2012-01-19 Impact factor: 3.240

6. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

7. A re-annotation pipeline for Illumina BeadArrays: improving the interpretation of gene expression data.

Authors: Nuno L Barbosa-Morais; Mark J Dunning; Shamith A Samarajiwa; Jeremy F J Darot; Matthew E Ritchie; Andy G Lynch; Simon Tavaré
Journal: Nucleic Acids Res Date: 2009-11-18 Impact factor: 16.971

8. An anatomically comprehensive atlas of the adult human brain transcriptome.

Authors: Michael J Hawrylycz; Ed S Lein; Angela L Guillozet-Bongaarts; Elaine H Shen; Lydia Ng; Jeremy A Miller; Louie N van de Lagemaat; Kimberly A Smith; Amanda Ebbert; Zackery L Riley; Chris Abajian; Christian F Beckmann; Amy Bernard; Darren Bertagnolli; Andrew F Boe; Preston M Cartagena; M Mallar Chakravarty; Mike Chapin; Jimmy Chong; Rachel A Dalley; Barry David Daly; Chinh Dang; Suvro Datta; Nick Dee; Tim A Dolbeare; Vance Faber; David Feng; David R Fowler; Jeff Goldy; Benjamin W Gregor; Zeb Haradon; David R Haynor; John G Hohmann; Steve Horvath; Robert E Howard; Andreas Jeromin; Jayson M Jochim; Marty Kinnunen; Christopher Lau; Evan T Lazarz; Changkyu Lee; Tracy A Lemon; Ling Li; Yang Li; John A Morris; Caroline C Overly; Patrick D Parker; Sheana E Parry; Melissa Reding; Joshua J Royall; Jay Schulkin; Pedro Adolfo Sequeira; Clifford R Slaughterbeck; Simon C Smith; Andy J Sodt; Susan M Sunkin; Beryl E Swanson; Marquis P Vawter; Derric Williams; Paul Wohnoutka; H Ronald Zielke; Daniel H Geschwind; Patrick R Hof; Stephen M Smith; Christof Koch; Seth G N Grant; Allan R Jones
Journal: Nature Date: 2012-09-20 Impact factor: 49.962

9. An integrated map of genetic variation from 1,092 human genomes.

Authors: Goncalo R Abecasis; Adam Auton; Lisa D Brooks; Mark A DePristo; Richard M Durbin; Robert E Handsaker; Hyun Min Kang; Gabor T Marth; Gil A McVean
Journal: Nature Date: 2012-11-01 Impact factor: 49.962

10. Model-based variance-stabilizing transformation for Illumina microarray data.

Authors: Simon M Lin; Pan Du; Wolfgang Huber; Warren A Kibbe
Journal: Nucleic Acids Res Date: 2008-01-04 Impact factor: 16.971

33 in total

1. Brain areas affected by intranasal oxytocin show higher oxytocin receptor expression.

Authors: Philippe C Habets; Christabel Mclain; Onno C Meijer
Journal: Eur J Neurosci Date: 2021-09-16 Impact factor: 3.698

2. Genetic influences on hub connectivity of the human connectome.

Authors: Ben D Fulcher; Stuart Oldham; Aurina Arnatkeviciute; Jeggan Tiego; Casey Paquola; Zachary Gerring; Kevin Aquino; Ziarih Hawi; Beth Johnson; Gareth Ball; Marieke Klein; Gustavo Deco; Barbara Franke; Mark A Bellgrove; Alex Fornito
Journal: Nat Commun Date: 2021-07-09 Impact factor: 14.919

3. Sigmoid colon mucosal gene expression supports alterations of neuronal signaling in irritable bowel syndrome with constipation.

Authors: Elizabeth J Videlock; Swapna Mahurkar-Joshi; Jill M Hoffman; Dimitrios Iliopoulos; Charalabos Pothoulakis; Emeran A Mayer; Lin Chang
Journal: Am J Physiol Gastrointest Liver Physiol Date: 2018-03-22 Impact factor: 4.871

4. Genome-wide association studies of autoimmune vitiligo identify 23 new risk loci and highlight key pathways and regulatory variants.

Authors: Ying Jin; Genevieve Andersen; Daniel Yorgov; Tracey M Ferrara; Songtao Ben; Kelly M Brownson; Paulene J Holland; Stanca A Birlea; Janet Siebert; Anke Hartmann; Anne Lienert; Nanja van Geel; Jo Lambert; Rosalie M Luiten; Albert Wolkerstorfer; J P Wietze van der Veen; Dorothy C Bennett; Alain Taïeb; Khaled Ezzedine; E Helen Kemp; David J Gawkrodger; Anthony P Weetman; Sulev Kõks; Ele Prans; Külli Kingo; Maire Karelson; Margaret R Wallace; Wayne T McCormack; Andreas Overbeck; Silvia Moretti; Roberta Colucci; Mauro Picardo; Nanette B Silverberg; Mats Olsson; Yan Valle; Igor Korobko; Markus Böhm; Henry W Lim; Iltefat Hamzavi; Li Zhou; Qing-Sheng Mi; Pamela R Fain; Stephanie A Santorico; Richard A Spritz
Journal: Nat Genet Date: 2016-10-10 Impact factor: 38.330

5. A mega-analysis of expression quantitative trait loci (eQTL) provides insight into the regulatory architecture of gene expression variation in liver.

Authors: Tobias Strunz; Felix Grassmann; Javier Gayán; Satu Nahkuri; Debora Souza-Costa; Cyrille Maugeais; Sascha Fauser; Everson Nogoceke; Bernhard H F Weber
Journal: Sci Rep Date: 2018-04-12 Impact factor: 4.379

Review 6. Dissecting diagnostic heterogeneity in depression by integrating neuroimaging and genetics.

Authors: Amanda M Buch; Conor Liston
Journal: Neuropsychopharmacology Date: 2020-08-11 Impact factor: 8.294

7. Novel multiple sclerosis susceptibility loci implicated in epigenetic regulation.

Authors: Till F M Andlauer; Dorothea Buck; Gisela Antony; Antonios Bayas; Lukas Bechmann; Achim Berthele; Andrew Chan; Christiane Gasperi; Ralf Gold; Christiane Graetz; Jürgen Haas; Michael Hecker; Carmen Infante-Duarte; Matthias Knop; Tania Kümpfel; Volker Limmroth; Ralf A Linker; Verena Loleit; Felix Luessi; Sven G Meuth; Mark Mühlau; Sandra Nischwitz; Friedemann Paul; Michael Pütz; Tobias Ruck; Anke Salmen; Martin Stangel; Jan-Patrick Stellmann; Klarissa H Stürner; Björn Tackenberg; Florian Then Bergh; Hayrettin Tumani; Clemens Warnke; Frank Weber; Heinz Wiendl; Brigitte Wildemann; Uwe K Zettl; Ulf Ziemann; Frauke Zipp; Janine Arloth; Peter Weber; Milena Radivojkov-Blagojevic; Markus O Scheinhardt; Theresa Dankowski; Thomas Bettecken; Peter Lichtner; Darina Czamara; Tania Carrillo-Roa; Elisabeth B Binder; Klaus Berger; Lars Bertram; Andre Franke; Christian Gieger; Stefan Herms; Georg Homuth; Marcus Ising; Karl-Heinz Jöckel; Tim Kacprowski; Stefan Kloiber; Matthias Laudes; Wolfgang Lieb; Christina M Lill; Susanne Lucae; Thomas Meitinger; Susanne Moebus; Martina Müller-Nurasyid; Markus M Nöthen; Astrid Petersmann; Rajesh Rawal; Ulf Schminke; Konstantin Strauch; Henry Völzke; Melanie Waldenberger; Jürgen Wellmann; Eleonora Porcu; Antonella Mulas; Maristella Pitzalis; Carlo Sidore; Ilenia Zara; Francesco Cucca; Magdalena Zoledziewska; Andreas Ziegler; Bernhard Hemmer; Bertram Müller-Myhsok
Journal: Sci Adv Date: 2016-06-17 Impact factor: 14.136

8. Conservation of Distinct Genetically-Mediated Human Cortical Pattern.

Authors: Qian Peng; Andrew Schork; Hauke Bartsch; Min-Tzu Lo; Matthew S Panizzon; Lars T Westlye; William S Kremen; Terry L Jernigan; Stephanie Le Hellard; Vidar M Steen; Thomas Espeseth; Matt Huentelman; Asta K Håberg; Ingrid Agartz; Srdjan Djurovic; Ole A Andreassen; Anders M Dale; Nicholas J Schork; Chi-Hua Chen
Journal: PLoS Genet Date: 2016-07-26 Impact factor: 5.917

9. Intergenerational trauma is associated with expression alterations in glucocorticoid- and immune-related genes.

Authors: Changxin Xu; Heather N Bader; Chris Chatzinakos; Nikolaos P Daskalakis; Peter Weber; Iouri Makotkine; Amy Lehrner; Linda M Bierer; Elisabeth B Binder; Rachel Yehuda
Journal: Neuropsychopharmacology Date: 2020-11-10 Impact factor: 7.853

10. Dynamical consequences of regional heterogeneity in the brain's transcriptional landscape.

Authors: Gustavo Deco; Morten L Kringelbach; Aurina Arnatkeviciute; Stuart Oldham; Kristina Sabaroedin; Nigel C Rogasch; Kevin M Aquino; Alex Fornito
Journal: Sci Adv Date: 2021-07-14 Impact factor: 14.136