| Literature DB >> 23615333 |
Ram Vinay Pandey1, Susanne U Franssen, Andreas Futschik, Christian Schlötterer.
Abstract
Estimating differences in gene expression among alleles is of high interest for many areas in biology and medicine. Here, we present a user-friendly software tool, Allim, to estimate allele-specific gene expression. Because mapping bias is a major problem for reliable estimates of allele-specific gene expression using RNA-seq, Allim combines two different strategies to account for the mapping biases. In order to reduce the mapping bias, Allim first generates a polymorphism-aware reference genome that accounts for the sequence variation between the alleles. Then, a sequence-specific simulation tool estimates the residual mapping bias. Statistical tests for allelic imbalance are provided that can be used with the bias corrected RNA-seq data.Entities:
Mesh:
Year: 2013 PMID: 23615333 PMCID: PMC3739924 DOI: 10.1111/1755-0998.12110
Source DB: PubMed Journal: Mol Ecol Resour ISSN: 1755-098X Impact factor: 7.090
Fig. 1Flowchart of the Allim pipeline. The five modules of the Allim pipeline are (1) Identification of fixed SNPs and creating two parental genomes, (2) Computer simulation of RNA-seq reads from both parental genotypes, (3) Estimation of the remaining mapping bias with simulated data, (4) Estimation of allele-specific expression in F1 and (5) Statistical test of significant allelic imbalance. All these five modules can be run with a single command. All input parameters can be specified in a single configuration file. This configuration file is one of two options. ‘AllimOptions_2Pexpr ‘is used when parental genomes have to be generated from parental expression or parental genomic short read data. ‘AllimOptions_2Pgenomes’ is used when two parental genomes are available.
Improvement of mapping success via genome modification (SNP inclusion). The performance of Allim was validated with experimental as well as simulated RNA-seq reads. The experimental data consisted of paired-end RNA-seq reads from males and females of two different isofemale lines (ps88 and ps94) of Drosophila pseudoobscura (Table 2). (An ‘isofemale line’ is established by a single female, typically caught and inseminated in the wild. Due to inbreeding over multiple generations, genetic heterozygosity in the line is reduced.) For the experiment, male and female flies from both lines were pooled and sequenced. Via Module 1 fixed SNPs between both parental lines were identified and used to create two parent-specific genomes. The simulation of reads was based on the two parental genomes (see Methods). The two parental genomes are later used as a reference to map F1 offspring RNA-seq reads
| Mapped single reads (%) | Improvement (% of total number of reads) | |||||||
|---|---|---|---|---|---|---|---|---|
| Total number of single reads | Before SNP adjustment | After SNP adjustment | ||||||
| RNA-Seq data | No. of reads p88 | No. of reads p94 | ps88 | ps94 | ps88 | ps94 | ps88 | ps94 |
| Female data | 79 981 000 | 79 998 000 | 91.18 | 92.15 | 91.49 | 92.22 | 0.31 | 0.07 |
| Male data | 76 877 000 | 79 207 000 | 85.97 | 86.73 | 86.19 | 87.05 | 0.22 | 0.32 |
| Simulated data | 122 682 000 | 122 682 000 | 90.94 | 90.96 | 91.05 | 91.03 | 0.11 | 0.07 |
Fig. 2Distribution of gene counts with percent mapping bias. In Drosophila pseudoobscura, approximately 96% of all genes (5820) show a residual mapping bias before SNP adjustment (blue bar), whereas only 11% of all genes (686) show a residual mapping bias after SNP adjustment (red bar). Biased genes show mapping biases of various strengths. In both cases, the majority of the biased genes 96% before and ∼69% (472 genes) after the SNP adjustment show only a weak residual mapping bias of ≤5%. The reduction in genes with mapping bias before and after SNP adjustment is significant (Fisher's exact test; P-value = 1e-06).
Number of paired-end RNA-seq reads of Drosophila pseudoobscura used for Allim validation. The data was generated on an Illumina GA IIx sequencer. The Drosophila pseudoobscura isofemale lines ps94 (stock number 14011-0121.94) and ps88 (stock number 14011-0121.88) were obtained from the UC San Diego Drosophila Stock Center. Flies were reared on standard cornmeal-molasses-yeast-agar medium and maintained at 19 °C under constant dark conditions. For each line, virgin females and virgin males were collected from 15 to 20 replicate vials, pooled and allowed to age for 3–7 days before shock-freezing in liquid nitrogen (Palmieri et al. 2012)
| Samples | Read pairs (in millions) | Insert size (bp) | Read length (bp) |
|---|---|---|---|
| ps88 males | 79.21 | 78 | 100 |
| ps88 females | 80.00 | 78 | 100 |
| ps94 males | 79.21 | 128 | 100 |
| ps94 females | 80.00 | 68 | 100 |
RNA-Seq data sets used to test accuracy of Allim to identify the parental origin of a read. The experimental data consisted of paired-end RNA-seq reads from males and females of two different isofemale lines of Drosophila pseudoobscura (Table 2). It can be seen that the experimental reads from line p88 were more often correctly identified by the pipeline. The slight discrepancy between the two strains reflects the fact that ps94 is derived from the strain that was used to generate the D. pseudoobscura reference genome
| Data set | No. of correctly identified reads, ps88 (%) | No. of correctly identified reads, ps94 (%) |
|---|---|---|
| Pooled reads from females of both lines | 99.97 | 98.96 |
| Pooled reads from males of both lines | 99.96 | 98.60 |
| Simulated reads for both parental genomes | 99.99 | 99.99 |
Comparison of various features of Allim to other available tools
| Features | AlleleSeq | MMSEQ | Allim |
|---|---|---|---|
| Inference of parental variants | No | Yes | Yes |
| Construction of polymorphism-aware diploid genome | Yes | Yes | Yes |
| Estimation and integration of residual mapping bias | No | No | Yes |
| Statistical test for Allelic imbalance | Yes | No | Yes |
| Use replicate information for statistical testing | No | Not applicable | Yes |
| ASE exon wise/per isoform | No | Yes | Yes |
| Mapper used | Bowtie | Bowtie | GSNAP |
| Single command to run whole pipeline | No | No | Yes |
Rozowsky et al. (2011).
Turro et al. (2011).