| Literature DB >> 35175196 |
Richard J White1, Eirinn Mackay2, Stephen W Wilson2, Elisabeth M Busch-Nentwich1,3.
Abstract
In model organisms, RNA-sequencing (RNA-seq) is frequently used to assess the effect of genetic mutations on cellular and developmental processes. Typically, animals heterozygous for a mutation are crossed to produce offspring with different genotypes. Resultant embryos are grouped by genotype to compare homozygous mutant embryos to heterozygous and wild-type siblings. Genes that are differentially expressed between the groups are assumed to reveal insights into the pathways affected by the mutation. Here we show that in zebrafish, differentially expressed genes are often over-represented on the same chromosome as the mutation due to different levels of expression of alleles from different genetic backgrounds. Using an incross of haplotype-resolved wild-type fish, we found evidence of widespread allele-specific expression, which appears as differential expression when comparing embryos homozygous for a region of the genome to their siblings. When analysing mutant transcriptomes, this means that the differential expression of genes on the same chromosome as a mutation of interest may not be caused by that mutation. Typically, the genomic location of a differentially expressed gene is not considered when interpreting its importance with respect to the phenotype. This could lead to pathways being erroneously implicated or overlooked due to the noise of spurious differentially expressed genes on the same chromosome as the mutation. These observations have implications for the interpretation of RNA-seq experiments involving outbred animals and non-inbred model organisms.Entities:
Keywords: RNA-seq; allele-specific expression; chromosomes; developmental biology; differential expression; gene expression; mutants; zebrafish
Mesh:
Substances:
Year: 2022 PMID: 35175196 PMCID: PMC8884726 DOI: 10.7554/eLife.72825
Source DB: PubMed Journal: Elife ISSN: 2050-084X Impact factor: 8.140
Figure 1.Linkage disequilibrium (LD) mapping plot of up- and downregulated genes in u426 mutants shows a cluster of such genes local to the mutation site on chromosome 7.
The plots for each of the 25 chromosomes shows the allele balance (proportion of reads containing the alternative allele) of each single nucleotide polymorphism (SNP) locus along with its physical position. The blue and orange lines are LOESS-smoothed averages of the data. The green line is the absolute difference of the mutant and sibling samples and is used to identify the region of highest LD. Vertical lines indicate the position of differentially expressed genes.
Summary of logistic regression results for RNA-sequencing (RNA-seq) analysed mutant lines.
Causative mutation shows the gene and location of the mutation site in lines where this has been confirmed empirically, otherwise the location is estimated from linkage disequilibrium (LD) data. Significance column indicates adjusted p-value (***: < 0.001, **: < 0.01; *: < 0.05). Odds ratio compares DE likelihood at maximum LD versus site of median LD. The nearby genes column shows the number of DE genes lying within a 20 Mbp window centred on the mutation site, and the percentage of these genes out of the total DE genes. In-table citations: 1(Barlow et al., 2020), 2(Miesfeld et al., 2015), 3(Armant et al., 2016). nl14 line kindly provided by Alex Nechiporuk.
| Allele | Causative mutation | DE genes/total | Coefficient ± SEM | Sig. | Odds ratio | Nearby genes (%) |
|---|---|---|---|---|---|---|
|
| 12/31,664 | 9.09 ± 1.56 | *** | 118.5 | 3 (25%) | |
|
| 157/31,199 | 6.84 ± 0.46 | *** | 55.8 | 23 (15%) | |
|
| 71/31,199 | 8.72 ± 0.72 | *** | 44.0 | 13 (18%) | |
|
| Unpublished (chr23, 22 Mbp) | 33/31,199 | 6.31 ± 2.13 | ** | 7.8 | 1 (3%) |
|
| Not known (chr1, ~25 Mbp) | 87/31,664 | 4.83 ± 1.05 | *** | 5.4 | 4 (5%) |
|
| Not known (chr7, ~22 Mbp) | 209/31,664 | 2.67 ± 0.48 | *** | 5.3 | 15 (7%) |
|
| 140/31,199 | 2.58 ± 1.57 | – | 2.3 | 4 (3%) | |
|
| 348/24,558 | 3.77 ± 1.67 | * | 2.0 | 14 (4%) | |
|
| Not known (chr13, ~25 Mbp) | 294/31,663 | 0.35 ± 1.04 | – | 1.1 | 4 (1%) |
Figure 2.Enrichment of differentially expressed (DE) genes on the mutant chromosome.
(A) Ideogram showing the locations of the DE genes in a mitfa incross. Circles represent DE genes and are coloured red if the gene is upregulated in the mutant embryos and blue if it is downregulated. (B) Distribution of the total number of DE genes in experiments according to whether there is an enrichment on the mutant chromosome (orange) or not (blue), plotted on a log10 scale. (C) Plot of normalised counts according to genotype in an intercross of two different sox10 alleles. Yellow = wild type (+/+), orange = sox10 t3 heterozygotes (t3/+), blue = sox10 baz1 heterozygotes (+/baz1), purple = sox10 t3, baz1 compound heterozygotes (t3/baz1). The schematic below the plot shows the chromosomes contributing to each genotype. Embryos that share the wild-type allele inherited from the baz1/+ parent (yellow chromosome) show higher expression levels.
Position of DE genes in w2 (mitfa) mutant at Prim-5 (24 hr post-fertilisation [hpf]).
Number of DE genes for each experiment and whether the mutant chromosome shows an enrichment of DE genes.
Normalised counts for si:ch73-308m11.1 (ENSDARG00000039752) in sox10 t3/baz1 incross at Prim-5 (24 hr post-fertilisation [hpf]).
Figure 3.Allele-specific expression is common in wild-type embryos.
(A) Experimental design. Two wild-type SAT fish were incrossed and 96 embryos were collected for RNA-sequencing (RNA-seq) at 5 days post-fertilisation (dpf). Depending on the haplotypes of the parents, different combinations of genotype are possible in specific regions in the offspring. (B) The haplotypes of the collected embryos were determined in 1 Mbp bins using the RNA-seq reads and the embryos were grouped according to the haplotypes in specific regions. Chromosome 5 is shown with chromosomal position along the x-axis and samples on the y-axis. 1 Mbp bins are coloured according to the haplotype in that region. Blue = homozygous Tübingen (Tu/Tu), green = heterozygous AB/Tübingen (AB/Tu), orange = homozygous AB (AB/AB), dark grey = not consistent with parental haplotypes (NC), light grey = no haplotype call (NA), due to, for example, low coverage. Examples of regions used to group the embryos are boxed. Red ovals indicate regions containing recombination breakpoints in the samples labelled in (C). (C–D) Examples of differentially expressed genes from two different groupings. (C) Counts for the myhc4 gene, grouped according to the haplotypes in the region 5:31–37 Mbp (region 1 in B). The Tübingen allele is expressed at very low levels, with much higher expression in the heterozygotes. There are two examples of embryos with recombinations within the region. Compare to red ovals in the haplotype plot in (B). (D) Example of a differentially expressed gene (slc4a4a) in a region where all three genotypes are present (5:44–53 Mbp, region 2 in B). As in (C), the Tübingen allele has lower expression, with the heterozygotes showing intermediate levels. (E) Distribution of absolute log2(fold change) values found between wild-type alleles. Differences when comparing homozygous embryos (blue) are generally larger than when comparing heterozygotes to homozygotes (yellow).
Normalised counts for myhc4 (ENSDARG00000035438) in wild-type SAT cross.
Normalised counts for slc4a4a (ENSDARG00000013730) in wild-type SAT cross.
Log2 fold change data for differentially expressed genes in wild-type SAT cross.
(A) Representation of the parental haplotypes of the SAT cross across all 25 chromosomes (blue = Tu/Tu, green = AB/Tu, orange = AB/AB). The black box shows the region (Chr5:44–53 Mb) that was used to define the groups of embryos compared using DESeq2. The red triangles show the positions of the genes that are differentially expressed when using this sample grouping, most of which are in or close to the region. (B) Expanded plot of chromosome 5 from 40 to 60 Mb. The differentially expressed genes are labelled.
Figure 3—figure supplement 1.Allele-specific expression is linked to the region used to define the sample groupings.
(A) Representation of the parental haplotypes of the SAT cross across all 25 chromosomes (blue = Tu/Tu, green = AB/Tu, orange = AB/AB). The black box shows the region (Chr5:44–53 Mb) that was used to define the groups of embryos compared using DESeq2. The red triangles show the positions of the genes that are differentially expressed when using this sample grouping, most of which are in or close to the region. (B) Expanded plot of chromosome 5 from 40 to 60 Mb. The differentially expressed genes are labelled.
Figure 4.Effect of removing differentially expressed (DE) genes linked to the mutation under investigation.
(A) Distribution of the overlap between the Gene Ontology (GO) terms enriched when DE genes linked to the mutation are removed. GO term enrichment was done on both the DE gene list and the list with the genes on the same chromosome as the mutation removed (excluding the mutated gene itself). The lists of enriched GO terms were then compared and the Jaccard similarity coefficient (number of GO terms enriched in both sets/total number of enriched GO terms) calculated. Each point represents one experiment. Experiments are split according to whether the chromosome with the mutated gene has an enrichment of DE genes or not. Points are coloured by the number of DE genes identified in the experiment (log10 scale). (B) Plot showing the changes in GO term enrichment for a single experiment (sox10 incross at 36 hr post-fertilisation). Each point is an enriched GO term ranked by p-value (highest ranked terms at the top) and the lines connect the same terms if they are enriched using both gene lists (all genes or genes linked to the mutation removed). Unconnected points are terms that are only enriched for either the ‘all genes’ list (open circles) or for the ‘linked genes removed list’ (open squares). (C) Network diagram representation of the same GO enrichments as in (B). Each node represents a GO term, and the nodes are connected by an edge if the genes annotated to the term overlap sufficiently (Cohen’s kappa > 0.4). GO term nodes are coloured by whether they are enriched in both lists (orange) or just one (blue = all genes only, green = linked genes removed only). The shape of the nodes represents the GO term domain of the term (circle = biological process, square = cellular component, hexagon = molecular function).
Jaccard index for each experiment represented in the boxplot.
Enriched GO terms and their position in the list of GO terms by p-value.
Figure 5.Distinguishing mutation-dependent gene expression changes from allele-specific expression (ASE).
(A) Plot of normalised counts consistent with ASE. This shows either reduced expression from the allele on one of the wild-type chromosomes (white chromosome in the diagram under the plot) or increased expression from the allele on the t3 chromosome (red chromosome). Yellow = wild-types (+/+), orange = t3 heterozygotes (t3/+), blue = baz1 heterozygotes (+/baz1), purple = compound heterozygotes (t3/baz1). (B) Normalised counts consistent with a response to the sox10 mutations. The compound heterozygotes have reduced expression and the other two groups of heterozygotes are intermediate between the compound heterozygotes and the wild types. (C). Boxplots of the expression of all the differentially expressed (DE) genes on chromosome 3. These are split into two groups, those that are consistent with being downstream of sox10 (sox10-DE) and those that appear to be driven by allele-specific expression unrelated to sox10 (ASE-DE).
Normalised counts for polr3h (ENSDARG00000102590) in sox10 incross at Prim-5 (24 hr post-fertilisation [hpf]).
Normalised counts for vasnb (ENSDARG00000102565) in sox10 incross at Prim-5 (24 post-fertilisation [hpf]).
Normalised counts for the genes represented by the boxplots in Figure 5C (sox10 incross at Prim-5/24 hr post-fertilisation [hpf]).
The expression of this gene according to genotype is not consistent with a response to a recessive mutation. Yellow = wild types (+/+), orange = t3 heterozygotes (t3/+), blue = baz1 heterozygotes (+/baz1), purple = compound heterozygotes (t3/baz1).
Figure 5—figure supplement 1.Allele-specific expression not linked to the homozygous region.
The expression of this gene according to genotype is not consistent with a response to a recessive mutation. Yellow = wild types (+/+), orange = t3 heterozygotes (t3/+), blue = baz1 heterozygotes (+/baz1), purple = compound heterozygotes (t3/baz1).
| Reagent type (species) or resource | Designation | Source or reference | Identifiers | Additional information |
|---|---|---|---|---|
| Gene (zebrafish, |
| Ensembl | ENSDARG00000003732 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000077467 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000039752 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000035438 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000013730 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000102590 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000102565 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000016526 | |
| Gene (zebrafish, | ahsa1a | Ensembl | ENSDARG00000028664 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000110416 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000099371 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000099172 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000002564 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000074752 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000089616 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000044769 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000084991 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000104178 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000103923 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000074231 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000104193 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000060207 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000036625 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000112755 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000088820 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000113960 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000109888 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000111638 | |
| Gene (zebrafish, |
| Ensembl | ENSDARG00000093476 | |
| Strain, strain background (zebrafish, | AB | ZIRC | ZDB-GENO-960809–7 | |
| Strain, strain background (zebrafish, | Tübingen | ZIRC | ZDB-GENO-990623–3 | |
| Strain, strain background (zebrafish, | SAT | ZIRC | ZDB-GENO-100413–1 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-190501–298 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-100723–4 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–112 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–17955 | |
| Genetic reagent (zebrafish, |
| PMID: | Allele not cryopreserved | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-040907–2 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-130411–634 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT- | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–491 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT- | |
| Genetic reagent (zebrafish, |
| PMID: | Allele not cryopreserved | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–193 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–11078 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-070315–1 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT- | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-980413–693 | |
| Genetic reagent (zebrafish, |
| PMID: | Allele not cryopreserved | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–432 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-130411–3189 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-130411–4030 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT- | |
| Genetic reagent (zebrafish, |
| PMID: | Allele not cryopreserved | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-051012–8 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT- | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–60 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-190501–603 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-080401–1 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–10984 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120727–213 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–20015 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-980203–1317 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-980203–1438 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–351 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120727–150 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-130411–4850 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–11315 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–117 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–18049 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-130411–5422 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-990423–22 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-070730–10 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-980203–1049 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–10423 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–10 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-160721–33 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120727–92 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–12106 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–19656 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–18374 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–91 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–135 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-100506–17 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–18201 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–12995 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–11675 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–18326 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–10694 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–18015 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–342 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–200 | |
| Genetic reagent (zebrafish, |
| PMID: | Allele not cryopreserved | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-070131–1 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-980203–1827 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–10436 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–11003 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-001107–2 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-131217–17748 | |
| Genetic reagent (zebrafish, |
| PMID: | Allele not cryopreserved | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120727–140 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-070914–1 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–12704 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-160721–33 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-120411–459 | |
| Genetic reagent (zebrafish, |
| PMID: | Allele not cryopreserved | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–18689 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–18690 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-060602–2 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–20235 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-200207–2 | |
| Genetic reagent (zebrafish, |
| PMID: | ZDB-ALT-161003–11320 | |
| Software, algorithm | HISAT2 | PMID: | RRID: |
|
| Software, algorithm | featureCounts | PMID: | ||
| Software, algorithm | DESeq2 | PMID: | ||
| Software, algorithm | BCFTools | PMID: | RRID: |
|
| Software, algorithm | statsmodels |
|
| |
| Software, algorithm | DeTCT | PMID: | ||
| Software, algorithm | BWA |
| ||
| Software, algorithm | biobambam |
| ||
| Software, algorithm | mpileup | PMID: | ||
| Software, algorithm | QCALL | PMID: | ||
| Software, algorithm | GATK | PMID: | ||
| Software, algorithm | Tophat | PMID: | ||
| Software, algorithm | QoRTs | PMID: |