Literature DB >> 25592117

HyLiTE: accurate and flexible analysis of gene expression in hybrid and allopolyploid species.

Wandrille Duchemin1,2, Pierre-Yves Dupont3, Matthew A Campbell4, Austen R D Ganley5, Murray P Cox6.   

Abstract

BACKGROUND: Forming a new species through the merger of two or more divergent parent species is increasingly seen as a key phenomenon in the evolution of many biological systems. However, little is known about how expression of parental gene copies (homeologs) responds following genome merger. High throughput RNA sequencing now makes this analysis technically feasible, but tools to determine homeolog expression are still in their infancy.
RESULTS: Here we present HyLiTE - a single-step analysis to obtain tables of homeolog expression in a hybrid or allopolyploid and its parent species directly from raw mRNA sequence files. By implementing on-the-fly detection of diagnostic parental polymorphisms, HyLiTE can perform SNP calling and read classification simultaneously, thus allowing HyLiTE to be run as parallelized code. HyLiTE accommodates any number of parent species, multiple data sources (including genomic DNA reads to improve SNP detection), and implements a statistical framework optimized for genes with low to moderate expression.
CONCLUSIONS: HyLiTE is a flexible and easy-to-use program designed for bench biologists to explore patterns of gene expression following genome merger. HyLiTE offers practical advantages over manual methods and existing programs, has been designed to accommodate a wide range of genome merger systems, can identify SNPs that arose following genome merger, and offers accurate performance on non-model organisms.

Entities:  

Mesh:

Year:  2015        PMID: 25592117      PMCID: PMC4300824          DOI: 10.1186/s12859-014-0433-8

Source DB:  PubMed          Journal:  BMC Bioinformatics        ISSN: 1471-2105            Impact factor:   3.169


Background

While evolution is usually a gradual process, the creation of a new species through the merger of different parent species occurs near instantaneously [1]. Although increasingly recognized as an important process in the evolution of many biological systems [2-5], how different gene copies (homeologs) are expressed following genome merger remains a major outstanding question [6,7]. Most studies have been restricted to observing just a few genes, thus limiting the ability to study interactions between competing gene regulation systems [8]. High throughput mRNA sequencing now permits whole-genome screening of hybrid and allopolyploid gene expression [9,10]. However, identifying the parental origin of mRNA reads remains challenging, especially for researchers without advanced bioinformatics skills [11]. To fill this gap, we have developed HyLiTE – Hybrid Lineage Transcriptome Explorer – to produce tables of homeolog expression data from raw mRNA read files in a single step. HyLiTE automatically i) maps reads to a reference genome, ii) masks gene regions with low read coverage, iii) identifies polymorphisms that are diagnostic of parental lineages, iv) classifies reads to parental types, and v) produces detailed summary reports of gene expression in both the hybrid or allopolyploid and its parent species. The final product – tables of homeolog read counts – can be used immediately for downstream analyses (such as determining differential expression between biological conditions, and between the new species and its parents).

Implementation

The primary design directives behind HyLiTE were i) ease of use, ii) runtime efficiency, and iii) use with non-model organisms (which encompasses most hybrid and allopolyploid species). Other key features include: Accommodating any number of parent species (for instance, three-parent allopolyploids such as modern hexaploid wheat) [12]. The ability to study systems with both haploid or diploid parents, thus allowing hybrids or allopolyploids with different homeolog and allelic copies. Using gene references from any species closely related to the study system (hybrid and allopolyploid species often lack good genome resources). Accommodating any number of biological replicates (and boosting SNP identification by combining information across replicates). Identifying new polymorphisms that have arisen within the hybrid or allopolyploid (especially important in species derived from older merger events). Improving SNP calling by using (optional) genomic DNA information in addition to high throughput mRNA sequences. Providing statistical validation of SNP calls and automatically masking ‘polymorphisms’ with low statistical support. An experimental feature that identifies putative chimeric genes (i.e., genes in which the homeologs have recombined within the hybrid or allopolyploid) [13], but see Additional file 1 for details on current limits of accuracy. The standard HyLiTE analysis, which will be adequate for most users, comprises a single, short command line. However, advanced users have complete flexibility to override individual steps. For instance, by default, Bowtie2 is used for read mapping, but HyLiTE can be run with any mapping software that returns the standard SAM mapping format. Because HyLiTE analyzes each gene independently, the software has low RAM requirements and runtime is linear with the number of genes under study. This independence between genes also allows HyLiTE to be parallelized via optional executables (see project website for details; http://hylite.sourceforge.net). HyLiTE regularly autosaves the run state, and analyses can therefore be stopped and re-started from the last checkpoint. Extensive documentation about the algorithms implemented in HyLiTE, software validation and benchmarking against alternative pipelines is provided in Additional file 1.

Results and discussion

Output

The main output of HyLiTE comprises a list of read counts for each homeolog in each biological replicate. Using presence and absence of diagnostic parental SNPs, reads are classified as i) derived from a given parent, ii) consistent with two or more parents (i.e., lacking diagnostic SNPs), or iii) unknown (i.e., masked due to low read coverage). The last two classes are equally uninformative for determining homeolog expression, but can distinguish whether improvements may be possible with additional sequence data (the ‘unknown’ category) or whether part of the gene is simply uninformative for ancestry (no diagnostic parental SNPs identified). Finally, each read is marked with an additional flag if one or more new SNPs are detected within the hybrid or allopolyploid.

Software comparison

A major point of difference between HyLiTE and alternative approaches (e.g., PolyCat [14]) is its robust statistical assessment of SNP calls and automatic masking of ‘polymorphisms’ with low statistical support. Due to the substantial error rate of high throughput sequencing technologies, sequencing errors can easily be confused with genuine polymorphisms in genes with low expression (and hence, low read coverage). The probability that a polymorphism at any given nucleotide position is a SNP rather than an error is given by a binomial distribution conditioned on the coverage level. Nucleotides with coverage less than this threshold are masked, but because coverage varies widely across even a single gene, typically only small, uninformative regions of any given gene are masked. This ‘dynamic masking’ substantially improves the accuracy with which reads are assigned to homeologs for genes with low to moderate expression. Detection of expression levels can be improved further by including genomic DNA reads due to the accuracy this imparts to SNP calling (see Additional file 1 for details).

Worked examples

Fungi. Species in the fungal genera Epichloë and Neotyphodium, already well known for their symbiotic relationships with grasses in temperate pastoral systems, are increasingly becoming the dominant model system for studying genome merger in fungi [9,15,16]. The most well studied example is Lp1, an economically important allodiploid asexual species formed from the union of a haploid sexual species, E. typhina, and a haploid asexual species, N. lolii (∼5% divergence). As HyLiTE had not yet been developed, the Cox et al. study instead applied a two-reference approach: gene references were generated separately for E. typhina and N. lolii using ancestry informative SNPs, and homeolog expression was then ascertained via high stringency mapping. Although estimates of gene expression are highly correlated (r=0.83,P≪0.0001), HyLiTE assigns an average of five times as many reads to homeologs as the two-reference approach, an improvement almost entirely due to reduced gene masking (Figure 1A). 86% of reads are assigned to homeologs, with the remainder classified as parental uninformative or unknown. PolyCat [14] assigned fewer reads to homeologs (Figure 1B), particularly for genes with low to moderate expression (see Additional file 1 for details).
Figure 1

Comparison between HyLiTE and A) the results of the Cox study [ 9 ] and B) PolyCat [ 14 ] for fungal data. The black lines indicate the total number of reads that map to each gene, ranked by expression level. Green points indicate the number of reads assigned to homeologs by HyLiTE. Red points in A) indicate the number of reads assigned to homeologs in the Cox et al. study, while blue points in B) indicate the number of reads assigned to homeologs by PolyCat. The substantial improvement in read assignment by HyLiTE was obtained using its default settings.

Comparison between HyLiTE and A) the results of the Cox study [ 9 ] and B) PolyCat [ 14 ] for fungal data. The black lines indicate the total number of reads that map to each gene, ranked by expression level. Green points indicate the number of reads assigned to homeologs by HyLiTE. Red points in A) indicate the number of reads assigned to homeologs in the Cox et al. study, while blue points in B) indicate the number of reads assigned to homeologs by PolyCat. The substantial improvement in read assignment by HyLiTE was obtained using its default settings. Plants. To show application to a plant system, we also analyzed gene expression in a natural cotton allotetraploid, Gossypium hirsutum, together with diploid representatives of the A (G. arboreum) and D (G. raimondii) genomes (∼3% divergence) [10]. Assignment accuracy was tested by classifying known reads from the two diploid species. HyLiTE assigned reads to homeologs with a very low error rate (1.6%; see Additional file 1 for details). It also identified 46,206 new SNPs specific to G. hirsutum. Animals. Finally, we analyzed gene expression in a synthetic allotetraploid fish derived from diploid goldfish (Carassius auratus) and diploid common carp (Cyprinus carpio) (∼6% divergence) (NCBI BioProject accession number: PRJNA82763). The very small number of reads available per gene (an average of only 15) caused HyLiTE to reject most SNP calls and therefore classify the majority of reads as parentally uninformative. However, the reads for which sufficient information was available to assign parental ancestry showed a very low error rate (0.22%).

Conclusions

The formation of a new species from the merger of two or more different parent species is important in the evolutionary history of many eukaryotic lineages. Hybrid and allopolyploid species carry multiple copies of each gene (homeologs), and while homeolog expression levels can be determined from high throughput RNA sequence data, assigning reads is extremely challenging. Here, we have developed HyLiTE to automate the process of moving from raw mRNA sequence files to tables of homeolog expression in a hybrid or allopolyploid and its parent species. This single-step analysis is specifically designed for ease-of-use, particularly for non-computational scientists. HyLiTE therefore allows gene expression patterns to be explored on a whole-genome scale even for species with very complex patterns of genome merger.

Availability and requirements

Project name: HyLiTEProject home page: http://hylite.sourceforge.netOperating systems: Linux, OS X, WindowsProgramming language: PythonOther requirements: NoneLicense: GNU GPL v. 3.0Any restrictions to use by non academics: None
  16 in total

Review 1.  Genome evolution in polyploids.

Authors:  J F Wendel
Journal:  Plant Mol Biol       Date:  2000-01       Impact factor: 4.076

Review 2.  Novel patterns of gene expression in polyploid plants.

Authors:  Keith L Adams; Jonathan F Wendel
Journal:  Trends Genet       Date:  2005-10       Impact factor: 11.639

Review 3.  Consequences of genome duplication.

Authors:  Marie Sémon; Kenneth H Wolfe
Journal:  Curr Opin Genet Dev       Date:  2007-11-19       Impact factor: 5.578

Review 4.  Evolutionary genetics of genome merger and doubling in plants.

Authors:  Jeff J Doyle; Lex E Flagel; Andrew H Paterson; Ryan A Rapp; Douglas E Soltis; Pamela S Soltis; Jonathan F Wendel
Journal:  Annu Rev Genet       Date:  2008       Impact factor: 16.830

Review 5.  The role of hybridization in plant speciation.

Authors:  Pamela S Soltis; Douglas E Soltis
Journal:  Annu Rev Plant Biol       Date:  2009       Impact factor: 26.379

Review 6.  Homoeologous recombination in allopolyploids: the polyploid ratchet.

Authors:  Robert T Gaeta; J Chris Pires
Journal:  New Phytol       Date:  2009-12-01       Impact factor: 10.151

7.  Preferential subfunctionalization of slow-evolving genes after allopolyploidization in Xenopus laevis.

Authors:  Marie Sémon; Kenneth H Wolfe
Journal:  Proc Natl Acad Sci U S A       Date:  2008-06-09       Impact factor: 11.205

8.  Origin of a fungal symbiont of perennial ryegrass by interspecific hybridization of a mutualist with the ryegrass choke pathogen, Epichloë typhina.

Authors:  C L Schardl; A Leuchtmann; H F Tsai; M A Collett; D M Watt; D B Scott
Journal:  Genetics       Date:  1994-04       Impact factor: 4.562

9.  Two rounds of whole genome duplication in the ancestral vertebrate.

Authors:  Paramvir Dehal; Jeffrey L Boore
Journal:  PLoS Biol       Date:  2005-09-06       Impact factor: 8.029

10.  An interspecific fungal hybrid reveals cross-kingdom rules for allopolyploid gene expression patterns.

Authors:  Murray P Cox; Ting Dong; Genggeng Shen; Yogesh Dalvi; D Barry Scott; Austen R D Ganley
Journal:  PLoS Genet       Date:  2014-03-06       Impact factor: 5.917

View more
  8 in total

1.  A Robust Methodology for Assessing Differential Homeolog Contributions to the Transcriptomes of Allopolyploids.

Authors:  J Lucas Boatwright; Lauren M McIntyre; Alison M Morse; Sixue Chen; Mi-Jeong Yoo; Jin Koh; Pamela S Soltis; Douglas E Soltis; W Brad Barbazuk
Journal:  Genetics       Date:  2018-09-13       Impact factor: 4.562

2.  Breaking Free: The Genomics of Allopolyploidy-Facilitated Niche Expansion in White Clover.

Authors:  Andrew G Griffiths; Roger Moraga; Marni Tausen; Vikas Gupta; Timothy P Bilton; Matthew A Campbell; Rachael Ashby; Istvan Nagy; Anar Khan; Anna Larking; Craig Anderson; Benjamin Franzmayr; Kerry Hancock; Alicia Scott; Nick W Ellison; Murray P Cox; Torben Asp; Thomas Mailund; Mikkel H Schierup; Stig Uggerhøj Andersen
Journal:  Plant Cell       Date:  2019-04-25       Impact factor: 11.277

3.  Genetic basis for divergence in developmental gene expression in two closely related sea urchins.

Authors:  Lingyu Wang; Jennifer W Israel; Allison Edgar; Rudolf A Raff; Elizabeth C Raff; Maria Byrne; Gregory A Wray
Journal:  Nat Ecol Evol       Date:  2020-04-13       Impact factor: 15.460

4.  Homeolog expression quantification methods for allopolyploids.

Authors:  Tony C Y Kuo; Masaomi Hatakeyama; Toshiaki Tameshige; Kentaro K Shimizu; Jun Sese
Journal:  Brief Bioinform       Date:  2020-03-23       Impact factor: 11.622

5.  Reference Transcriptomes and Detection of Duplicated Copies in Hexaploid and Allododecaploid Spartina Species (Poaceae).

Authors:  Julien Boutte; Julie Ferreira de Carvalho; Mathieu Rousseau-Gueutin; Julie Poulain; Corinne Da Silva; Patrick Wincker; Malika Ainouche; Armel Salmon
Journal:  Genome Biol Evol       Date:  2016-10-05       Impact factor: 3.416

6.  The Genetic Basis of Gene Expression Divergence in Antennae of Two Closely Related Moth Species, Helicoverpa armigera and Helicoverpa assulta.

Authors:  Ping-Ping Guo; Guo-Cheng Li; Jun-Feng Dong; Xin-Lin Gong; Lingyu Wang; Ke Yang; Jun Yang; Ling-Qiao Huang; Chen-Zhu Wang
Journal:  Int J Mol Sci       Date:  2022-09-02       Impact factor: 6.208

7.  Cross-kingdom transcriptomic trends in the evolution of hybrid gene expression.

Authors:  Anna H Behling; David J Winter; Austen R D Ganley; Murray P Cox
Journal:  J Evol Biol       Date:  2022-07-13       Impact factor: 2.516

8.  Haplotype Detection from Next-Generation Sequencing in High-Ploidy-Level Species: 45S rDNA Gene Copies in the Hexaploid Spartina maritima.

Authors:  Julien Boutte; Benoît Aliaga; Oscar Lima; Julie Ferreira de Carvalho; Abdelkader Ainouche; Jiri Macas; Mathieu Rousseau-Gueutin; Olivier Coriton; Malika Ainouche; Armel Salmon
Journal:  G3 (Bethesda)       Date:  2015-11-03       Impact factor: 3.154

  8 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.