Literature DB >> 27021699

Homoeologs: What Are They and How Do We Infer Them?

Natasha M Glover¹, Henning Redestig², Christophe Dessimoz³.

Abstract

The evolutionary history of nearly all flowering plants includes a polyploidization event. Homologous genes resulting from allopolyploidy are commonly referred to as 'homoeologs', although this term has not always been used precisely or consistently in the literature. With several allopolyploid genome sequencing projects under way, there is a pressing need for computational methods for homoeology inference. Here we review the definition of homoeology in historical and modern contexts and propose a precise and testable definition highlighting the connection between homoeologs and orthologs. In the second part, we survey experimental and computational methods of homoeolog inference, considering the strengths and limitations of each approach. Establishing a precise and evolutionarily meaningful definition of homoeology is essential for understanding the evolutionary consequences of polyploidization.

Entities: Chemical Disease Species

Keywords: homoeology; homology; polyploidy; positional homoeology

Mesh：

Year: 2016 PMID： 27021699 PMCID： PMC4920642 DOI： 10.1016/j.tplants.2016.02.005

Source DB: PubMed Journal: Trends Plant Sci ISSN： 1360-1385 Impact factor: 18.313

Polyploidization and Homoeology

Many plants – and virtually all angiosperms – have undergone at least one round of polyploidization in their evolutionary history 1, 2, 3. In particular, numerous important crop species, such as Arachis hypogaea (peanut), Avena sativa (oat), Brassica juncea (mustard greens), Brassica napus (rapeseed), Coffea arabica (coffee), Gossypium hirsutum (cotton), Mangifera indica (mango), Nicotiana tabacum (tobacco), Prunus cerasus (cherry), Triticum turgidum (durum wheat), and Triticum aestivum (bread wheat), exhibit allopolyploidy (see Glossary), a type of whole-genome duplication via hybridization followed by genome doubling [4]. This hybridization usually occurs between two related species, thus merging the genomic content from two divergent species into one (Box 1). Allopolyploidization has been studied since at least the early 1900s. Some of the first investigations were about chromosome numbers and pairing patterns of hybrid species 5, 6. The term homoeologous was coined to distinguish chromosomes that pair readily during meiosis from those that pair only occasionally during meiosis [7]. However, the definition of homoeology has varied and at times been used inconsistently. Homoeology has been broadly used to denote the relationship between ‘corresponding’ genes or chromosomes derived from different species in an allopolyploid. Accurately identifying homoeologs is key to studying the genetic consequences of polyploidization; knowing the evolutionary correspondence between genes across subgenomes allows us to more accurately estimate gene gain or loss after polyploidization (reviewed in 8, 9) and to study the major structural rearrangements or conservation between homoeologous chromosomes. Additionally, we can study the functional divergence of homoeologs on polyploidization, particularly in terms of expression (reviewed in 2, 8, 10, 11, 12, 13), epigenetic patterns (reviewed in 8, 12), alternative splicing [14], and diploidization (reviewed in [8]). From a crop improvement viewpoint, identifying homoeologs that may have been functionally conserved is important for elucidating or engineering the genetic basis for traits of interest 15, 16. This high interest in the genetic and evolutionary consequences of polyploidization has driven the development of several methods for homoeolog inference. However, because of their highly redundant nature polyploid genomes have been notoriously challenging to sequence and assemble [17]. Recent breakthroughs in sequencing and assembly methods suggest that we are finally overcoming this hurdle 18, 19, 20 and as increasing numbers of polyploid genomes are sequenced there will be a growing interest in homoeology inference. Thus, it is necessary to establish a common framework. Here we examine the current and common definitions of homoeology and point out imprecise usage in the literature, from historical definitions to modern understandings. We advocate a precise and evolutionarily meaningful definition of homoeology and connect homoeology and orthology inference. We then review homoeolog inference methods and discuss advantages and disadvantages of each approach.

What Are Homoeologs?

Historical Definitions and Modern (Mis)Understandings

It is first important to make the distinction between homology and homoeology. The prefix ‘homo-’ comes from the Latin (and ancient Greek) word for ‘same’, whereas the prefix ‘homoeo’ means ‘similar to’ [21]. Homoeology has alternatively been spelled as ‘homeology’ (Box 2). Both terms have a history of varied and, at times, inconsistent usage in different fields, but in biology it is now generally accepted that homology indicates ‘common ancestry’; by contrast, ‘homoeology’ is more ambiguous. The term homoeologous was first used in a cytogenetics study of allopolyploid wheat, where Huskins (1931) defined it as ‘phylogenetically similar but not strictly homologous chromosomes’ in a hybrid. Huskins goes on to explain further: To distinguish between chromosomes which come within the commonly accepted meaning of the term homologous and those which are, as evidenced by their pairing behavior, similar only in part, the latter might be referred to as homœologous chromosomes, signifying similarity but not identity…This term would include chromosomes of different ‘genomes’ which pair occasionally in allopolyploids, often causing the appearance of mutant or aberrant forms, and also, as a corollary, chromosomes which pair irregularly in many interspecific hybrids. [7] Two decades later, in the 1949 Dictionary of Genetics, R.L. Knight defines homoeologous chromosomes as ‘chromosomes that are homologous in parts of their length’ [22]. Thus, in its historical context, a pair of homoeologous chromosomes is thought of as being similar but exhibiting only infrequent pairing during meiosis. In a survey of 93 studies of autopolyploids and 78 studies of allopolyploids, multivalent pairing (pairing between more than two chromosomes) on average occurred more in autopolyploids than in allopolyploids (∼29% vs 8%) [23]. Although chromosome pairing patterns give a good indication of homology type, this should not be used as a criterion (Box 1). Over the years, the definition of homoeology has evolved and diverged to have different usages depending on the scientific field of study or topic. The term homoeologous can mean different things and may not be as simple as ‘genes duplicated by polyploidy’ [24]. Table 1 highlights the differences between the different definitions of homoeology depending on the context in which it is used. The variation among definitions depends on the level of biological analysis: at the chromosome, gene, or sequence level.

Table 1

Varied Usages of the Term ‘Homoeology’ in Different Areas of Research

Context	Definition	Refs
Recombination	Homoeologous: ‘sequences that are similar but imperfectly matched’	[96]
Cytogenetics	Homoeologous chromosomes: ‘those which once were homologous, i.e. essentially identical, but have become so different that they rarely pair [during meiosis]’	[97]
Evolutionary biology	Homoeologous: ‘duplicated genes or chromosomes that are derived from different parental species and are related by ancestry’	[98]
Computational biology	Homoeologs: ‘orthologs between subgenomes’	[35]
This review	Homoeologs: pairs of genes or chromosomes in the same species that originated by speciation and were brought back together in the same genome by allopolyploidization

Even in modern evolutionary biology contexts, the term homoeolog has been used inconsistently. For instance, some have used it not just in the context of allopolyploids but to relate duplicates created by autopolyploidy as well (for example, 25, 26). This is, however, at odds with the original description of homoeologs as belonging to an allopolyploid genome [7]. There are biological differences between genes that arise due to speciation versus duplication [27] and thus also, conceivably, between allo- versus autopolyploids. Autopolyploids by definition are created by genome doubling, with an exact copy of the genome formed. By contrast, allopolyploids are formed by the merger of closely related species that have already started to diverge. Although still poorly understood, these fundamental differences could have significant effects on the genome of the polyploid. Hybridization can induce a ‘genome shock’ prompting epigenetic or expression changes that might not be present with strictly genome doubling per se 8, 28, 29, 30. The functional consequences of genes duplicated by allo- vs autopolyploidy still needs to be investigated, which is why a clear distinction of terminology between the two is important. Furthermore, this usage of homoeolog overlaps with another term – ohnologs – used to denote genes resulting from whole-genome duplication [31]. The term homoeolog has even been used to refer to similar chromosomal regions in different species 32, 33, 34. Although closely related species do have similar chromosomes and gene content, this latter usage is unorthodox: the term homoeolog has been overwhelmingly used to denote relationships within polyploids, and therefore within a single species rather than between closely related species. A cross-species definition of homoeology is also redundant with that of orthology.

A Unifying, Evolutionarily Precise Definition of Homoeology

Consequently, there is a need for a unifying, evolutionarily precise definition of homoeology, formulated in terms of the key events that gave rise to the genes in question. The ideal definition should be as consistent as possible with the widespread usage of the term and should complement the other ‘-log’ terms, which have served the community well. We define homoeologs as pairs of genes or chromosomes in the same species that originated by speciation and were brought back together in the same genome by allopolyploidization. Figure 1 depicts how this definition complements the other ‘log’ terms. In particular, the analogy between homoeologs and orthologs implies that homoeologs can be thought of as orthologs between subgenomes of an allopolyploid [35].

Figure 1

Subtypes of Homologous Genes (Genes of Common Ancestry). As the table shows, the definition of ‘homoeologs’ we recommend – genes that originated by speciation and that were subsequently brought back in a single genome through allopolyploidization – complements well other homology subtypes commonly used in evolutionary biology. In particular, the table highlights the parallels between homoeologs and orthologs and between homoeologs and ohnologs.

Note that the term ‘paleolog’ is sometimes used to denote ancient polyploidization events. The term is convenient for plants such as soybean where the polyploidization event occurred more than a few million years ago and where it is unknown whether these were auto- or allopolyploidization events [36].

Implications of the Definition for Positional Conservation and Relationship Cardinality

Because of the analogy between homoeology and orthology, homoeologs are under the same common misconceptions that afflict orthologs: the notion that homoeologs necessarily in a one-to-one relationship or that they have remained strictly in their ancestral positions since speciation. Since homoeology is characterized by an initial speciation event, once the progenitor species of the future allopolyploid begin to diverge, the corresponding genes in each new species that descended from a common ancestral gene start diverging in sequence (Figure 2). The sequence divergence will depend on the time since the progenitor divergence and other factors (the same factors that contribute to ortholog divergence such as selection pressure, duplication events, and others). In addition to genic sequence divergence, other scale evolutionary events may occur, including single-gene duplications, deletions, and rearrangements.

Figure 2

Evolutionary History of an Allopolyploid. An ancestral genome undergoes a speciation event, resulting in two diploid species. The genes, which descended from a common gene in the ancestor, are orthologs. Evolution occurs after speciation, including structural rearrangements, gene duplications, and gene movement. On polyploidization, genes that were once orthologs are now homoeologs. Homoeologous relationships can be one-to-one, one-to-many, or many-to-many depending on the number of duplications since speciation of the progenitors.

As a consequence, orthologous relationships are not necessarily one-to-one between species and may exist in one-to-many or many-to-many relationships, especially among highly duplicated plant genomes [37]. The same is true for homoeologous relationships. Depending on the duplication (and loss) rate since the divergence of the progenitor species, there may be more than one homoeologous copy of a given gene per subgenome (Figure 2). In many plant species, a high degree of collinearity, or conservation of gene order [38], has been observed between homoeologous chromosomes in polyploids. Genes tend to stay in their ancestral position since divergence, leading to the concept of positional orthology [39] and, analogously in allopolyploids, of positional homoeology. However, there may be rearrangement of homoeologs via single-gene duplication/translocation either before or after polyploidization, going against the widespread notion that homoeologous genes are always positional (i.e., have remained in their ancestral location), as stated for example in [25]. Although we can expect that most homoeologs remain positionally conserved and in a one-to-one relationship after polyploidization, these are only a subset of the homoeologs. The frequency of homoeolog duplication may be significantly underestimated in some species due to use of the best bidirectional hit (BBH) – an approach inherently limited to inferring one-to-one relationships 40, 41.

How to Infer Homoeologs

In general, homoeology inference reduces to identifying similar (and therefore likely homologous) genes within a polyploid genome and inferring whether pairs of homologs started diverging from one another through speciation, in which case they are homoeologs (and usually located on different subgenomes), or through duplication, in which case they are paralogs (and usually located on the same subgenome). The methods for doing so have changed over time with advances in technology, from low-throughput laboratory techniques to high-throughput computational ones. In this section we survey these techniques and highlight their relative strengths and limitations.

Wet Lab Techniques Based on Probe Hybridization or PCR Amplification

Although whole-genome sequencing has become commonplace thanks to next-generation sequencing (NGS) techniques, many species do not yet have a fully sequenced reference genome. Techniques used to isolate homoeologous genes from polyploid species before NGS were based on hybridization, using a probe or primer as a template to retrieve the homoeologs of interest. However, due to the high sequence similarity of homoeologs as well as paralogs, one would obtain a mixture of DNA molecules representing homoeologous and paralogous copies, which then needed to be separated. One method of separating homoeologous copies from each other in a pool of highly similar DNA molecules is by using the mixture of homoeologs obtained from PCR to transform into bacteria, resulting in only a single copy of either homoeolog in each bacterial colony. Colonies can then be isolated, sequenced, and assigned to subgenomes by using knowledge from diploid progenitor species, specifically differential (sub)genome restriction patterns [42]. Note that the true progenitors may no longer exist in nature and that the term ‘progenitor’ may refer to their extant, unhybridized descendant or close relative [43]. Another way of separating homoeologous sequences makes use of restriction-digested DNA followed by size fractionation on a gel [44]. Minor differences among homoeologous copies can be expected to result in sequence differences at restriction sites and thus digestion cuts homoeologous copies into different sizes. This is followed by isolating the DNA from the separated bands and then amplifying these homoeologous copies by cloning. Alternatively, isolated homoeologs can be obtained using PCR primers to produce a mixture of homoeologous copies, and after size fractionation the same primers can be used to amplify individual homoeolog copies [44]. The above techniques are all performed on a gene-by-gene basis with molecular methods and therefore are small scale and relatively time-consuming and laborious. A more recent and larger-scale technique to separate homoeologs, based on hybridization of genomic DNA to an array, is able to target hundreds or thousands of genes at a time, each individually spotted on the array. Salmon et al. [45] used this technique to capture homoeologous pairs in G. hirsutum. After hybridization, the probes on the chip, enriched for homoeologous pairs, were then sequenced with NGS. Homoeologs could be distinguished by sequence polymorphisms between them. These experimental techniques have several limitations. First, they are appropriate for studies focusing on a small number of genes but scale poorly to entire genomes. Additionally, they all require prior sequence information for the gene of interest. If cDNA is used as the starting point, one can combine homoeolog inference with differential expression studies. However, this works only for genes that are expressed in the particular condition from which the cDNA library was made. Homoeologs are assigned to a subgenome by comparing the individual homoeolog sequences from the polyploid to their orthologous counterparts in the diploid progenitors. Therefore, these experiments need to be performed on the progenitor species as well, which may not always be readily available. Finally, it can be difficult to distinguish homoeologous from paralogous sequences, as the degree of sequence divergence between the two can be slight and thus not result in a clear difference in hybridization pattern. Thus, these techniques do not perform well on large gene families.

Comparative Mapping and Positional Homoeology

Before the era of whole-genome sequencing, molecular markers were used to detect synteny and collinearity between chromosomes. However, molecular mapping is more complicated in a polyploid than a diploid, as there needs to be sufficient allele polymorphism to distinguish among the different homoeologs. Several techniques exist to circumvent this problem by comparative mapping in diploid relatives or by using aneuploid lines [46]. Many studies have been published using mapping to identify homoeologous relationships between chromosomes or genes in several allopolyploids, including Gossypium (cotton) [47], B. napus (rapeseed) [48], A. hypogaea (peanut) [49], and T. aestivum (wheat) 50, 51, 52, 53, 54. Wheat researchers played a major role in popularizing the term homoeology in the 1990s, with many molecular mapping papers showing the collinearity between wheat homoeologous chromosomes. Although conservation of position in the genome can be used as another layer of evidence above sequence similarity to infer homoeology, there are several inherent problems with homoeology inference based solely on this approach. Mapping homoeologs is possible only if the molecular markers are able to distinguish sequence polymorphisms between homoeologs. Additionally, conservation of relative genomic location in itself is not a requirement for homoeology, which depends only on the type of event that gave rise to the sequences. Due to potential duplications, chromosomal rearrangements, or other events leading to gene movement 55, 56, 57, relying on positional conservation to infer homoeology may lead to a substantial fraction of missed homoeologous relationships and introduce a bias. Like orthologs or paralogs, positional and non-positional homoeologs could differ in their biological characteristics. For example, orthologous genes maintained in the same position have slower evolution rates, are less likely to undergo positive selection, and are more likely to have a conserved function 58, 59, 60, 61, 62. Additionally, positional orthologs have been shown to maintain a higher expression level and breadth compared with non-positional orthologs [63]. Paralogs that have inserted into distant regions of the genome tend to have a more divergent DNA methylation pattern and expression than tandem duplicates 64, 65.

Similarity-Based Computational Techniques

High-throughput sequencing allows fast and affordable production of genome-wide sequence information, making it possible to identify similar regions and infer homoeology computationally at a genome-wide scale. However, despite rapid improvements in sequencing technology it remains a challenge to obtain a high-quality, fully assembled reference genome sequence for many plant species [66]. This is mainly because of their large, complex genomes, which are highly repetitive due to duplication and transposon activity [17]. With entire chromosomes in multiple copies, this difficulty is compounded in polyploid genomes. Because of these issues, most polyploid plant genome sequences remain in a draft, highly fragmented state, usually comprising small contigs harboring only a few genes [17]. The identification of homoeologs thus first requires assembling short sequences (e.g., expressed sequence tags or, increasingly, NGS reads) at low stringency followed by homoeolog discrimination based on sequence polymorphisms between the reads. For example, Udall et al. assembled ESTs from allotetraploid cotton and the two diploid progenitors. Most assembled contigs contained four copies: two orthologs from the progenitors and one from each of the homoeologs. They then assigned the homoeolog ESTs to their appropriate subgenome based on sequence comparison with the progenitors [67]. In another example [68], homoeologs were distinguished in hexaploid wheat by first assembling, at a relatively low stringency, transcriptome NGS reads into clusters of sequences containing homoeologs and close paralogs. The second step was to reassemble each cluster separately using a more stringent assembler to separate homoeologs. After discriminating between homoeologous genes, it is generally necessary to map the reads back to the progenitor species to infer to which subgenome they belong. For example, Akama et al. [69] sequenced and de novo assembled both Arabidopsis halleri and Arabidopsis lyrata (progenitors of the allotetraploid Arabidopsis kamchatica). They identified homoeologs by aligning the allotetraploid reads to both the A. halleri and A. lyrata genomes and considered high-scoring alignments as homoeologs. A similar technique was performed in hexaploid wheat taking advantage of the recently sequenced diploid progenitors Triticum urartu and Aegilops tauschii [70]. Another method of separating contigs into individual homoeolog copies employs the strategy of ‘post-assembly phasing’ using remapped reads, which detects polymorphisms in reads and determines whether they were inherited together [71]. Provided that the progenitors’ genomes are known and well separated, techniques based on short reads and sequence polymorphisms to infer homoeologs can be effective. Because they tend to be based on RNA-seq reads, one can simultaneously quantify their expression. However, there will be false negatives if one or both of the homoeologs is unexpressed. Also, it can be costly to first sequence the progenitor species. Another disadvantage is that, again, these methods do not establish one-to-many or many-to-many relationships. Additionally, as with experimental hybridization methods, it can be difficult to distinguish homoeologs from paralogs.

Evolution-Based Computational Techniques

We indicated above that homoeologs should be defined as pairs of genes within an allopolyploid that originated by speciation and were reunited by hybridization. Thus, fundamentally, the relationship between homoeologs is based on evolutionary relationships rather than sequence similarity. Furthermore, the parallel between homoeologs and orthologs suggests the possibility of repurposing orthology inference methods – a relatively mature area of research with many well-established computational methods [72]. These methods, which all work at the genome-wide scale, are divided into phylogenetic-tree-based (which infer speciation and duplication nodes on gene trees) and graph-based (which infer the evolutionarily closest genes between species without explicitly reconstructing trees). Methods based on phylogenetic trees use the process of gene/species tree reconciliation, which determines whether each internal node of a given gene tree is a speciation or duplication node using the phylogeny of the species tree. With this information one can determine whether any two genes are related through orthology or paralogy; pairs of genes that coalesce at a speciation node are orthologs, whereas pairs of genes that coalesce at a duplication node are paralogs [72]. To our knowledge, the only phylogenetic tree-based homoeology inference approach taken so far is that of Ensembl Genomes, which has repurposed their Compara phylogenetic tree-based pipeline [73] to distinguish orthologs, paralogs, and homoeologs in wheat [74]. This is achieved by treating each subgenome as a different species, running their usual orthology pipeline, and finally relabeling orthologs inferred among subgenomes as homoeologs. This information is found in the ‘location-based display’ on their website under ‘Polyploid view’ (http://plants.ensembl.org/Triticum_aestivum/Info/Index). In general, graph-based orthology methods comprise inferring and clustering pairs of orthologs based on sequence similarity [72]. Graph-based orthology methods have also been adapted to infer homoeologs. One of the simplest and most widely used methods of ortholog detection is by finding BBHs between pairs of genomes [75]. This method uses BLAST [76] or another sequence alignment algorithm to find the set of reciprocally highest-scoring pairs of genes between two genomes. Such an approach was used to infer homoeologs between the subgenomes of hexaploid wheat, identifying triplets of best bidirectional protein hits between subgenomes [41]. However, the BBH method has inherent drawbacks. By identifying only the ‘best’ pair, it cannot identify one-to-many or many-to-many homoeology. This is particularly problematic for highly duplicated plant genomes [77]. As a result, BBH between subgenomes will at best infer a subset of the homoeologous relationships, thereby yielding false-negatives. Additionally, differential gene loss among the subgenomes can cause erroneous inference of paralogs as homoeologs [78]. Finally, using alignment scores is suboptimal in the presence of many fragmentary genes and sequencing errors [79]. Another graph-based homoeolog inference approach to analyze the wheat genome was performed in the Orthologous Matrix (OMA) database – a method and resource for inferring different types of homologous relationships between fully sequenced genomes [35]. This technique identifies mutually closest homologs based on evolutionary distance while considering the possibility of differential gene loss or many-to-many relationships [80]. Again, the application of the orthology inference pipeline was achieved by treating each subgenome as a different species, running the standard pipeline, and, finally, calling orthologs between subgenomes homoeologs. Compared with the BBH approach, the OMA algorithm has the advantages of considering many-to-many homoeology, identifying differential gene losses, and relying on evolutionary distances rather than alignment score. The main issue limiting the use of repurposed orthology methods such as Ensembl Compara and OMA is the requirement for a priori delineation of the subgenomes. If there have been rearrangements across subgenomes since hybridization of the progenitors occurred, this will cause errors in homoeolog inference because subgenomes can no longer be straightforwardly treated as individual species. Another problem with both similarity- and evolution-based techniques is that they are highly dependent on the quality of sequence assembly and annotation used to infer homoeologs (Box 3).

Concluding Remarks

Polyploid species are widespread throughout the plant kingdom. There is much interest in polyploidy and accurately identifying homoeologs allows us to better study the genetic and evolutionary consequences on genomes of polyploids. Many exciting findings have been published recently that provide insights into the structural and functional divergence of homoeologs and the chromosomes they reside on 3, 9, 81, 82. As a result, polyploidy has emerged as potentially a major mechanism of adaptation to environmental stresses 9, 83, 84, 85. The term homoeologous was first used in 1931 to describe chromosomes related by allopolyploidy. Since then, the definition has changed over the years and now suffers from inconsistent interpretation, usage, and spelling. In recent decades there has been increasing interest in polyploidy and the word homoeology has experienced an increase in usage. There has been a surge in sequenced plant genomes and polyploid genomes are not far behind, despite their increased complexity and challenges due to their repetitive nature 18, 86. Thus, just as it was important to establish clear definitions of orthology and paralogy 87, 88, now is the time to establish common and consistent definitions for homologs that exist in a polyploid. Based on our survey of the usage of the term and related concepts in evolutionary biology, we advocate defining homoeologs as pairs of genes that started diverging through speciation but are now found in the same species due to hybridization. This evolution-based definition has several implications that call for a fundamental shift in the way we as biologists, plant breeders, and bioinformaticians think of homoeology. First, homoeolog inference may suffer from false negatives if inferred solely on the basis of positional conservation. This is because genes can move and, by definition, different types of homologous relationships are based on how the genes originated and not where they are located in the genome. Syntenic conservation is helpful to infer homoeologs but should be used only as a soft criterion to provide additional evidence that a pair of genes are homoeologs. We recommend using the term positional homoeolog when referring to the subset of homoeologs with a conserved syntenic position. Furthermore, looking at homoeology from an evolutionary perspective has an impact on the relationship cardinality. Homoeology is not necessarily a one-to-one relationship, especially in highly duplicated plant genomes. This conceptual change is important because one-to-one positional homoeologs are likely to have significantly different biological characteristics than one-to-many, non-positional homoeologs – as has been previously observed with orthologs. The establishment of a clear and meaningful definition of homoeology is timely. With rapid progress in sequence technology, we are at the cusp of an explosion of sequenced polyploid genomes. However, although assembling allopolyploid genomes might no longer be ‘formidable’ [18], unraveling the evolutionary history of the genes they contain remains resolutely so (see Outstanding Questions). Overcoming this challenge will require a major coordinated effort among plant, evolutionary, and computational biology scientists. A common definition and framework constitutes a first essential step toward that goal. Can dependable computational methods be devised to infer from genome sequence alone whether a polyploid species originated by allopolyploidization or by autopolyploidization? Certain computational pipelines need delineation of subgenomes before homoeology inference. This will, however, not work if there has been considerable chromosomal rearrangement between subgenomes after polyploidization. Can one simultaneously detect rearrangement, separate subgenomes, and infer homoeologs? In general, what are the functional differences between homoeologs (resulting from allopolyploids) and ohnologs (resulting from autopolyploids)? There is a growing body of research looking at the functional implications of polyploidization, but so far a clear answer remains elusive.

85 in total

Review 1. The advantages and disadvantages of being polyploid.

Authors: Luca Comai
Journal: Nat Rev Genet Date: 2005-11 Impact factor: 53.242

2. Not born equal: increased rate asymmetry in relocated and retrotransposed rodent gene duplicates.

Authors: Brian P Cusack; Kenneth H Wolfe
Journal: Mol Biol Evol Date: 2006-12-18 Impact factor: 16.240

3. Gramene: a resource for comparative grass genomics.

Authors: Doreen Ware; Pankaj Jaiswal; Junjian Ni; Xiaokang Pan; Kuan Chang; Kenneth Clark; Leonid Teytelman; Steve Schmidt; Wei Zhao; Samuel Cartinhour; Susan McCouch; Lincoln Stein
Journal: Nucleic Acids Res Date: 2002-01-01 Impact factor: 16.971

4. Genomewide nonadditive gene regulation in Arabidopsis allotetraploids.

Authors: Jianlin Wang; Lu Tian; Hyeon-Se Lee; Ning E Wei; Hongmei Jiang; Brian Watson; Andreas Madlung; Thomas C Osborn; R W Doerge; Luca Comai; Z Jeffrey Chen
Journal: Genetics Date: 2005-09-19 Impact factor: 4.562

5. Frequent gene movement and pseudogene evolution is common to the large and complex genomes of wheat, barley, and their relatives.

Authors: Thomas Wicker; Klaus F X Mayer; Heidrun Gundlach; Mihaela Martis; Burkhard Steuernagel; Uwe Scholz; Hana Simková; Marie Kubaláková; Frédéric Choulet; Stefan Taudien; Matthias Platzer; Catherine Feuillet; Tzion Fahima; Hikmet Budak; Jaroslav Dolezel; Beat Keller; Nils Stein
Journal: Plant Cell Date: 2011-05-27 Impact factor: 11.277

6. Polyploidy and angiosperm diversification.

Authors: Douglas E Soltis; Victor A Albert; Jim Leebens-Mack; Charles D Bell; Andrew H Paterson; Chunfang Zheng; David Sankoff; Claude W Depamphilis; P Kerr Wall; Pamela S Soltis
Journal: Am J Bot Date: 2009-01 Impact factor: 3.844

7. Expression divergence of FRUITFULL homeologs enhanced pod shatter resistance in Brassica napus.

Authors: P F Peng; Y C Li; D S Mei; J Colasanti; L Fu; J Liu; Y F Chen; Q Hu
Journal: Genet Mol Res Date: 2015-02-02

8. Extensive error in the number of genes inferred from draft genome assemblies.

Authors: James F Denton; Jose Lugo-Martinez; Abraham E Tucker; Daniel R Schrider; Wesley C Warren; Matthew W Hahn
Journal: PLoS Comput Biol Date: 2014-12-04 Impact factor: 4.475

9. The impact of gene duplication, insertion, deletion, lateral gene transfer and sequencing error on orthology inference: a simulation study.

Authors: Daniel A Dalquen; Adrian M Altenhoff; Gaston H Gonnet; Christophe Dessimoz
Journal: PLoS One Date: 2013-02-25 Impact factor: 3.240

10. Genome-wide quantification of homeolog expression ratio revealed nonstochastic gene regulation in synthetic allopolyploid Arabidopsis.

Authors: Satoru Akama; Rie Shimizu-Inatsugi; Kentaro K Shimizu; Jun Sese
Journal: Nucleic Acids Res Date: 2014-01-13 Impact factor: 16.971

50 in total

1. New Insights Into the Relationships Within Subtribe Scorzonerinae (Cichorieae, Asteraceae) Using Hybrid Capture Phylogenomics (Hyb-Seq).

Authors: Elham Hatami; Katy E Jones; Norbert Kilian
Journal: Front Plant Sci Date: 2022-07-01 Impact factor: 6.627

2. A Comprehensive Study of the WRKY Transcription Factor Family in Strawberry.

Authors: José Garrido-Gala; José-Javier Higuera; Antonio Rodríguez-Franco; Juan Muñoz-Blanco; Francisco Amil-Ruiz; José L Caballero
Journal: Plants (Basel) Date: 2022-06-15

3. Meta-QTLs, ortho-meta-QTLs and candidate genes for grain yield and associated traits in wheat (Triticum aestivum L.).

Authors: Dinesh Kumar Saini; Puja Srivastava; Neeraj Pal; P K Gupta
Journal: Theor Appl Genet Date: 2022-01-05 Impact factor: 5.574

4. Maximum Parsimony Inference of Phylogenetic Networks in the Presence of Polyploid Complexes.

Authors: Zhi Yan; Zhen Cao; Yushu Liu; Huw A Ogilvie; Luay Nakhleh
Journal: Syst Biol Date: 2022-04-19 Impact factor: 9.160

5. Impact of whole genome triplication on the evolutionary history and the functional dynamics of regulatory genes involved in Brassica self-incompatibility signalling pathway.

Authors: Thanina Azibi; Houria Hadj-Arab; Maryse Lodé; Julie Ferreira de Carvalho; Gwenn Trotoux; Sylvie Nègre; Marie-Madeleine Gilet; Julien Boutte; Jérémy Lucas; Xavier Vekemans; Anne-Marie Chèvre; Mathieu Rousseau-Gueutin
Journal: Plant Reprod Date: 2020-02-20 Impact factor: 3.767

6. The Transcriptional Landscape of Polyploid Wheats and Their Diploid Ancestors during Embryogenesis and Grain Development.

Authors: Daoquan Xiang; Teagen D Quilichini; Ziying Liu; Peng Gao; Youlian Pan; Qiang Li; Kirby T Nilsen; Prakash Venglat; Eddi Esteban; Asher Pasha; Yejun Wang; Rui Wen; Zhongjuan Zhang; Zhaodong Hao; Edwin Wang; Yangdou Wei; Richard Cuthbert; Leon V Kochian; Andrew Sharpe; Nicholas Provart; Dolf Weijers; C Stewart Gillmor; Curtis Pozniak; Raju Datla
Journal: Plant Cell Date: 2019-10-18 Impact factor: 11.277

7. Analysis of Paralogs in Target Enrichment Data Pinpoints Multiple Ancient Polyploidy Events in Alchemilla s.l. (Rosaceae).

Authors: Diego F Morales-Briones; Berit Gehrke; Chien-Hsun Huang; Aaron Liston; Hong Ma; Hannah E Marx; David C Tank; Ya Yang
Journal: Syst Biol Date: 2021-12-16 Impact factor: 15.683

8. The endosperm-specific transcription factor TaNAC019 regulates glutenin and starch accumulation and its elite allele improves wheat grain quality.

Authors: Yujiao Gao; Kexin An; Weiwei Guo; Yongming Chen; Ruijie Zhang; Xue Zhang; Siyuan Chang; Vincenzo Rossi; Fangming Jin; Xinyou Cao; Mingming Xin; Huiru Peng; Zhaorong Hu; Weilong Guo; Jinkun Du; Zhongfu Ni; Qixin Sun; Yingyin Yao
Journal: Plant Cell Date: 2021-05-05 Impact factor: 11.277

9. Parallel ddRAD and Genome Skimming Analyses Reveal a Radiative and Reticulate Evolutionary History of the Temperate Bamboos.

Authors: Cen Guo; Peng-Fei Ma; Guo-Qian Yang; Xia-Ying Ye; Ying Guo; Jing-Xia Liu; Yun-Long Liu; Deren A R Eaton; Zhen-Hua Guo; De-Zhu Li
Journal: Syst Biol Date: 2021-06-16 Impact factor: 15.683

10. Homoeolog Inference Methods Requiring Bidirectional Best Hits or Synteny Miss Many Pairs.

Authors: Natasha Glover; Shaoline Sheppard; Christophe Dessimoz
Journal: Genome Biol Evol Date: 2021-06-08 Impact factor: 3.416