Literature DB >> 27289095

Independent Domestication of Two Old World Cotton Species.

Simon Renny-Byfield¹, Justin T Page², Joshua A Udall², William S Sanders³, Daniel G Peterson⁴, Mark A Arick⁵, Corrinne E Grover⁶, Jonathan F Wendel⁷.

Abstract

Domesticated cotton species provide raw material for the majority of the world's textile industry. Two independent domestication events have been identified in allopolyploid cotton, one in Upland cotton (Gossypium hirsutum L.) and the other to Egyptian cotton (Gossypium barbadense L.). However, two diploid cotton species, Gossypium arboreum L. and Gossypium herbaceum L., have been cultivated for several millennia, but their status as independent domesticates has long been in question. Using genome resequencing data, we estimated the global abundance of various repetitive DNAs. We demonstrate that, despite negligible divergence in genome size, the two domesticated diploid cotton species contain different, but compensatory, repeat content and have thus experienced cryptic alterations in repeat abundance despite equivalence in genome size. Evidence of independent origin is bolstered by estimates of divergence times based on molecular evolutionary analysis of f7,000 orthologous genes, for which synonymous substitution rates suggest that G. arboreum and G. herbaceum last shared a common ancestor approximately 0.4-2.5 Ma. These data are incompatible with a shared domestication history during the emergence of agriculture and lead to the conclusion that G. arboreum and G. herbaceum were each domesticated independently.

Entities: Chemical Disease Species

Keywords: Gossypium; crop plants; genome size; molecular evolution; repetitive DNA

Mesh：

Year: 2016 PMID： 27289095 PMCID： PMC4943200 DOI： 10.1093/gbe/evw129

Source DB: PubMed Journal: Genome Biol Evol ISSN： 1759-6653 Impact factor: 3.416

Introduction

Understanding the origins of crop plants and their relationships to wild relatives have long been central concerns of plant biologists. This historical interest, stimulated by the importance of this understanding to crop plant improvement, traces to before the landmark volumes by de Candolle (1883) and Darwin (1868) and remains an area of active research today (Abbo et al. 2012; Hancock 2012; Meyer and Purugganan 2013; Olsen and Wendel 2013). A major challenge in the study of crop plants has been determining the wild ancestors of domesticated species. This difficulty reflects multiple processes, including the often dramatic morphological transformation between progenitor and derivative, possible introgression between crops and wild relatives, cultivation far outside the native range, and rarity or extinction of wild ancestors. Accordingly, the wild ancestors and germplasm relationships of some of our major crop species have remained obscure. A case in point concerns the two Old World cultivated cotton species, Gossypium arboreum and Gossypium herbaceum (fig. 1). Both species have an ancient history of cultivation, extending back perhaps 5,000 years or more (Chowdhury and Buth 1971). This antiquity of original domestication followed by human-mediated dispersal over vast geographic ranges, extending from Africa through the Levant and Indian subcontinent into the Far East, has generated extensive variability within each species (Hutchinson and Ghose 1937; Silow 1944; Hutchinson et al. 1947; Fryxell 1978). The two species are similar morphologically (Stanton et al. 1994) and with respect to chemical and protein traits (Parks et al. 1975; Wendel et al. 1989). When grown in sympatry, fertile hybrids may arise, although F2 and later generations display “breakdown,” that is, aberrant recombinant phenotypes including some sterility and lethality (Silow 1944; Stephens 1950; Phillips 1961). This indication of genetic differentiation is supported by cytogenetic data, which demonstrate that G. arboreum and G. herbaceum differ by a reciprocal translocation (Gerstel 1953; Brubaker et al. 1999).

Fig. 1.

Morphological differences between fiber from wild and domesticated A-genome diploid cotton species. Shown are single seeds with single celled trichomes (fiber) from two cotton species, Gossypium herbaceum and Gossypium arboreum. Gossypium arboreum exists only as a cultigen. Exemplifying the general problem of inference regarding the origins of many crop plants, almost nothing is known about the location and timing of original domestication of either cultivated diploid cotton species. Wild progenitor populations have not been identified with certainty, but a wild and morphologically distinct form of G. herbaceum (G. herbaceum subsp. africanum (Watt) Mauer) occurs in southern Africa (Botswana, Lesotho, and possibly elsewhere) in regions far removed from known historical or present cultivation (Saunders 1961; Fryxell 1978; Vollesen 1987). Its small fruit with seeds bearing sparse, coarse epidermal seed trichomes (“lint” or “cotton”) suggests that G. herbaceum subsp. africanum is a reasonable model of the ancestor of cultivated G. herbaceum (Hutchinson 1954; Fryxell 1978). For G. arboreum, no wild forms have been identified; instead, this species occurs only as a cultigen, with an enormous indigenous range extending from China and Korea westward into northern Africa: its center of diversity lies in India. Because wild forms of G. arboreum are unknown, and because the location of wild G. herbaceum subsp. africanum is geographically disjunct from known historical regions of cultivation of either species, the origin of the two species and their relationships to each other are unclear. Two opposing views have been forwarded, one that stresses the overall similarity of the two species and a second that emphasizes their differences. Hutchinson, a proponent of the first view, proposed G. herbaceum subsp. africanum as a model of the ancestor of both species (Hutchinson and Ghose 1937; Hutchinson 1954; Fryxell 1978). According to Hutchinson’s hypothesis, G. arboreum arose from G. herbaceum early in the history of diploid cotton cultivation, suggesting one cultivated cotton species arose from another. An alternative hypothesis is that G. arboreum and G. herbaceum diverged prior to domestication. Fryxell (1978) and Wendel et al. (1989) among others argue that genetic differences between the two species are too great to have arisen during the relatively brief period in which domesticated cottons have existed. Thus, according to this view, cultivated Old World cottons originated from at least two independent domestication events from two different wild progenitor species. The purpose of this study was to use genomic data from ongoing resequencing efforts to gain insight into the relative validity of the common progenitor hypothesis versus the different progenitor hypothesis for the two domesticated diploid cotton species. We report results based on whole-genome resequencing (of 13 cotton accessions) and two sources of information derived from these data, that is, types and abundances of repetitive DNA sequences, using a genome skimming approach (Novák et al. 2013) and divergence estimates derived from synonymous substitutions at more than 7,000 confidently aligned orthologous genes between G. arboreum and G. herbaceum. These data collectively provide compelling evidence for independent domestication of the two Old World diploid cotton species, shed new light on processes of genome size evolution, and have relevance to our understanding of the origin of the genomes of the two modern allotetraploid cotton species that presently dominate world cotton commerce (i.e., Gossypium hirsutum and Gossypium barbadense).

Materials and Methods

Plant Samples, DNA Extraction, and Sequencing

We utilized data generated as part of the cotton resequencing project (see supplementary file S1, Supplementary Material online), which includes both wild (A1-73; subsp. africanum) and domesticated accessions of G. herbaceum, as well as multiple accessions of the exclusively domesticated species G. arboreum. Gossypium arboreum and G. herbaceum collectively comprise the A-genome diploid genome clade, the donor of the A genome to allopolyploid (AD-genome, which includes the commercially important G. hirsutum and G. barbadense) cottons at the time of their formation in the mid-Pleistocene (Wendel and Grover 2015). The diploid genomes studied include the D-genome species Gossypium raimondii, which is the best living model of the D-genome ancestor of allopolyploid cotton (Wendel and Grover 2015), and which has a genome size (885 Mb) that is about half as large as that found in both of the A-genome species (G. arboreum and G. herbaceum; 1,700 Mb) studied. In total, 13 accessions of diploid cotton were analyzed (table 1).

Table 1

Sample and Clustering Details for the 13 Accessions Used in this Analysis

Species	Reads Per Sample (Number of Samples)	Coverage (%)	Mean Number of Clustered Reads	Genome Size^a (Mb/1C)
Gossypium herbaceum (A1)	175,474 (3)	1	123,766	1,667
Gossypium arboreum (A2)	180,063 (5)	1	132,223	1,698
Gossypium raimondii (D5)	92,632 (5)	1	46,617	880

aHendrix and Stewart (2015).

Sample and Clustering Details for the 13 Accessions Used in this Analysis aHendrix and Stewart (2015).

Preparation of Sequence Data

Illumina sequencing reads were filtered for quality using default parameters in the program Trimmomatic version 0.33 (Bolger et al. 2014) retaining high-quality reads that were trimmed to 95 bp (from 125 bp) resulting in over 3 million reads per sample. For each sample we took a random 1% genome equivalent (as is typical for analysis with RepeatExplorer), according to genome size estimates at the Kew C-Value Database (Bennett and Leitch 2010; Novák et al. 2013) (accessions used are detailed in table 1 and supplementary file S1, Supplementary Material online). A sample identifier was prefixed to each sequence name, after which we combined genomic samples from all 13 accessions into a single data set for analysis (table 1).

Graph-Based Clustering

The combined sequence reads were analyzed using the RepeatExplorer pipeline (Novak et al. 2010, 2013), which identifies repetitive DNA families in low-pass, next-generation sequence data and has been used successfully in other species (Macas et al. 2007; Swaminathan et al. 2007; Wicker et al. 2009; Hribova et al. 2010; Novak et al. 2010; Renny-Byfield et al. 2011, 2012, 2013; Piednoel et al. 2012; Novák et al. 2013). Briefly, reads are linked based on sequence similarity and a graph-based clustering algorithm groups reads into clusters where reads within a given cluster are more densely connected to each other than they are to other reads in the data set. All resulting clusters were annotated, where possible, using the RepeatMasker (Smit et al. 2010) default library, a custom repeat library consisting of repetitive DNA sequences identified in the recently published G. raimondii genome assembly (Paterson et al. 2012) and previous genomic sequencing data (Grover et al. 2004, 2007, 2008; Hawkins et al. 2006). We subsequently calculated the number of reads from each sample contributing to each cluster by counting the number of reads with each sample identifier. This allowed us to assess the number of reads, nucleotides (as each read is 95 bp long), and fraction of the genome for all clusters in each sample. Because many of the clusters have annotations, we summed the number of reads, nucleotides, and fraction of the genome associated with each annotation for each sample. Following this, we pooled samples by species and calculated the mean and standard error for each unique annotation.

Statistical Analysis of Cluster Abundance

Analyzing a standard 1% of the genome per sample allows us to estimate the absolute abundance of each cluster in all samples such that the number of reads and/or bp are directly comparable among species. We used a Generalized Linear Model (GLM) to examine differential abundance of the largest 1,000 clusters in the three species in a manner similar to how RNA-seq count data are used to assess differential gene expression between samples. Using the GLM, we performed contrasts, with the R (version 3.1.2) package contrast (version 0.19), to assess whether mean abundance of each cluster is statistically different among species. All P values were subsequently corrected using the method of Benjamini and Hochberg (1995).

Hierarchical Clustering of Samples Based on Repetitive DNA Content

Using the largest 1,000 clusters, we assessed the similarity of repeat content on a per sample basis using hierarchical clustering, based on Euclidean distance. We used multidimensional scaling to place each sample in two-dimensional space, using the cmdscale function implemented in R (R Development Core Team 2010), and highlighted each cluster using the ordihull and ordispider functions of the R package vegan (version 2.2-1).

Estimation of Synonymous Substitution Rate

We generated gene sequences of two accessions each of the two A-genome diploid species (A1-155 and A1-97 for G. herbaceum; A2-1011 and A2-34 for G. arboreum). These pseudomolecules were produced using a single-nucleotide polymorphism data set generated using extensive EST and genomic resequencing data (Page et al. 2013). Short read alignments to the G. raimondii reference genome, generated in Page et al. (2013), allowed us to identify single nucleotide polymorphism between species. Since the consensus sequences were generated from the same annotation coordinates, the consensus sequences were not aligned in the traditional sense. Consensus sequences were formatted for input into BioPerl using ClustalW (Larkin et al. 2007) and nonsynonymous and synonymous substitution rates were estimated with a Jukes–Cantor substitution model. Two estimates of divergence (using upper and lower bounds of rate estimates) time between G. herbaceum and G. arboreum were obtained from the substitution rate at neutral loci using the formula T = K/2r, where K equals divergence amount and r corresponds to the rate of divergence for a small sampling of nuclear genes from woody plants (1.5 × 10 − 8 and 2.6 × 10 − 9 substitutions/site/year), as discussed in Senchina et al. (2003).

Results

Clustering of Next-Generation Sequences from Three Species of Gossypium

To characterize and quantify the repetitive content of diploid Gossypium genomes, we sampled multiple 1% genome equivalents from each species (table 1). The complete data set was subjected to clustering using the RepeatExplorer pipeline (Novak et al. 2010, 2013), producing approximately 60,000 clusters ranging from a minimum of only two reads to over 27,000 (a summary of the RepeatExplorer run is provided in supplementary file S2, Supplementary Material online). Sequence similarity searches to a custom repeat library resulted in 65% of clusters being annotated. Not surprisingly, however, the distribution of annotations was heavily skewed in favor of the larger clusters (e.g., 92% of the 1,000 largest, and 100% of the 100 largest clusters were annotated). We grouped clusters based on shared annotation and calculated the total number of Mb/1C that could be attributed to each annotation type (fig. 2; supplementary file S3, Supplementary Material online). As the majority of the data relevant to genome size evolution resides in the largest clusters, our bioinformatic and statistical approaches used the portion of the data most pertinent to question of genome size and independent origins of G. herbaceum and G. arboreum.

Fig. 2.

Bar plot showing the abundance of the most common repeat types in the genomes of three Gossypium species. Species are color coded and indicated using genome designations (A1, Gossypium herbaceum; A2, Gossypium arboreum; and D5, Gossypium raimondii). Standard error bars are shown. Annotation abbreviations are as follows: RLG, Ty3/Gypsy retroelements; RLC, Ty1/Copia retroelements; RLX, unknown LTR retroelements; RXX, retroelement unknown superfamily; TXX, unknown DNA transposon; AT, AT-rich simple repeat. In some cases, our pipeline was unable to distinguish Ty3/Gypsy- and Ty1/Copia-derived clusters, and there is reasonable fraction of long terminal repeat (LTR) retroelements of unknown type (RLX; fig. 2; supplementary file S3, Supplementary Material online). These are derived from degraded copies of the Ty3/Gypsy and Ty1/Copia and are likely in roughly the same proportions as those LTR retroelements we could distinguish. Other than the non-specific LTR retroelement annotation, Ty3/Gypsy retroelements (RLG) are the next largest category, comprising between 133 and 522 Mb/1C of the genome, depending on the species. Not surprisingly, G. raimondii (D5), with the smallest genome (880 Mb/1C), had the fewest Ty3/Gypsy retroelements (fig. 2; supplementary file S3, Supplementary Material online), accounting for approximately 13% of the genome. The two Old World diploid cottons, G. herbaceum (A1) and G. arboreum (A2), each had a larger complement of Ty3/Gypsy retroelements, in congruence with their larger genome sizes relative to G. raimondii (fig. 2; supplementary file S3, Supplementary Material online). As expected, Ty1/Copia retroelements are significantly less abundant when compared to Ty3/Gyspy, comprising only approximately 36 Mb of the genomes in all three species (fig. 2; supplementary file S3, Supplementary Material online). In interspecific comparisons of the 1,000 largest clusters, we found a variable number with significant deviation in abundance between the three species analyzed (tables 2 and 3). Between 149 and 342 of the 1,000 largest clusters exhibited evidence of divergence in abundance, depending on the species and clusters being compared.

Table 2

Two-Way Analysis of Variance of Cluster Abundance among the Largest 1,000 Clusters in Four Gossypium Species

	df	Sum of Squares	Mean Sq	F Value	P
Cluster (C)	999	6.18 × 10⁸	618,949	279.56	<0.00005
Species (S)	2	2.03 × 10⁷	10,193,026	4603.81	<0.00005
Cluster:species (CxS)	1998	2.69 × 10⁸	134,759	60.87	<0.00005
Residuals	10,000	2.21 × 10⁷	2,214

Table 3

Statistically Significant Differences in Cluster (Using a GLM, See Materials and Methods) Abundance between Species of Gossypium

Comparison	Clusters with Differential Abundance^a
A1: A2	149
A1: D5	297
A2: D5	342

aOf the largest 1,000 clusters.

Two-Way Analysis of Variance of Cluster Abundance among the Largest 1,000 Clusters in Four Gossypium Species Statistically Significant Differences in Cluster (Using a GLM, See Materials and Methods) Abundance between Species of Gossypium aOf the largest 1,000 clusters.

Variation in Repeat Content among A-Genome Diploids

We used a GLM to estimate the effect of species on cluster abundance. Subsequently, we used contrasts to individually compare the abundance of each cluster between species. This revealed 149 of the top 1,000 clusters had statistically different abundance in the two sister species G. herbaceum and G. arboreum (A1 and A2, respectively; table 3). Examining the clusters with significant difference revealed that the overall variation is attributable to some clusters being highly represented in G. arboreum and others being over-represented in G. herbaceum (fig. 3). Importantly, the largest clusters are typically more highly abundant in G. arboreum, whereas the smaller set of clusters are generally more highly in G. herbaceum.

Fig. 3.

Scatter plot of cluster abundance in Gossypium herbaceum (A1) and Gossypium arboreum (A2). Clusters that exhibit statistically significant difference in abundance between the two species are color coded as indicated. Clusters that do not exhibit and statistical difference are indicated in gray. Using the largest 1,000 clusters, we assessed the similarity of repeat content on a per sample basis using hierarchical clustering (fig. 4). This analysis reveals, as expected, that samples cluster by species, with G. herbaceum and G. arboreum being more closely related to each other than either is to G. raimondii. Of particular relevance is that G. herbaceum and G. arboreum are distinct in the first clustering dimension (fig. 4).

Fig. 4.

Cotton samples grouped by repeat content. Using the largest 1,000 clusters we assessed the similarity of repeat content on a per sample basis using hierarchical clustering, based on Euclidean distance. We identified natural groups (gated and numbered) using the ordihull and ordispider functions of the R package vegan. Importantly, Gossypium herbaceum (A1) and Gossypium arboreum (A2) are distinct in the first dimension.

Estimation of Divergence Time Based on Synonymous Substitution Rates for >7,000 Genes

We estimated genome-wide synonymous substitution (Ks) rates for approximately >7,000 orthologous genes of G. herbaceum and G. arboreum for which alignments and inferences of orthology were deemed unambiguous (fig. 5). Depending on the accessions compared, the mean Ks varied from 0.0127 to 0.0137, (average = 0.0132). Assuming a range of reasonable mutation rates, between 1.5 × 10 − 8 and 2.6 × 10 − 9 substitutions per site per year (see Senchina et al. 2003), estimates of divergence time for G. herbaceum and G. arboreum ranged from 400,000 to 2.5 Myr.

Fig. 5.

Distribution of synonymous substitutions (Ks) between orthologs from Gossypium arboreum and Gossypium herbaceum. Alignment of over >7,000 genes in each comparison allowed the mean and median substitution rate between species to be estimated.

Discussion

The Repetitive Landscape of the Cotton Genome

Here we used low-coverage next-generation sequencing to analyze the global repeat composition within and among three cotton (Gossypium) species, and subsequently applied the annotated repetitive profiles as evidence, in conjunction with estimates of divergence time, to assess the likelihood that the two Old World cultivated cottons, G. arboreum and G. herbaceum, had independent origins from different wild progenitors. In total, we annotated between 348 and 1,146 Mb/1C, depending on the species (fig. 2; supplementary file S3, Supplementary Material online), with the three cotton genomes being between 40% and 68% repetitive, in line with estimates from other plants (SanMiguel et al. 1998; Kumar and Bennetzen 1999; Wicker et al. 2001). As expected from previous genomic analyses in cotton and other plant species (Hawkins et al. 2006; Hribova et al. 2010; Renny-Byfield et al. 2011, 2012, 2013; Paterson et al. 2012), Ty3/Gypsy LTR-retroelements (RLG) account for the majority of cotton genomes. Our range in RLG estimates is, however, notably higher than estimates employing methodology that is similar to that which we used here (Macas et al. 2007; Hribova et al. 2010; Novak et al. 2010, 2013; Renny-Byfield et al. 2011, 2013); this may be due, in part, to the inclusion of a cotton-specific repeat database in our analysis. Not surprisingly, G. raimondii (880 Mb/1C; the smallest genome analyzed) had the smallest absolute number of repeats while G. arboreum had the greatest absolute number of repeats. The results reported here are broadly consistent with earlier work but in detail contrast with the initial estimates reported by Hawkins et al. (2006). Hawkins et al. reported that Ty1/Copia-like sequences were more abundant than Ty3/Gypsy elements in G raimondii. Their analysis was based on cloned, whole-genome shotgun sequences, which then were matched to the NCBI database, which at that time was relatively poor in terms of repeat content, as the authors noted. The use of a custom database of cotton repeats and RepeatMasker libraries, as in this study, allows for more accurate annotation. In this respect we note the proportions of each TE superfamily follow the same pattern as reported for the G. raimondii reference genome (Paterson et al. 2012), with the values reported here being consistently lower. For example, the reference annotation identifies 53.2% of the genome to be retroelement derived, whereas our estimate is 36.91%. Similarly, Ty3/Gypsy and Ty1/Copia retroelements account for 18.8% and 5.9% of the genome according to the reference annotation, whereas we report 12% and 4%, respectively. For DNA transposons, the reference annotation reports 1.5% of the genome, whereas our analysis suggests 0.9%. Additionally, sequencing of the G. arboreum (A2) genome provides estimates of repeat abundance similar to those reported here. For example, the G. arboreum genome assembly consists of 5.5% Ty1/Copia elements, whereas we report 2.2%. Similarly, our analysis and that reported by Li et al. (2014) indicate Ty3/Gypsy retroelements are far more common than Ty1/Copia, although perhaps not surprisingly whole-genome assembly identifies a larger proportion of Ty3/Gypsy when compared with the clustering detailed here (∼56% vs. ∼30%). All things considered, RepeatExplorer performed well and, at least in terms of the cotton genome, seems to produce repetitive DNA content estimations, whose proportions are in broad agreement with high quality, fully assembled genomes. Interestingly, Hawkins et al. (2006) reported that in G. herbaceum (A1) Ty3/Gypsy-like retroelements predominated, in contrast to observations in G. arboreum, where Ty1/Copia were reported as more abundant. The results presented in Hawkins et al. (2006) are therefore in agreement with data presented here for G. herbaceum. The recent publication of the G. arboreum genome sequence revealed a notable proliferation of Gorge-Ty3/Gypsy. Furthermore, Ty3-Gyspy-like sequences were more common in the genome of G. arboreum when compared to their Ty1/Copia counterparts (Li et al. 2014), in line with our analysis. Annotation of the G. arboreum genome sequence, however, indicated that 68% of the genome is composed of repetitive DNA, a value very close to our estimate (∼67.5%; fig. 2; supplementary file S3, Supplementary Material online). A sizeable fraction of each genome was attributable to LTR retroelements of unknown origin. One likely explanation of these is mutational degeneration, rendering difficult particular assignments to source retroelement families (fig. 2; supplementary file S3, Supplementary Material online). We also identified a number of other repeat classes in our data set but almost all of these were in low abundance in all three species (fig. 2; supplementary file S3, Supplementary Material online). A significant proportion of the data for each of the four genomes is composed of relatively low-copy repeat families, with relatively few clusters containing more than 5,000 reads (data not shown).

Variation in Repetitive DNA Content among Cotton Species

We demonstrate here the first statistical assessment of genome-wide differences in repeat content between closely related Gossypium species. Using a GLM we investigated variation in cluster abundance among the three cotton species analyzed, with a two-way analysis of variance revealing that all factors and interactions are significant (table 2). Additionally, our results provide data on repeat abundance using statistical methods, a practice that is relatively uncommon (Macas and Neumann 2007; Renny-Byfield et al. 2011, 2012, 2013; Piednoel et al. 2012). For most species comparisons there were a relatively large number of clusters exhibiting evidence of differential abundance (table 3). There were approximately equal number of clusters with statistically significant differences in comparisons between G. herbaceum (A1) and G. raimondii (D5), and G. arboreum (A2) and G. raimondii (D5), an expected result given that G. raimondii is equally divergent from both G. herbaceum and G. arboreum.

Repeat Content, Genic Divergence, and the Question of Parallel Domestication of Two Different A-Genome Cottons

This high level of divergence between the two closely related species G. herbaceum (A1) and G. arboreum (A2) was somewhat unexpected, given their overall similarity in genome size and in other traits (Wendel et al. 1989). Our GLM analysis indicates that 149 of the top 1,000 clusters showed differential abundance following contrast analysis (table 3 and fig. 3), despite the minimal difference in genome size between the species (∼10 to 80 Mb). These observations serve to highlight the ever-changing repeat landscapes of plant genomes; stasis in genome size need not reflect genomic quiescence, even between two closely related genomes. The example presented here clearly demonstrates this point; that is, despite their overall and remarkable similarity, at the repeat content level, the genomes of G. herbaceum and G. arboreum are easily distinguished. Such divergence would be extraordinary, perhaps implausibly so, if these two species had an ancestor-descendant relationship following a single domestication event some 5,000 years ago. Moreover, if G. arboreum had been derived from domesticated G. herbaceum, as suggested in some of the older literature, then one might expect the former to be nested within the latter in a hierarchical clustering analysis; instead, however, there is a separation of the two species into distinct groups in the first dimension after multidimensional scaling (fig. 4), as is the case with allozymes (Wendel et al. 1989). A key conclusion reached here is that, despite negligible divergence in genome size, the two A-genome cotton species contain variable proportions of repeat families. This observation suggests that they are distinct species with separate evolutionary histories, as opposed to conjoined domesticates, one derived from the other. These data are congruent with a molecular divergence data set derived from >7,000 orthologous genes from G. arboreum and G. herbaceum, which indicate that these two species last shared a common ancestor approximately 400,000 to 2.5 Ma, prior to the origin of agriculture and possibly the origin of modern humans. Collectively, we view these various sources of genomic data as providing compelling support for the hypothesis that the two species were independently domesticated from different wild progenitors, rather than having been derived, one from the other (G. arboreum from G. herbaceum), following a single domestication event (see Introduction). We note that this evidence in support of parallel domestication is consistent with the observation of F2 breakdown following interspecific hybridization (Stephens 1950), the chromosomal translocation that distinguishes the two species (Gerstel 1953; Brubaker et al. 1999), and allozyme data (Wendel et al. 1989) and microsatellite markers (Hinze et al. 2015), which, remarkably, were used a quarter of a century ago to derive a divergence time estimate of 1,400,000 ± 450,000 years. A corollary implication is that the differences that distinguish G. arboreum from G. herbaceum did not arise during agricultural times, but instead were present in their respective ancestors.

29 in total

1. Analysis of a contiguous 211 kb sequence in diploid wheat (Triticum monococcum L.) reveals multiple mechanisms of genome evolution.

Authors: T Wicker; N Stein; L Albar; C Feuillet; E Schlagenhauf; B Keller
Journal: Plant J Date: 2001-05 Impact factor: 6.417

Review 2. Phylogenetic insights into the pace and pattern of plant genome size evolution.

Authors: C E Grover; J S Hawkins; J F Wendel
Journal: Genome Dyn Date: 2008

3. The paleontology of intergene retrotransposons of maize.

Authors: P SanMiguel; B S Gaut; A Tikhonov; Y Nakajima; J L Bennetzen
Journal: Nat Genet Date: 1998-09 Impact factor: 38.330

4. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads.

Authors: Petr Novák; Pavel Neumann; Jiří Pech; Jaroslav Steinhaisl; Jiří Macas
Journal: Bioinformatics Date: 2013-02-01 Impact factor: 6.937

Review 5. A bountiful harvest: genomic insights into crop domestication phenotypes.

Authors: Kenneth M Olsen; Jonathan F Wendel
Journal: Annu Rev Plant Biol Date: 2013-02-28 Impact factor: 26.379

6. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data.

Authors: Petr Novák; Pavel Neumann; Jirí Macas
Journal: BMC Bioinformatics Date: 2010-07-15 Impact factor: 3.169

7. Genome sequence of the cultivated cotton Gossypium arboreum.

Authors: Fuguang Li; Guangyi Fan; Kunbo Wang; Fengming Sun; Youlu Yuan; Guoli Song; Qin Li; Zhiying Ma; Cairui Lu; Changsong Zou; Wenbin Chen; Xinming Liang; Haihong Shang; Weiqing Liu; Chengcheng Shi; Guanghui Xiao; Caiyun Gou; Wuwei Ye; Xun Xu; Xueyan Zhang; Hengling Wei; Zhifang Li; Guiyin Zhang; Junyi Wang; Kun Liu; Russell J Kohel; Richard G Percy; John Z Yu; Yu-Xian Zhu; Jun Wang; Shuxun Yu
Journal: Nat Genet Date: 2014-05-18 Impact factor: 38.330

8. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula.

Authors: Jirí Macas; Pavel Neumann; Alice Navrátilová
Journal: BMC Genomics Date: 2007-11-21 Impact factor: 3.969

9. Global repeat discovery and estimation of genomic copy number in a large, complex genome using a high-throughput 454 sequence survey.

Authors: Kankshita Swaminathan; Kranthi Varala; Matthew E Hudson
Journal: BMC Genomics Date: 2007-05-24 Impact factor: 3.969

10. Insights into the evolution of cotton diploids and polyploids from whole-genome re-sequencing.

Authors: Justin T Page; Mark D Huynh; Zach S Liechty; Kara Grupp; David Stelly; Amanda M Hulse; Hamid Ashrafi; Allen Van Deynze; Jonathan F Wendel; Joshua A Udall
Journal: G3 (Bethesda) Date: 2013-10-03 Impact factor: 3.154

16 in total

1. Genome-Wide Mining of MYB Transcription Factors in the Anthocyanin Biosynthesis Pathway of Gossypium Hirsutum.

Authors: Yingjie Zhu; Ying Bao
Journal: Biochem Genet Date: 2021-01-27 Impact factor: 1.890

2. Nucleotide Evolution, Domestication Selection, and Genetic Relationships of Chloroplast Genomes in the Economically Important Crop Genus Gossypium.

Authors: Tong Zhou; Ning Wang; Yuan Wang; Xian-Liang Zhang; Bao-Guo Li; Wei Li; Jun-Ji Su; Cai-Xiang Wang; Ai Zhang; Xiong-Feng Ma; Zhong-Hu Li
Journal: Front Plant Sci Date: 2022-04-15 Impact factor: 6.627

3. Genome-Wide Identification and Characterization of CPR5 Genes in Gossypium Reveals Their Potential Role in Trichome Development.

Authors: Heng Wang; Muhammad Jawad Umer; Fang Liu; Xiaoyan Cai; Jie Zheng; Yanchao Xu; Yuqing Hou; Zhongli Zhou
Journal: Front Genet Date: 2022-06-08 Impact factor: 4.772

4. The Gossypium stocksii genome as a novel resource for cotton improvement.

Authors: Corrinne E Grover; Daojun Yuan; Mark A Arick; Emma R Miller; Guanjing Hu; Daniel G Peterson; Jonathan F Wendel; Joshua A Udall
Journal: G3 (Bethesda) Date: 2021-04-19 Impact factor: 3.154

5. Assessment of Gene Flow Between Gossypium hirsutum and G. herbaceum: Evidence of Unreduced Gametes in the Diploid Progenitor.

Authors: E Montes; O Coriton; F Eber; V Huteau; J M Lacape; C Reinhardt; D Marais; J L Hofs; A M Chèvre; C Pannetier
Journal: G3 (Bethesda) Date: 2017-07-05 Impact factor: 3.154

6. GBS Mapping and Analysis of Genes Conserved between Gossypium tomentosum and Gossypium hirsutum Cotton Cultivars that Respond to Drought Stress at the Seedling Stage of the BC₂F₂ Generation.

Authors: Richard Odongo Magwanga; Pu Lu; Joy Nyangasi Kirungu; Latyr Diouf; Qi Dong; Yangguang Hu; Xiaoyan Cai; Yanchao Xu; Yuqing Hou; Zhongli Zhou; Xingxing Wang; Kunbo Wang; Fang Liu
Journal: Int J Mol Sci Date: 2018-05-30 Impact factor: 5.923

7. Core cis-element variation confers subgenome-biased expression of a transcription factor that functions in cotton fiber elongation.

Authors: Bo Zhao; Jun-Feng Cao; Guan-Jing Hu; Zhi-Wen Chen; Lu-Yao Wang; Xiao-Xia Shangguan; Ling-Jian Wang; Ying-Bo Mao; Tian-Zhen Zhang; Jonathan F Wendel; Xiao-Ya Chen
Journal: New Phytol Date: 2018-02-21 Impact factor: 10.151

8. Insights into the Evolution of the New World Diploid Cottons (Gossypium, Subgenus Houzingenia) Based on Genome Sequencing.

Authors: Corrinne E Grover; Mark A Arick; Adam Thrash; Justin L Conover; William S Sanders; Daniel G Peterson; James E Frelichowski; Jodi A Scheffler; Brian E Scheffler; Jonathan F Wendel
Journal: Genome Biol Evol Date: 2019-01-01 Impact factor: 3.416

9. Comparative Genomics of an Unusual Biogeographic Disjunction in the Cotton Tribe (Gossypieae) Yields Insights into Genome Downsizing.

Authors: Corrinne E Grover; Mark A Arick; Justin L Conover; Adam Thrash; Guanjing Hu; William S Sanders; Chuan-Yu Hsu; Rubab Zahra Naqvi; Muhammad Farooq; Xiaochong Li; Lei Gong; Joann Mudge; Thiruvarangan Ramaraj; Joshua A Udall; Daniel G Peterson; Jonathan F Wendel
Journal: Genome Biol Evol Date: 2017-12-01 Impact factor: 3.416

10. Genome-Wide Analysis Elucidates the Role of CONSTANS-like Genes in Stress Responses of Cotton.

Authors: Wenqiang Qin; Ya Yu; Yuying Jin; Xindong Wang; Ji Liu; Jianping Xi; Zhi Li; Huiqin Li; Ge Zhao; Wei Hu; Chuanjia Chen; Fuguang Li; Zhaoen Yang
Journal: Int J Mol Sci Date: 2018-09-07 Impact factor: 5.923