Literature DB >> 35196080

Convergent consequences of parthenogenesis on stick insect genomes.

Kamil S Jaron1,2,3, Darren J Parker1,2, Yoann Anselmetti4, Patrick Tran Van1,2, Jens Bast1, Zoé Dumas1, Emeric Figuet4, Clémentine M François4, Keith Hayward1,2, Victor Rossier1,2, Paul Simion4, Marc Robinson-Rechavi1,2, Nicolas Galtier4, Tanja Schwander1.   

Abstract

The shift from sexual reproduction to parthenogenesis has occurred repeatedly in animals, but how the loss of sex affects genome evolution remains poorly understood. We generated reference genomes for five independently evolved parthenogenetic species in the stick insect genus Timema and their closest sexual relatives. Using these references and population genomic data, we show that parthenogenesis results in an extreme reduction of heterozygosity and often leads to genetically uniform populations. We also find evidence for less effective positive selection in parthenogenetic species, suggesting that sex is ubiquitous in natural populations because it facilitates fast rates of adaptation. Parthenogenetic species did not show increased transposable element (TE) accumulation, likely because there is little TE activity in the genus. By using replicated sexual-parthenogenetic comparisons, our study reveals how the absence of sex affects genome evolution in natural populations, providing empirical support for the negative consequences of parthenogenesis as predicted by theory.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35196080      PMCID: PMC8865771          DOI: 10.1126/sciadv.abg3842

Source DB:  PubMed          Journal:  Sci Adv        ISSN: 2375-2548            Impact factor:   14.136


INTRODUCTION

Sex: What is it good for? The reason why most eukaryotes take a complicated detour to reproduction, when more straightforward options, such as parthenogenesis, are available, remains a central and largely unanswered question in evolutionary biology (, ). Animal species in which parthenogenetic reproduction is the sole form of replication typically occur at the tips of phylogenies, and only a few of them have succeeded as well as their sexually reproducing relatives (). In other words, most parthenogenetic lineages may eventually be destined for extinction. These incipient evolutionary failures, however, are invaluable as by understanding their fate something may be learned about the adaptive value of sex. Parthenogenesis is thought to be favored in the short term because it generates a transmission advantage (, ), as well as the advantage of assured reproduction when mates are scarce (, ). The short-term benefits of parthenogenesis, however, are believed to come along with long-term costs. For example, the physical linkage between loci it entails can generate interferences that decrease the efficacy of natural selection [e.g., (–), reviewed in ()]. This is expected to translate into reduced rates of adaptation and increased accumulation of mildly deleterious mutations, which may potentially drive the extinction of parthenogenetic lineages. In addition to these predicted effects on adaptation and mutation accumulation, parthenogenesis is expected to drive major aspects of genome evolution. A classical prediction is that heterozygosity (i.e., intraindividual polymorphism) increases over time in the absence of recombination, as the two haploid genomes diverge independently of each other, generating the so-called “Meselson effect” (, ). Parthenogenesis can also affect the dynamics of transposable elements (TEs), resulting in either increased or decreased genomic TE loads (, ). Last, some forms of parthenogenesis might facilitate the generation and maintenance of structural variants (SVs), which in sexuals are counter-selected due to the constraints of properly pairing homologous chromosomes during meiosis (). We tested these predictions by comparing the genomes of five independently derived parthenogenetic stick insect species in the genus Timema with their close sexual relatives (Fig. 1). These replicate comparisons allowed us to solve the key problem in understanding the consequences of parthenogenesis for genome evolution: separating the consequences of parthenogenesis from lineage-specific effects (). Timema are wingless, plant-feeding insects endemic to western North America. Parthenogenetic species in this genus are diploid and of nonhybrid origin () and ecologically similar to their sexual relatives. Previous research, based on a small number of microsatellite markers, has suggested that oogenesis in parthenogenetic Timema is functionally mitotic, as no loss of heterozygosity between females and their offspring was detected ().
Fig. 1.

Multiple, independent transitions from sexual to parthenogenetic reproduction are known in the genus Timema , each representing a biological replicate of parthenogenesis, and with a close sexual relative at hand for comparison.

() (A) Phylogenetic relationships of Timema species [adapted from (, )]. (B) Species sequenced in this study. Photos taken by Bart Zijlstra (www.bartzijlstra.com).

Multiple, independent transitions from sexual to parthenogenetic reproduction are known in the genus Timema , each representing a biological replicate of parthenogenesis, and with a close sexual relative at hand for comparison.

() (A) Phylogenetic relationships of Timema species [adapted from (, )]. (B) Species sequenced in this study. Photos taken by Bart Zijlstra (www.bartzijlstra.com).

RESULTS AND DISCUSSION

De novo genome assemblies reveal extremely low heterozygosity in parthenogenetic stick insects

We generated 10 de novo genomes of Timema stick insects, from 5 parthenogenetic and 5 sexual species (Fig. 1 and tables S1 and S2). Genomes were subjected to quality control, screened for contamination, and annotated (see Methods and Supplementary Text). The final reference genomes were largely haploid, spanned 75 to 95% of the estimated genome size [1.38 gigabase pairs (Gbp); ()], and were sufficiently complete for downstream analyses, as shown by the count of single copy orthologs conserved across insects [96% of BUSCO genes () detected on average; table S3]. A phylogeny based on a conservative set of 3975 1:1 orthologous genes (data S1) corroborated published phylogenies and molecular divergence estimates in the Timema genus (fig. S1). Last, we identified 55 candidate horizontal gene transfers (HGTs) from nonmetazoans. Using long reads, we were able to corroborate six of the eight HGT candidates present in one species (T. douglasi; see Supplementary Text), indicating that most of our HGT candidates are likely true HGTs rather than false positives. The two remaining HGT candidates in T. douglasi were most likely misassembled chimeric contigs. All 55 candidate HGTs occurred early on in Timema evolutionary history, well before the evolution of parthenogenesis in the genus (see Supplementary Text). We estimated genome-wide nucleotide heterozygosity in each reference genome directly from sequencing reads, using a reference-free technique [genome profiling analysis ()]. These analyses revealed extreme heterozygosity differences between the sexual and parthenogenetic species. The five sexual Timema featured nucleotide heterozygosities within the range previously observed in other sexual species [Fig. 2; ()]. The heterozygosities in the parthenogenetic species were substantially lower, and so low that reference-free analyses could not distinguish heterozygosity from sequencing error (see Supplementary Text). We therefore compared heterozygosity between sexuals and parthenogens by calling single-nucleotide polymorphisms (SNPs) in five resequenced individuals per species. This analysis corroborated the finding that parthenogens have extremely low (<10−5) heterozygosity, being at least 140 times lower than that found in their sexual sister species [permutation analysis of variance (ANOVA), reproductive mode effect P = 0.0049; Fig. 2]. Screening for SVs (indels, tandem duplications, and inversions) in sexual and parthenogenetic individuals revealed the same pattern: extensive and variable heterozygosity in sexual species and homozygosity in the parthenogens (see Fig. 2 and Supplementary Text). Some heterozygosity in Timema parthenogens could be present in genomic regions not represented in our assemblies, such as centromeric and telomeric regions. These regions, however, represent a relatively small fraction of the total genome, meaning that, for most of the genome at least, Timema parthenogens are either largely or completely homozygous for all types of variants (see Supplementary Text).
Fig. 2.

Extremely low heterozygosity in parthenogenetic Timema species for different types of variants.

(A) Nucleotide heterozygosity represented by bars indicates genome-wide estimates for the reference individuals (based on raw reads, see Methods); heterozygosity based on SNP calls in resequenced individuals is indicated by points and represents a conservative estimation of heterozygosity in the assembled genome portions (with error bars indicating the range of estimates across individuals). (B) Heterozygous SVs (reported as number of heterozygous SVs/number of callable sites) in resequenced individuals (with error bars indicating the range of estimates across individuals). Note that although heterozygous SNPs and SVs were called using stringent parameters, it is likely that a large portion are false positives in parthenogenetic Timema (see Supplementary Text).

Extremely low heterozygosity in parthenogenetic Timema species for different types of variants.

(A) Nucleotide heterozygosity represented by bars indicates genome-wide estimates for the reference individuals (based on raw reads, see Methods); heterozygosity based on SNP calls in resequenced individuals is indicated by points and represents a conservative estimation of heterozygosity in the assembled genome portions (with error bars indicating the range of estimates across individuals). (B) Heterozygous SVs (reported as number of heterozygous SVs/number of callable sites) in resequenced individuals (with error bars indicating the range of estimates across individuals). Note that although heterozygous SNPs and SVs were called using stringent parameters, it is likely that a large portion are false positives in parthenogenetic Timema (see Supplementary Text). We are able to demonstrate that extremely low heterozygosity is a convergent consequence of parthenogenesis in Timema. However, while extremely low heterozygosity is the rule for Timema parthenogens, this is not the case for all parthenogenetic species. For example, we have previously shown that some parthenogenetic animal species, particularly those of hybrid origin, are characterized by relatively high heterozygosity levels (). This indicates that the consequences of parthenogenesis for heterozygosity are likely to be lineage-specific. The unexpected finding of extremely low heterozygosity in Timema parthenogens raises the question of when and how heterozygosity was lost. For example, the bulk of heterozygosity could have been lost during the transition from sexual reproduction to obligate parthenogenesis (). This would be the case if functionally mitotic parthenogenesis was derived from automictic parthenogenesis, meaning that recombination and meiosis take place, and diploidy is restored secondarily, for example, via fusion of two of the four meiotic products. Similar to inbreeding, automictic parthenogenesis thus results in the rapid loss of heterozygosity over time (). Automixis can then be co-opted into functionally mitotic parthenogenesis via recombination suppression, and by solely fusing meiotic products separated during meiosis I and not meiosis II (). Alternatively, heterozygosity loss in Timema parthenogens could be a continuous and ongoing process. To distinguish these options, we investigated the origin of the genetic variation present among different homozygous genotypes in each parthenogenetic species. We found that only 6 to 19% of the SNPs called in a parthenogen are at positions that are also polymorphic in the sexual relative (table S4). This means that most of the variation in parthenogens likely results from mutations that appeared after the split from the sexual lineage. Although it is possible that some of the parthenogen-specific polymorphisms may represent ancestral polymorphisms that were purged in the sexual species, it is unlikely that this would be the case for most SNPs in parthenogens. This implies that heterozygosity generated through new mutations is most likely lost continuously in parthenogens and was not solely lost at the inception of parthenogenesis. The most likely explanation for these findings is that parthenogenetic Timema are not functionally mitotic but automictic. Formally distinguishing between automixis and functionally mitotic parthenogenesis with gene conversion will require cell biological data, which is currently not available for Timema. It is important to note that the key theoretical predictions regarding the consequences of sex do not change between automixis or functionally mitotic parthenogenesis. Thus, although automixis can allow for the purging of heterozygous deleterious mutations (), the classical predictions for the long-term costs of asexuality extend to automictic parthenogens because, as for obligate selfers, linkage among genes is still much stronger than in classical sexual species (). This is especially the case in largely homozygous parthenogens, where recombination and segregation, even if mechanistically present, have no effect on genotype diversities. Functional mitosis in Timema parthenogens was previously inferred from the inheritance of heterozygous microsatellite genotypes between females and their offspring (), a technique widely used in nonmodel organisms with no cytological data available [e.g., (, )]. The most likely reconciliation of these results with our finding of extreme homozygosity is that heterozygosity is maintained in only a small portion of the genome, for example, the centromeres or telomeres, or between paralogs. Consistent with this idea, we were unable to locate several of the microsatellite-containing regions in even the best Timema genome assemblies (see Supplementary Text), suggesting that these regions are not present in our assemblies due to the inherent difficulty of assembling repetitive genome regions from short read data ().

Extensive variation in genotype diversity between parthenogenetic populations

Parthenogenesis and sexual reproduction are expected to drive notably different distributions of polymorphisms in genomes and populations. Within genomes, different regions experience different types of selection with sometimes opposite effects on the levels of polymorphisms within populations, such as purifying versus balancing selection (). The increased linkage among genes in parthenogenetic as compared to sexual species is expected to homogenize diversity levels across different genome regions. At the species level, parthenogens are expected to have reduced genetic diversities relative to sexual species because of likely bottlenecks occurring during the origin of parthenogenesis. At the population level, recurrent sweeps of specific genotypes in parthenogenetic populations can lead to extremely low genetic diversity and even to the fixation of a single genotype, while sweeps in sexual populations typically reduce diversity only in specific genome regions. To address these aspects in the genomes of sexual and parthenogenetic Timema species, we mapped population-level variation for the SNPs and SVs inferred above to our species-specific reference genomes. We then anchored our reference genome scaffolds to the 12 autosomal linkage groups of a previously published assembly of the sexual species T. cristinae [v1.3 from (); see Supplementary Text]. This revealed that different types of polymorphisms (SNPs and SVs) tended to co-occur across the genomes in all species, independently of reproductive mode (Fig. 3).
Fig. 3.

Population polymorphism levels in parthenogenetic (blue) and sexual (red) Timema species.

(A) Phylogenies based on 1:1 orthologous genes reflect the different levels of genotype diversities in parthenogenetic Timema species. (B) Distribution of SVs (dark blue and red) and SNPs (light blue and orange) along the genome. Scaffolds from the 10 de novo genomes are anchored on autosomal linkage groups from the sexual species T. cristinae (see Supplementary Text).

Population polymorphism levels in parthenogenetic (blue) and sexual (red) Timema species.

(A) Phylogenies based on 1:1 orthologous genes reflect the different levels of genotype diversities in parthenogenetic Timema species. (B) Distribution of SVs (dark blue and red) and SNPs (light blue and orange) along the genome. Scaffolds from the 10 de novo genomes are anchored on autosomal linkage groups from the sexual species T. cristinae (see Supplementary Text). The focal population for three of the five parthenogenetic species (T. genevievae, T. tahoe, and T. shepardi) consisted largely of a single genotype with only minor variation among individuals. By contrast, genotype diversity was considerable in T. monikensis and T. douglasi (Fig. 3A). In the former species, there was further a conspicuous diversity peak on LG8, supporting the idea that parthenogenesis is automictic in Timema. Under complete linkage (functionally mitotic parthenogenesis), putative effects of selection on this LG would be expected to propagate to the whole genome. Independently of local diversity peaks, overall diversity levels in T. monikensis and T. douglasi were comparable to the diversities in populations of some of the sexual Timema species (Fig. 3A). Different mechanisms could contribute to such unexpected diversities in parthenogenetic Timema, including the presence of lineages that derived independently from their sexual ancestor, or rare sex. While a single transition to parthenogenesis is believed to have occurred in T. monikensis, the nominal species T. douglasi is polyphyletic and known to consist of independently derived clonal lineages. These lineages have broadly different geographic distributions but can overlap locally (). Identifying the causes of genotypic variation in these species, including the possibility of rare sex, requires further investigation and is a challenge for future studies. Independently of the mechanisms underlying polymorphism in the parthenogenetic species T. monikensis, the polymorphism peak on LG8 is notable (Fig. 3B). This peak occurs in a region previously shown to determine color morph [green, green-striped, or brownish (“melanistic”)] in the sexual sister species of T. monikensis, T. cristinae (). Our focal T. monikensis population features four discrete color morphs (green, dark brown, yellow, and beige), suggesting that additional color morphs may be regulated by the region identified in T. cristinae. We also found a peak in polymorphism on LG8, spanning over approximately two-thirds of LG8, in the sexual species T. californicum, which features a different panel of color morphs than T. cristinae (). This diversity peak in T. californicum was generated by the presence of two divergent haplotypes (approximately 24 Mbp long), with gray individuals homozygous for one haplotype and green individuals heterozygous or homozygous for the alternative haplotype (see Supplementary Text). Note that the gray color morph is not known in the monomorphic green parthenogenetic sister of T. californicum (T. shepardi), and we therefore do not expect the same pattern of polymorphism on LG8 in this species.

Faster rate of adaptive evolution in sexual than parthenogenetic species

We have shown previously that parthenogenetic Timema species accumulate deleterious mutations faster than sexual species (, ), a pattern also reported in other parthenogenetic taxa [reviewed in (, )]. This is expected given that increased linkage among loci in parthenogens reduces the ability of selection to act individually on each locus, which generates different forms of selective interference (, , ). In addition to facilitating the accumulation of deleterious mutations, selective interference among loci in parthenogens should also constrain the efficiency of positive selection. While there is accumulating evidence for this process in experimental evolution studies [e.g., (–)], its impact on natural populations remains unclear (, ). To compare the efficiency of positive selection in sexual and parthenogenetic Timema, we used a branch-site model on the gene trees [see () and Methods]. We compared the terminal branches leading to sexual or parthenogenetic species in one-to-one orthologous genes identified in at least three species pairs (data S1), using a threshold of q < 0.05 to classify which terminal branches show evidence of positive selection. We found a greater number of positively selected genes in sexual than parthenogenetic species [binomial generalized linear mixed model (GLMM) P = 0.005; Fig. 4]. In addition, we also examined if there was more evidence for positive selection in sexual species in a threshold-free way by comparing the likelihood ratio test statistic between parthenogenetic and sexual species. This confirmed that the evidence for positive selection was stronger for sexual species (permutation glm P = 0.011).
Fig. 4.

Number of genes showing evidence for positive selection in each species (total number of genes = 7155).

In addition to reproductive mode, species pair also had a significant influence on the number of positively selected branches (binomial GLMM P = 0.015). There was no significant interaction between species pair and reproductive mode (P = 0.197). Note that the difference between reproductive modes is robust to a more stringent cutoff (q < 0.01 instead of 0.05; fig. S2A) and if genes with polymorphic positively selected sites were excluded (fig. S2B).

Number of genes showing evidence for positive selection in each species (total number of genes = 7155).

In addition to reproductive mode, species pair also had a significant influence on the number of positively selected branches (binomial GLMM P = 0.015). There was no significant interaction between species pair and reproductive mode (P = 0.197). Note that the difference between reproductive modes is robust to a more stringent cutoff (q < 0.01 instead of 0.05; fig. S2A) and if genes with polymorphic positively selected sites were excluded (fig. S2B). The positively selected genes that we identified are most likely associated with species-specific adaptations. Few of them were shared between species, with overlap between species not greater than expected by chance (false discovery rate < 0.4; fig. S3), and there was little enrichment of functional processes in positively selected genes [0 to 19 Gene Ontology (GO) terms per species; table S5]. Most of the significant GO terms were associated with positively selected genes in parthenogenetic Timema (table S5), likely because a much smaller proportion of positively selected genes in sexual species had annotations (fig. S4). We speculate that positively selected genes in sexuals could often be involved in sexual selection and species recognition. Genes associated with processes such as pheromone production and reception often evolve very fast, which makes them difficult to annotate through homology-based inference (). For the parthenogenetic species, although some terms could be associated with their mode of reproduction (e.g., GO:0033206 meiotic cytokinesis in T. douglasi), most are not clearly linked to a parthenogenetic life cycle.

TE content is similar between species with sexual and parthenogenetic reproduction

Upon the loss of sexual reproduction, TE dynamics are expected to change (, ). How these changes affect genome-wide TE loads is, however, unclear as sex can facilitate both the spread and the elimination of TEs (). In parthenogens, TE load might initially increase as a result of weaker purifying selection, a pattern well illustrated by the accumulation of TEs in nonrecombining parts of sex chromosomes and other supergenes (, ). However, TE loads in parthenogens are expected to decrease over time via at least two nonmutually exclusive mechanisms. First, TEs are expected to evolve lower activity over time as their evolutionary interests are aligned with their hosts (, ). Second, TE copies that were purged via excision can recolonize a sexual but not a parthenogenetic genomic background (). Last, it is important to note that the predicted effects of reproductive mode on genomic TE loads require sufficient levels of TE activity (transposition or excision). If TE activity is very low, TE contents will remain relatively stable in closely related species independently of their reproductive mode. We generated a Timema genus-level TE library by merging de novo TE libraries generated separately for each of the 10 Timema species. We then quantified TE content in each Timema genome by mapping reads to this merged library (see Methods). The overall TE content was very similar in all 10 species (20 to 23.6%), with significant differences in abundance of TE superfamilies between species groups but no significant effect of reproductive mode (P = 0.43; Fig. 5 and fig. S5).
Fig. 5.

Total TE abundance in the 10 Timema species.

TE abundance in each genome is expressed as the fraction of reads that map to a genus-level TE library (replicates within species correspond to the reference genome and three to five resequenced individuals). TE families are named following the Wicker classification (). The first character corresponds to the TE class [class I are retrotransposons (R); class II are DNA transposons (D)], the second character corresponds to the order (e.g., LTR), and the third character corresponds to the superfamily (e.g., Gypsy); for example, RLG is a Gypsy retroelement. The character X indicates unknown classification at the superfamily level (because of fragmentation or lack of detectable homology).

Total TE abundance in the 10 Timema species.

TE abundance in each genome is expressed as the fraction of reads that map to a genus-level TE library (replicates within species correspond to the reference genome and three to five resequenced individuals). TE families are named following the Wicker classification (). The first character corresponds to the TE class [class I are retrotransposons (R); class II are DNA transposons (D)], the second character corresponds to the order (e.g., LTR), and the third character corresponds to the superfamily (e.g., Gypsy); for example, RLG is a Gypsy retroelement. The character X indicates unknown classification at the superfamily level (because of fragmentation or lack of detectable homology). The oldest node in our Timema phylogeny has an age estimate of 30 million years (), but the overall TE contents of the two clades separating at this node only differ by 1.3%. This suggests that there has been little TE activity during the evolution of the genus or that transposition and excision rates are in balance. TE sequence divergence landscapes further revealed that there were few recently duplicated TEs in the Timema genomes, again suggesting little recent TE activity (see Supplementary Text and fig. S13). Although additional studies are required to formally distinguish between low TE activity and balanced transposition and excision rates, our results suggest that TE content likely evolves too slowly in Timema for any putative reproductive mode effects to become apparent. Low activity of TEs might facilitate the persistence of incipient parthenogenetic strains () and thus help to explain the high frequency of established parthenogenetic species in Timema. In conclusion, we present genomes of five independently derived parthenogenetic lineages of Timema stick insects, together with their five sexual sister species. This design with replicated species pairs allows us to disentangle consequences of parthenogenesis from species-specific effects. All parthenogenetic Timema species are largely or completely homozygous for both SNPs and SVs, and frequently feature lower levels of population polymorphism than their close sexual relatives. Low population polymorphism can exacerbate the effects of linkage for reducing the efficacy of selection, resulting in reduced rates of positive selection in parthenogenetic Timema, in addition to the accumulation of deleterious mutations previously documented (). Despite these negative genomic consequences, parthenogenesis is an unusually successful strategy in Timema. It evolved and persisted repeatedly in the genus, and parthenogenetic species often occur across large geographic areas. Because Timema are wingless and their populations are subjected to frequent extinction-recolonization dynamics in their fire-prone Californian shrubland habitats, the genomic costs of parthenogenesis are likely offset by one of the most classical benefits of parthenogenesis: the ability to reproduce without a mate.

METHODS

Sample collection and sequencing

For each of the 10 species, the DNA for Illumina shotgun sequencing was derived from virgin adult females collected in 2015 from natural populations in California (table S1). Extractions were done using the Qiagen MagAttract HMW DNA Kit, following the manufacturer’s indications. Five polymerase chain reaction (PCR)–free libraries were generated for each reference genome (three 2× 125-bp paired-end libraries with average insert sizes of 350, 550, and 700 bp, respectively, and two mate-pair libraries with 3000- and 5000-bp insert sizes, respectively); one library (550-bp insert size) was generated for each resequenced individual. Libraries were prepared using the Illumina TruSeq DNA PCR-Free or Nextera Mate Pair Library Prep Kits, following the manufacturer’s instructions, and sequenced on the Illumina HiSeq 2500 system, using v4 chemistry and 2× 125-bp reads at FASTERIS SA, Plan-les-Ouates, Switzerland and the Lausanne Genomic Technologies Facility, Switzerland.

Genome assembly and annotation

The total coverage for the reference genomes (all libraries combined) ranged between 37× and 45× (table S2). Trimmed paired-end reads were assembled into contigs using ABySS () and further scaffolded using paired-end and mate pairs using BESST (). Scaffolds identified as contaminants were filtered using BlobTools (). The assembly details can be found in the Supplementary Materials (see Supplementary Text). Publicly available RNA sequencing libraries for Timema (, , ) were used as expression evidence for annotation. Trimmed reads were assembled using Trinity v2.5.1 () to produce reference-guided transcriptomes. The transcriptomes and protein evidence were combined with ab initio gene finders to predict protein coding genes using MAKER v2.31.8 (). The annotation details can be found in the Supplementary Materials (see Supplementary Text).

Orthologs

Timema orthologous groups (OGs) were inferred with the OrthoDB standalone pipeline (v. 2.4.4) using default parameters (). In short, genes are clustered with a graph-based approach based on all best reciprocal hits between each pair of genomes. The high level of fragmentation typical for Illumina-based genomes constrains the ability to identify 1:1 orthologs across all 10 Timema species. To maximize the number of single-copy OGs covering all 10 Timema species, transcriptomes were included during orthology inference. Thus, transcripts were used to complete OGs in the absence of a gene from the corresponding species. Using this approach, 7157 single-copy OGs covering at least three sexual-parthenogenetic sister species pairs were obtained (data S1).

Horizontal gene transfers

To detect HGT from nonmetazoan species, we first used the pipeline of foreign sequence detection developed by Francois et al. (). We used the set of coding DNA sequences (CDS) identified in publicly available transcriptomes () and the genome assemblies before the decontamination procedure with BlobTools (). The rationale is that some genuine HGT could have been wrongly considered as contaminant sequences during this decontamination step and thus been removed from the assembly. Scaffolds filtered during decontamination are available from our github repository (https://github.com/AsexGenomeEvol/Timema_asex_genomes/tree/main/4_Horizontal_Gene_Transfers/contamination_sequences) and will be archived upon acceptance. Briefly, a DIAMOND BlastP (v0.8.33) () allows us to detect candidate nonmetazoan genes in the set of CDS of each species. Taxonomic assignment is based on the 10 best blast hits to account for potential contaminations and other sources of taxonomic misassignment using a curated reference database designed to cover all domains of life [see Francois et al. () for details]. Candidate nonmetazoan sequences are then subjected to a synteny-based screen with Gmap (v2016-11-07) () to discriminate between contaminant sequences and potential HGT-derived sequences. A sequence is considered as an HGT candidate if it is physically linked to (i.e., mapped to the same scaffold as) at least one “confident-arthropod” CDS (previously identified in the DIAMOND blast). We then clustered all HGT candidates identified in each of the 10 Timema species into HGT families using Silix (v1.2.10) (), requiring a minimum of 85% identity (default parameters otherwise). These HGT families were then “completed” as much as possible by adding homologs from the genome assemblies not identified as HGT candidates (this could occur if the corresponding sequences are fragmented or on short scaffolds for example). To this end, the longest sequence of each HGT family was mapped (using Gmap) on the genomic scaffolds of all species, requiring a minimum of 85% identity. For each completed HGT family, a protein alignment of the candidate HGT sequence(s) and its (their) 50 best DIAMOND blastP hits in the reference database (first step of the pipeline) was generated with MAFFT (v7) (). The alignments were cleaned using HMMcleaner (stringency parameter = 12) (), and sites with more than 50% missing data were removed. Phylogenetic trees were inferred using RAxML (v8.2) () with the model “PROTGAMMALGX” of amino acid substitution and 100 bootstrap replicates. Phylogenetic trees were inspected by eye to confirm whether an evolutionary history was consistent with the hypothesis of HGT.

Heterozygosity

Genome-wide nucleotide heterozygosity was estimated using genome profiling analysis of raw reads from the reference genomes using GenomeScope (v2) (). The second SNP-based heterozygosity estimate was generated using resequenced individuals. We resequenced five individuals per species, but three individuals of T. shepardi, two individuals of T. poppensis, and one T. tahoe individual did not pass quality control and were discarded from all downstream analyses. SNP calling was based on the Genome Analysis Toolkit best practices pipeline (). We used a conservative set of SNPs with quality scores ≥300 and supported by 15× coverage in at least one of the individuals. SNP heterozygosity was then estimated as the number of heterozygous SNPs divided by the number of callable sites in each genome. Because of stringent filtering criteria, our SNP-based heterozygosity is an underestimation of genome-wide heterozygosity.

Structural variants

We used Manta (v1.5.0) (), a diploid-aware pipeline for SV calling, in the same set of resequenced individuals used for SNP heterozygosity estimates. We found a high frequency of heterozygous SVs with approximately twice the expected coverage (fig. S6), which likely represent false positives. To reduce the number of false positives, we filtered very short SVs (30 bases or less) and kept only variant calls that had either split read or paired-end read support within the expected coverage range, where the coverage range was defined individually for each sample by manual inspection of coverage distributions. The filtered SV calls were subsequently merged into population SV calls using SURVIVOR (v1.0.2) (). The merging criteria were SV calls of the same type on the same strand with breakpoint distances shorter than 100 bp.

Genome alignment

We anchored our genome assemblies to the reference of T. cristinae (BioProject Accession PRJNA417530) () using MUMmer (version 4.0.0beta2) () with parameter --mum. The alignments were processed by other tools within the package: show-coords with parameters -THrcl to generate tab-delimited alignment files and dnadiff to generate one-to-one alignments. We used only uniquely anchored scaffolds for which we were able to map at least 10,000 nucleotides to the T. cristinae reference genome.

Transposable elements

For each species, specific repeat libraries were constructed and annotated to the TE superfamily level (), wherever possible. For collecting repetitive sequences, we used a raw read-based approach DNAPipeTE v1.2 () with parameters -genome_coverage 0.5 -sample_number 4 and respective species genome size, as well as an assembly-based approach (RepeatModeler v1.0.8 available at www.repeatmasker.org/RepeatModeler/), such that repeats not present in the assembly can still be represented in the repeat library. The two raw libraries were merged and clustered by 95% identity (the TE family threshold) using usearch v10.0.240 () with the centroid option. To annotate TEs larger than 500 bp in the repeat library, we used an approach that combines homology and structural evidence [PASTEClassifier ()]. Because PASTEClassifier did not annotate to TE superfamily levels, we additionally compared by BlastN (v. 2.7.1+) () the repeat libraries to the well curated T. cristinae TE library from Soria-Carrasco et al. (). Blast hits were filtered according to TE classification standards: identity percentage >80%, alignment length >80 bp, and the best hit per contig was kept. The two classification outputs were compared, and in case of conflict, the classification level of PASTEClassifier was preferred. All nonannotated repeats were labeled “unknown.” Repeat library header naming was done according to RepeatMasker standard, but keeping the Wicker naming for elements (i.e., Wicker#Repeatmasker, e.g., DTA#DNA/hAT). TE libraries were sorted by header, and TE annotations to similar families were numbered consecutively. Species-specific TE libraries were merged into a genus-level Timema TE library to account for any TE families that might have not been detected in the single species assemblies. To estimate the TE content of reference genomes and resequenced individuals, we first repeat-masked the assemblies with the genus-level TE library using RepeatMasker v4.1.0 with parameters set as -gccalc -gff -u -a -xsmall -no_is -div 30 -engine rmblast (). Second, we mapped the 350-bp insert paired-end reads back to the reference genome assemblies using BWA-MEM v0.7.17 () with standard parameters. We then counted the fraction of reads mapping to TEs out of total mappable reads by counting the number of reads that mapped to each genomic location annotated as TE using htseq-counts (v0.6.1.1p1) () with parameters set to -r name -s no -t similarity -i Target --nonunique none, using the mapped read alignments and the gff output of RepeatMasker (filtered for TE length of >80 bp). TE content was compared among species using a permutation ANOVA with 5000 bootstrap replicates. To generate the TE activity landscapes, utilities scripts calcDivergenceFromAlign.pl and createRepeatLandscape_mod.pl were used on the outputs of Repeatmasker v4.1.0. The createRepeatLandscape_mod.pl was modified to match the TE families found in Timema.

Positive selection analysis

Only one-to-one orthologs in at least three pairs of species (sister-species sex-asex) were used. The species phylogeny was imposed on every gene as the “gene tree.” We used a customized version of the Selectome pipeline (). All alignment building and filtering was performed on predicted amino acid sequences, and the final amino acid MSAs (multiple sequence alignments) were used to infer the nucleotide MSAs used for positive selection inference. MSAs were obtained by MAFFT (v. 7.310) () with the allowshift option, which avoids overaligning nonhomologous regions (e.g., gene prediction errors, or alternative transcripts). All the next steps “mask” rather than remove sites, by replacing the amino acid with an “X” and the corresponding codon with “NNN.” MCoffee (v11.00.8cbe486) () was run with the following aligners: mafft_msa, muscle_msa, clustalo_msa (), and t_coffee_msa (). MCoffee provides a consistency score per amino acid, indicating how robust the alignment is at that position for that sequence. Residues with a consistency score less than five were masked. TrimAl (v. 1.4.1) () was used to mask columns with less than four residues (neither gap nor X). Following this step, 2 of the 7157 ortholog alignments consisted only of gaps and were excluded from further analyses. The branch-site model with rate variation at the DNA level () was run using the Godon software (https://bitbucket.org/Davydov/godon/, version 2020-02-17, option BSG --ncat 4). Each branch was tested iteratively, in one run per gene tree. For each branch, we obtain a ΔlnL that measures the evidence for positive selection, a corresponding P value and associated q value (estimated from the distribution of P values over all branches of all genes), and an estimate of the proportion of sites under positive selection, if any. All positive selection results, and detailed methods, are available at https://selectome.org/timema. To determine whether the number of positively selected genes differed between sexual and parthenogenetic species, we used a binomial GLMM approach [lme4 ()] with q value threshold of 0.05 or 0.01. Significance of model terms was determined with a Wald statistic. In addition, we also examined if there was more evidence for positive selection in sexual species in a threshold-free way by comparing ΔlnL values between parthenogenetic and sexual species. To do this, we used a permutation glm approach where reproductive mode (sexual or parthenogenetic) was randomly switched within a species pair. To determine whether the overlap of positively selected genes was greater than expected by chance, we used the SuperExactTest package (v. 0.99.4) () in R. The resulting P values were multiple test–corrected using Benjamini and Hochberg’s algorithm implemented in R. To assess the impact of polymorphism on our results, we repeated our analysis after excluding positively selected genes with nonsynonymous polymorphic variants at positively selected sites (those with a posterior probability of >0.95 detected by Bayes empirical Bayes analysis in Godon). Functional enrichment analyses were performed using TopGO (v. 2.28.0) () using the Drosophila melanogaster functional annotation (see Supplementary Text). To determine whether a GO term was enriched, we used a Fisher’s exact test with the “weight01” algorithm to account for the GO topology. GO terms were considered to be significantly enriched when P < 0.05.
  103 in total

1.  Search and clustering orders of magnitude faster than BLAST.

Authors:  Robert C Edgar
Journal:  Bioinformatics       Date:  2010-08-12       Impact factor: 6.937

2.  Positive feedback in the transition from sexual reproduction to parthenogenesis.

Authors:  Tanja Schwander; Séverine Vuilleumier; Janie Dubman; Bernard J Crespi
Journal:  Proc Biol Sci       Date:  2010-01-13       Impact factor: 5.349

3.  Molecular evidence for ancient asexuality in timema stick insects.

Authors:  Tanja Schwander; Lee Henry; Bernard J Crespi
Journal:  Curr Biol       Date:  2011-06-16       Impact factor: 10.834

Review 4.  Supergenes and complex phenotypes.

Authors:  Tanja Schwander; Romain Libbrecht; Laurent Keller
Journal:  Curr Biol       Date:  2014-03-31       Impact factor: 10.834

5.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Authors:  Brian J Haas; Alexie Papanicolaou; Moran Yassour; Manfred Grabherr; Philip D Blood; Joshua Bowden; Matthew Brian Couger; David Eccles; Bo Li; Matthias Lieber; Matthew D MacManes; Michael Ott; Joshua Orvis; Nathalie Pochet; Francesco Strozzi; Nathan Weeks; Rick Westerman; Thomas William; Colin N Dewey; Robert Henschel; Richard D LeDuc; Nir Friedman; Aviv Regev
Journal:  Nat Protoc       Date:  2013-07-11       Impact factor: 13.491

6.  AUGUSTUS: ab initio prediction of alternative transcripts.

Authors:  Mario Stanke; Oliver Keller; Irfan Gunduz; Alec Hayes; Stephan Waack; Burkhard Morgenstern
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

7.  HTSeq--a Python framework to work with high-throughput sequencing data.

Authors:  Simon Anders; Paul Theodor Pyl; Wolfgang Huber
Journal:  Bioinformatics       Date:  2014-09-25       Impact factor: 6.937

8.  Consequences of Asexuality in Natural Populations: Insights from Stick Insects.

Authors:  Jens Bast; Darren J Parker; Zoé Dumas; Kirsten M Jalvingh; Patrick Tran Van; Kamil S Jaron; Emeric Figuet; Alexander Brandt; Nicolas Galtier; Tanja Schwander
Journal:  Mol Biol Evol       Date:  2018-07-01       Impact factor: 16.240

9.  Gene finding in novel genomes.

Authors:  Ian Korf
Journal:  BMC Bioinformatics       Date:  2004-05-14       Impact factor: 3.169

10.  GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes.

Authors:  T Rhyker Ranallo-Benavidez; Kamil S Jaron; Michael C Schatz
Journal:  Nat Commun       Date:  2020-03-18       Impact factor: 17.694

View more
  1 in total

1.  Dynamics of sex-biased gene expression during development in the stick insect Timema californicum.

Authors:  Tanja Schwander; Darren James Parker; Jelisaveta Djordjevic; Zoé Dumas; Marc Robinson-Rechavi
Journal:  Heredity (Edinb)       Date:  2022-05-17       Impact factor: 3.832

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.