| Literature DB >> 29131842 |
Victoria G Twort1,2, Alice B Dennis2, Duckchul Park2, Kathryn F Lomas3, Richard D Newcomb1,4, Thomas R Buckley1,2.
Abstract
Animal reproductive proteins, especially those in the seminal fluid, have been shown to have higher levels of divergence than non-reproductive proteins and are often evolving adaptively. Seminal fluid proteins have been implicated in the formation of reproductive barriers between diverging lineages, and hence represent interesting candidates underlying speciation. RNA-seq was used to generate the first male reproductive transcriptome for the New Zealand tree weta species Hemideina thoracica and H. crassidens. We identified 865 putative reproductive associated proteins across both species, encompassing a diverse range of functional classes. Candidate gene sequencing of nine genes across three Hemideina, and two Deinacrida species suggests that H. thoracica has the highest levels of intraspecific genetic diversity. Non-monophyly was observed in the majority of sequenced genes indicating that either gene flow may be occurring between the species, or that reciprocal monophyly at these loci has yet to be attained. Evidence for positive selection was found for one lectin-related reproductive protein, with an overall omega of 7.65 and one site in particular being under strong positive selection. This candidate gene represents the first step in the identification of proteins underlying the evolutionary basis of weta reproduction and speciation.Entities:
Mesh:
Year: 2017 PMID: 29131842 PMCID: PMC5683631 DOI: 10.1371/journal.pone.0188147
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Distribution of biological function annotation of two Hemideina transcriptomes.
Green bars: H. crassidens and blue bars: H. thoracica.
The 20 most encountered InterPro accessions present in two Hemideina transcriptomes.
| InterPro Entry | InterPro Description | Number of Contigs | InterPro Entry | InterPro Description | Number of Contigs |
|---|---|---|---|---|---|
| IPR012336 | Thioredoxin-like fold | 23 | IPR016187 | C-type lectin fold | 17 |
| IPR011991 | Winged helix-turn-helix DNA-binding | 18 | IPR016186 | C-type lectin-like/link domain | 16 |
| IPR029071 | Ubiquitin-related domain | 17 | IPR001304 | C-type lectin-like | 14 |
| IPR027417 | P-loop containing nucleoside triphosphate | 17 | IPR012336 | Thioredoxin-like fold | 13 |
| IPR000504 | RNA recognition motif domain | 15 | IPR001254 | Serine proteases, trypsin domain | 13 |
| IPR000626 | Ubiquitin domain | 15 | IPR009003 | Peptidase S1, PA clan | 13 |
| IPR016187 | C-type lectin fold | 12 | IPR011991 | Winged helix-turn-helix DNA-binding | 10 |
| IPR010987 | Glutathione S-transferase, C-terminal-like | 12 | IPR011992 | EF-hand-like domain | 9 |
| IPR014756 | Immunoglobulin E-set | 11 | IPR023796 | Serpin domain | 8 |
| IPR015943 | WD40/YVTN repeat-like-containing domain | 11 | IPR027417 | P-loop containing nucleoside triphosphate | 8 |
| IPR016186 | C-type lectin-like/link domain | 11 | IPR029277 | Single domain Von Willebrand factor type C domain | 8 |
| IPR011992 | EF-hand domain pair | 10 | IPR013783 | Immunoglobulin-like fold | 7 |
| IPR004045 | Glutathione S-transferase, N-terminal | 10 | IPR008037 | Pacifastin domain | 7 |
| IPR000477 | Reverse transcriptase domain | 10 | IPR000477 | Reverse transcriptase domain | 7 |
| IPR032675 | Leucine-rich repeat domain, L domain-like | 9 | IPR002048 | EF-hand domain | 7 |
| IPR008991 | Translation protein SH3-like domain | 9 | IPR029071 | Ubiquitin-related domain | 7 |
| IPR005203 | Hemocyanin, C-terminal | 9 | IPR016040 | NAD(P)-binding domain | 7 |
| IPR001304 | C-type lectin-like | 9 | IPR000626 | Ubiquitin domain | 6 |
| IPR016040 | NAD(P)-binding domain | 8 | IPR007110 | Immunoglobulin-like domain | 6 |
| IPR013766 | Thioredoxin domain | 7 | IPR013766 | Thioredoxin domain | 6 |
Candidate genes for downstream evolutionary analysis.
| Gene | Annotation | AA % identity | Nuc % identity | |||
|---|---|---|---|---|---|---|
| General metabolic controls | Cytochrome oxidase subunit I | contig01355 | contig00784 | 80.2 | 81.3 | |
| Elongation factor 1 delta | refmapcontig00958 | contig00992 | 99.2 | 97.5 | ||
| Reproductive proteins | Serine protease snake-like | contig02289 | contig01653 | 90 | 94.1 | |
| Testis specific ser/thr kinase | contig01847 | — | — | — | ||
| Sperm flagellar protein | contig01892 | — | — | — | ||
| Accessory gland protein | refmapcontig01086 | contig00834 | 93.5 | 98.1 | ||
| Accessory gland protein | refmapcontig00651 | contig01386 | 90.4 | 97.3 | ||
| Accessory gland protein | refmapcontig00312 | contig01836 | 88.5 | 90.8 | ||
| —N/A— | refmapcontig00670 | contig1351 | 93.8 | 93.7 |
AA, amino acid
Nuc, nucleotide
Fig 2Maximum likelihood phylogeny constructed using mitochondrial cytochrome oxidase subunit I (COI) DNA sequences from individuals representing three Hemideina and one Deinacrida species.
Bootstrap support values greater than 0.5 are indicated. Scale bar represents the number of substitutions per site.
Fig 3Haplotype network of A) Circles represent different haplotypes, with the circles area being proportional to the frequency of each haplotype. Lines between haplotypes represent mutational steps between sequences. The empty circles represent inferred unsampled haplotypes. Colours correspond to species: Red, H. crassidens; Blue, H. thoracica; Yellow, H. trewicki.
Fig 4Haplotype network of A) . Circles represent different haplotypes, with the circles area being proportional to the frequency of each haplotype. Lines between haplotypes represent mutational steps between sequences. The empty circles represent inferred unsampled haplotypes. Colours correspond to species: Red, H. crassidens; Blue, H. thoracica; Yellow, H. trewicki.
Summary statistics of intra-specific sequence variation within three Hemideina species.
| Gene | Species | N | n | s | H | Hd | Eta | S | π | k | πs | πa | πa/πs | S | NS | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Reproductive proteins | 18 | 36 | 423 | 13 | 0.798 | 19 | 19 | 0.005 | 2.175 | 0.008 | 0.004 | 0.5 | 8 | 11 | ||
| 10 | 20 | 423 | 5 | 0.768 | 8 | 8 | 0.008 | 3.711 | 0.118 | 0.007 | 0.0593 | 3 | 5 | |||
| 5 | 10 | 423 | 1 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | — | 0 | 0 | |||
| 19 | 38 | 438 | 9 | 0.680 | 8 | 7 | 0.005 | 2.241 | 0.015 | 0.002 | 0.133 | 5 | 3 | |||
| 11 | 22 | 438 | 3 | 0.255 | 3 | 3 | 0.001 | 0.355 | 0.002 | 0.000 | 0.000 | 2 | 1 | |||
| 5 | 10 | 438 | 1 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | — | 0 | 0 | |||
| 19 | 38 | 333 | 3 | 0.619 | 2 | 2 | 0.002 | 0.747 | 0.010 | 0.000 | 0.000 | 2 | 0 | |||
| 11 | 22 | 333 | 5 | 0.797 | 5 | 5 | 0.005 | 1.662 | 0.014 | 0.002 | 0.142 | 3 | 2 | |||
| 5 | 10 | 333 | 1 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | — | 0 | 0 | |||
| 18 | 36 | 258 | 5 | 0.470 | 4 | 4 | 0.002 | 0.527 | 0.000 | 0.003 | — | 0 | 4 | |||
| 11 | 22 | 258 | 5 | 0.727 | 12 | 11 | 0.020 | 4.762 | 0.031 | 0.017 | 0.548 | 4 | 8 | |||
| 5 | 10 | 234 | 1 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | — | 0 | 0 | |||
| 19 | 38 | 237 | 3 | 0.437 | 3 | 3 | 0.005 | 1.24 | 0.017 | 0.002 | 0.118 | 2 | 1 | |||
| 11 | 22 | 237 | 3 | 0.177 | 3 | 3 | 0.001 | 0.355 | 0.005 | 0.000 | 0.000 | 2 | 1 | |||
| 5 | 10 | 237 | 1 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | — | 0 | 0 | |||
| 18 | 36 | 459 | 6 | 0.763 | 5 | 5 | 0.002 | 1.111 | 0.011 | 0.000 | 0.000 | 5 | 0 | |||
| 10 | 20 | 459 | 4 | 0.363 | 3 | 1 | 0.001 | 0.363 | 0.004 | 0.000 | 0.000 | 3 | 0 | |||
| 5 | 10 | 459 | 1 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | — | 0 | 0 | |||
| 17 | 34 | 372 | 9 | 0.813 | 8 | 7 | 0.004 | 1.303 | 0.005 | 0.003 | 0.600 | 3 | 5 | |||
| 11 | 22 | 372 | 3 | 0.279 | 2 | 2 | 0.001 | 0.458 | 0.000 | 0.002 | — | 0 | 2 | |||
| 5 | 10 | 372 | 3 | 0.378 | 7 | 7 | 0.004 | 1.556 | 0.008 | 0.003 | 0.375 | 3 | 4 | |||
| General metabolic controls | 19 | 38 | 363 | 2 | 0.341 | 1 | 1 | 0.001 | 0.341 | 0.000 | 0.001 | — | 0 | 1 | ||
| 11 | 22 | 363 | 5 | 0.644 | 8 | 8 | 0.004 | 1.608 | 0.000 | 0.000 | — | 0 | 0 | |||
| 5 | 10 | 363 | 1 | 0.000 | 0 | 0 | 0.000 | 0.000 | 0.000 | 0.000 | — | 0 | 0 | |||
| 18 | 18 | 672 | 17 | 0.993 | 134 | 119 | 0.050 | 34.22 | — | — | — | — | — | |||
| 10 | 10 | 672 | 9 | 0.978 | 60 | 59 | 0.024 | 16.33 | — | — | — | — | — | |||
| 5 | 5 | 672 | 2 | 0.400 | 1 | 1 | 0.001 | 0.400 | — | — | — | — | — |
N, number of individuals; n, number of alleles; s, number of sites; H, number of haplotypes; Hd, haplotype diversity; Eta, number of mutations; S, number of segregating sites; π, nucleotide diversity; k, average number of nucleotide differences; πs, nucleotide diversity at synonymous sites; πa, nucleotide diversity at nonsynonymous sties; S, total number of synonymous substitutions; NS, total number of nonsynonymous substitutions
Tajima’s D test results for three Hemideina gene datasets.
| Gene | Species | D-statistic |
|---|---|---|
| 2.175 | ||
| -1.759 | ||
| — | ||
| 0.511 | ||
| -1.471 | ||
| — | ||
| 0.629 | ||
| — | ||
| — | ||
| -1.111 | ||
| 1.564 | ||
| — | ||
| 0.637 | ||
| -1.471 | ||
| — | ||
| -0.240 | ||
| -1.529 | ||
| — | ||
| -0.913 | ||
| -0.440 | ||
| -1.573 |
*, p-value < 0.05.
Likelihood ratio tests of positive section using PAML site-specific models.
| Gene (#sequences) | dN/dS | 2Δl | 2Δl | 2Δl | Positive Selection (%) |
|---|---|---|---|---|---|
| 1.11 | 10.62* | 7.65 | 7.65* | 6.8 (1.4) | |
| 0.51 | 10.57* | 0.00 | 1.20 | 11–47 (0–47) | |
| 0.72 | 1.22 | 0.17 | 0.17 | 4.4–16 (0) |
dN/dS is the average omega (ω) across all sites and branches calculated under M0. 2Δl is given for each model comparison (M0:M3; M1a:M2a; M8:M8a), which is twice the difference between the log likelihood of the two nested site-specific models implemented in PAML. Models are judged to have a significantly better fit (* = P-value<0.05) based on the chi2 distribution with degrees of freedom proportional to the difference in the number of parameters between models; M0:M3 = 4; M1a:M2a = 4; M8:M8a = 50:50 mixture of point mass 0 and 1).
Parameters indicating positive selection are in bold. Percent positive selection indicates the proportion of sites across the gene predicted to have experienced positive selection, while the percentage in bracketed represents the proportion of sites identified with a >95% probability.
Fig 5Positive selection within Acp3.
Red line represents the mean posterior omega, and the blue line represents the probability of each codon being under positive selection. The values were calculated using a Bayes Empirical Bayes analysis under the M8 site-specific model in Paml. Codon position based on full-length alignment. The annotations identified using InterProScan and the position of the 24 bp indel are shown in relation to codon position.