| Literature DB >> 27558666 |
Victor Borges Rezende1, Carlos Congrains1, André Luís A Lima1, Emeline Boni Campanini1, Aline Minali Nakamura1, Janaína Lima de Oliveira1, Samira Chahad-Ehlers1, Iderval Sobrinho Junior1, Reinaldo Alves de Brito2.
Abstract
Several fruit flies species of the Anastrepha fraterculus group are of great economic importance for the damage they cause to a variety of fleshy fruits. Some species in this group have diverged recently, with evidence of introgression, showing similar morphological attributes that render their identification difficult, reinforcing the relevance of identifying new molecular markers that may differentiate species. We investigated genes expressed in head tissues from two closely related species: A. obliqua and A. fraterculus, aiming to identify fixed single nucleotide polymorphisms (SNPs) and highly differentiated transcripts, which, considering that these species still experience some level of gene flow, could indicate potential candidate genes involved in their differentiation process. We generated multiple libraries from head tissues of these two species, at different reproductive stages, for both sexes. Our analyses indicate that the de novo transcriptome assemblies are fairly complete. We also produced a hybrid assembly to map each species' reads, and identified 67,470 SNPs in A. fraterculus, 39,252 in A. obliqua, and 6386 that were common to both species. We identified 164 highly differentiated unigenes that had a mean interspecific index ([Formula: see text]) of at least 0.94. We selected unigenes that had Ka/Ks higher than 0.5, or had at least three or more highly differentiated SNPs as potential candidate genes for species differentiation. Among these candidates, we identified proteases, regulators of redox homeostasis, and an odorant-binding protein (Obp99c), among other genes. The head transcriptomes described here enabled the identification of thousands of genes hitherto unavailable for these species, and generated a set of candidate genes that are potentially important to genetically identify species and understand the speciation process in the presence of gene flow of A. obliqua and A. fraterculus.Entities:
Keywords: RNA-Seq; de novo assembly; fixed SNPs; fraterculus group; next generation sequencing; positive selection
Mesh:
Year: 2016 PMID: 27558666 PMCID: PMC5068948 DOI: 10.1534/g3.116.030486
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.154
Figure 1Framework for identifying candidate genes and estimating Ka/Ks in transcriptome libraries of A. fraterculus and A. obliqua. Separate pool libraries were built from virgin (V) and postmating (PM) males and females, and postovipositing (PO) females, with replicates. The derived reads were combined into single assemblies per species that were investigated for genes potentially involved with species divergence.
Summary of the sequencing effort, read cleaning and de novo assembly statistics of head transcriptomes of adult individuals from both sexes at different life stages of A. fraterculus and A. obliqua
| Illumina Sequencing | |||
| Total of paired-end reads | 155,940,826 | 81,781,686 | 74,159,140 |
| Filtered reads | 140,493,653 | 73,637,239 | 66,856,414 |
| Assemblies Length Distribution | |||
| Total number of contigs | 154,787 | 112,862 | 98,549 |
| Number of trinity components | 76,293 | 61,153 | 55,126 |
| N50 | 2012 | 2504 | 2637 |
| Contigs longer than 1000 bp | 45,602 | 38,105 | 34,982 |
| Contigs longer than 2000 bp | 22,297 | 21,054 | 19,964 |
| Contigs longer than 10,000 bp | 213 | 329 | 269 |
| Assemblies Length Statistics | |||
| Average contig length (bp) | 1027 | 1196 | 1254 |
| Median contig length (bp) | 494 | 535 | 563 |
| Longest contig (bp) | 25,704 | 27,513 | 22,394 |
Hybrid assembly generated using reads of head tissues of A. fraterculus and A. obliqua.
Figure 2Gene ontology (GO) analysis of the head transcriptomes of A. fraterculus and A. obliqua. GO terms mapped at the most general level with a percentage > 0.5% are shown.
Figure 3KOG function classification of the head transcriptomes of A. fraterculus and A. obliqua.
Figure 4Frequency distribution of . Distribution of the average of allele frequency differences between A. fraterculus and A. obliqua per unigene () in 6386 SNPs across 2612 unigenes. x-axis is in intervals of 0.05. Unigenes with the highest differentiation levels (0.94 < ≤ 0.95) are shown in red.
Functional enrichment of highly differentiated unigenes (≥ 0.94)
| GO term | Category | Main Category | Number of Genes | Fold Enrichment | |
|---|---|---|---|---|---|
| GO:0005737 | Cytoplasm | CC | 64 | 1.65 | 3.23E−03 |
| GO:0044444 | Cytoplasmic part | CC | 51 | 1.76 | 9.74E−03 |
| GO:0005829 | Cytosol | CC | 18 | 2.86 | 3.81E−02 |
| GO:0005622 | Intracellular | CC | 90 | 1.35 | 3.89E−02 |
| GO:0044237 | Cellular metabolic process | BP | 70 | 1.61 | 7.82E−03 |
| GO:0006412 | Translation | BP | 16 | 3.87 | 1.13E−02 |
| GO:0006518 | Peptide metabolic process | BP | 17 | 3.58 | 1.57E−02 |
| GO:0043043 | Peptide biosynthetic process | BP | 16 | 3.77 | 1.58E−02 |
| GO:1901566 | Organonitrogen compound biosynthetic process | BP | 21 | 2.96 | 2.21E−02 |
| GO:0008152 | Metabolic process | BP | 81 | 1.47 | 2.34E−02 |
| GO:0043604 | Amide biosynthetic process | BP | 16 | 3.64 | 2.41E−02 |
| GO:0043603 | Cellular amide metabolic process | BP | 17 | 3.33 | 3.98E−02 |
CC, cellular component; BP biological process.
P-values were corrected using a Bonferroni approach.
Highly divergent unigenes with at least three SNPs
| Amino Acid Involved | Substitution Type | ||||
|---|---|---|---|---|---|
| Thioredoxin reductase-1 | 0.950 | 0.950 | Arg (R) | Lys (K) | NS |
| 0.950 | Asn (N) | Asn (N) | S | ||
| 0.950 | Ser (S) | Leu (L) | NS | ||
| Maternal expression at 31B | 0.950 | 0.950 | Val (V) | Val (V) | S |
| 0.950 | Ile (I) | Ile (I) | S | ||
| 0.950 | Ala (A) | Ala (A) | S | ||
| 0.950 | Ala (A) | Ala (A) | S | ||
| Superoxide dismutase | 0.950 | 0.950 | — | — | NC |
| 0.950 | Thr (T) | Thr (T) | S | ||
| 0.950 | Thr (T) | Thr (T) | S | ||
| CG3842 | 0.950 | 0.950 | Ser (S) | Ser (S) | S |
| 0.950 | Gly (G) | Gly (G) | S | ||
| 0.950 | Ala (A) | Ala (A) | S | ||
| 0.950 | Arg (R) | Arg (R) | S | ||
| 0.950 | Leu (L) | Ile (I) | NS | ||
| CG2233 | 0.942 | 0.919 | Gly (G) | Glu (E) | NS |
| 0.950 | Asn (N) | Asp (D) | NS | ||
| 0.950 | Ser (S) | Ser (S) | S | ||
| 0.950 | Leu (L) | Leu (L) | S | ||
| CG32425 | 0.950 | 0.950 | Ile (I) | Ile (I) | S |
| 0.950 | Asn (N) | Asn (N) | S | ||
| 0.950 | — | — | NC | ||
| Flavin-containing monooxygenase 2 | 0.950 | 0.950 | Ser (S) | Thr (T) | NS |
| 0.950 | Glu (E) | Glu (E) | S | ||
| 0.950 | Gly (G) | Gly (G) | S | ||
| 0.950 | Val (V) | Val (V) | S | ||
| 0.950 | Asn (N) | Asn (N) | S | ||
| Relish | 0.949 | 0.949 | Asp (D) | Glu (E) | NS |
| 0.950 | Leu (L) | Leu (L) | S | ||
| 0.949 | Arg (R) | Arg (R) | S | ||
is average of the interspecific index (D) between A. fraterculus and A. obliqua calculated for all SNPs of each unigene
NC, Substitution in noncoding region; N, nonsynonymous substitution; S, synonymous substitutions.
Highly divergent unigenes evolving under positive selection in the Anastrepha branch
| SNP Calling | Pairwise Ka/Ks | ||||||
|---|---|---|---|---|---|---|---|
| Amino Acid Involved | Substitution Type | ||||||
| Ribosomal protein L24-like | 0.950 | — | — | NC | 50.0000(0.002/4.5E−05) | 0.0188(0.024/1.257) | 0.0168(0.021/1.272) |
| Serine protease 6 | 0.950 | Gln | Gln | S | 1.2455(0.023/0.019) | 0.2743(0.601/2.189) | 0.2903(0.607/2.092) |
| CG16817 | 0.949 | Ser | Arg | N | 0.7325(0.007/0.009) | 0.0830(0.152/1.835) | 0.0762(0.151/1.988) |
| CG2219 | 0.950 | Ser | Thr | N | 0.7302(0.009/0.013) | 0.1050(0.235/2.239) | 0.1073(0.289/2.697) |
| CG13367 | 0.949 | Ser | Pro | N | 0.7168(0.002/0.003) | 0.0826(0.168/2.034) | 0.0838(0.167/1.997) |
| Microtubule-associated protein 205 | 0.949 | Glu | Glu | S | 0.6091(0.018/0.030) | 0.2762(0.321/1.162) | 0.2852(0.325/1.139) |
| Odorant-binding protein 99c | 0.950 | Glu | Ile | N | 0.6038(0.030/0.049) | 0.1401(0.124/0.887) | 0.1281(0.122/0.953) |
| CG9500 | 0.950 | Glu | Ala | N | 0.6025(0.036/0.059) | 0.1160(0.353/3.042) | 0.0869(0.337/3.877) |
| Glu | Ala | N | |||||
| Ribonuclear protein at 97D | 0.950 | — | — | NC | 0.5627(0.017/0.031) | 0.0243(0.044/1.822) | 0.0225(0.040/1.767) |
| Transferrin 3 | 0.943 | Glu | Ser | N | 0.5614(0.007/0.012) | 0.0522(0.088/1.684) | 0.0528(0.088/1.674) |
| Mitochondrial ribosomal protein S2 | 0.950 | Thr | Thr | S | 0.5462(0.010/0.017) | 0.0311(0.083/2.676) | 0.0344(0.083/2.425) |
is the interspecific index between A. fraterculus and A. obliqua calculated for each transcript. Amino acid substitution is associated to the SNP analysis. A. fra, A. fraterculus; A. obl, A. obliqua; C. cap, C. capitata.
NC, Substitution in noncoding region; N, nonsynonymous substitution; S, synonymous substitution.
Values of Ka/Ks rates and values of Ka and Ks, separated by a slash, are shown in parentheses.