| Literature DB >> 31477173 |
Geoffrey M Attardo1, Adly M M Abd-Alla2, Alvaro Acosta-Serrano3, James E Allen4, Rosemary Bateta5, Joshua B Benoit6, Kostas Bourtzis2, Jelle Caers7, Guy Caljon8, Mikkel B Christensen4, David W Farrow6, Markus Friedrich9, Aurélie Hua-Van10, Emily C Jennings6, Denis M Larkin11, Daniel Lawson12, Michael J Lehane3, Vasileios P Lenis13, Ernesto Lowy-Gallego4, Rosaline W Macharia14,15, Anna R Malacrida16, Heather G Marco17, Daniel Masiga14, Gareth L Maslen4, Irina Matetovici18, Richard P Meisel19, Irene Meki2, Veronika Michalkova20,21, Wolfgang J Miller22, Patrick Minx23, Paul O Mireji5,24, Lino Ometto25,16, Andrew G Parker2, Rita Rio26, Clair Rose3, Andrew J Rosendale27,6, Omar Rota-Stabelli25, Grazia Savini16, Liliane Schoofs7, Francesca Scolari16, Martin T Swain28, Peter Takáč29, Chad Tomlinson23, George Tsiamis30, Jan Van Den Abbeele18, Aurelien Vigneron31, Jingwen Wang32, Wesley C Warren23,33, Robert M Waterhouse34, Matthew T Weirauch35, Brian L Weiss31, Richard K Wilson23, Xin Zhao36, Serap Aksoy37.
Abstract
BACKGROUND: Tsetse flies (Glossina sp.) are the vectors of human and animal trypanosomiasis throughout sub-Saharan Africa. Tsetse flies are distinguished from other Diptera by unique adaptations, including lactation and the birthing of live young (obligate viviparity), a vertebrate blood-specific diet by both sexes, and obligate bacterial symbiosis. This work describes the comparative analysis of six Glossina genomes representing three sub-genera: Morsitans (G. morsitans morsitans, G. pallidipes, G. austeni), Palpalis (G. palpalis, G. fuscipes), and Fusca (G. brevipalpis) which represent different habitats, host preferences, and vectorial capacity.Entities:
Keywords: Disease; Hematophagy; Lactation; Neglected; Symbiosis; Trypanosomiasis; Tsetse
Mesh:
Substances:
Year: 2019 PMID: 31477173 PMCID: PMC6721284 DOI: 10.1186/s13059-019-1768-2
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Fig. 1Geographic distribution, ecology, and vectorial capacity of sequenced Glossina species. Visual representation of the geographic distribution of the sequenced Glossina species across the African continent. Ecological preferences and vectorial capacities are described for each associated group
Glossina species contig and scaffold assembly statistics
| Scaffold length |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Total genomic coverage | 100× | 46× | 50× | 52× | 58× | 45× |
| Genome size (Mb) | 366 | 357 | 370 | 374 | 380 | 315 |
| > 1 Mb | 13 | 102 | 78 | 70 | 63 | 81 |
| 250 kb–1 Mb | 138 | 248 | 316 | 393 | 395 | 202 |
| 100–250 kb | 605 | 184 | 248 | 330 | 326 | 136 |
| 10–100 kb | 3663 | 290 | 379 | 496 | 709 | 257 |
| 5–10 kb | 737 | 106 | 94 | 165 | 507 | 85 |
| 2–5 kb | 1933 | 255 | 206 | 252 | 978 | 156 |
| < 2 kb | 6718 | 541 | 884 | 689 | 948 | 734 |
| Total no. of contigs | 24,071 | 7275 | 18,748 | 13,688 | 31,320 | 16,993 |
| N50 contig length (kb) | 49 | 167 | 46 | 64 | 24 | 62 |
| Total no. of Scaffolds | 13,807 | 1726 | 2205 | 2395 | 3926 | 1651 |
| GC content (%) | 33 | 35 | 34 | 34 | 34 | 27 |
| N50 scaffold length (kb) | 120 | 1038 | 812 | 561 | 575 | 1209 |
| L50 (rank of N50 scaffold) | 569 | 94 | 115 | 178 | 186 | 62 |
| Repeat content (%) | 34.95 | 35.49 | 38.64 | 37.09 | 35.49 | 37.67 |
N50 is defined as the minimum contig length needed to cover 50% of the genome. L50 is defined as the smallest number of contigs whose length sum makes up half of genome size
Quantification of Glossina gene predictions and genomic completeness by Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis
| Species | Complete BUSCOs (%) | Complete and single-copy BUSCOS (%) | Complete and duplicated BUSCOs (%) | Fragmented BUSCOs (%) | Missing BUSCOs (%) | Total BUSCO groups searched (%) |
|---|---|---|---|---|---|---|
| BUSCO gene analysis results (percentage) (diptera_odb9 geneset) | ||||||
| | 93.53 | 88.00 | 5.54 | 3.22 | 3.25 | 100.00 |
| | 95.53 | 90.78 | 4.75 | 2.72 | 1.75 | 100.00 |
| | 97.11 | 93.00 | 4.11 | 2.18 | 0.71 | 100.00 |
| | 96.50 | 91.14 | 5.36 | 2.32 | 1.18 | 100.00 |
| | 95.00 | 87.53 | 7.47 | 3.32 | 1.68 | 100.00 |
| | 95.14 | 89.03 | 6.11 | 2.97 | 1.89 | 100.00 |
| BUSCO genomic analysis results (percentage) (diptera_odb9 geneset) | ||||||
| | 92.03 | 91.25 | 0.79 | 3.32 | 4.64 | 100.00 |
| | 98.43 | 97.36 | 1.07 | 1.07 | 0.50 | 100.00 |
| | 98.07 | 97.18 | 0.89 | 1.25 | 0.68 | 100.00 |
| | 98.32 | 97.21 | 1.11 | 1.18 | 0.50 | 100.00 |
| | 97.07 | 92.85 | 4.22 | 1.86 | 1.07 | 100.00 |
| | 97.96 | 97.11 | 0.86 | 1.25 | 0.79 | 100.00 |
Fig. 2Comparative analysis of repetitive elements within the Glossina genomes. a Graphical representation of the constitution and sequence coverage by the various classes of identified dispersed repetitive elements. b Coverage of TE families that are shared between species. More than 75% of the total coverage (eight first magnified bars) correspond to TE either specific to one species, shared by all species, or shared by the five closest. c Relative constitution of DNA terminal inverted repeat (TIR) families across the Glossina genomes. d Relative constitution of long interspersed nuclear elements (LINEs) across the Glossina genomes. For c and d, the size of the pie charts reflects the proportion of the subclass among the dispersed repetitive sequences
Fig. 3Glossina whole-genome alignment, phylogenetic analysis of orthologous protein-coding nuclear genes, and phylogenetic analysis of mitochondrial sequences. a Analysis of whole-genome and protein-coding sequence alignment. The left graph reflects the percentage of total genomic sequence aligning to the G. m. morsitans reference. The right side of the graph represents the alignment of all predicted coding sequences from the genomes with coloration representing matches, mismatches, insertions, and uncovered exons. b Phylogenic tree from conserved protein-coding sequences. Black dots at nodes indicate full support from maximum likelihood (Raxml), Bayesian (Phylobayes), and coalescent-aware (Astral) analyses. Raxml and Phylobayes analyses are based on an amino acid dataset of 117,782 positions from 286 genes from 12 species. The Astral analyses are based on a 1125-nucleotide dataset of 478,617 positions from the 6 Glossina (full trees are in Additional file 2: Figure S2A-C). The values at nodes represent the bootstrap supports and posterior probabilities from the maximum likelihood and Bayesian analyses, respectively (Bootstrap/posterior probability). c Molecular phylogeny derived from whole mitochondrial genome sequences. The analysis was performed using the maximum likelihood method with MEGA 6.0
Fig. 4Visualization of syntenic block analysis data and predicted Muller element sizes. Level of syntenic conservation between tsetse scaffolds and Drosophila chromosomal structures (Muller elements). The color-coded concentric circles consisting of bars represent the percent of syntenic conservation of orthologous protein-coding gene sequences between the Glossina genomic scaffolds and Drosophila Muller elements. Each bar represents 250 kb of aligned sequence, and bar heights represent the percent of syntenic conservation. The graphs on the periphery of the circle illustrate the combined predicted length and number of genes associated with the Muller elements for each tsetse species. The thin darkly colored bars represent the number of 1:1 orthologs between each Glossina species and D. melanogaster. The thicker lightly colored bands represent the predicted length of each Muller element for each species. This was calculated as the sum of the lengths of all scaffolds mapped to those Muller elements
Fig. 5Homology map of the Wolbachia-derived cytoplasmic and horizontal transfer-derived nuclear sequences. Circular map of the G. austeni Wolbachia horizontal transfer-derived genomic sequences (wGau—blue), the D. melanogaster Wolbachia cytoplasmic genome sequence (wMel—green), the G. m. morsitans Wolbachia cytoplasmic genome sequence (wGmm—red), and the Wolbachia-derived chromosomal insertions A and B from G. m. morsitans (wGmm insertion A and insertion B yellow and light yellow, respectively). The outermost circle represents the scale in kbp. Contigs for the wGau sequences, wGmm, and the chromosomal insertions A and B in G. m. morsitans are represented as boxes. Regions of homology between the G. austeni insertions and the other sequences are represented by orange ribbons. Black ribbons represent syntenic regions between the wGau insertions and the cytoplasmic genomes of wGmm and wMel
Fig. 6Constituent analysis of Glossina-associated gene orthology groups. Visualization of the relative constitution of orthology groups containing Glossina gene sequences. Combined bar heights represent the combined orthogroups associated with each Glossina species. The bars are color-coded to reflect the level of phylogenetic representation of clusters of orthogroups at the order, sub-order, genus, sub-genus, and species. Saturated bars represent orthology groups specific and universal to a phylogenetic level. Desaturated bars represent orthogroups specific to a phylogenetic level but lack universal representation across all included species. Gene ontology analysis of specific and universal groups can be found in Additional file 1: Table S7
Fig. 7Sub-genus-specific gene family expansions/retractions. Principal component analysis-based clustering of gene orthology groups showing significant differences in the number of representative sequences between the six Glossina species. Orthology groups included have sub-genus-specific expansions/contractions as determined by CAFE test (p value < 0.05). Groups highlighted in the manuscript are enclosed within boxes in the figure. An alternative version of the figure labeled with the orthology group IDs is provided in Additional file 2: Figure S11. This data is also available in table form in Additional file 1: Table S8, in Additional file 7, and Additional file 8
Fig. 8Heat map of counts of Glossina homologs to Drosophila immune genes. A plot of immune gene families showing variance greater than 1 in the number of genes per species. Numbers within the cells represent the counts of sequences per species within immune gene orthology groups. Orthology groups included in the analysis contain Drosophila genes with the “Immune System Process” GO tag (GO:0002376). The gene families are clustered by similarity in variance as determined by Pearson correlation. The bar graphs on the right side of the figure represent the average ratio of synonymous to non-synonymous changes across orthologous sequences within each immune gene family
Fig. 9Conservation of synteny, sequence homology, and stage/sex-specific expression of tsetse milk proteins between species. Overview of the conservation of tsetse milk protein genes and their expression patterns in males and non-lactating and lactating females. a Syntenic analysis of gene structure/conservation in the mgp2-10 genetic locus across Glossina species. b Phylogenetic analysis of orthologs from the mgp2-10 gene family. c Combined sex- and stage-specific RNA-seq analysis of relative gene expression of the 12 milk protein gene orthologs in males and non-lactating and lactating females of 5 Glossina species. d Visualization of fold change in individual milk protein gene orthologs across 5 species between lactating and non-lactating female flies. Gene sequence substitution rates are listed for each set of orthologous sequences. e Comparative enrichment analysis of differentially expressed genes between non-lactating and lactating female flies
Fig. 10Comparative analysis of Glossina male accessory gland (MAG) protein family memberships. Graphical representation of the evolutionary rate and gene number variability in male accessory protein genes across Glossina species. a Average ratio of synonymous to non-synonymous changes in male reproductive-associated genes relative to the entire genome. Error bars represent the standard error of the mean, and the asterisks represent a p value < 0.001. b The number of putative gene sequences across the Glossina genus orthologous and paralogous to characterized MAG genes from G. m. morsitans. The genes are categorized by their functional classes as derived by orthology to characterized proteins from Drosophila and other insects. The functional classes include Novel—tsetse-specific genes; OBPs—odorant-binding proteins; peptidase—proteins with peptidase/-like functions; Unk.—proteins with orthologs in other insects that lack functional characterization
Fig. 115′Nuc/apyrase salivary gene family organization and sequence features across Glossina species. a Chromosomal organization of the 5′Nuc/apyrase family orthologs on genome scaffolds from the six Glossina species. The brown gene annotations represent 5′Nuc gene orthologs; purple gene annotations represent sgp3 gene orthologs and the blue gene annotations an apyrase-like encoding gene. The broken rectangular bars on the G. brevipalpis scaffold indicate that the sequence could not be determined due to poor sequence/assembly quality. b Schematic representation of sgp3 gene structure in tsetse species. The K(.) denotes a repetition of a lysine (K) and another amino acid (glutamic acid, glycine, alanine, serine, asparagine or arginine). The green oval represents a repetitive motif found in Morsitans sub-genus; the red oval represents a repetitive motif found in Palpalis group. The dashed line indicates a partial motif present. For each of the two motifs the consensus sequence is shown in the right by a Logo sequence. The poor sequence/assembly quality of the G. brevipalpis scaffold prevented inclusion of this orthology in the analysis
Fig. 12Phylogenetic and sequence divergence analysis of Glossina vision-associated proteins. Phylogenetic and sequence conservation analysis of the vision-associated Rhodopsin G-protein coupled receptor genes in Glossina and orthologous sequences in other insects. a Phylogenetic analysis of Rhodopsin protein sequences. b Pairwise analysis of sequence divergence between M. domestica and Glossina species and within the Glossina genus