| Literature DB >> 29299110 |
Nicolaas Francois Visser Burger1, Anna-Maria Botha1.
Abstract
Although the hemipterans (Aphididae) are comprised of roughly 50,000 extant insect species, only four have sequenced genomes that are publically available, namely Acyrthosiphon pisum (pea aphid), Rhodnius prolixus (Kissing bug), Myzus persicae (Green peach aphid) and Diuraphis noxia (Russian wheat aphid). As a significant proportion of agricultural pests are phloem feeding aphids, it is crucial for sustained global food security that a greater understanding of the genomic and molecular functioning of this family be elucidated. Recently, the genome of US D. noxia biotype US2 was sequenced but its assembly only incorporated ~ 32% of produced reads and contained a surprisingly low gene count when compared to that of the model/first sequenced aphid, A. pisum. To this end, we present here the genomes of two South African Diuraphis noxia (Kurdjumov, Hemiptera: Aphididae) biotypes (SA1 and SAM), obtained after sequencing the genomes of the only two D. noxia biotypes with documented linked genealogy. To better understand overall targets and patterns of heterozygosity, we also sequenced a pooled sample of 9 geographically separated D. noxia populations (MixIX). We assembled a 399 Mb reference genome (PRJNA297165, representing 64% of the projected genome size 623 Mb) using ± 28 Gb of 101 bp paired-end HiSeq2000 reads from the D. noxia biotype SAM, whilst ± 13 Gb 101 bp paired-end HiSeq2000 reads from the D. noxia biotype SA1 were generated to facilitate genomic comparisons between the two biotypes. Sequencing the MixIX sample yielded ±26 Gb 50 bp paired-end SOLiD reads which facilitated SNP detection when compared to the D. noxia biotype SAM assembly. Ab initio gene calling produced a total of 31,885 protein coding genes from the assembled contigs spanning ~ 399 Mb (GCA_001465515.1).Entities:
Keywords: Arthropod genomics; Biotype comparison; Diuraphis noxia; Genome assembly; SNP calling; South Africa
Year: 2017 PMID: 29299110 PMCID: PMC5745598 DOI: 10.1186/s40793-017-0307-6
Source DB: PubMed Journal: Stand Genomic Sci ISSN: 1944-3277
Classification and general features of biotype SAM [22]
| MIGS ID | Property | Term | Evidence codea |
|---|---|---|---|
| Classification | Domain: | TAS [ | |
| Phylum: | TAS [ | ||
| Class: | TAS [ | ||
| Order: | TAS [ | ||
| Family: | TAS [ | ||
| Genus: | TAS [ | ||
| Species: | TAS [ | ||
| (Type) strain: | TAS [ | ||
| Gram stain | N/A | ||
| Cell shape | N/A | ||
| Motility | N/A | ||
| Sporulation | N/A | ||
| Temperature range | N/A | ||
| Optimum temperature | N/A | ||
| pH range; Optimum | N/A | ||
| Carbon source | N/A | ||
| MIGS-6 | Habitat | N/A | |
| MIGS-6.3 | Salinity | N/A | |
| MIGS-22 | Oxygen requirement | N/A | |
| MIGS-15 | Biotic relationship | N/A | |
| MIGS-14 | Pathogenicity | N/A | |
| MIGS-4 | Geographic location | South Africa | TAS [] |
| MIGS-5 | Sample collection | June 2012 | NAS [] |
| MIGS-4.1 | Latitude | N/A | |
| MIGS-4.2 | Longitude | N/A | |
| MIGS-4.4 | Altitude | N/A |
aEvidence codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from the Gene Ontology project [31]
Fig. 1Photomicrograph of South African biotypes SA1 and SAM
Fig. 2PAUP generated phylogenetic tree based on whole mitochondrial genomes. A maximum parsimony tree generated through PAUP [56] utilizing whole mitochondrial genomes, that was aligned with MAFFT [57], illustrating ’s close association with
Project information
| MIGS ID | Property | Term |
|---|---|---|
| MIGS 31 | Finishing quality | Level 2: High-Quality Draft |
| MIGS-28 | Libraries used | Illumina paired-end library |
| MIGS 29 | Sequencing platforms | Illumina HiSeq |
| MIGS 31.2 | Fold coverage | ×45 (SAM); ×22 (SA1); ×27 (MixIX) |
| MIGS 30 | Assemblers | Soapdenovo |
| MIGS 32 | Gene calling method | Augustus |
| Locus Tag | N/A | |
| Genbank ID | GCA_001465515.1 | |
| GenBank Date of Release | 14/12/2015 | |
| GOLD ID | Gp0149495 | |
| BIOPROJECT | PRJNA297165 | |
| MIGS 13 | Source Material Identifier | N/A |
| Project relevance | Academic and Agricultural |
Fig. 3Quantitative assessment of genome assembly through BUSCO. a BUSCO analysis utilizing D. noxia biotype SAM contigs; b BUSCO analysis utilizing D. noxia biotype RWA2 scaffolds (GCA_001186385.1) [15]; and c BUSCO analysis utilizing scaffolds (GCA_000142985.2)
Genome statistics
| Attribute | Value | % of Total |
|---|---|---|
| Genome size (bp) | 399,704,836 | 64.06 |
| DNA coding (bp) | 66,633,929 | 16.67 |
| DNA G + C (bp) | 123,520,793 | 29.5 |
| DNA scaffolds | 190,686 | 64.06 |
| Total genes | 31,885 | 100 |
| Protein coding genes | 31,885 | 100 |
| RNA genes | – | – |
| Pseudo genes | – | – |
| Genes in internal clusters | – | – |
| Genes with function prediction | 12,791 | 40.12 |
| Genes assigned to COGs | 13,523 | 42.41 |
| Genes with Pfam domains | 13,877 | 43.52 |
| Genes with signal peptides | 1399 | 4.39 |
| Genes with transmembrane helices | 2957 | 9.27 |
| CRISPR repeats | 3 | – |
Number of genes associated with general KOG functional categories
| Code | Value | %age | Description |
|---|---|---|---|
| J | 1272 | 3.99 | Translation, ribosomal structure and biogenesis |
| A | 1258 | 3.95 | RNA processing and modification |
| K | 2193 | 6.88 | Transcription |
| L | 1467 | 4.60 | Replication, recombination and repair |
| B | 729 | 2.29 | Chromatin structure and dynamics |
| D | 1503 | 4.71 | Cell cycle control, cell division, chromosome partitioning |
| V | 270 | 0.85 | Defense mechanisms |
| T | 3531 | 11.07 | Signal transduction mechanisms |
| M | 294 | 0.92 | Cell wall/membrane biogenesis |
| N | 55 | 0.17 | Cell motility |
| U | 1772 | 5.56 | Intracellular trafficking and secretion |
| O | 2101 | 6.59 | Posttranslational modification, protein turnover, chaperones |
| C | 498 | 1.56 | Energy production and conversion |
| G | 957 | 3.00 | Carbohydrate transport and metabolism |
| E | 872 | 2.73 | Amino acid transport and metabolism |
| F | 350 | 1.10 | Nucleotide transport and metabolism |
| H | 177 | 0.56 | Coenzyme transport and metabolism |
| I | 1232 | 3.86 | Lipid transport and metabolism |
| P | 734 | 2.30 | Inorganic ion transport and metabolism |
| Q | 377 | 1.18 | Secondary metabolites biosynthesis, transport and catabolism |
| R | 3740 | 11.73 | General function prediction only |
| S | 1528 | 4.79 | Function unknown |
| – | 18,362 | 57.59 | Not in KOGs |
The total is based on the total number of protein coding genes in the genome
Fig. 4Amino acid content comparison of proteins and wheat phloem. Bar plots indicate relative abundance of amino acids in subsp. aestivum L (red) and as component of protein coding genes within (brown) along with two-point moving average lines
Protein amino acid constituency and codon abundancy for proteins
| Amino acid | Frequency | % of total | Most frequently occurring codon | Codon % of total |
|---|---|---|---|---|
| Ala | 371,416 | 5.5 | GCT | 35.3 |
| Cys | 136,944 | 2.0 | TGT | 67.4 |
| Asp | 385,673 | 5.7 | GAT | 65.1 |
| Glu | 420,519 | 6.2 | GAA | 77.8 |
| Phe | 258,795 | 3.8 | TTT | 67.9 |
| Gly | 342,001 | 5.1 | GGT | 36.4 |
| His | 166,317 | 2.5 | CAT | 63.0 |
| Ile | 430,599 | 6.4 | ATT | 45.5 |
| Lys | 479,017 | 7.1 | AAA | 75.3 |
| Leu | 601,144 | 8.9 | TTA | 33.8 |
| Met | 161,194 | 2.4 | ATG | 100.0 |
| Asn | 403,005 | 6.0 | AAT | 66.7 |
| Pro | 318,625 | 4.7 | CCA | 41.9 |
| Gln | 278,160 | 4.1 | CAA | 69.9 |
| Arg | 330,282 | 4.9 | AGA | 32.5 |
| Ser | 555,299 | 8.2 | TCA | 26.0 |
| Thr | 400,065 | 5.9 | ACA | 35.9 |
| Val | 416,649 | 6.2 | GTT | 33.7 |
| Trp | 73,151 | 1.1 | TGG | 100.0 |
| Tyr | 220,486 | 3.3 | TAT | 62.3 |
Fig. 5Relative abundance of KOG functional annotations within predicted genes
SNPs identified between sample MixIX and biotype SAM
| SNP effecta | Value | %age of total | Number of genes | %age of genes with KOG classification |
|---|---|---|---|---|
| Synonymous | 18,289 | 19.85 | 5677 | 62.37 |
| Substitution | 63,035 | 68.42 | 9674 | 83.54 |
| Truncation | 6844 | 7.43 | 2672 | 74.93 |
| Frame shift | 2375 | 2.58 | 1008 | 45.54 |
| Insertion | 579 | 0.63 | 163 | 35.58 |
| Deletion | 504 | 0.55 | 109 | 22.02 |
| Extension | 499 | 0.54 | 300 | 37.00 |
aWhere synonymous SNPs cause no amino acid change, substitution SNPs cause a single amino acid substitution, truncation SNPs introduces of a stop codon, frame shit SNPs disrupt the reading frame through deletions and/or insertions of 1 or 2 bases; insertion SNPs introduces an additional codon; deletion SNPs is where a codon is removed and extension SNPs disrupt existing stop codons
Fig. 6Relative abundance of SNPs within genes assigned to KOG functional categories