| Literature DB >> 28756777 |
S L Pearce1, D F Clarke1,2, P D East1, S Elfekih1, K H J Gordon3, L S Jermiin1, A McGaughran1,4, J G Oakeshott5, A Papanicolaou1,6, O P Perera7, R V Rane1,2, S Richards8, W T Tay1, T K Walsh1, A Anderson1, C J Anderson1,9, S Asgari10, P G Board11, A Bretschneider12, P M Campbell1, T Chertemps13,14, J T Christeller15, C W Coppin1, S J Downes16, G Duan4, C A Farnsworth1, R T Good2, L B Han17, Y C Han1,18, K Hatje19, I Horne1, Y P Huang20, D S T Hughes21, E Jacquin-Joly14, W James1, S Jhangiani21, M Kollmar19, S S Kuwar12, S Li1, N-Y Liu1,22, M T Maibeche13,14, J R Miller23, N Montagne13, T Perry2, J Qu21, S V Song2, G G Sutton23, H Vogel12, B P Walenz23, W Xu1,24, H-J Zhang1,25, Z Zou17, P Batterham2, O R Edwards26, R Feyereisen27, R A Gibbs21, D G Heckel12, A McGrath1, C Robin2, S E Scherer21, K C Worley21, Y D Wu18.
Abstract
BACKGROUND: Helicoverpa armigera and Helicoverpa zea are major caterpillar pests of Old and New World agriculture, respectively. Both, particularly H. armigera, are extremely polyphagous, and H. armigera has developed resistance to many insecticides. Here we use comparative genomics, transcriptomics and resequencing to elucidate the genetic basis for their properties as pests.Entities:
Mesh:
Year: 2017 PMID: 28756777 PMCID: PMC5535293 DOI: 10.1186/s12915-017-0402-6
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Genome assembly and annotation statistics
| Species |
|
|
|
|
|---|---|---|---|---|
| Genome assembly | csiro4bp | csirohz5p5 | ||
| Assembly size (Mb) | 337.07 | 341.15 | 431.7 | 419.42 |
| Number of scaffolds | 997 | 2975 | 43,622 | 20,870 |
| Max. scaffold length (Mb) | 6.15 | 1.85 | 16.12 | 3.25 |
| N50 scaffold size (kb) | 1000.4 | 201.5 | 3717.00 | 664.01 |
| N90 scaffold size (kb) | 175.3 | 52.3 | 43.1 | 46.4 |
| Mean scaffold length (kb) | 338.1 | 114.7 | 9.9 | 20.1 |
| Median scaffold length (kb) | 117.3 | 68.0 | 0.655 | 0.997 |
| Number of contigs | 24,228 | 34,676 | 88,842 | 38,380 |
| N50 contig length (kb) | 18.3 | 12.6 | 15.5 | 40.4 |
| Mean contig length (kb) | 12.4 | 8.6 | 4.86 | 10.4 |
| Median contig length (kb) | 7.4 | 5.4 | NA | NA |
| Gene annotation | (NCBI)c | |||
| Protein-coding | 17,086 | 15,200d | 15,007 | 27,404 |
| InterPro domain | 12,212 | 11,061 | 14,113 | NA |
| GO | 11,324 | 10,221 | 9462 | NA |
| Pfam | 10,700 | 9,795 | 11,753 | NA |
| KEGG | 4217 | 4004 | 6242 | 8611 |
| Genomic features | ||||
| Repeat (%) | 14.6 | 16.0 | 43.6 | 24.9 |
| GC (%) | 36.1 | 36.2 | 38.8 | 35.3 |
| Coding (%) | 6.7 | 5.9 | 4.1 | 10.4 |
| Intron (%) | 39.3 | 17.7 | 16.3 | NA |
| Gene length (b) | 9098 | 5306 | 6029 | NA |
| Avg. protein length (aa) | 442.8 | 444.7 | 458.5 | 531.1 |
| microRNAs | 251 | 232 | 487 | 98 |
| Quality control: BUSCO % present (complete) | ||||
| Genome | 94.3 (83) | 93.2 (80) | 91.6 (73) | 93.7 (81) |
| Proteins (OGS) | 94.6 (86) | 90.7 (82) | 93.6 (87) | 92.9 (84) |
N50 and N90 are computed on each assembly size as given in the table. The statistics for published B. mori and M. sexta genome assemblies are included for comparison, with references as follows:
a B. mori v2 [39], b M. sexta [40], cNational Center for Biotechnology Information (NCBI) Gnomon models, dIndicates plus 1192 partial gene models
GO Gene Ontology, KEGG Kyoto Encyclopedia of Genes and Genomes, BUSCO Benchmarking Universal Single-Copy Orthologues, OGS official gene set
Fig. 1GO term analyses of gene gain/loss events in H. armigera vs B. mori. The left panel shows GO terms enriched in the H. armigera gene set vs B. mori, and the right panel shows those enriched in the B. mori gene set vs H. armigera
Detoxification, digestive and chemosensory receptor gene families
| Gene family | Clan/clade/group |
|
|
|
|
|
|---|---|---|---|---|---|---|
| P450s | M | 10 | 10 | 0.061 | 11 | 16 |
| 2 | 8 | 8 | 0.029 | 7 | 8 | |
| 3 | 46 | 42 | 0.076 | 31 | 45 | |
| 4 | 50 | 48 | 0.083 | 30 | 34 | |
| Total | 114 | 108 | 79 | 103 | ||
| CCEs | Dietary/detoxa | 71 (8) | 67 (9) | 0.117 | 52 (8) | 67 (9) |
| Hormone/semiochemical processing | 13 (5) | 13 (5) | 0.071 | 13 (5) | 16 (6) | |
| Neuro-developmental | 13 (10) | 13 (10) | 0.022 | 13 (10) | 13 (10) | |
| Total | 97 (23) | 93 (24) | 78 (23) | 96 (25) | ||
| GSTs | Delta/epsilon | 25 | 24 | 0.124 | 14 | 16 |
| Sigma | 11 | 10 | 0.106 | 2 | 8 | |
| Theta | 1 | 1 | 0.063 | 1 | 1 | |
| Zeta | 2 | 2 | 0 | 2 | 2 | |
| Omega | 3 | 3 | 0.047 | 4 | 4 | |
| Total | 42 | 40 | 23 | 31 | ||
| UGTs | UGT33 | 22 | 19 | 0.102 | 13 | 16 |
| UGT40 | 8 | 7 | 0.116 | 12 | 9 | |
| Other | 16 | 16 | 0.114 | 19 | 19 | |
| Total | 46 | 42 | 44 | 44 | ||
| ABCs | A | 7 | 7 | 0.036 | 7 | 7 |
| B | 11 | 11 | 0.033 | 9 | 9 | |
| C | 11 | 11 | 0.019 | 11 | 11 | |
| G | 17 | 17 | 0.009 | 16 | 16 | |
| Other | 8 | 8 | 0.007 | 8 | 11 | |
| Total | 54 | 54 | 51 | 54 | ||
| Serine proteases: major digestive clades | Trypsinsa | 51 (15) | 46 (15) | 0.159 | 17 (6) | d |
| Chymotrypsinsa | 49 (4) | 44 (4) | 0.067 | 28 (3) | d | |
| Lipases | Acida | 28 (1) | 28 (1) | 0.117 | 32 (1) | d |
| Neutrala | 61 (10) | 60 (9) | 0.061 | 25 (2) | d | |
| Chemosensory receptor proteins | GRs | 213 | 166 | 0.292 | 69 | 45 |
| ORs | 84 | 82 | 0.090 | 72 | 73 | |
| OBPs | 40 | 40 | 0.074 | 40 | 45 | |
| CSPs | 29 | 29 | 0.056 | 22 | 21 |
See Additional file 6: Table S5 and Additional file 4: Sections 1–8 for details of genes, functions and names in each family
aCatalytically inactive sequences (although not necessarily without function) in parentheses
bAveraged K a/K s for orthologous members of the subfamily
cFigures based on the official gene sets, with further analysis as described in Additional file 4: Section 13
dThese figures are not available in the official gene sets at the level of detail required
GR gustatory receptor, OR olfactory receptor, OBP odorant-binding protein, CSP chemosensory protein
Fig. 2Phylogenetic, physical and transcriptional relationships within the major detoxification gene clusters. Selected clades of P450s, GSTs and CCEs, containing genes associated with detoxification functions, are shown. Clades discussed more extensively in the text are highlighted in red. Further details about the gene names and their associated OGS numbers are given in Additional file 4: Sections 1–3. Bars below the gene names indicate genes within a distinctive genomic cluster on a specific scaffold with the number shown; see Additional file 4: Sections 1–3 for further details. The clade 1 CCEs are specifically indicated. The phylogenetic order shown does not reflect the physical order of genes within a cluster. Expression is given as fragments per kilobase of transcript per million mapped reads (FPKM) for the tissue/developmental stage transcriptomes and log2(fold change) (logFC) for the host-response transcriptomes
Fig. 3Phylogenetic, physical and transcriptional relationships within the major digestion gene clusters. Selected clades of serine proteases and lipases containing genes associated with digestive functions are shown. For the serine proteases, chymotrypsins (on the left) and trypsins (right) are shown as a single tree; the neutral and acid lipases are shown separately. Clades discussed more extensively in the text are highlighted in red. Further details about the gene names and their associated OGS numbers are given in Additional file 4: Sections 6, 7. Bars below the gene names indicate genes within a distinctive genomic cluster on a specific scaffold with the number shown; see Additional file 4: Sections 6, 7 for further details. The clade 1 chymotrypsins and trypsins are specifically indicated; for the latter, no single scaffold is shown because the cluster spans scaffolds 306, 5027, 842 and 194. The phylogenetic order shown does not reflect the physical order of genes within a cluster. Expression is given as FPKM for the tissue/developmental stage transcriptomes and logFC for the host-response transcriptomes
Detoxification gene clades showing enhanced sequence divergence in H. armigera and gene loss in H. zea
| Family | Clan/group | Gene number in | Gene pairs tested | Significant rate difference ( |
|
|---|---|---|---|---|---|
| P450 | Detox, clan 3 | 43 | 9 | 3 | 3 |
| Detox, clan 4 | 47 | 11 | 5 | 2 | |
| Other | 6 | 4 | 2 | 0 | |
| CCE | Detox | 55 | 19 | 7 | 4 |
| Other | 16 | 9 | 0 | 0 | |
| GST | Detox | 36 | 8 | 4 | 2 |
| Other | 3 | 1 | 0 | 0 |
Tajima’s relative rate tests were performed on the numbers listed of H. armigera paralogue gene pairs in the major detoxification groups; for each group examined, the number of pairs showing a significant rate difference is given. Also listed are the numbers of genes in the relevant clades missing in the H. zea assembly. The P450, CCE and GST families are partitioned in these analyses into lineages for which there is empirical evidence for detoxification functions and those for which there is little or no such evidence. More details of the specific genes involved and comparable data for the proteases, lipases and GRs are given in Additional file 4: Table S7
Fig. 4Effects of rearing diet on development time and weight gain. The mean weights and development times with their standard errors are plotted for larvae from each diet
Fig. 5Numbers of genes differentially expressed on each of the different diets. The seven diets are listed at the bottom of the figure, with the total numbers of DE genes on each diet shown by the horizontal histogram at the lower left. The main histogram shows the number of DE genes summed for each diet individually and for various diet combinations. The diets for which each number is calculated are denoted by black dots, representing either a single diet plant or a combination of multiple different diets. See also Additional file 3: Figure S3 for a principal component analysis showing the relationships among the transcriptional responses to the different diets
Fig. 6Expression profiles for selected co-expression modules from the tissue/developmental stage transcriptomic experiment that are enriched for diet-responsive genes. The five modules for which expression profiles are shown are those most enriched for genes called as DE in the host-response experiment (see text). Expression (FPKM) profiles for each module are shown on the left, with the tissue types (see text) identified by colour as in the legend. The composition of each module is described in the central panels, showing the total number (N) of genes per module, the number that are DE, the number in all diet co-expression modules (DM) and the number in the major gene family (GF) classes defined by the key below. Major functions enriched in each module are noted on the right of the figure
Fig. 7Expression profiles for selected co-expression modules from the host-response transcriptomic experiment. The eight modules for which expression profiles are shown are those most enriched for DE genes. Four of these modules (see text) are also significantly enriched in genes from the detoxification- and digestion-related families. Expression (log2FC) profiles for each module are shown on the left. The composition of each module is described in the central panels, showing the total number (N) of genes per module, the number that are DE, the number in the five tissue/developmental stage modules T1–T5 (TM) and the number in the major gene family (GF) classes defined by the key below. Major functions enriched in each module are noted on the right of the figure. See Additional file 4: Section 11 for more detailed analyses of the host-response network including aspects illustrated by the co-expression modules D20 and D3
Fig. 8Population structure. Results of MDS analyses, using (a) H. armigera and (b) H. zea as the reference strain. The proportion of variance explained by each dimension is given as a percentage on the axis label. To include the reference strains on these plots, genotypes for each reference strain were recoded as 0/0