| Literature DB >> 28683762 |
Qian Zhao1,2,3,4, Dongna Ma1,2,3,4, Liette Vasseur1,2,5, Minsheng You6,7,8,9.
Abstract
BACKGROUND: Structural variation among genomes is now viewed to be as important as single nucleoid polymorphisms in influencing the phenotype and evolution of a species. Segmental duplication (SD) is defined as segments of DNA with homologous sequence.Entities:
Keywords: Evolution; Lepidoptera; Segmental duplications
Mesh:
Year: 2017 PMID: 28683762 PMCID: PMC5499213 DOI: 10.1186/s12862-017-1007-y
Source DB: PubMed Journal: BMC Evol Biol ISSN: 1471-2148 Impact factor: 3.260
Characterization of the SDs of the five Lepidoptera species
| Species | P. xylostella | M. sexta | H. melpomene |
|
|
|---|---|---|---|---|---|
| Total number of SDs | 21,369 | 11,141 | 23,942 | 10,799 | 3667 |
| Number of SDs with 90% identity | 18,064 | 8892 | 21,572 | 10,070 | 3221 |
| Number of SDs with 80-90% identity | 3204 | 2171 | 2239 | 668 | 416 |
| Number of SDs with 75-80% identity | 99 | 78 | 127 | 60 | 5 |
| Total (Mb) | 43 | 19.1 | 40.5 | 23.5 | 5.6 |
| % of genome | 11 | 5.2 | 15.2 | 9.9 | 1.2 |
| Number of genes | 2235 | 1040 | 1453 | 1564 | 332 |
| % of genes | 12.4 | 6.8 | 11 | 10.3 | 2.3 |
Fig. 1SDs length distributions among 5 Lepidoptera species. a Boxplots showing the mean and range of the numbers of SD for each SD length category for all five Lepidoptera species combined. Most SDs were found in the 1–1.5 kb length category. b Number of SDs of different length categories for each species. The size of the circle represents the proportion of SDs
Fig. 2Duplication map comparison of five Lepidoptera species. SDs regions in each Lepidoptera species and their paralogous regions in other four genomes were shown. Different colors represent different insects. Only the best alignments were listed
Fig. 3“Shared” and “Unique” SDs among 5 species. a “Shared” and “Unique” SDs in each genome. Red represents “Unique” SDs while black represents “Shared” SDs. b SDs classification of the five species based on the existing SDs (marked as “+”) and absence SDs (marked as “-”). We classified the SDs into five groups to analyze evolutionary history of SDs in Lepidoptera genome. Group A are the potential ancestral SDs events. Group B showed the SDs that were only lost in Noctuoidea and Bombycoidea while Groups C, D and E showed regions where duplication patterns were inconsistent with the generally accepted phylogeny. Such a scenario could arise as a result of a de novo origin of SDs or as a result of deleted events, which might have played a role in lineage-specific evolution. “n” represents the number of group A, B, C, D or E. c Identity comparison between “Shared” and “Unique” SDs in the genomes of the five Lepidopteron species. The identity of “Shared” SDs is marked as black while the identity of “Unique” SDs is marked as red
GO enrichment for some proteins within the SDs regions among the five Lepidoptera species
| GO term |
| Number of proteins |
|---|---|---|
|
| ||
| Nucleic acid binding [GO:0003676] | 5.78E-06 | 149 |
| Oxidoreductase activity [GO:0016491] | 1.64E-05 | 16 |
| Oxidation-reduction process [GO:0055114] | 0.0005938 | 38 |
| Serine-type endopeptidase activity [GO:0004252] | 0.001488 | 54 |
| Protein tyrosine phosphatase activity [GO:0004725] | 0.003778 | 12 |
| Protein dephosphorylation [GO:0006470] | 0.005278 | 14 |
| Structural constituent of cuticle [GO:0042302] | 0.006409 | 3 |
| Zinc ion binding [GO:0008270] | 0.008232 | 151 |
|
| ||
| Prothoracicotrophic hormone activity [GO:0018445] | 2.99E-06 | 10 |
| Growth factor activity [GO:0008083] | 0.0026 | 5 |
| Phosphorylase kinase complex [GO:0005964] | 0.003736 | 3 |
| SWI/SNF complex [GO:0016514] | 0.003736 | 3 |
| Phosphorylase kinase activity [GO:0004689] | 0.003736 | 3 |
| Phosphoprotein phosphatase activity [GO:0004721] | 0.00556 | 6 |
| Neuropeptide signaling pathway [GO:0007218] | 0.007097 | 13 |
| Defense response [GO:0006952] | 0.00955 | 3 |
|
| ||
| ATP-dependent peptidase activity [GO:0004176] | 0.002622 | 2 |
| Misfolded or incompletely synthesized protein catabolic process [GO:0006515] | 0.002622 | 2 |
| DNA integration [GO:0015074] | 0.008793 | 2 |
| Inositol-1,4,5-trisphosphate 3-kinase activity [GO:0008440] | 0.008793 | 2 |
|
| ||
| Dephosphorylation [GO:0016311] | 0.002541 | 12 |
| RNA-directed DNA polymerase activity [GO:0003964] | 0.002617 | 16 |
| Glucuronosyltransferase activity [GO:0015020] | 0.003223 | 10 |
| Endonuclease activity [GO:0004519] | 0.004409 | 14 |
| Carbohydrate transport [GO:0008643] | 0.005052 | 12 |
| Pyrophosphatase activity [GO:0016462] | 0.008074 | 4 |
| Riboflavin metabolic process [GO:0006771] | 0.009158 | 8 |
| Rho guanyl-nucleotide exchange factor activity [GO:0005089] | 0.00975 | 10 |
|
| ||
| Heme binding [GO:0020037] | 1.30E-06 | 13 |
| Monooxygenase activity [GO:0004497] | 2.09E-06 | 13 |
| Hormone activity [GO:0005179] | 0.0001028 | 6 |
| Electron transport [GO:0006118] | 0.0003995 | 18 |
| Calcium ion binding [GO:0005509] | 0.00205 | 10 |
| Response to oxidative stress [GO:0006979] | 0.003103 | 3 |
| Odorant binding [GO:0005549] | 0.003145 | 6 |
| Oxidoreductase activity [GO:0016491] | 0.00358 | 16 |
TEs properties of the Lepidoptera genomes, duplications and 2.5 Kb flanking regions
| Repeat | Duplication | % | 2.5 Kb FR | % | Genome | % | Enrichment in SDs | Enrichment in FR |
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
| SINE | 3040 | 0.007 | 22,679 | 0.021 | 258,493 | 0.066 | 0.107 | 0.327 |
|
|
|
|
|
|
|
|
|
|
| LINE | 249,583 | 0.576 | 129,227 | 0.122 | 628,430 | 0.159 | 3.613 | 0.767 |
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
| SINE | 361 | 0.002 | 2714 | 0.005 | 48,452 | 0.011 | 0.164 | 0.463 |
| LTR | 131 | 0.001 | 1537 | 0.003 | 12,658 | 0.003 | 0.227 | 1.004 |
| LINE | 18,312 | 0.096 | 19,895 | 0.039 | 156,948 | 0.037 | 2.561 | 1.048 |
|
| ||||||||
|
|
|
|
|
|
|
|
|
|
| SINE | 2783 | 0.012 | 4447 | 0.009 | 21,783 | 0.009 | 1.352 | 0.972 |
| LTR | 555 | 0.002 | 1975 | 0.004 | 7333 | 0.003 | 0.800 | 1.282 |
| LINE | 6026 | 0.026 | 4729 | 0.009 | 57,808 | 0.023 | 1.103 | 0.389 |
|
| ||||||||
| DNA | 483 | 0.001 | 20,348 | 0.017 | 51,129 | 0.019 | 0.064 | 0.938 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DNA DNA transposons, SINE short interspersed nuclear elements, LTR long terminal repeat, LINE long interspersed nuclear elements
The TEs contents of three regions of the genomes were compared: SDs regions; 2.5 Kb flanking regions (FR) of the SDs and the genome average. Enrichment was defined as the repeat content of duplicated sequences divided by the repeat content of unique sequences. The significance was performed by simulating the repeats in a random sample (n = 1,00) of DBM SDs (P-value < 0.05 were in bold)
Fig. 4Expression levels are associated with segmental duplications in the 5 species. a Expression level analysis in P. xylostella. Different developmental stages (from egg to adult) are listed. The expression level for the genes located in SDs is marked as red while the expression level for the genes outside SDs is marked as black. b Expression level analysis in M. sexta. Different tissues and developmental stages used in the analysis are listed. The expression level for the genes located in SDs is marked as green while the expression level for the genes outside SDs is marked as black. c Expression level analysis in silk gland of B. mori. Different strains were analyzed including domesticated strain Chunhua (D_CH), domesticated strain Chunyu (D_CY), wild silkworm Ankang from Baihe county of Shanxi Province (W_AKBH) and wild silkworm Ankang from Shiquan county of Shanxi Province (W_AKSQ). d Methylation level comparison between genes within and without SDs regions based on silkworm data