| Literature DB >> 33551639 |
Jun-Ming Mao1, Yong Wang2, Liu Yang2, Qin Yao1, Ke-Ping Chen1.
Abstract
Introns are highly variable in number and size. Sequence simulation is an effective method to elucidate intron evolution patterns. Previously, we have reported that introns are more likely to evolve through mutation-and-deletion (MD) rather than through mutation-and-insertion (MI). In the present study, we further studied evolution models by allowing insertion in the MD model and by allowing deletion in the MI model at various frequencies. It was found that all deletion-biased models with proper parameter settings could generate sequences with attributes matchable to 16 invertebrate introns from the microphthalmia transcription factor gene, whereas all insertion-biased models with any parameter settings failed to generate such sequences. We conclude that the examined invertebrate introns may have evolved from a longer ancestral sequence in a deletion-biased pattern. The constructed models are useful for studying the evolution of introns from other genes and/or from other taxonomic groups. (C++ scripts of all deletion- and insertion-biased models are available upon request.).Entities:
Keywords: Mutation; deletion; evolution model; insertion; sequence simulation
Year: 2021 PMID: 33551639 PMCID: PMC7841239 DOI: 10.1177/1176934320988558
Source DB: PubMed Journal: Evol Bioinform Online ISSN: 1176-9343 Impact factor: 1.625
Sixteen species selected to represent different phylum/class of invertebrates.
| Phylum | Class | Species | Intron (bp) |
|---|---|---|---|
|
|
| 1201 | |
| 567 | |||
|
|
| 828 | |
|
|
| 1342 | |
|
|
| 971 | |
| 380 | |||
| 1188 | |||
| Not available |
| 1878 | |
| 1061 | |||
|
| 490 | ||
| 672 | |||
| 3104 | |||
| 1168 | |||
|
| 1335 | ||
|
|
| 2008 | |
| 259 |
Figure 1.Partial structure of microphthalmia transcription factor (MITF) gene in invertebrate.
Shown here is a phase 1 intron flanked by conserved exon nucleotides encoding the basic region of bHLH motif from invertebrate MITF gene. The intron is located after A of the codon ATT which codes for isoleucine (I). Number between lines indicates intron length (base pairs). Invertebrate species is given in common name or abbreviated scientific name with taxon name in brackets. Intron locations were obtained by viewing gene structures linked to sequence numbers beginning with “XP” or “NP” at GenBank (www.ncbi.nlm.nih.gov). Intron locations of other sequence numbers were determined by manually comparing genomic sequences with those of known gene structures. Please refer to Table 1 for full scientific names of invertebrate species.
Figure 2.Phylogenetic tree of 16 invertebrate introns.
The original maximum likelihood (ML) tree constructed using 16 invertebrate introns is shown. Invertebrate species is shown in common name or abbreviated scientific name with taxon name in brackets. Branch sizes are indicated by values above or below each branch. AS1 to AS15 indicate the locations of ancestral sequence No. 1 to 15. Numbers at nodes are bootstrap values obtained using 1000 replicates. Please refer to Table 1 for full scientific name and intron length of each invertebrate species.
Factor and level design for testing evolution models using L16 (4*5) orthogonal table.
| Evolution model | Level | Factors | ||||
|---|---|---|---|---|---|---|
|
|
|
|
|
| ||
| Mutation-and-deletion | 1 | 5000 | 3000 | 200 | 31-50 | 11-20 |
| 2 | 6000 | 3250 | 400 | 71-90 | 21-30 | |
| 3 | 7000 | 3500 | 600 | 111-130 | 31-40 | |
| 4 | 8000 | 3750 | 800 | 151-170 | 41-50 | |
| Mutation-and-insertion | 1 | 20 | 150 | 200 | 31-50 | 11-20 |
| 2 | 40 | 200 | 400 | 71-90 | 21-30 | |
| 3 | 60 | 250 | 600 | 111-130 | 31-40 | |
| 4 | 80 | 300 | 800 | 151-170 | 41-50 | |
Abbreviations: LAS1, length of ancestral sequence 1; LAS15, length of ancestral sequence 15; LI/D, length of bases inserted or deleted each time; MI/D, number of bases mutated each time; M1, mutated bases per 1 branch length.
Attributes of sequences generated from MD and MI models using optimized parameters.
| Model | Model parameters | Attributes of generated sequences | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
| |
| SIIs | / | / | / | / | / | 3378 ± 73 | 1.92 ± 0.49 | 1.42 ± 0.11 | 0.138 ± 0.013 | 11.9 ± 2.4 |
| MD100 | 5000 | 3250 | 1200 | 11-30 | 41-50 | 3394 ± 67 | 1.81 ± 0.29 | 1.36 ± 0.17 | 0.129 ± 0.013 | 11.6 ± 1.7 |
| MD90/10 | 8000 | 3250 | 1200 | 11-30 | 21-30 | 3395 ± 85 | 2.01 ± 0.54 | 1.50 ± 0.11 | 0.137 ± 0.017 | 11.4 ± 2.0 |
| MD80/20 | 8000 | 3750 | 1200 | 31-50 | 21-30 | 3427 ± 69 | 1.77 ± 0.24 | 1.47 ± 0.14 | 0.144 ± 0.020 | 12.0 ± 2.5 |
| MD70/30 | 8000 | 3250 | 600 | 31-50 | 41-50 | 3407 ± 155 | 1.79 ± 0.39 | 1.52 ± 0.20 | 0.141 ± 0.017 | 10.7 ± 2.0 |
| MD60/40 | 8000 | 3750 | 200 | 111-130 | 21-30 | 3439 ± 86 | 1.69 ± 0.28 | 1.33 ± 0.19 | 0.140 ± 0.012 | 11.1 ± 1.3 |
| MD55/45 | 5000 | 3000 | 800 | 151-170 | 11-20 | 3420 ± 98 | 1.91 ± 0.47 | 1.50 ± 0.29 | 0.138 ± 0.015 | 11.2 ± 2.5 |
| MI100 | 80 | 250 | 400 | 111-130 | 11-20 | 3442 ± 98 | 2.06 ± 0.52 | 1.69 ± 0.16 | 0.128 ± 0.006 | 12.5 ± 2.5 |
| MI90/10 | 60 | 200 | 200 | 71-90 | 41-50 | 3457 ± 119 | 1.74 ± 0.35 | 1.74 ± 0.23 | 0.133 ± 0.013 | 9.7 ± 1.6 |
| MI80/20 | 80 | 300 | 200 | 71-90 | 21-30 | 3431 ± 104 | 2.01 ± 0.39 | 1.70 ± 0.18 | 0.132 ± 0.015 | 10.8 ± 2.3 |
| MI70/30 | 20 | 150 | 200 | 71-90 | 51-60 | 3421 ± 112 | 2.21 ± 0.58 | 1.77 ± 0.21 | 0.135 ± 0.018 | 11.5 ± 2.0 |
| MI60/40 | 60 | 300 | 1200 | 31-50 | 21-30 | 3405 ± 126 | 1.82 ± 0.23 | 1.77 ± 0.25 | 0.121 ± 0.015 | 11.7 ± 2.3 |
| MI55/45 | 40 | 300 | 600 | 31-50 | 41-50 | 3415 ± 125 | 1.92 ± 0.37 | 1.78 ± 0.16 | 0.130 ± 0.015 | 11.6 ± 1.8 |
Abbreviations: D̅, overall mean distance; LAS1, length of ancestral sequence 1; LAS15, length of ancestral sequence 15; LI/D, length of bases inserted or deleted each time; LMSA, length of multiple sequence alignment; MI/D, number of bases mutated each time; M1, mutated bases per 1 branch length; RT92+G+I, ratio of transition to transversion under Tamura 3 parameter model with gamma distribution and invariant sites; SED̅, standard error of the overall mean distance; SIIs, sixteen invertebrate introns; TSML, topology score of the constructed ML tree.
This table lists the test result of No. 24 for each model. Please refer to Supplemental Tables S7 to S12 and S19 to S24 for test results of No. 17 to 23 of all evolution models. Attributes of SIIs are obtained from allowing each of the sequence to mutate by only one base.
Data are presented as mean ± standard deviation (n = 10).
*, ** and *** indicate significant difference from independent t-test compared to SIIs at P < .1, P < .05 and P < .01 level, respectively.