| Literature DB >> 22716200 |
Evaristus Chibunna Mbanefo1, Yu Chuanxin, Mihoko Kikuchi, Mohammed Nasir Shuaibu, Daniel Boamah, Masashi Kirinoki, Naoko Hayashi, Yuichi Chigusa, Yoshio Osada, Shinjiro Hamano, Kenji Hirayama.
Abstract
BACKGROUND: Evolution of novel protein-coding genes is the bedrock of adaptive evolution. Recently, we identified six protein-coding genes with similar signal sequence from Schistosoma japonicum egg stage mRNA using signal sequence trap (SST). To find the mechanism underlying the origination of these genes with similar core promoter regions and signal sequence, we adopted an integrated approach utilizing whole genome, transcriptome and proteome database BLAST queries, other bioinformatics tools, and molecular analyses.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22716200 PMCID: PMC3434034 DOI: 10.1186/1471-2164-13-260
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
*SST isolatedegg cDNAs with similar signal peptide
| S | |||
|---|---|---|---|
| AY570737 (1027 bp) | AAS68242 (271aa) | MRIINLVIISTALLLINLLQTKSQ | |
| AY570744 (983 bp) | AAS68249 (260aa) | MRIIILGIISTVLLLINLLQTKSQ | |
| AY570753 (1038 bp) | AAS68258 (174aa) | MRIINLVNISTVLLLINLLQTKSQ | |
| AY570748 (854 bp) | AAS68253 (203aa) | MFKMRIINLVNISTVLLLINLLQTKSR | |
| AY570756 (848 bp) | AAS68261 (124aa) | MFKMRIINLVNISTVLLLINLLQTKSQ | |
| AY570742 (1037 bp) | AAS68247 (274aa) | MFKVRIINLVNISTVLLLINLLQTKSQ |
*SST: Signal Sequence Trap.
Figure 1 Multiple alignments of protein sequences of the SST identified cDNAs showing similar signal peptide. (A) The protein products of the original SST isolated S. japonicum egg cDNAs were aligned using ClustalW. The aligned sequences are limited to the candidates identified using SST, excluding database sequences. The N-terminal similar signal peptide is automatically colored red, indicating high similarity at the consensus sequence. Ostensibly, several other residues are also conserved and would be explored during the functional characterization. (B) The phylogenetic tree of the novel protein family identified using SST is shown here. The evolutionary history was inferred using the Minimum Evolution method. The evolutionary distances were computed using the p-distance method and are in the units of the number of amino acid differences per site. The analysis involved 6 amino acid sequences originally isolated using SST. Phylogenetic and evolutionary analyses were conducted on MEGA5[76].
*UniGene entries formRNAs and ESTs bearing the similar signal sequence (n = 195)
| Sja.1526 | 1476162 | AY814448, BU780442est | Egg protein SjCP3611 |
| Sja.1611 | 1476247 | FN317637, BU772954est | Hypothetical protein |
| Sja.1628 | 1476264 | AY570742, FN320556, FN320555, FN320553, FN320552, FN320551, FN320550, FN320549 | Egg protein SjCP1531 |
| Sja.1676 | 1476312 | AY570748SST, AY223245, AY222916, AY813542, EF127834, EF140742, FN323799, FN323800, FN323801, FN323803, FN323793, FN323792, FN323791, FN323790, FN323788, FN323785, FN323782, FN323781, FN323779, FN323778, FN323777, FN323776, FN323773, FN323772, FN323771, FN323770, FN323769, FN323768, FN323767, FN323766, FN323765, FN323764, FN323763, FN323762, BU772060est, BU766145est, CX862012est | Egg protein SjCP3842 |
| Sja.2063 | 1476798 | FN321064, FN321061 | Egg protein SjCP1084 |
| Sja.2065 | 1476800 | AY570753SST, AY570744SST, AY814685, FN327232, FN327137, FN318042, FN321065, FN321060, FN321059, FN321058, FN321057, FN321056, FN321055, FN329815nc, BU768978est, BU780021est | Egg protein SjCP3611, Egg protein SjCP501, Hypothetical proteins |
| Sja.2070 | 1476805 | AY599749SST | Egg protein SjCP1731 |
| Sja.5326 | 2034920 | FN326953, FN330298nc | Hypothetical protein |
| Sja.9771 | 2493712 | AY570756SST, FN327121, FN327254, FN327253, FN327241, FN327233, FN327229, FN327224, FN327222, FN327216, FN327196, FN327185, FN327163, FN327158, FN327154, FN327129, FN327125, FN327115, FN327089, FN327083, FN327073, FN327057, FN327050, FN327049, FN327045, FN327042, FN327035, FN327022, FN327018, FN327014, FN327000, FN326998, FN326978, FN326973, FN326961, FN326960, FN326959, FN326930, FN326905, FN326883, FN326882, FN326881, FN326859, FN326857, FN326852, FN326851, FN326841, FN326831, FN326829, FN326822, FN326808, FN326801, FN326790, FN326770, FN326740, FN330540nc | Egg protein SjCP400, Somula protein |
| Sja.11083 | 2671933 | AY915467, FN327219, FN327063, FN326828, FN326826, FN323794, FN323797, FN323798, FN323802, FN323789, FN323787, FN323786, FN323784, FN323783, FN323780, FN323774, FN323761, FN323760, FN323759, FN323758, FN323757, FN320521, FN320520, FN320519, FN320518, FN320517, FN320516, FN320515, FN320513 | Egg protein SjCP3842, Hypothetical protein |
| Sja.11325 | 2672175 | AY813755, FN320057, FN320056, FN320514, FN329566nc, BU768160est, BU774105est, BU770186est, BU779051est | Egg protein SjCP3842, Hypothetical protein |
| Sja.11840 | 2895838 | FN327242, FN327131, FN327087, FN326854, BU776301est | Hypothetical protein |
| Sja.11891 | 2895889 | AY813975, FN329814nc, BU769048est | Egg protein SjCP1084 |
| Sja.13298 | 3987026 | FN320059 | Hypothetical protein |
| Sja.13324 | 3987052 | AY570737SST, FN328299nc | Egg protein SjCP1084 |
| Sja.13882 | 3987610 | FN330716nc | None |
| Sja.13956 | 3987684 | FN330422nc | None |
| Sja.14071 | 3987799 | FN329677nc | None |
| Sja.14095 | 3987823 | FN329269nc | None |
| Sja.14561 | 3988289 | FN327139, FN323795, FN323796, FN323775 | Egg protein SjCP3842 |
| Sja.14562 | 3988290 | FN327130, FN326955, FN326901 | Egg protein SjCP1084 |
| Sja.14565 | 3988293 | FN327099 | Egg protein SjCP1084 |
| Sja.14614 | 3988342 | FN320058 | Hypothetical protein |
| Sja.14627 | 3988355 | FN319007 | Hypothetical protein |
| Sja.14941 | 3988669 | FN320554 | Hypothetical protein |
| Sja.15036 | 5233761 | FN326786, FN318043, CX861530est | Hypothetical protein |
| Sja.15108 | 5233833 | AY810465, FN321062 | Hypothetical protein |
* UniGene is a database of sets of transcript sequences that appear to come from the same transcription locus.
The original set of cDNAs we earlier identified using signal sequence trap bear the superscript tag ().
Transcripts with tags () and () are expressed sequence tags (ESTs) and non-coding mRNAs respectively.
genome contigs containing the similar signal sequence (n = 34)
| - | 69.7 | 4848 – 4779 | |
| - | 0.6 | 276 – 205 | |
| - | 1.6 | 999 – 928 | |
| - | 3.0 | 2413 – 2342 | |
| - | 14.5 | 3856 – 3785 | |
| - | 6.8 | 3284 – 3217 | |
| + | 29.2 | 19023 – 19094 | |
| - | 10.1 | 6502 – 6431 | |
| - | 43.7 | 42511 – 42440 | |
| - | 12.4 | 9335 – 9264 | |
| + | 12.9 | 493 – 564 | |
| - | 12.4 | 10498 – 10427 | |
| - | 12.1 | 7860 – 7789 | |
| + | 4.3 | 433 – 504 | |
| - | 11.9 | 5768 – 5697 | |
| + | 19.0 | 12367 – 12438 | |
| - | 22.3 | 383 – 322 | |
| - | 9.6 | 4663 – 4602 | |
| + | 4.9 | 2484 – 2544 | |
| + | 2.8 | 337 – 408 | |
| - | 7.4 | 4411 – 4342 | |
| + | 4.9 | 388 – 459 | |
| - | 3.2 | 925 – 854 | |
| - | 6.8 | 271 – 200 | |
| + | 1.9 | 1646 – 1717 | |
| + | 2.6 | 446 – 517 | |
| - | 2.4 | 2389 – 2319 | |
| - | 6.7 | 5231 – 5160 | |
| + | 3.1 | 1295 – 1366 | |
| - | 1.1 | 656 – 585 | |
| + | 2.3 | 1094 – 1165 | |
| - | 2.1 | 1918 – 1847 | |
| - | 1.3 | 1131 – 1060 | |
| + | 5.6 | 4249 – 4320 |
(-) are contigs with ‘signal sequence’ on the negative strand (anti-sense) of the genome while
(+) are contigs with ‘signal sequence’ on the positive strand (sense) in the genome
*Contigs are representative of disperse duplicated gene loci. We indicated the ranges for the signal sequence motif.
Scaffolds containing the similar signal sequence (n = 18)
| CABF01002611, CABF01002612, CABF01002622, CABF01002623, CABF01002628, CABF01002630 | |
| CABF01020047, CABF01020050 | |
| CABF01020060 | |
| CABF01022876, CABF01022884 | |
| CABF01023364 | |
| CABF01025296 | |
| CABF01027854, CABF01027861, CABF01027866 | |
| CABF01067176 | |
| CABF01070230 | |
| CABF01072590, CABF01072591 | |
| CABF01073691 | |
| CABF01075030 | |
| CABF01076032 | |
| CABF01078976 | |
| CABF01080674 | |
| CABF01080893 | |
| CABF01080757 | |
| CABF01092393 |
Figure 2 Southern blotting confirms duplicated loci exclusively in (A) Southern hybridization with digoxigenin-labeled probes showing the presence of duplicated loci with several bands due to copies of the duplicated source locus. Lanes 2-9 corresponds to EcoRI + EcoRV double digested genomic DNA of different species and strains of Schistosoma (S. haematobium, S. mansoni, S. mekongi, S. japonicum (Japanese, Chinese, Philippines’ Leyte, Mindanao and Mindoro isolates). ‘M’ is Digoxigenin-labeled DNA molecular weight marker. Notice the differential banding pattern among different strains and between isolates of the same strain. (B) Same experiment as in (A) was replicated using a different pair of restriction enzymes (EcoRI + HindIII). Also inter-strain and intra-strain variation in the banding pattern is apparent.
Figure 3 Multi-alignments of 34 genome contigs representing duplicated loci to assign putative source locus. Graph shows the sequence similarity (A) and absolute complexity (B) of the DNA sequence of the 34 contigs in S. japonicum genome containing the duplicated loci. This multiple alignment was used to putatively assign the most prominent contig [GenBank:CABF01020060], the longest among the identified dataset (43.7 kb), which significantly covered the length of the other contigs as the putative duplication ‘source locus’. The curve shows the probable length of the duplicated locus, terminating with RTE-SJ at the 5` end (trimmed out in this figure) and Perere at the 3` end. ‘Similarity’ curve is a measure of the level of similarity of the aligned sequences. The y-axis on the ‘similarity’ curve will read ‘1’ when the sequences are 100 % similar in each position. The output shows the maximum score on the y-axis. Absolute complexity is a measure of the level of conservation or variability of nucleotides in the aligned sequences. It is a measure of the likelihood that the observed similarity did not occur by chance. The maximum positive score on the y-axis of the ‘absolute complexity’ curve is expected to be higher than the negative value to exclude any possibility that the observed similarity occurred by chance. The x-axes in both curves represent the nucleotide positions. The numbering of the nucleotides started at the 30000th position in this figure because we trimmed output figure at the 5` end for ease of presentation.
Figure 4 Further evidence that the duplicated genes were duplicons of a single duplication source locus. Apart from the prominent flanking copies of retrotransposons observed around the putative gene duplication source locus [GenBank:CABF01020060], other two short copies of the retrotransposon (RTE_SJ) are also found within introns in the coding region. We aligned the source locus with 10 of the duplicons and observed that both the signal sequence and these two partial copies of RTE_SJ are relatively aligned at same position, further indicating that the duplicated genes could have originated from a single source locus.
Figure 5 RT-PCR showing expression patterns of some of the duplicons in the developmental stages of the parasite. RT-PCR using cDNA libraries of the parasite egg (E), cercaria (C), schistosomula (S) and mixed sex adult (A) as template provide evidence of the transcription and expression of some of the duplicons. The pairs of primers used were designed to amplify the entire coding sequences of the mRNAs. No differential expression pattern was observed but quantitative expression levels were not investigated. (A) Evidence of expression of SjCP1084 protein coding mRNA [GenBank:AY570737], transcribed from the putative source locus [GenBank:CABF01020060]. See Table 2 for list of other similar transcripts. Notably, a non-coding transcript variant [GenBank:FN328299] can also be transcribed from the same locus. See Figure 6 and Additional file 8 for more details. (B) Expression of SjCP3842 protein coding mRNA [GenBank:AY570748] predictably transcribed from [GenBank:CABF01002612]. See Table 2 for list of other similar transcripts in the database. (C) Expression of SjCP1531 [GenBank:AY570742] predictably transcribed from [GenBank:CABF01023364]. See Table 2 for list of other similar transcripts. Notably, a non-coding transcript variant [GenBank:FN329677] can also be transcribed from the same locus (second band). See Figure 6 and Additional file 9 for details. (D) S. japonicum Actin gene was used as internal control to qualify the samples.
Figure 6 Splice models of some duplicons with evidence of alternatively splicing. (A) SjCP1084 protein coding mRNA [GenBank:AY570737] and a non-coding transcript [GenBank:FN328299] are products of alternative splicing. Based on gene prediction from the contigs using GeneQuest and GeneMark, and alignment of cDNAs to genome sequences using Spling program, we observed that two mRNA transcript variants were produced from [GenBank:CABF01020060]. An extra splice site was evolved in the first exon of the non-coding transcript [GenBank:FN328299]. When the splice site is recognized, an ORF encoding a protein coding mRNA [GenBank:AY570737] variant is created. The images were created from computer simulation of real DNA sequences using Vector NTI program. Also see a supplementary figure in Additional file 8 for more details. (B) SjCP1531 protein coding mRNA [GenBank:AY570742] and a non-coding transcript [GenBank:FN329677] are products of alternative splicing. Two mRNA transcript variants can be produced from a contig representing on of the duplicated loci [GenBank:CABF01023364]. Two extra splice sites were not utilized in the transcription of the non-coding transcript [GenBank:FN329677]. When the splice sites were recognized, exons 5 and 6 of a translatable ORF were created to produce a protein coding mRNA [GenBank:AY570742] variant. Refer to RT-PCR result in Figure 5 (C) where two bands of exact size and sequence as the two variants described above are apparent on the agarose gel electrophoresis image. Also see Additional file 9 for more details.