Literature DB >> 27453772

Chimeric mitochondrial peptides from contiguous regular and swinger RNA.

Hervé Seligmann1.   

Abstract

Previous mass spectrometry analyses described human mitochondrial peptides entirely translated from swinger RNAs, RNAs where polymerization systematically exchanged nucleotides. Exchanges follow one among 23 bijective transformation rules, nine symmetric exchanges (X ↔ Y, e.g. A ↔ C) and fourteen asymmetric exchanges (X → Y → Z → X, e.g. A → C → G → A), multiplying by 24 DNA's protein coding potential. Abrupt switches from regular to swinger polymerization produce chimeric RNAs. Here, human mitochondrial proteomic analyses assuming abrupt switches between regular and swinger transcriptions, detect chimeric peptides, encoded by part regular, part swinger RNA. Contiguous regular- and swinger-encoded residues within single peptides are stronger evidence for translation of swinger RNA than previously detected, entirely swinger-encoded peptides: regular parts are positive controls matched with contiguous swinger parts, increasing confidence in results. Chimeric peptides are 200 × rarer than swinger peptides (3/100,000 versus 6/1000). Among 186 peptides with > 8 residues for each regular and swinger parts, regular parts of eleven chimeric peptides correspond to six among the thirteen recognized, mitochondrial protein-coding genes. Chimeric peptides matching partly regular proteins are rarer and less expressed than chimeric peptides matching non-coding sequences, suggesting targeted degradation of misfolded proteins. Present results strengthen hypotheses that the short mitogenome encodes far more proteins than hitherto assumed. Entirely swinger-encoded proteins could exist.

Entities:  

Keywords:  Bijective transformation; Nucleotide substitution; Proteome; RNA–DNA differences; Systematic deletions; delRNA

Year:  2016        PMID: 27453772      PMCID: PMC4942731          DOI: 10.1016/j.csbj.2016.06.005

Source DB:  PubMed          Journal:  Comput Struct Biotechnol J        ISSN: 2001-0370            Impact factor:   7.271


Introduction

Mitochondrial genomes apparently compensate for their reduced size by cumulating multiple functions for single sequences [12], [14]. For example, tDNA, DNA templating for tRNAs, probably functions also occasionally as light strand replication origin [79], [80], [83], [106], [107], [108], [110]. The complementary strand of tDNA has similar secondary structure formation capacities and might template for additional functional tRNAs with anticodons usually corresponding to the inverse complement of the tRNA's regular anticodon [7], [81], [82], [84]. Mitochondrial tRNA sidearm loops might also function as anticodons, potentially increasing further mitochondrial anticodon repertoires [90], [95]. Translation of stop codons also increases protein coding repertoires [81], reassigning stop codons to amino acids [26], [84], [85], [86], [87], [91], [7], [11], [98]. These various mechanisms expand protein-coding potentials of DNA/RNA sequences. Multifunctional sequences, as suggested for tRNA synthetase genes [53], [69], [70] are presumably relicts of ancient, short protogenomes, plausibly consisting of ancestors of ribosomal RNAs [72], [73], [111], where sequence multifunctionality was probably essential. Presumably, alternative codings are relicts of mechanisms that increase sequence multifunctionality.

Swinger polymerization

A further little known phenomenon increases DNA's protein coding repertoire: nucleotide polymerization that systematically exchanges nucleotides. This alters gene and mRNA coding properties. Assuming this phenomenon enables to detect homology relationships of otherwise ‘orphan’ DNA and RNA sequences. The homology of these orphan sequences had not been determined because these apparently orphan sequences are so much transformed as compared to their ‘parent’ homologue that homology is undetectable without assuming a systematic exchange between nucleotides, but becomes obvious after taking the systematic exchange(s) into account. These transformations consist of systematic exchanges between nucleotides during DNA or RNA polymerization, producing so-called swinger sequences. The first described swinger RNAs were from vertebrate mitogenomes, and correspond to a 3′-to -5′ inversion, without complementing, of the homologous, template sequence [88], [92], also called ‘reversing’ transformation [27], [28]. When considering a specific sequence, this transformation follows the swinger rule A ↔ T + C ↔ G (bijective transformation rule π9 according to the annotation system in [58]) of the negative strand of the specific sequence, one among nine systematic symmetric exchanges, of type X ↔ Y, e.g. A ↔ C [93]. Fourteen asymmetric exchanges exist, of type X → Y → Z → X, e.g. A → C → G > A [94]. About hundred mitochondrial transcripts corresponding to one of these 23 swinger types have been detected within the human EST database of GenBank, with about twice as many from the nine symmetric exchanges than from the fourteen asymmetric exchanges. Swinger RNAs matching eleven exchange types were detected within GenBank's EST database, six symmetric, and four asymmetric transformations. Most of these swinger RNAs (obtained by classical Sanger sequencing) are longer than 100 nucleotides and have > 90% similarity with the mitogenome if the swinger transformation is assumed over their complete length [90], [91]. All 23 swinger types exist in the human mitochondrial transcriptome among short reads produced by RNA seq (Illumina) ([103], data from [30]). Abundances of different swinger types as estimated from GenBank's ESTs (sequenced by classical methods) and next generation massive sequencing (RNA seq) are overall congruent (i.e. Figure 2 in [103]). This congruence between swinger RNA abundances is remarkable for two reasons: first because comparable results are obtained by two independent methods (Sanger versus next generation sequencing); and second because biological samples differ (not the same cells/mitochondrial lines were analyzed). This suggests that mitochondrial swinger transcription is general to mitochondria, not tissue- or line-specific. Hence sequences potentially template for 23 swinger transformed versions, increasing considerably the potential coding density of any sequence. Swinger DNA was also detected for nuclear and mitochondrial genes [96], [97], especially ribosomal RNAs [99], but for now only according to swinger rule A ↔ T + C ↔ G. Swinger sequences detected in Genbank originate from numerous independent research projects and laboratories, only this author describes them as swinger-transformed.

Swinger versus chimeric RNAs and peptides

Some detected sequences are not entirely swinger-transformed, sequences contiguous to the swinger sequence match the untransformed, contiguous DNA template, and hence are regular RNA [100]. These RNAs transcribed partly by regular, and partly by swinger transcriptions, are termed chimeric RNAs [100]. The transition from one to the other part is frequently abrupt, suggesting sudden switches in the polymerization mode of the same polymerase. Analyzes here search for peptides matching translation of such chimeric RNAs, where contiguous parts of the peptide are translated from regular and swinger parts of the sequence. These peptides are also considered chimeric, and differ from previously described swinger peptides [103] because the latter are only translated from swinger-transformed RNA, while chimeric peptides would be transcribed from RNA that is in part regular, and in part swinger-transformed. The principle according to which chimeric sequences are produced is shown for a specific 120 nucleotides long sequence of the human mitogenome (Fig. 1). The mid-forty nucleotides are swinger transformed according to swinger rule A ↔ C + G ↔ T (second sequence in Fig. 1). Swinger RNAs consist solely of swinger-transformed regions such as the underlined transformed regions. Chimeric RNAs have at least one of the contiguous, untransformed (5′ and/or 3′) parts. Swinger peptides are solely translated from swinger-transformed sequences (such as the underlined sequence in Fig. 1). Chimeric peptides are translated from a sequence that stretches over a regular and a swinger-transformed RNA region. Only a minority of detected RNA sequences bearing swinger transformations are chimeric, most follow in their entirety a given nucleotide exchange rule [90], [91], [100]. Detection of chimeric peptides would be evidence independent of previous descriptions of swinger peptides (entirely encoded by swinger-transformed RNA [103]) for translation of swinger-transformed RNA. Much fewer chimeric human mitochondrial RNAs [100] than entirely swinger-transformed human mitochondrial RNAs [90], [91] have been detected. Hence I expect to detect fewer chimeric peptides than for previous analyzes searching for entirely swinger-encoded peptides.
Fig. 1

Example of running windows reduced to 120 nucleotides for illustration purposes, for the human mitochondrial genome, and swinger transformation of the mid-third 40 nucleotides. First row: regular genomic sequence; rows starting by $ are the first 5 running windows, with the mid third swinger transformed according to swinger rule A ↔ C + G ↔ T (as an example). Running windows used for actual analyses are 270 nucleotides long, and transformed according to each of the 23 swinger transformations, along the same principles as shown above for 120 nucleotides. The three peptides translated from the first running window sequence are also indicated, stops codons are translated here as ‘*’. Analyses searching for mass spectrometry data matching these predicted peptides consider the possibility that any amino acid is integrated at stops (19 possibilities, merging leucine and isoleucine, which are undistinguishable by the sequencing technique used here, because their molecular weights are identical). This means that for a single running window sequence after a single swinger transformation, 3 × 19 = 57 hypothetical peptides are considered. This number is doubled to 114 considering the inverse complement sequence, and multiplied by 23 considering all 23 potential swinger transformations. Hence 2622 peptides are translated from the 23 swinger transformations of each running window sequence indicated by $. This running window structure enables detection of chimeric peptides where the regular part is translated from the 5′, as well as from the 3′ side of the swinger part.

Swinger polymerization by regular polymerases?

Swinger polymerizations could result from unusual polymerization modes by regular polymerases because the principle of swinger polymerization does not differ from that of point nucleotide misinsertions. The difference is in the systematic change in templating rules, from f(A,C,G,T) = (A,C,G,T/U) (regular DNA replication/transcription), to a different rule, e.g. A ↔ C, which can also be annotated as f(A,C,G,T/U) = (C,A,G,T/U), stressing its systematic, rather than punctual nature. It seems plausible that point nucleotide misinsertions are due to switches to unstable, unusual conformations of polymerases, lasting the time of a misinsertion. Hypothetically, these unstable, misinsertion-inducing conformations are occasionally stabilized, so that the nucleotide exchange corresponding to that misinsertion occurs systematically along the sequence stretch polymerized while the polymerase is in that unusual conformation, producing a swinger DNA/RNA. Swinger RNAs are for now the only evidence indicating the existence of such unusual, stabilized polymerase states. This hypothesis on polymerase conformations yields two testable predictions. The first prediction is that biochemical parameters experimentally estimated for point misinsertions by polymerases predict properties of swinger sequences. In this respect, the affinity (Km) and Vmax of each of the twelve misinsertions, and the four regular insertions, as determined by Lee and Johnson [46]) (therein Table 1), were used to predict abundances of swinger RNAs. Indeed, these experimental kinetic parameters predict several properties of swinger RNAs [58], [93], [94], strengthening the hypothesis that regular polymerases are responsible for swinger polymerizations, by switching to unusual, stabilized ‘swinger’ conformations, similar (or even identical) to conformations causing point misinsertions, but lasting longer.
Table 1

Human mitochondrial peptides detected assuming abrupt switches between regular and swinger parts of RNA, for peptides where each regular and swinger parts have > 8 amino acids (mass spectra from [33]). Columns are: 1. Peptide number; 2. swinger type; 3. amino acid inserted at stop(s) (‘no’ indicates lack of stops); 4. strand and frame; 5. peptide sequence; 6. PSMs; 7. Xcorr; 8. trypsin miscleavage; 9. PEP; 10–13. Positive strand positions of 5’ and 3’ extreme amino acids of regular and swinger parts of detected peptide; 14. Peptide extremity matching regular transcription. Underlined: peptide swinger part; *, $ marks swinger peptide parts covering previously described swinger reads, respectively previously described swinger peptides [103]. Peptides 8 and 9 differ in posttranslational amino acid modifications (not indicated). Highlighted peptide parts match both translations according to vertebrate mitochondrial, and nuclear (standard) genetic codes. Peptide parts not highlighted match only translation according to the vertebrate mitochondrial genetic code, and are incompatible with translation according to the nuclear genetic code. For example, peptide 3 could be translated in the cytosol on the base of RNA transcribed from mitochondrial inserts in the nuclear chromosome (numts), peptides 5 could not, as peptides 1 and 2 because at least one part of the peptide is not compatible with translation according to the nuclear genetic code. Further analyses (see text) show that fewer detected peptides are compatible with the nuclear genetic code than expected by chance, and that more peptides than expected by chance are compatible only with translation according to the mitochondrial genetic code.

1234567891011121314
1acar0YGVSEGLAAPVGAYNVGAFAALYMAANFSFLNQDAVQVADK32.7700.37432463156315331235′2912
2acvr1SCLLAFLMGSLMLTLIVGLSK203.0400.16934653432349234683′129
3acrr0AWGGGFDVDWWGSDDIVAMR433.0600.745161881616116158161315′1010
4agkr2AIGKVAFSTSVMLEVMFLVNK1853.0610.08222892262231922923′1110
5agfr1SYTFPPGSSSVACWLGCSPSPTLTLIFGLSK493.3801.00034563432352234593′922
6agsr0GGSPSDSTTSSQQLLSSILWSK1493.9800.755101251010110164101283′1012
7agnof0LLGAVPLASASLTIGSLALAGMPFLTGFYSKDHIIETANMS12.3710.931137041376713644137013′2912
8atnof1FIAYHSPGKVNFVPATAVTR3992.7710.1968829248558793′119
9atnof1FIAYHSPGKVNFVPATAVTR6042.7710.6338829248558793′119
10atpf1MPNSFNWDVGGNSSKLPVECLVEQGPEAR12.3110.60514671500140714643′1712
11atyr2TMSYALTLLLLQTCRGFSR254.3410.83656015574563156043′910
12atnor2DMGDASVMGLSVNEASYDGK192.1700.12169336906696369363′1010
13cgnof0TLGQGVAHDLRTNPVDFVGDK62.6910.09413441365136814045′912
14cgkr2LLASLPQPTVVPSTMPTISVRSGVLAGCLIGWWKPK22.0010.592111031105811163111063′1620
15cgtr0TDNTNHHLTGSAIMTMTAPVK5404.8900.653116251160111661116283′1110
16cgnor1TSQTDLLTDPPITYEFLWAFSVNK$82.0300.592121501212612195121533′914
17ctmf1MWEDLMVEAMNSLSSATVGR8804.4900.83719952019202220525′911
18cter0GMGPMAYLASLALKENMVNNAEGFK6422.4710.18242334209420641615′916
19cttf0NPSLSISVPSTRHVSMPITISSIPPQTTEMCLMK32.5410.86642904350425142873′2113
20ctar2GVNWAKMNIAGYESSYNEQR52.7610.352105121048510542105153′1010
21ctmr0AMMGDCAVCGTEMMSMCIK132.0300.42139141389013887138605′109
22ctvr0NVVWSVAVAAMMKGGVGVGMGGHMEMK313.3110.393153211527915276152405′1413
23ctyf1DVSGPSSPSSSLMTLTLFSPDLLGDPDNYTLANPLNTPPY22.2700.724157141580115684157113′3010
24gtqr2NNLFSLYCYLFQLWMMDPEHMNSMALK12.5500.464133291328713365133323′1611
25ac gtsf0MVGSFMGSGDKPTEPGDSFGEPRSEAGPGPGSTLQSAR382.4310.45919952049205221095′1919
26ac gthr2WSSSLAAPSAFVLVGMSSRHSLLVCGTHVYFFGHNWNK12.7110.21844284389450044313′1424
27ac gtsf2SGWVEWSRHSVLLLLSLPVLAAGITMLLTDR83.8511.00065916648655865883′2110
28ac gtyf0EATASSAGNDASYDGQSGKDSQATPYTKPTPK92.7710.42572217254716172183′1220
29ac gtxr1GDKLFYDXGLLWGAQAGMVR4354.0610.90175037479747674465′911
30ac gtxr1RPLSPXGASLWSSVLXTYLR1295.3100.51674767452750974793′911
31ac gtnf0VMVTDLLQKSWSPHSYNNYITNR62.3910.97181578193812781543′1310
32ac gtnf2SNALNNAGKNAEGHYSSSPNNK12.5211.00085598583852085563′913
33ac gtcf1TPGVVPEPAPAANVHSSCPPCPWLPCFPPSLPPSLTLTK152.7500.296125641262412510125613′2217
34ac gtnof0SLKQNWDFSFNSSTMVVAGIFLLIR12.1111.000132991333813266132963′1411
35ac gtxf1EMHLCSXEDSRAHNTWGXLK132.0210.989167281675216698167253′1010
36ag ctkr2SLAPSGWSLLNLTNPLFSSMNLPTILLHKR164.3110.20817281704178817313′1119
37ag ctdr1LGDDWLEDMGNSNQNQLK32.0800.89934263399339633755′99
38ag ctnof2WALFLSGTDSSSVSLAPLAATGSWGGLNQTQLR72.9600.14350435076498050403′1221
39ag ctyf0NPPYTWSDYMSIFCFVVCLGGLR152.0800.87574857515751875545′1013
40ag cter1YVGVEDESAVTNTSTNLTLPTIGQPSNGKK22.1410.47278097779777677195′1020
41ag ctqr2GDACWGPVPSQLGGQGQAGVVKGLQGLHQQGGPQNGGR12.2410.80494569387938493425′2315
42ag ctnof1AHVEAPIAGSMVLAVTSPGSNNR373.7401.000116041164311646116705′149
43ag ctvr1RSPLPGDQVDYVVVHGGMSVQFLWAFSVNK343.0811.000121801212612213121833′1911
44ag ctff1FNPFFGFVGPITKPTLNFNK9143.5100.423148741490414847148713′119
45ag ctvr2HVHPEPSDEVAAYGANSIRCVGVGVVVMLVR13.2310.721150151497915069150183′1318
46at cgnof2SSLRPYTKCVVFLASEEVK42.4111.00031353129312630965′910
47at cger2QAEVFLSLQSSSQNHCFMQHISSGESASYVVPEK2153.2300.22438583822392138613′1321
48at cgar0FEDNKWDSFIDFYQTYFLGLAGNAGDCNGYGDMSYK12.8211.00040744002410740773′2412
49at cgef2GISWPKLDEEGGGPFEAGEAPAGLK593.8210.16958475874579958443′916
50at cgkf1SIAGVDVAMAVSGTKTLYLLHSNTHHNR13.3110.88662016231614761983′1018
51at cgrf2WLPWLGCSCGWCRLITSTPTYFPHYSR12.4910.94180708097802280673′1116
52at cgcr2CYLVGAFHCNLHNQENCK273.9200.69082478223822081965′99
53at cger1AGEGLLEVWKASEPNSAVAK83.1710.17190729042903990125′1010
54at cgef1EIFLSLLPQVGSGMGGESSR113.5700.24996489669967297055′911
55at cghf2GFLCIKLSCVGGCPHLLASSLYYFLTK$82.0810.935110371107010992110343′1317
56at cgxr0DGGNXGSQGXGAMSSHVPMMKMNLNVVLK$382.2510.766123211229112288122375′1217
57at cgrf0RPRLTSLPSLLNDINTILWSGGSAGSVNMGSVGEFVGR12.9410.124156871573515738157985′1820
58acgnr2NTTTLSRTLNVGAVMNNVMVDVAGFNGSLVK602.6811.00033153255334533183′2110
59acgef0SEHTPQLPTETTSSALSDRR1593.5210.14150855109511251425′1010
60acggr0WGGSTTNGGEPTGGSTLVGGEYKLQGDR1732.5210.35164836450653164863′1216
61acgqr2QLVEQPKDTVEWQDMVEVGYNVVR35.2810.40168916852692468943′1311
62acgtf2TSKPHPTTTPPPSSSTPLQGLQCGGVRGVQAHQGAGHVQGR733.0310.876131671321513218132875′1724
63acgtf2TSKPHPTTTPPPSSSTQISPITCGGVRGVQAHQGAGHVQGR292.9910.866131671323013233132905′2219
64acgnof2GKQEPGLEQLCASSAALGEIPLPNNNPPLPK172.3610.236139891402213932139863′1219
65acgef0GLPQHQVHHKPHKPHYETHTQQK22.0501.000148621490114835148593′149
66acgef0GLPQHQVHHQPHKPHYETHTQQK22.0501.000148651490114835148623′1310
67acgaf1GGFSGPSQILIATILCSFMGK42.2400.319162661630516242162633′129
68acter2GENAGELEGWTATLSCSSPEGPTTLGPLR62.5900.15632283186327332313′1415
69acttr1KPAGASPAFFPGGGTSTLKPVDTGATLLTMEGETVSPGSVVK$22.0401.00055985511550854725′3012
70actkf2NLPILLLTNVEPQPFPPTPK274.2800.948110971111811064110943′911
71actkf2NLPILLLTNVDPQPFPPTPK1974.9701.000110971111811064110943′911
72actnf2WPLQCRQMSTTSSNFCHFNPNNNPPLPK32.9310.970139981402213941139953′919
73agcrf2DQPNPLRPCPAPLTDAMAIR452.1900.27239453969397240025′1010
74agckr1NNSSPIPVPVSSMVMPAAKTGK3653.8610.38263696336639663723′139
75agcvr2FQLQSMPVLGSVGGVGGAMGVMR1254.6400.95881578115818481603′149
76agcnor2TSYLLLQQPLLWWLCWFKLCVFWK52.0110.163107491071610785107523′1212
77agcqr1MQQAAFALQWMLLWGQDWQQQWMR82.1600.264124321239012459124353′159
78agcwr2MSSTCGNDISMSRISGLFSAWGWK42.0410.739130981306813137131013′1113
79agcqf0LQQMMQQADVLVAGIFLLIR133.7700.353133141333813281133113′911
80agcvf0IVAFSTSSQLGLMVLEVPVGVK2604.1600.296134581348813491135155′139
81agcdr1FLVDLGLGGVGAGFGSDEGLDDGLCGVWCDFMTVSLLMNK22.8000.029139711388713884138545′3010
82agtaf0TDAQSGGASAYKAHENILLR12.2511.00023432370231323403′1010
83agtnf2LIYSTSITLLPMTGGNGEGMR1944.0000.58654425475547855025′129
84agtdf2LIYSTSITLLPMTGGDGEGMR613.9600.46054425475547855025′1110
85agtrr2NGVSSSGGVEEGGVEVAVCLLLCVEWWLVRVCLVLLVR12.4110.18064236369636663125′2018
86agtgr2AASCGPPSCLPEVGINGGGNGMISTAAGGPGIVGAMNEANG*12.2900.27584938409852684963′3011
87agtar2AAEGNSYAEEFYGEADAGGGYAVEAATAWGGPPLAEAVPR12.4300.518101431006810065100265′2515
88agtdr0NYSSAMGACQGGSDESNDDGSGVCVWFARGPVVAAPGAVDR62.5510.953135661349113488134465′2615
89agtyf2RIRPVEVVGAIPHYFLLQYPHHR*12.8110.173137131374913680137103′1211
90atchf1HPLWNLHVEGGFSSNTCAVRPAVMGNVESYMHKHK122.5510.12414131452145515155′1520
91atcqf1GTLTVQQQHNMPPTFLGNAR613.1900.65926162643264626735′1010
92atcvr2GLVIVVVKALGSVGGVGGAMGVMR4294.4110.29981578115818781603′1410
93atcvr2GLVIVVVKAVGSVGGVGGAMGVMR10574.4810.14081608115818781633′1410
94atcxr1MXLMLMIALVAXEXIQMVVLFSVMAGMLGVVGWCR$12.1600.818132631323313335132663′1124
95atckr0FNYAFLGWGDWLLLWNYHMGMKVK173.8111.000140491402214019139805′1014
96atcqr2LWLCSKGGQWLQLGFVMNQFLMNDPK352.6010.630154081538115378153335′1115
97atgnf0TLGQGVAHEVANNGLHLRPLFSSCTR12.1300.07613441389139214195′179
98atgdr1EPMMHQVSMGKKPVSGGDPPDEDDVDGIK142.5710.12950615007508850643′1910
99atgnof0TMASSSPPSVPPAPAGSASASVTVASPPLALVAPVR$132.4201.00053765409541254815′927
100atgnof2GASFLFIWNSLYLLFGAWAGVLGTALSLLIRAELGQPGNL12.2411.00060546141602460513′3010
101atgmr0IMRMGAFGIGNMSGENTSTK652.8611.00062916264626162345′1010
102atgpf0APPTALVGTDSPHSTEAMWNDLLQCSEPPDSSFFSPPVA102.6400.56469877074696069843′309
103atgkr1KPYTLPMESMNPFCSHSKAK512.1711.000102841026310314102873′1010
104atgnr1GGAYQGNQSNNLLGGGLVVGWGLDNRLEGLFVVGLMSWSVG33.0210.924129361284612969129393′3011
105atgxr1EWAEVSSCGEEGXGGADAASEEPTKTTGER112.5110.991148261478414781147395′1019
106cgtqr1GLYWWDQQYGSGQGGWSLASPLDLGAQWTQGVGFR12.1501.0002281891861265′1421
107cgtxf0HPKPKPWEMXLXIPLXIXLLVQXGTALWTLGK12.7200.63221032166207321003′2210
108cgtmr0LLQSDHTAFGSAPMSPQPK22.1300.21726042577263126073′109
109cgtaf2AGASEGMTSYMMCLHTHYNLQHSPSNLANMSDK62.0600.03442784347425142753′2310
110cgttr2RPPSGWPSCTSTLVSGATTTTGRGAGVTVETTECGSSTDNM62.5110.58449894902502849923′3011
111cgtnor0LFLGMCLKQENPVMMSGLK3934.0910.162120661203312093120693′109
112cgttf0VFLLTMTFNQNITLWIWQHIASTGHPGMNATMSCK22.1400.703123301238412387124355′1917
113cgtff2MNGTEGHVVHVPDGVASMIYPTLQLMFPTTNSPSK82.5700.942131071316113059131043′1916
114ctgvr0VFVSLSLVSPFWLNPASTVAVNVNGYNEEVEVGHGYVVK12.9801.00031263078319231293′1722
115ctgef1QSHMKSPEPVGDEEEDEER264.1410.03237833807381038375′910
116ctgkr2DQVRPLVLCMVMLYFTIHLLHAYK82.5500.48086018571856885325′1212
117ctgkf0MNNKVFLVFVQTTIPLYLK7913.6110.130140011402513971139983′910
118ctgnof2SPSSMYPNNKLEEGLYELK704.5610.048159151594515948159725′109
119acgttf1GGAGGTPAATGTRTPSAGSDSVSTDFTQPTTSTTTPTNLK*52.2410.82525202466258625233′1822
120acgtar1FVKAALFLLAGTYPSLGAR794.0510.34229372913291028835′109
121acgtdr1MVEDMTGWADGLISTGDVDPTFSGVPKLSGGSAK62.8810.28743054236423342065′2311
122acgtfr2EEMLDGSFCGTFVFGGPVLALFSVMR12.2000.16344524416441343715′1412
123acgtnor2VPRQALVPFEVNEASYDGK1413.1310.59269306906696069333′910
124acgtnor2VPRQALVPFDVNEASYDGK3883.1610.49669306906696069333′910
125acgtnof1SNFLPTTLSRPIRNAPTLLGLGVMHAAHPAGQQMLEQAK292.9711.00072877350735374045′2118
126acgtqr1HGQAMLALPVLDPSVVVLGGCQGVGGK12.7400.031108571083310911108603′918
127acgtnor2VVGGVGWVPLAWLSLDMLQR1673.6400.555144481442414481144513′911
128acgtar2WMSGALILLGGAFCVLGSFRGGVGFVLLPGDVMADAGVER*42.3711.000147721471814715146525′1822
129actgdr1SSYKDPFAEDADLDNDIALLSLGDLVPLVK1493.8110.46927332682276927363′1713
130actgqr2GMGQGVHSQQAMVQAKVGAVMQQVMVDVAGGQNVGQPQGR22.3610.40933633276327332465′3010
131actgsf0NSVCSDGSARAVSPLAPGLSHK252.7510.56032733303330633155′1012
132actgsf0NSVCSDGSARSSSPLAPGLSHK233.2111.00032733309331233395′1210
133actgsr0MIMSAWSWKVMSSSMMETSMVEHLLDMIEIRPR62.4110.94875277482747974315′1716
134actgnof2SLSPFMITPSSVGVGVMVAVER593.2800.88477677794779778305′1111
135actgtr0TPKEQEIGEATVGTGIFNLTAK22.4211.00087788748881187813′1111
136actgnof2NQMIQALLITILLGLYFTLLSIVTAGTVFGLR12.2900.69098349894989799305′2012
137actgfr1SNAFGESKFTPTETMADTMAAFFFEYCGK12.4210.403114001136711451114033′1217
138actgrr2WGSSNHEHGGAGCLMGMVQGR5714.3500.165114811146011517114843′912
139actgkr2WGSSNHEHGGAGGLMGMVQGKGK674.6710.973114781145411517114813′1013
140actgdr0AMLLDMGAWVSKVETWVDAR94.4211.000121861215312150121265′119
141actgdr0AMLLDMGAWVSQVETWVDAR94.4201.000121861215312150121235′1010
142actgnof2VSAASSGFYRPLPNNNPPLPK813.6701.000139921402213962139893′1110
143agctkr2SKTLLLMWTLILIQSTSGK1103.7610.37727572730272727035′109
144agctsf0ASLSVALDPSGSTNTSLTAKHPNQLASIYFSR22.2410.98757755832573057723′2012
145agctnof0KAPNPCLAICALDACMLEK12223.2410.44859315955595859855′910
146agctsf0GGPAGSVFWATIQANPMASMTFSKSYSK42.6710.42976087653757276053′1612
147agcthr0TLLNKTSPTFTYLSGEAHLTCHGEK*42.1910.342135991356913641136023′1114
148agtchf0SLLPSLSTQHHRYSGWVPGGSGLDIHAPK12.8110.42024452472247525295′1217
149agtcrf2AATDDERPTTTGLNSSTTTLLLSR52.6600.30352145247516352113′1113
150agtcef1LTEPLTNGESSWEKASGSMVLAAVLLK$1552.1310.932116281165811580116253′1116
151agtcmf0IVAFSTSSQLGLMASEALAGMK1683.5300.790134581349413497135215′139
152agtcwf0IVAFSTSSQLGLMASEALAGWK4344.7200.715134581349413497135215′139
153agtcdf0IVAFSTSSQLGLMASEALAGDK4484.3300.817134581349413497135215′1210
154agtcrr1SLLPFVLFTFTPWSTLGLSMFLWVLGRGGLLFGR*13.4011.000137581372513821137613′1321
155agtcnor2VAPVMSLIWFVLLPVSGPILEWCGRLMK163.0610.437149341490714991149373′919
156atcgkr2ATKTVGGVFGQTNQSPDPK*12.6810.2513212942912675′109
157atcgyf1SVGGGSAGYCVCGAAWGLGEPTKPQYHPPQFMYLTSSK82.3700.5175556094985523′1919
158atcger0VISSEFIMQSQSPKHELEK423.0810.10316291605160215755′109
159atcgaf2LYSQAFNSSSAQHTHGVGGCGVHWVRFEFK12.2510.66933243369337234115′1525
160atcgnof2LMPPLCKIHHESVALLVR202.551139363912390938855′99
161atcggr1EKNQAVPEGPSMFISGPTQVK32.4710.57343834353441343863′1110
162atcggr1EKNQAVPEGLAMFISGPTQVK152.8110.5743804353441343833′1110
163atcggr1EKNQAVPEGLSMFISGPTQVK202.8010.5743834353441343863′1110
164atcgnof0AHTPKMLVMGPGLLPSGQGLGR272.1310.47245034527453045665′1012
165atcgxr1SEASASGSAKAAHDHLDDHPMMXLLFFVNSSMMAHLGK12.7910.3451155064517551183′1820
166atcgqr2QDCCDQDGSDEDPSNQNPQPAPKQER12.4210.30963156282627962405′1214
167atcgvr2GPVTVQAKVVGSVGGVGGAMGVMR2034.8310.07481608115818781633′159
168atcgvr2GPVTVQAKVLGSVGGVGGAMGVMR1814.5910.08281578115818781603′1410
169atcgaf2AASHPVPVPMTLLMLGLLTNTLTMYQWWR242.6600.05394779537945094743′1910
170atcgar0AMTLHAHAGAMFSEPAVLWVAISAMSAGAEPTAVANAK22.4400.693111961111511226111993′2810
171atcgqr2GDAGEMLLVNAGLLGAQFLLASK283.4700.068137191369513692136535′1013
172atcger2GDAGEMLLVIAGLLGAEFLLASK293.5400.147137191369513692136535′1013
173atcgtr0SAEHSLGAGYHSGLMWGGVFKGLATVTLSGSPTTSGENT12.9310.885155461545615573155493′309
174atgccf2NIPFLLFGVNSCCVIPSCNMPSACWINCKCLCK12.3610.23918721911181518693′1319
175atgcef0SVESMLLGEENNFAEEAKAK6413.5210.14819021929187218993′119
176atgcwf0WDISQGKTFAVILNLVLYPHPPK233.6210.85132403270320432373′1112
177atgckf1HYLYDMSPLNGIENHGKK1543.1910.41942724293429643235′99
178atgchf1LMHHHYKSSAHHVHSPMIVHHNNYQYK*352.4010.59165016522644464983′918
179atgcer1IINITAVEENPSGRSSLHK323.6210.04071557131712871015′109
180atgcer1IINITAVEEIPSGRSSLHK424.3810.15671557131712871015′109
181atgcnof2LLKECLSLASVPATPPYHTFEEPVYMK122.4810.68175247560748275213′1314
182atgcff1AQWLFAFALFLKMPFPFEVMFHMSMK52.1011.00090069051905490815′1610
183atgcnr1NYLYYKSYCVSYSTTNNLSFNITK143.5410.304112981126811265112295′1212
184atgckr0SKNKPDTNASSNPVMMSGLK2323.4910.793120541203312087120573′911
185atgccf1VIFCQMVEFCVMVQVHSDNCADIIEAPLHKMTSK12.1310.623134281345213353134253′924
186atgcyf1VVYNGLQAMPEAYSQDFSLLTTFPPHPPSK122.4700.484139411400113914139383′219
The second testable prediction of the hypothesis is that the same polymerase produces regular and swinger-transformed sequences. Hence occasionally, polymerases switch in the midst of replication/transcription, so that part of the sequence follows regular templating rules, and the other, contiguous part, is swinger transformed according to one among the 23 swinger rules. The fact that there are far more RNAs that are entirely swinger transformed than chimeric RNAs [100] suggests that such switches during polymerization are rare, and usually occur before or at the onset of polymerization. The 16S rRNA gene in the complete mitogenome of Kamimuria wangi is a swinger transformed A ↔ T + C ↔ G DNA sequence, embedded within an otherwise regular insect mitogenome [99]. The reasons why until now only DNA matching this A ↔ T + C ↔ G swinger transformation has been detected, remain unknown. This A ↔ T + C ↔ G exchange rule is also common among chimeric (part regular, part swinger) RNAs [100]. Peptides encoded by chimeric transcripts are detected for the first time here. An example of three peptides and the corresponding DNA, chimeric RNA is described in Fig. 1. Underlined parts are translated after swinger-transformation of the transcribed RNA sequence. When a detected peptide corresponds only to all or part of the underlined amino acid sequence, this peptide is considered a swinger peptide, as the peptides described in an earlier publication [103]. When the detected peptide encompasses part of the underlined, and part of the contiguous amino acid sequence(s) that are not underlined in Fig. 1, the peptide is considered chimeric, because translated from untransformed RNA (the part that is not underlined in Fig. 1), and from RNA that is swinger translated (underlined part in Fig. 1).

Previously detected swinger RNA

An anonymous reviewer suggested to add, for reader convenience, explanations on how swinger sequences described in previous publications had been detected in GenBank. These methods are not described in the Materials and methods section, as this would be inadequate and confusing: neither results nor analyses beyond those described in earlier publications on RNA were done in the context of the presently described proteomic analyses. The aim here is (chimeric) peptide detection. Following descriptions are only for the convenience of potential readers. In a first step, the 23 swinger versions of the human mitogenome were transformed in silico. This means that for swinger transformation A ↔ C (as an example), all As in the human mitogenome are replaced by the ‘replace’ function in the software Word by ‘X’. Then all Cs are replaced by A. The last step is to replace all Xs by C, producing a hypothetical, A ↔ C swinger transformed human mitogenome. Similar procedures produce all 23 possible swinger versions of the mitogenome. Each of these is then analyzed by BLASTn [2]. Two types of analyses have been done. The first analyzes (publications by Seligmann on swinger sequences prior to 2016, from 2012 on) search for alignments between the swinger transformed mitogenome and various sequence databases in GenBank, using standard default megablast parameters. This resulted in detecting long, highly similar sequences, as described in [88], [92], [93], [94], for example. Such searches do not yield alignments with nuclear chromosome sequences, but detect about 100 ESTs (expressed sequence tags). The length of the alignments (> 100 nucleotides) and the similarity with the hypothetical swinger-transformed mitogenome versions (> 90%), as these were previously presented (Table 1 in [88], [92], and Table 2 in [93], [94]), are not compatible with randomly obtained results (as tested by simulations based on randomly shuffled swinger mitogenomes in [96] (therein Section 2.2.3.)). These long EST swinger sequences were then confirmed by sequences detected within sequence read archives (SRA) of the human transcriptome published by Garzon et al. [30]), GenBank SRA entries SRX768406–SRX768440. For these analyses pertaining to short RNA reads (50 nucleotides, RNA seq, Illumina), BLASTn searched for ‘somewhat similar sequences’, using default search parameters, and detected swinger reads as these are described in [103] (therein supplementary data). These results on short swinger reads converge with those obtained by the first, EST-focused search ([103], therein Figure 2). Using the same search tool and criteria as used for the RNA seq reads, the 23 swinger-transformed mitogenome versions also align with nuclear chromosome sequences (they did not align with human nuclear chromosomes when using megablast as for the EST search).

Nuclear origins of swinger sequences (numts)

Alignments detected with relaxed criteria (BLASTn, see previous section) between swinger mitogenome sequences and each short reads and nuclear chromosome sequences suggest the possibility that swinger reads (but not swinger ESTs) could originate from the cytosol. To some extent, this is not relevant to the main issue at stake, the very existence of swinger polymerizations, but could be relevant because the very large human nuclear genome could by chance be the origin of these alignments, due to its size. In addition, mitogenome copies (called numts, [51]) are inserted within nuclear chromosomes [36]. Because nuclear copies of regular mitogenomes exist, their occurrence for swinger-transformed mitogenomes is plausible, and would consist in itself a possible independent confirmation of the existence of swinger sequences, as previously discussed [103]. In addition, the possibility of swinger transcription of regular numts in the nucleus can't be ruled out. Previous analyses [103] showed that the majority of detected swinger reads have mitochondrial origins. On average, alignments between swinger-transformed mitogenomes and RNA seq reads have higher identity percentages than between the same swinger mitogenome sequences and nuclear chromosome sequences. This is the case for a statistically significant majority of comparisons of identity percentages obtained between RNA reads and the swinger-transformed mitogenome, versus that between the same swinger-transformed mitogenome region and nuclear chromosome sequences. This suggests that most potential swinger numts diverge from their ancestral mitogenomic sequence, and that most RNA reads aligning with swinger-transformed mitogenomes have mitochondrial origins, because the swinger-transformed mitogenomes resemble on average more RNA reads than putative swinger numt(s) [103]. The ‘Discussion’ below develops these points in relation to potential nuclear origins of chimeric peptides.

Chimeric RNAs due to fusion between different RNAs

The term ‘chimeric’ transcripts has been used in the literature for a different type of RNA than the contiguous regular- and swinger-transcribed RNAs [101]. These other types of chimeric RNAs refer to two or more different transcripts produced each by regular polymerization, on the template of disjunct DNA regions. These RNAs are then fused by natural [60], [132] or artificial reverse-transcription-associated phenomena [131]. These chimeric RNAs differ from the regular-swinger RNAs in the sense that for the latter the transcription process is chimeric (part regular-, part swinger transcriptions), but not in terms of their templating DNA regions, which are contiguous, not disjunct. It is possible that some unknown sequencing artifacts produce some of the detected swinger reads, but the non-random mapping of detected swinger peptides on detected swinger RNA reads, as previously described [103] shows that most swinger reads exist while translation occurs in the cell, and hence are not artifacts.

Swinger polymerization creates new genomic sequences

Another type of analyses detected swinger repeats within the regular mitogenome. Swinger repeats are usually short repeats that can only be detected when taking into account swinger transformations. These short sequences are inserted within the regular mitogenome, suggesting that natural retrotransposition of swinger RNAs produces novel DNA sequences [101]. They are more frequent and longer than expected by chance, and their length is proportional to the probability that the specific swinger transformation conserves circular code signals that presumably maintain the ribosomal translation frame in the gene. The natural circular code is a punctuation code within the genetic code consisting of 20 codons that as a group, have properties that enable protein coding frame retrieval [4], [22], [55], [56], [57]. This indicates that insertion of swinger sequences in the human mitogenome depends on their capacity to integrate protein coding genes without disrupting punctuation that presumably enables ribosomal detection of the coding frame.

Chimeric peptides

Recent analyses show convergent frequencies between swinger RNAs sequenced by classical and next generation (RNAseq) sequencing methods [103]. Hence swinger RNA occurrence is relatively well confirmed by data from independent methods and research teams. Here analyses complement at peptide levels results on chimeric transcripts. The existence of transcripts that are part regular (untransformed), part swinger RNAs, with an abrupt switch between these parts, predicts the existence of ‘chimeric’ peptides matching translation of such chimeric transcripts. Hence MS/MS mass spectra of peptide data (from [33]) previously used to detect swinger peptides [103] are reanalyzed here, using the same methods as by Seligmann [103]), searching for peptides matching in part the translation of the untransformed human mitogenome, and in part the translation of the swinger-transformed, contiguous mitogenome sequence. Chimeric peptides are peptides where swinger-encoded parts of a peptide are contiguous with parts translated from regular RNA. These would be stronger evidence for translation of swinger RNA than previous detections of entirely swinger-encoded peptides because the regular encoded parts function as matched positive controls, directly associated with swinger-encoded parts. In addition, chimeric peptides could suggest that swinger peptides are integrated within otherwise regular proteins, a further small step to understand functions associated with swinger phenomena.

Materials and methods

The revised Cambridge reference sequence for the human mitochondrial genome (NC_012920, [3]) was cut according to a running window of 270 nucleotides. Analyzes do not account for known mitochondrial polymorphisms, as this would expand analyzes beyond computing powers. The six frames of each of these nucleotide sequences of 270 bases were translated into the corresponding hypothetical peptides according to the vertebrate mitochondrial genetic code, after the 90 nucleotide-long mid-third of that sequence was swinger transformed according to each of the 23 swinger transformations. Hence each running window (around 16,300 in total) is represented by the six peptides translated from each of its 23 partly swinger-transformed versions (6 frames × 23 swinger versions = 138 hypothetical chimeric peptides for each of the 16,300 running windows). The window length of 90 codons/amino acids is designed to match the length of the longest (non-chimeric) peptides (up to 40 amino acids, [103]) previously detected in this dataset [33]. All translated hypothetical peptides are used by Thermo Protein Discoverer to predict a theoretical mass spectrometry distribution, which is matched with observed MS/MS mass spectrometry data from Gueugneau et al. [33]). Stops are translated as ‘X’, which Thermo Proteome Discoverer considers by default as leucine/isoleucine (these have equal masses, and are indistinguishable by mass spectrometry). Peptides including stops are duplicated 18 times, replacing ‘X’ by one of the 18 remaining amino acid species, excluding leucine and isoleucine. Hence predicted peptides include the possibility that any amino acid could be inserted at stops. Analyses assume that all stops in a single predicted peptide are translated by the same amino acid. Hence the 138 peptides for a single window of 270 nucleotides, if it includes at least one stop (the majority of cases), are represented 19 times, inserting X and each of the remaining amino acids at stops (19 x 138 = 2622 chimeric peptides). In total, approximately 42.7 million hypothetical chimeric peptides were tested. Consensus searches were handled with the Sequest (Thermo Fisher Scientific, Illkirch) algorithm with the following mass tolerances: Parent = 1 Da and Fragment = 0.5 Da (monoisotopic masses). Fixed carbamidomethyl (C) and variable Oxydation (M) modifications were activated, as well as the lysine → pyrrolysine modification, and only one missed trypsin cleavage was allowed. False discovery rate was estimated against a reverse decoy database using the Percolator algorithm. No protein grouping was allowed since the database only contained non redundant entries. Peptides with false discovery rate q < 0.05 and score Xcorr > 1.99 were considered identified. The score Xcorr is a likelihood of match between expected and observed MS/MS data that is unaffected by peptide length. Further explanations on peptide detection and characterization by the software are given in the Discussion. Observed mass spectra were compared separately to predicted peptides 19 times, each time inserting a different amino acid at stops. Here analyzes test the existence of a specific group of peptides, namely chimeric peptides. The false discovery rate q is adapted to such populations of detected items [41]. Results also indicate the posterior error probability PEP, an estimate of detection error specific to each individual peptide, which might be useful in the future, when analyzes focus on specific peptides, rather than on a population of peptides. Results are not analyzed according to this criterion more adapted to studies focusing on specific individual peptides.

Results

Analyses detect according to the filtering criteria 1301 chimeric peptides, among approximately 42.7 million chimeric peptides produced by combinations of stop codon-amino acid insertions, swinger transformations and frames for the running window of 270 nucleotides (illustration in Fig. 1). Hence chimeric peptides are detected for approximately 3 among 100,000 hypothetical chimeric peptides. This is 200 times less than the rate of detection for ‘regular’ non-chimeric swinger peptides, using the same criteria and the same data, approximately 6 per 1000 predicted swinger peptides [103]. Previously detected chimeric human mitochondrial RNAs are about 3% of all RNAs detected with at least some swinger part. Part of the discrepancy between chimeric RNA versus peptide detections probably results from the fact that proteomic analyses only considered abrupt switches between regular and swinger parts of peptides. Blast analyses detecting RNAs are not limited by this consideration, and can detect RNAs where the switch is not abrupt: in a transition sequence between regular and swinger transformed sequences, nucleotides seem random. Hence for practical reasons, detection of chimeric RNAs encompasses more possibilities than chimeric peptide detection, explaining lower rates of chimeric peptide detection (relative to rates of swinger peptide detection); these are lower than rates of chimeric RNA detection (relative to swinger RNA detection rates). Here we focus specifically on chimeric peptides for which each regular and swinger parts have more than 8 amino acids. This is because considering 19 different amino acid species (merging leucine and isoleucine), the e value for 42.7 million potential chimeric peptides is about 0.0001 for amino acid sequences of 9 residues (42,000,000 × 1/19− 9). Hence the match of each regular and swinger part of the detected peptide with the predicted chimeric peptide is unlikely to be due to chance, as estimated by this approximate e value. This restricts the sample of 1301 detected chimeric peptides to 186 chimeric peptides of at least 18 residues, from various swinger transformations and stop-amino acid insertions (Table 1). Among these 186 chimeric peptides, the regular-encoded part of the peptide corresponds to the 5′ part of the peptide for 41% of the 186 chimeric peptides. This means that a statistically significant majority of chimeric peptides (two tailed sign test, P = 0.0061) correspond to the 5′ translation of swinger RNA and 3′ translation of regular RNA. Note that this statistically significant bias could not occur if detected chimeric peptides were due to random detection artifacts, strengthening the suspicion that results reflect a biological reality. Hence 41% of chimeric peptides reflect translation of regular transcripts that switch at a given point to swinger transcription. Frequencies and mean lengths of chimeric peptides, for each swinger type (Table 2) show that the regular (non-swinger) part of the chimeric peptides is on average slightly longer than the swinger part, though this difference is not statistically significant.
Table 2

Frequencies and lengths of chimeric swinger peptides detected in Table 1: Columns indicate: swinger type; peptide number; mean PSMs; mean amino acid number in peptide non-swinger part; mean amino acid number in peptide swinger part.

Swinger typeNPSMsRegSwinger
A ↔ C322.017.010.3
A ↔ G496.014.814.0
A ↔ T5209.611.610.0
C ↔ G4139.011.314.0
C ↔ T7225.114.711.7
G ↔ T11.016.011.0
A ↔ C + G ↔ T1159.613.814.2
A ↔ G + C ↔ T10103.013.214.4
A ↔ T + C ↔ G1231.212.314.8
A → C → G → A1052.214.613.7
A → C → T → A547.014.213.6
A → G → C → A991. 914.110.2
A → G → T → A833.318.312.4
A → T → C → A7230.312.114.7
A → T → G → A918. 818.312. 8
C → G → T → C852.418.412.9
C → T → G → C5179.211.412.6
A → C → G → T → A1082.114.014.3
A → C → T → G → A1471.813.611.9
A → G → C → T → A5268.413.211.4
A → G → T → C → A7173.112.113.6
A → T → C → G → A1833.714.612.3
A → T → G → C → A1392.611.812.7

Swinger peptides and chimeric peptides

Note that chimeric peptides, due to their part that matches translation of regular transcripts, differ in mass spectrometry properties from peptides entirely translated from swinger RNA, even if these have the exact same swinger sequence. Hence detection of chimeric peptides with swinger parts overlapping previously detected ‘regular’ swinger peptides (as the swinger peptides described by [103]) would be strong, independent methodological confirmation that positive results are not artifacts. Indeed, the swinger parts of eight chimeric peptides in Table 1 overlap with one of the 263 previously described swinger peptides [103]. These previously described swinger peptides cover on average 1.1% of the swinger-transformed mitogenome, expecting approximately 2 overlaps with chimeric peptides in Table 1 if no association exists between the two independent analyses. This means that chimeric peptides map on previously described swinger peptides 4 times more frequently than expected. This association between two independent searches confirms that results are not false positive matches between the mass spectrometry data and some predicted hypothetical chimeric peptides among a very large number of predicted hypothetical chimeric peptides. In addition, note that even if detected swinger and chimeric peptides correspond to the same swinger region, the corresponding MS/MS mass spectra differ, because for chimeric peptides mass spectra include also the adjacent residues translated from regular, untransformed RNA, while for swinger peptides, mass spectra do not include the latter residues. This non-random correspondence between swinger peptides and swinger parts of chimeric peptides suggests that translation of swinger RNAs is not random, and probably specific to some mitogenome regions.

Swinger RNA and chimeric peptides

Previously detected swinger peptides preferentially map on human mitogenome regions covered by independently detected swinger RNAs [103]. Their numbers increase with numbers of detected swinger transcripts. These positive associations between swinger RNA and swinger peptides can also be expected for chimeric peptides described in Table 1, Table 2. Such associations would confirm that the detected chimeric peptides actually exist, because they would match two independent material evidences, peptides, and RNA fragments. The mean number of PSMs (peptide spectrum matches) for chimeric peptides increases as a function of the number of human mitogenome regions covered by swinger RNA (also called contigs), for the swinger type corresponding to the swinger part of the chimeric peptides (Fig. 2). Swinger transcriptomic data are from Seligmann [103]). Chimeric peptides presumably reflect translation of chimeric RNAs, along part regular, and part swinger transcription rules. Hence amounts of chimeric peptides should reflect numbers of possible transitions between regular and swinger RNAs, estimated by the number of swinger contigs previously described by Seligmann [103]). Indeed, a positive association between PSMs of chimeric peptides and swinger RNA contigs exists (r = 0.64, one tailed P = 0.0006), strengthening confidence in the validity of results, and corresponding with previous results for swinger peptides [103]. Note that similar correlation analyses for numbers (not PSMs) of detected chimeric peptides do not yield statistically significant associations with contig numbers.
Fig. 2

Mean number of PSMs detected for chimeric peptides, as a function of the number of disjunct human mitogenome regions covered by swinger RNA (RNA data from [102]). The positive association indicates the expected causal link between swinger RNA and chimeric peptides.

The swinger part of 8 chimeric peptides (marked by * in Table 1) maps on human mitogenome regions also covered by the adequate type of swinger RNA (six swinger types, two matches for A → G → T → A and A → C → G → T → A, and one match for each A → G → C → T → A, A → G → T → C → A, A → T → C → G → A, and A → T → G → C → A swinger transformations). Considering the overall mitogenome coverage by swinger RNAs (on average 2.6% of the genome), lack of association between swinger RNAs and the swinger part of chimeric peptides would expect 4.76 matches across all 23 swinger transformations, with 0.21 peptides for the average swinger transformation. This predicted number for specific swinger transformations was always < 0.5 peptides. Detecting at least one match for six among 23 swinger types, when less than 0.5 are expected for all 23 swinger transformations has P = 0.022 according to a two-tailed Fisher exact test. This indicates that chimeric peptides associate with detected swinger RNA, though this association is weaker than the previously described association between swinger RNA and swinger peptides [103].

Chimeric peptides: strong validation of swinger sequences

Chimeric peptides are in terms of confirmation of swinger polymerization only secondary evidence, because peptides are translated from RNA, as compared to previous descriptions of swinger RNAs and chimeric swinger RNAs [100], which directly result from swinger polymerization. This point is also valid for swinger peptides. However, detection of (numerous) peptides matching translation of contiguous parts of the mitogenome, where one part reflects regular transcription, and the other swinger transcription, is a strong methodological confirmation for swinger phenomena and associated translation into peptides, which is not implied by the detection of ‘pure’ swinger peptides. This is because the non-swinger part of the peptide is a positive control paired to its contiguous swinger part. Hence in addition of describing a further aspect of the biological phenomenon of swinger polymerizations, chimeric peptides are also a further validation of the phenomenon's existence.

Chimeric peptides integrated in regular proteins?

An important question associated to swinger sequences is their function: among others, do they code for functional proteins, and are swinger peptides integrated into regular, perhaps functional proteins? A reanalysis of Table 1 yields a first insight into these important questions. The regular (non-swinger) part of eleven peptides matches the sequence of six among thirteen known, regular, mitogenome-encoded proteins. Their swinger parts correspond to the translation of the contiguous swinger transformation of these genes, along nine (four symmetric, and five asymmetric) systematic nucleotide exchange rules (Table 3). Note that up to three chimeric peptides are detected for two large mitochondrial proteins (cytochrome c oxidase I and NADH:ubiquinone oxidoreductase subunit 5). It is plausible that such peptides are integrated within complete proteins. These sequence alterations could modulate (or not) the regular function of the protein, and not necessarily impair function.
Table 3

Chimeric peptides from Table 1 with regular part matching proteins translated from known mitogenome-encoded genes. Swinger parts are underlined, gene identity is followed by the position of the ‘normal’ part of the peptide matching the regular translation of the gene in the regular protein. The swinger transformation and the amino acid inserted at stop(s) are also indicated. Peptide parts matching translation according to both nuclear and mitochondrial genetic codes are highlighted: peptide 100 could be translated in the cytosol on the base of RNA transcribed from mitochondrial inserts in the nuclear chromosome (numts), all remaining peptides could not, as at least on part of the peptide is incompatible with translation according to the nuclear genetic code. Analyses (see text) show that there are fewer detected peptides compatible with the nuclear genetic code than expected, and more than expected peptides compatible only with the mitochondrial genetic code.

Table 1 #PeptideGenePositionSwinger ruleStop
100GASFLFIWNSLYLLFGAWAGVLGTALSLLIRAELGQPGNLCOX118–47A > T > Gr
27SGWVEWSRHSVLLLLSLPVLAAGITMLLTDRCOX1205–213A ↔ C + G ↔ Ts
181LLKECLSLASVPATPPYHTFEEPVYMKCOX1500–512A > T > G > Cx
136NQMIQALLITILLGLYFTLLSIVTAGTVFGLRCOX3157–168A > C > T > Ge
169AASHPVPVPMTLLMLGLLTNTLTMYQWWRCOX341–59A > T > C > Ga
23DVSGPSSPSSSLMTLTLFSPDLLGDPDNYTLANPLNTPPYCyt B238–267C ↔ Ty
19NPSLSISVPSTRHVSMPITISSIPPQTTEMCLMKND1305–318C ↔ Tt
38WALFLSGTDSSSVSLAPLAATGSWGGLNQTQLRND2165–176A ↔ G + C ↔ Tn
34SLKQNWDFSFNSSTMVVAGIFLLIRND5249–262A ↔ C + G ↔ Tf
80IVAFSTSSQLGLMVLEVPVGVKND5301–313A > G > T > Cd
7LLGAVPLASASLTIGSLALAGMPFLTGFYSKDHIIETANMSND5374–402A ↔ G
These 11 chimeric peptides integrated in regular proteins represent 5.9% of all 186 detected chimeric peptides. Considering that regular mitochondrion-encoded proteins have a total length of 3789 amino acids, the regular proteins represent 11.43% of the total number of amino acids that could be translated from the positive and negative strands of the human mitogenome. This means that chimeric peptides embedded within regular coding sequences are half as frequent as expected (5.9 versus 11.43%). This principle is further strengthened when examining the number of PSMs (number of identified peptide spectra matching a hypothetical peptide) for these 11 regular-protein-integrated chimeric peptides, as compared to the mean number of PSMs for all chimeric peptides detected for that swinger transformation: their PSMs is in all but one case (peptide 80 in Table 1) lower than the mean PSMs of other chimeric peptides for that swinger transformation. Hence chimeric peptides within regular proteins are rarer, and less expressed (as far as PSMs numbers can be trusted to reflect peptide abundances), than chimeric peptides translated from non-coding sequences, and non-coding frames of regular protein coding genes.

The natural circular code and swinger RNA, peptides and chimeric peptides

An anonymous reviewer suggested examining whether properties of chimeric peptides can be predicted from frameshift error-correcting properties of the natural circular code. Indeed, abundances of detected swinger RNAs in GenBank's EST database are proportional to reading frame retrieval (RFR) after swinger transformation of the natural circular code [58]. In this context, RFR, which estimates the capacity of the natural circular code to retrieve the protein coding frame, is calculated for the 20 codons that form the natural circular code, after each of the 23 swinger transformations: some codons belonging to the natural circular code are transformed into another codon included in the natural circular code, meaning that this property is invariant in relation to that codon and swinger transformation. RFR estimates this across all 20 codons of the natural circular code, for each swinger transformation. The length of swinger repeats in the human mitogenome is proportional to the RFR of the swinger transformation [101], which suggests that RFR affects insertion rates of swinger repeats in protein coding regions, and hence could also affect chimeric peptide production. The association between RFR and swinger RNA abundances for EST sequences occurs also for mitogenome coverage by swinger RNA reads sequenced by RNAseq in the transcriptome by Garzon et al. [30]) (Pearson correlation coefficient r = 0.528, one tailed P = 0.005). For swinger peptides as described by Seligmann [103]), the mean number of PSMs also increases with RFR (r = 0.364, one tailed P = 0.044). This positive association between mean PSMs numbers and RFR is also detected for chimeric peptides from Table 1 (r = 0.367, one tailed P = 0.043). These two results are independent, also because mean PSMs of swinger and chimeric peptides are only weakly correlated (r = 0.24, P > 0.05). Hence detections of chimeric and swinger peptides are proportional to extents by which swinger transformations conserve natural circular code ‘frame’ punctuations. Note that RFR, as mitogenome contig numbers in a previous section, associate with mean PSMs, rather than numbers of detected peptides, suggesting that in the context of these specific data, PSMs are better quantitative estimates than other variables.

Discussion

Statistical validity of peptide detections by mass spectrometry

An anonymous reviewer of a previous version indicated that detection of peptides with masses approximately matching the numerous possibilities produced by translation of all potential chimeric RNAs could be due to chance, due mainly to the large number of hypothetical chimeric peptides. Indeed, considering all 19 possible amino acids inserted at stops introduces a ‘fudge’ factor that enables adapting many hypothetical peptides to an actual fragment with a similar mass. Note that 28 among 186 (15%) detected peptides lack stops, invalidating this argument for several detected chimeric peptides. Independently of this, there are three reasons why this important point does not invalid the remaining results on chimeric peptides presented here. This is first because mean chimeric peptide PSMs converge with corresponding swinger RNA contig numbers, an independent type of data unrelated to the problems of proteomic analyses, already discussed above. The other two points relate to the nature of the MS/MS mass spectrometry analyses themselves. The factor ‘detection by chance’ is integrated into the detection software used by Thermo Proteome Discoverer. The software compares the match between the mass spectrum of the actual fragment and the predicted mass spectrum of the hypothetical peptide, and its match with a dataset of decoy (false, negative controls) predicted peptides. The q value estimates the false detection rate (FDR, see explanations by [41]) of a peptide based on comparing matches by the actual predicted peptides and the decoy peptide database. This q is a probability of detection corrected for the false detection rates within the population of positive results (classical P values consider the whole population of statistical decisions, not only the subpopulation of positives). Hence the reported detections account for matches due to chance, considering the various parameters of the samples analyzed/compared, among them in particular sample sizes. The third point relates to the nature of the statistic whose distribution is used to evaluate the above mentioned q (FDR). It is Xcorr, the cross correlation of the goodness of fit between the experimental peptide fragments and theoretical mass spectra. This integrates fits with each b and y ions, which correspond to asymmetry in the physical fragmentation of peptide bonds within the detected peptide, resulting into shorter peptide subparts: b ions occur when the residue's N-terminal is charged, y ions when the C-terminal is charged. Hence the match between the observed and the predicted peptide is not based solely on the similarity between their total masses, but also on fit between distributions of masses of sub-fragments of the (expected and observed) peptides, and this separately for b and y ions. The Xcorr statistic accounts, in addition to peptide size, for the number of matching masses of such sub-fragments. This allows inferring more precisely the residue sequences in the peptide, and means that peptide detection is not based only on a single measure, its total mass, but also on the mass of several subfragments. In this context, the peptide ACD can function as a simplified example. Its mass corresponds to six possible peptides, ACD, ADC, CAD, CDA, DAC and DCA. Hence if ACD results from translation of swinger RNA, one can't assert that the observed mass is due to this peptide rather than any of the other five possibilities. However, Xcorr also considers the masses of subfragments of this peptide. Detection of a subfragment matching the mass of AC excludes four among the six possible peptides. A fragment matching the mass of CD matches only two peptides. If both subfragments AC and CD are detected, the characterization of the peptide ACD can be considered as assessed. In addition, this process is done separately for b and y ions, because mass spectrometry analyses are in principle sufficiently precise to distinguish between these ions (remember that the precision of 0.5 Da of the analyzed data means a precision of half the mass of a hydrogen atom, which is also far less than the difference between amino acids with similar molecular masses). Hence Xcorr integrates information from both b and y ions, evaluating whether that information is congruent with the observed data. This procedure, coupled to q values based on comparisons of the Xcorr distribution obtained for negative controls (decoy peptides), renders detections relatively robust, despite fuzzy factors. In fact, large numbers of predicted peptides are necessary to estimate properly the distribution of random Xcorrs. The last point stresses that q (as P) values account for numbers of predicted peptides.

Confirmation of chimeric peptides by Waters technology

An anonymous reviewer suggested to confirm the existence of chimeric peptides by additional, independent mitochondrial proteomic data. In this context, I focused on another analysis of trypsinized human mitochondrial peptides [1], extracted by a more up to date MS/MS technology (Waters, Milford, MA, http://www.waters.com). This technique yields more accurate mass estimates than the method used by [33] (0.5 Da for the latter versus 5 ppm for the Waters method, hence about 10 × more accurate estimates). Analyses of the twelve samples from Alberio et al. [1]) by the software PLGS yield relatively few hits matching chimeric peptides considering only peptides where each regular and swinger-encoded parts are each at least nine amino acids long. One peptide matches significantly according to PLGS a chimeric peptide whose swinger part (underlined) matches swinger transformation A → T → G → A, LVSASVEMNQQQVPGSAGR (the regular part are residues 4228–4237 translated from the third frame of the negative strand of the human mitogenome). The other peptide detected in these data has a swinger part that matches transformation C → G → T → C, SAAAARAGSACCLTSTAVTDRNLNTTF, the regular-encoded part corresponds to COX1, residues 211–219 in that regular mitochondrion-encoded protein. Hence a different technology detects within independent mitoproteomic data peptides matching translations of chimeric RNAs, with one part regular, the other swinger transformed RNA. Hence, at least qualitatively, these independent data and technology confirm the existence of chimeric peptides and their integration in regular mitochondrial proteins. A more detailed description of ‘regular’ swinger peptides (meaning peptides entirely coded by swinger transformations of the mitogenome (unlike chimeric peptides that are in part regular-, in part swinger-encoded)) detected in the data from Alberio et al. [1]) will be presented elsewhere. These results from data by Alberio et al. [1]) are too scarce to indicate whether chimeric peptides are produced according to a non-random profile. However, the non-random convergence between chimeric and entirely swinger peptides (detected in the same dataset from [33]) noted in a previous section in Results is in itself an indication that swinger-encoded peptides or parts of peptides are non-randomly produced.

Nuclear mitogenome copies

Previous transcriptomic analyses that detected non-canonical RNAs transformed according to systematic rules, such as deletions of mono- and dinucleotides after each transcribed trinucleotide (producing delRNAs, [102]), and swinger transformations [103], included controls that account whether the transformed mitogenome versions match nuclear chromosome sequences: mitogenome analyses are frequently contaminated by such chromosomic pseudogenes [9], [10], [48], [49], [50], [51], [62], [66], [67], [123], [124], [125], [133], [134]. These previous analyses blasted the swinger-transformed mitogenome versions versus the (regular) human nuclear chromosomes. For transformed mitogenome regions aligning with both transcriptomic reads and chromosomes, similarities between the transformed mitogenome and the RNA contigs were compared with the corresponding similarities between the same transformed mitogenome region and the chromosomes. For each del- and swinger RNAs (non-canonical RNAs), similarities with RNA contigs were greater than those with chromosome sequences in significant majorities of cases [102], [103], as already discussed above for swinger RNA reads. These results indicate two major issues. First, overall, RNA contigs result from non-canonical transcriptions of the mitogenome, the point that was being tested. Second, the observation that chromosome sequences match transformed versions of the mitogenome suggests that chromosomes include inserts of mitogenomic origins that were transformed according to systematic rules. The observation that these are on average less similar to the transformed mitogenome than RNA contigs suggests that these transformed mitochondrial sequences inserted in nuclear chromosomes mutated apart from the original sequence, as expected for inserts lacking function in the cell's nucleus [16], [24], [29], [34], [35], [36], [38], [40], [52], [54], [63], [65], [71], [77], [113], [116], [117], [118], [119], [126], [128], [129].

Peptides translated according to nuclear or vertebrate mitochondrial genetic codes

Similar-minded analyses at the peptide level can test whether chimeric peptides in Table 1 were translated according to the human mitochondrial or the nuclear genetic codes. For that purpose, the regular and swinger transformed versions of the human mitogenome were translated according to the standard genetic code, which differs from the vertebrate mitochondrial genetic code by the reassignment of codon ATA from Met to Ile, of TGA from Trp to stop, and AGR from stop to Arg [23]. These four codons are 6.25% of all 64 codons. Each swinger- and regular-encoded part of detected chimeric peptides has at least 9 amino acids. Hence the probability of detecting chimeric peptides that would have identical sequence according to both genetic codes is (1 − 0.0625)− k, where k is the total length of the peptide. This principle is applied to the chimeric peptides in Table 1 so as to calculate the predicted number of peptides, for each size category, that is expected to match translation according to both nuclear and mitochondrial genetic codes. Lengths of chimeric peptides in Table 1 range from 18 to 42 residues, a total of 24 length categories. The observed number of chimeric peptides compatible with translation according to both genetic codes (in total 30 among the 186 chimeric peptides) is lower than expected in 16 among 24 size categories. Obtaining this result has P = 0.038 according to a one-tailed sign test. This means that, considering the length of chimeric peptides, there are statistically significantly fewer than expected peptides with sequences compatible with translation according to the nuclear genetic code. The same principle can be applied to chimeric peptides in Table 1 whose sequences are only compatible with translation according to the mitochondrial genetic code, separately for each the regular- and the swinger-encoded parts. Here, the observed number (54) should be larger than the predicted number, if the sample is biased towards mitochondrion-encoded/translated peptides. Considering that 6.25% of codons differ in codon-amino acid assignments between the two genetic codes, the total expected number of chimeric peptides, considering their size, containing at least one of the 4 codons with coding assignment differing between nuclear and mitochondrial genetic codes is 35.97. This number is far lower than the observed 54 according to a chi-square test (P = 0.0027). Hence chimeric peptides with sequences compatible only with translation according to the mitochondrial genetic code are significantly more frequent than expected. This bias confirms the mitochondrial origin of chimeric peptides in Table 1. The number of peptide length categories where more observed peptides than expected are only compatible with mitochondrial translation is again 16 among 24 length categories, which has P = 0.038 according to a one tailed sign test. These analyses show that detected peptides are more likely translated according to the mitochondrial genetic code than according to the nuclear genetic code. Note that translation, within the mitochondrion, according to the nuclear code is possible: it potentially depends for some codons upon the presence of cytosolic tRNAs, which could be occasionally imported in mitochondria [21], [32], [39], [44], [75], [76], [78], [112], [114]. However, this rationale is not symmetric: cytosolic translation according to the mitochondrial genetic code is much less probable than the opposite, so that nuclear origins are not compatible with the results obtained. In fact, whether peptides have cytosolic or mitochondrial origins does not actually affect the main point that is addressed here, which is that these peptides were translated in part from swinger RNA. The same point applies to the potential nuclear (numt) origin of swinger-transformed mitochondrial DNA: independently of the location of the process, detection of chimeric peptides implies that swinger transformations occurred, whether during transcription of the regular mitogenome or nuclear inserts, or during numt insertion, possibly by natural swinger retrotranscription. This does not exclude the possibility that some detected chimeric peptides originate from the cytosol, but stresses the fact that most are mitochondrial, and that this issue is not directly relevant to the fact that swinger RNAs, chimeric RNAs, and corresponding peptides, exist, independently of the question of which cellular compartments produce them.

Few chimeric peptides in regular proteins translated from mitochondrion-encoded genes

Chimeric peptides in Table 3 have regular parts that match sequences of regular mitochondrial proteins encoded by mitochondrial genes. These are about 5% of all 186 detected chimeric peptides. Peptides translated from regular mitochondrial genes represent about 11% of the total length potentially translated from the complete mitogenome, considering all six frames. Hence these 11 chimeric peptides potentially integrated in regular mitochondrial proteins are half as frequent as one could expect. Their PSMs is lower than for other chimeric peptides. These are hence rarer and less expressed than one could expect. Possibly, chimeric peptides integrated in regular proteins perturb proper protein folding. Incorrect folding induces various degradation mechanisms associated with mitophagy [5], [42], [122], which could explain that only few chimeric peptides are detected within regular proteins. These findings are not incompatible with the possibility that at least some swinger transcripts and peptides are functional.

Secondary structure formation by swinger transformed RNA and swinger RNA detection

Secondary structure formation by self-hybridization of DNA/RNA groups bijective transformations into three classes of each eight transformations. These share self-hybridization properties within each class [27], [28]. This means that seven bijective transformations (including A ↔ T + C ↔ G) conserve self-hybridization properties of the original, untransformed sequence. Secondary structure formation by swinger RNA associates with swinger RNA detection [104], but these groupings/properties do not correlate with differences in chimeric or swinger peptide abundances/PSMs (not shown). The issue of regulation of alternative mitochondrial transcriptions, respectively post-transcriptional splicing, in relation to secondary structure formation by transformed RNA [61] remains unclear: a positive association exists between RNA occurrence and secondary structure formation for regular and swinger RNAs, but for transcripts resulting from systematic deletions (delRNAs), a negative association exists between secondary structure formation after deletions and delRNAs [105].

Swinger transformations, RNA–DNA differences (RDDs) and heteroplasmy

Specific non-random point differences occur between DNA and RNA sequences, either due to nucleotide substitutions [47] or inserts/deletions [15], including for human mitochondrial transcripts [8], [37], [59]. These RDDs appear shortly after transcripts exit polymerases [130], suggesting RDDs are due to post-transcriptional edition. The systematic repetition of transformations over long sequence stretches that characterize swinger RNA seem less likely produced by post-transcriptional edition than some unusual stabilized polymerase state, however, at this point, no possibility can be excluded, and potential connections of del- and swinger RNAs with RDDs should be kept in mind. For the same reason, and by definition, punctual mitochondrial heteroplasmies [45], [74], [115], [121] could not account for swinger parts of chimeric peptides, because these have to be translated from sequences differing from standard mitogenome sequences by far more than punctual nucleotide substitutions. Mitochondrial length heteroplasmies are common (49% of individuals, [64]), and in principle could, by chance correspond to swinger-like inserts in the mitogenome. Considering the seven regions containing length heteroplasmies described by Ramos et al. [64]) (therein table 3), only three among 186 chimeric peptides in Table 1 (peptide numbers 3, 156 and 157) potentially overlap (and this only in part) with these length heteroplasmies. Hence length heteroplasmies map non-randomly on chimeric peptides (3 among seven). Hence some presumed chimeric peptides might be translated from regions presenting length heteroplasmies, but this explanation is compatible with, at most, a small minority of chimeric peptides. Hence heteroplasmy could not explain chimeric peptides.

Translation increasing codon size or transcription systematically deleting nucleotides

It is important to note in the broader context of the discussion of results that further little known mechanisms increase the coding potential of sequences. A different, sometimes tRNA-based mechanism produces an alternative decoding of sequences, that of systematic frameshifting, which expands the codon from three to four (or five) nucleotides [6], called here tetracodons or pentacodons. This could result from systematic ribosomal slippages, a phenomenon that would correspond to programmed frameshifts (e.g. [25], [43]), but occurring systematically, and serially; and/or from translational activities of tRNAs with expanded anticodons [68], [120], [127]. These cases relate to previously described isolated frameshift mutations, interpreted as isolated tetra-, pentacodons. The hypothesis of an early genetic code based on quadruplets was suggested by Baranov et al. [6]) to solve the problem that the weak triplet codon-anticodon interactions could not occur from a thermodynamic point of view in the absence of ribosomes, especially if these occurred at high temperatures [17], [18], [19], [20]. Molecules as complex as ribosomes probably were absent at proto-life stages. Codon-anticodon interactions between four (or more) base pairs are more stable than those between three base pairs. Symmetry considerations also enable the deduction that the primeval genetic code was based on a subset of 64 quadruplets, called the tesserae, specifically for the vertebrate mitochondrial genetic code [31]. The expanded codon hypothesis is that modern genes include overlapping coding regions that consist of series of tetra- or pentacodons. This hypothesis is compatible with bioinformatic analyses where all eight frames of mitochondrial genes were translated assuming tetracodons. Blast analyses detected alignments between parts of these hypothetical tetracoded peptides and regular proteins in GenBank. Several other analyses based on codon usages in these tetracoding sequences confirm their special coding status, including higher GC contents than in non-tetracoding neighboring mitochondrial sequences. This corresponds to the prediction that tetracoding is an adaptation to translation at high temperature [89]. This point was further confirmed by a positive correlation between predicted tetracoding in lizard mitogenomes and mean body temperature in these lizard species [109]. Accordingly, overlap coding by tetracodons increases with temperature. At this point, and besides the proven existence of decoding mechanisms for isolated tetracodons, the strongest further evidence for the existence of protein coding regions based on tetracodons is the coevolution between predicted tetracoding regions and the predicted antisense mitochondrial tRNAs with expanded anticodons, which is observed in mammal and Drosophila mitochondria [88], [90], [95]. In addition, mitochondrial peptides matching translation of regular and swinger RNAs according to tetra- and pentacodons have been detected [103], as well as translation of delRNAs (or dRNAs), RNAs transcribed while systematically deleting every fourth, or every fourth and fifth nucleotide. Peptide translation of such transcripts uses regular tRNAs but produces peptides identical to those resulting from decoding by tRNAs with expanded anticodons of regular transcripts [102]. These delRNAs are produced by systematic deletions, every third nucleotide, and correspond at deletion level, to systematic nucleotide substitutions/exchanges. This predicts that chimeric peptides consisting in part of regular-translated, and in part tetra- or pentacoded peptides, might exist. The strongest evidence for swinger-encoding is the association between detected swinger RNAs and detected swinger peptides. Analyses detecting mass spectra matching predictions according to translations of tetra- and pentacodons suffer the caveat that evidence is based solely on mass spectra, with the above discussed difficulties in asserting the robustness of results based only on proteomics. However, further analyses detected peptides matching translations, according to expanded codons, of swinger-transformed sequences, and showed their association with detected swinger RNA [103]. Hence from a methodological point of view, translation according to expanded codons of swinger RNAs is stronger evidence for tetra- and pentacoding than such translation of regular RNA because it is confirmed by the independent detections of two ‘unusual’ types of molecules, swinger RNA and corresponding peptides matching expanded codons.

Robustness of experimental design

An anonymous reviewer indicated that analyzes comparing transcriptome and proteome make sense only if data originate from individuals with the same phenotypes, and if possible the same tissues and even the same individual(s), however analyzes compare tumor transcriptome [30] with normal proteome [33]. This setup is indeed suboptimal. However, considering this point, RNA and peptide data converge (also in previous analyzes, [102], [103]) despite that RNA and peptide data originate from different tissues/individuals/phenotypes. This indicates that the phenomenon is general, and robust. This should not be surprising, because analyzes consider only RNA and peptides corresponding to the mitogenome. Most tissue-specific differences in mitochondrial RNA and protein profiles relate to molecules imported from the cytosol [13]. Methods used to detect the various types of unusual peptides take into account the large numbers of possibilities in matching observed and hypothetical mass spectra, so that positive detections are robust, and could not be due to chance. Beyond methodological issues, occurrence of peptides coded by combinations of presumably unusual coding systems (translation of stops, together with translation according to expanded codons, and this for swinger RNAs), suggests that these basically ignored mechanisms expand more frequently than presumed the coding potential of genes, at least of the short mitogenomes. Detections of chimeric peptides, consisting of peptide parts corresponding to regular translation, adjacent to peptide parts matching translation of contiguous swinger RNA, strengthen confidence in the validity of results as positive controls, and expand our understanding of the phenomenon: swinger peptides are occasionally integrated in regular mitochondrion-encoded proteins, but their occurrence is downregulated.

Conclusions

Analyses of MS/MS mass spectrometry data detect peptides matching the translation of chimeric transcripts, RNA following in part regular, and in part swinger-transformed transcription, assuming abrupt switches between regular and swinger transformed parts of the RNA. The 186 detected chimeric peptides (peptides consisting of a part encoded by regular RNA and a contiguous part encoded by swinger RNA) represent 3/100,000 among potential chimeric peptides, about 200 times fewer (6/1000) than detected swinger peptides (peptides entirely encoded by swinger RNA) in the same data. Eleven among these 186 chimeric peptides have a regular-encoded part that corresponds to proteins translated from classical mitochondrion-encoded genes. Chimeric peptides map on previously detected swinger RNA. This association is weaker than a previously described association between ‘regular’ swinger peptides and swinger RNAs [103]. The vertebrate mitochondrial genetic code differs from the nuclear genetic code for four codons. Numbers of detected chimeric peptides that could be translated from human mitogenome sequences according to the nuclear genetic code are significantly fewer than expected considering the differences between the two genetic codes. This means that the majority of detected chimeric peptides are not cytosolic contaminations and were translated in the mitochondrion. Previous detections of swinger peptides (predicted products of translation of swinger RNA) suggested that swinger transformed RNA is translation-competent [103]. Chimeric peptides where the regular part corresponds to known mitochondrion-encoded proteins might be incorporated into the respiratory chain complexes. Chimeric and swinger peptides might affect known mitochondrial functions despite low abundances if they have regulatory functions. Results are compatible with the possibility that some proteins are encoded by swinger transformations, with yet unknown functions.
  130 in total

1.  Using secondary structure to identify ribosomal numts: cautionary examples from the human genome.

Authors:  Link E Olson; Anne D Yoder
Journal:  Mol Biol Evol       Date:  2002-01       Impact factor: 16.240

Review 2.  Import of tRNAs and aminoacyl-tRNA synthetases into mitochondria.

Authors:  Anne-Marie Duchêne; Claire Pujol; Laurence Maréchal-Drouard
Journal:  Curr Genet       Date:  2008-12-16       Impact factor: 3.886

3.  Mitochondrial tRNAs as light strand replication origins: similarity between anticodon loops and the loop of the light strand replication origin predicts initiation of DNA replication.

Authors:  Hervé Seligmann
Journal:  Biosystems       Date:  2009-09-13       Impact factor: 1.973

4.  The relation between hairpin formation by mitochondrial WANCY tRNAs and the occurrence of the light strand replication origin in Lepidosauria.

Authors:  Hervé Seligmann; Antonieta Labra
Journal:  Gene       Date:  2014-02-18       Impact factor: 3.688

5.  The genomic landscape of polymorphic human nuclear mitochondrial insertions.

Authors:  Gargi Dayama; Sarah B Emery; Jeffrey M Kidd; Ryan E Mills
Journal:  Nucleic Acids Res       Date:  2014-10-27       Impact factor: 16.971

6.  Nuclear insertions help and hinder inference of the evolutionary history of gorilla mtDNA.

Authors:  O Thalmann; D Serre; M Hofreiter; D Lukas; J Eriksson; L Vigilant
Journal:  Mol Ecol       Date:  2005-01       Impact factor: 6.185

7.  Rates of DNA duplication and mitochondrial DNA insertion in the human genome.

Authors:  Douda Bensasson; Marcus W Feldman; Dmitri A Petrov
Journal:  J Mol Evol       Date:  2003-09       Impact factor: 2.395

8.  Rampant nuclear insertion of mtDNA across diverse lineages within Orthoptera (Insecta).

Authors:  Hojun Song; Matthew J Moulton; Michael F Whiting
Journal:  PLoS One       Date:  2014-10-21       Impact factor: 3.240

9.  Continued colonization of the human genome by mitochondrial DNA.

Authors:  Miria Ricchetti; Fredj Tekaia; Bernard Dujon
Journal:  PLoS Biol       Date:  2004-09-07       Impact factor: 8.029

10.  Novel modes of RNA editing in mitochondria.

Authors:  Sandrine Moreira; Matus Valach; Mohamed Aoulad-Aissa; Christian Otto; Gertraud Burger
Journal:  Nucleic Acids Res       Date:  2016-03-21       Impact factor: 16.971

View more
  6 in total

1.  The Maximal C³ Self-Complementary Trinucleotide Circular Code X in Genes of Bacteria, Archaea, Eukaryotes, Plasmids and Viruses.

Authors:  Christian J Michel
Journal:  Life (Basel)       Date:  2017-04-18

2.  Bijective codon transformations show genetic code symmetries centered on cytosine's coding properties.

Authors:  Hervé Seligmann
Journal:  Theory Biosci       Date:  2017-11-16       Impact factor: 1.919

3.  Unbiased Mitoproteome Analyses Confirm Non-canonical RNA, Expanded Codon Translations.

Authors:  Hervé Seligmann
Journal:  Comput Struct Biotechnol J       Date:  2016-10-05       Impact factor: 7.271

4.  Genetic Code Optimization for Cotranslational Protein Folding: Codon Directional Asymmetry Correlates with Antiparallel Betasheets, tRNA Synthetase Classes.

Authors:  Hervé Seligmann; Ganesh Warthi
Journal:  Comput Struct Biotechnol J       Date:  2017-08-12       Impact factor: 7.271

5.  Transcripts with systematic nucleotide deletion of 1-12 nucleotide in human mitochondrion suggest potential non-canonical transcription.

Authors:  Ganesh Warthi; Hervé Seligmann
Journal:  PLoS One       Date:  2019-05-23       Impact factor: 3.240

6.  Chimeric Translation for Mitochondrial Peptides: Regular and Expanded Codons.

Authors:  Hervé Seligmann; Ganesh Warthi
Journal:  Comput Struct Biotechnol J       Date:  2019-08-23       Impact factor: 7.271

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.