| Literature DB >> 29577068 |
Yaqin Deng1,2, Adekunle Toyin Bamigbade1, Mirza Ahmed Hammad1, Shimeng Xu1, Pingsheng Liu1,2.
Abstract
Identification of the coding elements in the genome is fundamental to interpret the development of living systems and species diversity. Small peptides (length < 100 amino acids) have played an important role in regulating the biological metabolism, but their identification has been limited by their size and abundance. Serum is the most important body fluid and is full of small peptides. In this study, we have established a small ORF-encoded peptides (SEPs) database from mouse GENCODE release. This database provides about half a million putative translated SEPs in mouse. We also extract serum proteins from wild type and ob/ob mice, and collect the low molecular weight proteins for mass spectrometric analysis. More than 50 novel SEPs have been discovered. Several SEPs are further verified by biochemical method with newly raised antibodies. These novel SEPs enhance the knowledge about the complexity of serum and provide new clues for the annotation and functional analysis of genes, especially the noncoding elements in the genome.Entities:
Keywords: Database; Mass spectrometric analysis; Serum; Small ORF-encoded peptides (SEPs); ob/ob mice
Year: 2018 PMID: 29577068 PMCID: PMC5860097 DOI: 10.1007/s41048-018-0048-0
Source DB: PubMed Journal: Biophys Rep ISSN: 2364-3439
Fig. 1Construction of Mouse Merged database. Around 110 thousand transcripts of mouse were released from Gencode (vM4). All transcripts, except the known coding transcripts, were translated to SEPs (length between 8 and 100 a.a.) by the ORF Finder program and an in-house program. The SEP database was then merged with mouse Uniprot database and Contamination database to form Mouse Merged database for mass spectrometry data mining in this study
Verification of the MMD
| Accession | Sequence | Pub. SEPs | Reference |
|---|---|---|---|
| ENSMUSG00000064337.1 | MKWEEMGYIFL | MOTS-c | Lee |
| ENSMUSG00000019933.3 | MSGKSWVLISTTSPQSLEDEILGRLLKILFVLFVDLMSIMYVVITS | MLN | Anderson |
| ENSMUSG00000028475.8 | METAVIGMVAVLFVITMAITCILCYFSYDSHTQDPERSSRRSFTVATFHQEASLFTGPALQSRPLPRPQNFWTVV | SPAR | Matsumoto |
| ENSMUSG00000086316.3 | MGDQPCASGRSTLPPGNTREPKPPKKRCVLAPRWDYPEGTPSGGSSTLPSAPPPASAGLKSHPPPPEK | NoBody | D’Lima |
Fig. 2Verification of ob/ob mice. Twelve-week-old male mice were chosen for experiments in this study. A The weight of WT and ob/ob mice. B The IPGTT test for the WT and ob/ob mice. All the mice were fasting for 18 h before IPGTT. 2 g/kg glucose was injected for the IPGTT. C Area under the curve was calculated for the IPGTT. WT mice, n = 5; ob/ob mice, n = 6. Data were analyzed by Student t tests and presented as mean ± SEM. Significance, *p < 0.05; ***p < 0.001
Fig. 3Working procedure for the serum SEP detection. According to the workflow for the enrichment and identification of low abundance mouse serum proteins (A), two rounds of mouse serum proteins were separated by SDS-PAGE and stained by Colloidal blue. B The first round, seven WT mice. C The second round, four WT mice and four ob/ob mice. The proteins below 14 kDa (the proteins between two red lines in every lane) were sliced for mass spectrometric analysis
MS identification of sliced bands
| Band NO. | SEP NO. | Accession | Description | Detected peptide | XCorr | M.W. (kDa) |
|---|---|---|---|---|---|---|
| 1 | 1 | ENSMUSG00000032985.11 | 5730522E02Rik|processed_transcript | LPQLAAAEPNRPR | 2.22 | 10.6 |
| 2 | ENSMUSG00000085596.1 | Gm11476| processed_transcript | QLPLYQIEILVcNITAmTHPSNFSlESNQcRLSPRSQQLEcPK | 2.11 | 8.8 | |
| 3 | ENSMUSG00000028289.8 | Epha7| processed_transcript | MHLQSRLSAKR | 1.65 | 8.0 | |
| 4 | ENSMUSG00000029049.10 | Morn1 | processed_transcript | NGTGcVSPLELR | 1.09 | 6.4 | |
| 5 | ENSMUSG00000021177.11 | Tdp1|retained_intron | MVcLSFTTK | 0.75 | 1.4 | |
| 6 | ENSMUSG00000042105.14 | Inpp5f|retained_intron | MLVLPVnLPR | 1.73 | 1.3 | |
| 7 | ENSMUSG00000029647.11 | Pan3|nonsense_mediated_decay | MLLIcNQKQMHLPSSWPTSSDR | 1.47 | 2.7 | |
| 8 | ENSMUSG00000081985.1 | lincRNA 1700047M11Rik|lincRNA | SNRSLTLR | 1.05 | 1.4 | |
| 9 | ENSMUSG00000081985.1 | Gng2-ps1|processed_pseudogeiie | KLIELLKmEAnlDR | 1.05 | 6.0 | |
| 10 | ENSMUSG00000006423.11 | C330007P06Rik|retained_intron | HPPVIFVTYTHMQANTHAHKIR | 1.36 | 6.3 | |
| 11 | ENSMUSG00000059974.6 | Ntm|processed_transcript | TTQAKMHnSISWAIFTGLAALcLFQGK | 1.49 | 3.2 | |
| 2 | 3 | ENSMUSG00000028289.8 | Epha7|processed_transcript | MHLQSRLSAKR | 1.65 | 8.0 |
| 12 | ENSMUSG00000031634.8 | Ufsp2|processed_transcript | mISSKPIER | 1.66 | 2.8 | |
| 1 | ENSMUSG00000032985.11 | 5730522E02Rik|procssed_transcript | LPQLAAAEPNRPR | 1.60 | 10.6 | |
| 3 | 13 | ENSMUSG00000054693.10 | Adam10|nonsense_mediated_decay | KEALVmGLSLMEDLKVSSR | 0.90 | 9.6 |
| 14 | ENSMUSG00000035953.9 | Tmem55b|retained_intron | HFPRSLRDIQPccLER | 1.28 | 3.4 | |
| 15 | ENSMUSG000000103242.1 | RP23-132G24.3|processed_pseudogene | MAEAIYIIEVKEWGK | 1.33 | 2.5 | |
| 4 | 16 | ENSMUSG00000085553.1 | antisense Gm14808|antisense | QnNHGGWLWVPKEScALGR | 1.65 | 7.4 |
| 12 | ENSMUSG00000031634.8 | Ufsp2|processed_transcript | mISSKPIER | 1.65 | 2.8 | |
| 17 | ENSMUSG00000048215.10 | lincRNA A630023P12Rik|lincRNA | SQNFSWImLLcPSQM | 0.47 | 4.9 | |
| 18 | ENSMUSG00000086528.1 | antisense Gm15731|antisense | TIQKAPPHYmSIELR | 1.60 | 4.1 | |
| 19 | ENSMUSG00000102503.1 | TEC RP23-388I22.1|TEC | mYYLVKmScYmKcLR | 0.32 | 2.7 | |
| 5 | 20 | ENSMUSG00000085865.1 | lincRNA Gm15966|lincRNA | SASSWNQPLPGPSGFGLEEVSREGGWR | 3.36 | 5.8 |
| 21 | ENSMUSG00000081123.1 | Gm11469|processed_pseudogene | EGVNIAEAIER | 1.60 | 10.1 | |
| 4 | ENSMUSG00000029049.10 | Morn1|processed_transcript | NGTGcVSPLELR | 1.45 | 6.4 | |
| 22 | ENSMUSG00000053199.9 | Arhgap20|retained_intron | QSTVKcWRPFQmSHmQTFmK | 1.15 | 7.7 | |
| 23 | ENSMUSG00000029464.6 | Gpn3|retained_intron | PGGAERnSR | 0.78 | 2.4 | |
| 24 | ENSMUSG00000098033.1 | Gm9381|processed_pseudogene | IVSnAScTTncIVLLAKVIFGmTTLALER | 1.69 | 9.3 | |
| 25 | ENSMUSG00000102240.1 | TEC RP23-242B14.1|TEC | mNLKILTYVcFASQRQTIYLENR | 2.21 | 5.5 | |
| 26 | ENSMUSG00000020063.12 | Sirt1|nonsense_mediated_decay | mVFHTFLFVTLnSLK | 1.06 | 3.1 | |
| 11 | ENSMUSG00000059974.6 | Ntm|processed_transcript | TIQAKMHnSISWAIFTGLAALcLFQGK | 2.30 | 3.2 | |
| 6 | 27 | ENSMUSG00000024073.10 | Birc6|retained_intron | QLFLVEnKNLNIIIPmFYcFFPIR | 1.22 | 11.3 |
| 28 | ENSMUSG00000057406.12 | Whsc1|nonsense_mediated_decay | SLPSQKcSPKYSENEAR | 0.83 | 3.9 | |
| 29 | ENSMUSG00000031559.10 | 4930555F03Rik|processed_transcript | mLHcVHSSLIYnSQTLER | 1.58 | 4.9 | |
| 7 | 30 | ENSMUSG00000072929.6 | Gm15109|unprocessed_pseudogene | DTMVQEEEMDQGMHHHQDLSQK | 0.24 | 3.9 |
| 31 | ENSMUSG00000090699.1 | Gm9071|unprocessed_pseudogene | mKEKEVMSFLHNLEMEYIEAR | 1.51 | 6.2 | |
| 32 | ENSMUSG00000025495.10 | Ptdss2|nonsense_mediated_decay | NPSGYSLQHQERYcGQYFGFLMFWSHT | 1.01 | 6.9 | |
| 8 | 33 | ENSMUSG00000026414.9 | Tnnt2|retained_intron | DAILEALR | 1.67 | 4.8 |
| 34 | ENSMUSG00000025089.11 | Gfral|nonsense_mediated_decay | FPHTFYHRVLIcSTAWDPNK | 1.12 | 7.0 | |
| 9 | 35 | ENSMUSG00000034285.11 | Nipsnap1|nonsense_mediated_decay | IEVLGSLFR | 1.97 | 6.6 |
| 33 | ENSMUSG00000026414.9 | Tnnt2|retained_intron | DAILEALR | 1.61 | 4.8 | |
| 36 | ENSMUSG00000084274.2 | Gm12504|processed_transcript | MNYFcFHImWcYVLSFmAR | 0.33 | 3.7 | |
| 15 | ENSMUSG00000103242.1 | RP23-132G24.3|processed_pseudogene | MAEAIYIIEVKEWGK | 1.06 | 2.5 | |
| 37 | ENSMUSG00000102415.1 | TEC RP23-284P20.1|TEC | LTKTYQHVYcMLK | 0.91 | 3.2 | |
| 10 | 33 | ENSMUSG00000031626.12 | Tnnt2|retained_intron | DAILEALR | 1.72 | 4.8 |
| 38 | ENSMUSG00000031626.12 | Pros1|retained_intron | EnmDSnHKKTVFSILLEMR | 0.23 | 4.8 | |
| 39 | ENSMUSG00000031626.12 | Gm29365|unprocessed_pseudogene | SVTDmDTIEKSNLnRQFLFcPWDVTK | 0.64 | 8.5 | |
| 11 | 1 | ENSMUSG00000032985.11 | 5730522E02Rik|processed_transcript | LPQLAAAEPNRPR | 2.16 | 10.6 |
| 40 | ENSMUSG00000092054.3 | Kif4-ps|transcribed_processed_pseudogene | mLTELEK | 1.09 | 5.7 | |
| 41 | ENSMUSG00000090109.1 | Ear-ps10|unprocessed_pseudogene | TTVAMKSYTVAcNPR | 1.50 | 7.9 | |
| 42 | ENSMUSG00000099956.1 | Gm29365|unprocessed_pseudogene | DPAFYAYQLLDDYKEGnLHMIPDTPPAEERSGDDSDVLIGn | 0.61 | 6.0 | |
| 12 | 43 | ENSMUSG00000031626.12 | Sorbs2|processed_transcript | YQIFnFnR | 1.70 | 2.4 |
| 44 | ENSMUSG00000022686.10 | B3gnt5|processed_transcript | FVLETFPPGLLGGQRTSGTFK | 1.10 | 4.5 | |
| 45 | ENSMUSG00000083128.1 | Gm12723|processed_pseudogene | TEAIEALVK | 1.18 | 5.7 | |
| 12 | ENSMUSG00000031634.8 | Ufsp2|processed_transcript | mISSKPIER | 1.82 | 2.8 | |
| 46 | ENSMUSG00000020361.9 | Hspa4|processed_transcript | TQYVDHAGLELKGSHQPLPPK | 1.03 | 5.2 | |
| 47 | ENSMUSG00000091078.1 | antisense Gm17218|antisense | MASVSPEIKR | 1.38 | 1.6 | |
| 13 | 12 | ENSMUSG00000031634.8 | Ufsp2|processed_transcript | mISSKPIER | 1.88 | 2.8 |
| 48 | ENSMUSG00000103862.1 | TEC RP23-198F7.2|TEC | mRNWLVSPmnSK | 1.24 | 4.1 | |
| 14 | 6 | ENSMUSG00000042105.14 | Inpp5f|retained_intron | MLVLPVnLPR | 2.34 | 1.3 |
| 1 | ENSMUSG00000032985.11 | 5730522E02Rik|processed_transcript | LPQLAAAEPNRPR | 1.85 | 10.6 | |
| 49 | ENSMUSG00000042688.12 | Mapk6|retained_intron | FLFTnR | 1.65 | 1.7 | |
| 12 | ENSMUSG00000031634.8 | Ufsp2|processed_transcript | mISSKPIER | 1.58 | 2.8 | |
| 50 | ENSMUSG00000076594.1 | Ikgv6-13|IG_LV_gene | ASQn | 1.30 | 10.1 | |
| 51 | ENSMUSG00000005360.10 | Slcla3|retained_intron | VWEAPRYnK | 1.40 | 6.2 | |
| 52 | ENSMUSG00000103591.1 | TEC RP24-369B15.2|TEC | mMLKTIcRIINVFLILLnEDDAK | 1.26 | 5.4 | |
| 53 | ENSMUSG00000006010.10 | BC003331 |retained_ intron | ELSWIIWmKNGPQNMPAR | 1.48 | 3.0 | |
| 15 | 54 | ENSMUSG00000097002.1 | lincRNA Gm2670|lincRNA | KVnLFQAK | 1.98 | 3.5 |
| 33 | ENSMUSG00000026414.9 | Tnnt2|retained intron | DAILEALR | 1.93 | 4.8 |
Mouse Merged database was used for mass spectrometry database search
Fig. 4MS/MS spectrum of the four example peptides. The matched fragment ions of precursor ions were listed in the right of MS/MS spectra. All the matched ions were labeled with different colors, b-ions were labeled with red color, y-ions were labeled with blue color. The sequences below the spectra were the corresponding full length SEPs according to the Mouse Merged database. Red highlights represent the detected peptide fragments. A The spectrum result of SEP3. B The spectrum result of SEP12. C The spectrum result of SEP33. D The spectrum result of SEP54
Fig. 5SEP3 and SEP12 are conserved in mammals. Conservation analysis of SEP3 (A) and SEP12 (B) with clustal multiple alignment in six species
Fig. 6WB verification of SEPs in mouse serum. Polyclonal antibodies for four SEPs were raised in rabbits. Two antibodies showed specific bands in the low molecular weight area of mouse serum samples. A Anti-SEP3 antibody recognized a target protein in around 8 kDa, indicated by the red arrow. B Anti-SEP54 antibody recognized a target protein in around 10 kDa, indicated by the red arrow