Literature DB >> 35913170

An Integrated Approach for Microprotein Identification and Sequence Analysis.

Omar Brito-Estrada1, Keira R Hassel1, Catherine A Makarewich2.   

Abstract

Next-generation sequencing (NGS) has propelled the field of genomics forward and produced whole genome sequences for numerous animal species and model organisms. However, despite this wealth of sequence information, comprehensive gene annotation efforts have proven challenging, especially for small proteins. Notably, conventional protein annotation methods were designed to intentionally exclude putative proteins encoded by short open reading frames (sORFs) less than 300 nucleotides in length to filter out the exponentially higher number of spurious noncoding sORFs throughout the genome. As a result, hundreds of functional small proteins called microproteins (<100 amino acids in length) have been incorrectly classified as noncoding RNAs or overlooked entirely. Here we provide a detailed protocol to leverage free, publicly available bioinformatic tools to query genomic regions for microprotein-coding potential based on evolutionary conservation. Specifically, we provide step-by-step instructions on how to examine sequence conservation and coding potential using Phylogenetic Codon Substitution Frequencies (PhyloCSF) on the user-friendly University of California Santa Cruz (UCSC) Genome Browser. Additionally, we detail steps to efficiently generate multiple species alignments of identified microprotein sequences to visualize amino acid sequence conservation and recommend resources to analyze microprotein characteristics, including predicted domain structures. These powerful tools can be used to help identify putative microprotein-coding sequences in noncanonical genomic regions or to rule out the presence of a conserved coding sequence with translational potential in a noncoding transcript of interest.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 35913170      PMCID: PMC9521633          DOI: 10.3791/63841

Source DB:  PubMed          Journal:  J Vis Exp        ISSN: 1940-087X            Impact factor:   1.424


  80 in total

1.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

2.  A micropeptide encoded by a putative long noncoding RNA regulates muscle performance.

Authors:  Douglas M Anderson; Kelly M Anderson; Chi-Lun Chang; Catherine A Makarewich; Benjamin R Nelson; John R McAnally; Prasad Kasaragod; John M Shelton; Jen Liou; Rhonda Bassel-Duby; Eric N Olson
Journal:  Cell       Date:  2015-01-29       Impact factor: 41.582

Review 3.  Mining for Micropeptides.

Authors:  Catherine A Makarewich; Eric N Olson
Journal:  Trends Cell Biol       Date:  2017-05-18       Impact factor: 20.808

Review 4.  Discovery and characterization of smORF-encoded bioactive polypeptides.

Authors:  Alan Saghatelian; Juan Pablo Couso
Journal:  Nat Chem Biol       Date:  2015-12       Impact factor: 15.040

5.  A peptide encoded by a transcript annotated as long noncoding RNA enhances SERCA activity in muscle.

Authors:  Benjamin R Nelson; Catherine A Makarewich; Douglas M Anderson; Benjamin R Winders; Constantine D Troupes; Fenfen Wu; Austin L Reese; John R McAnally; Xiongwen Chen; Ege T Kavalali; Stephen C Cannon; Steven R Houser; Rhonda Bassel-Duby; Eric N Olson
Journal:  Science       Date:  2016-01-15       Impact factor: 47.728

6.  SARS-CoV-2 gene content and COVID-19 mutation impact by comparing 44 Sarbecovirus genomes.

Authors:  Irwin Jungreis; Rachel Sealfon; Manolis Kellis
Journal:  Nat Commun       Date:  2021-05-11       Impact factor: 14.919

7.  Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs.

Authors:  Norihiro Maeda; Takeya Kasukawa; Rieko Oyama; Julian Gough; Martin Frith; Pär G Engström; Boris Lenhard; Rajith N Aturaliya; Serge Batalov; Kirk W Beisel; Carol J Bult; Colin F Fletcher; Alistair R R Forrest; Masaaki Furuno; David Hill; Masayoshi Itoh; Mutsumi Kanamori-Katayama; Shintaro Katayama; Masaru Katoh; Tsugumi Kawashima; John Quackenbush; Timothy Ravasi; Brian Z Ring; Kazuhiro Shibata; Koji Sugiura; Yoichi Takenaka; Rohan D Teasdale; Christine A Wells; Yunxia Zhu; Chikatoshi Kai; Jun Kawai; David A Hume; Piero Carninci; Yoshihide Hayashizaki
Journal:  PLoS Genet       Date:  2006-04       Impact factor: 5.917

8.  De novo gene birth.

Authors:  Stephen Branden Van Oss; Anne-Ruxandra Carvunis
Journal:  PLoS Genet       Date:  2019-05-23       Impact factor: 5.917

9.  Discovery of high-confidence human protein-coding genes and exons by whole-genome PhyloCSF helps elucidate 118 GWAS loci.

Authors:  Jonathan M Mudge; Irwin Jungreis; Toby Hunt; Jose Manuel Gonzalez; James C Wright; Mike Kay; Claire Davidson; Stephen Fitzgerald; Ruth Seal; Susan Tweedie; Liang He; Robert M Waterhouse; Yue Li; Elspeth Bruford; Jyoti S Choudhary; Adam Frankish; Manolis Kellis
Journal:  Genome Res       Date:  2019-09-19       Impact factor: 9.043

10.  Regulation of the ER stress response by a mitochondrial microprotein.

Authors:  Qian Chu; Thomas F Martinez; Sammy Weiser Novak; Cynthia J Donaldson; Dan Tan; Joan M Vaughan; Tina Chang; Jolene K Diedrich; Leo Andrade; Andrew Kim; Tong Zhang; Uri Manor; Alan Saghatelian
Journal:  Nat Commun       Date:  2019-10-25       Impact factor: 14.919

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.