| Literature DB >> 34297055 |
Andrew J Olson1, Doreen Ware1,2.
Abstract
SUMMARY: Genome sequencing projects annotate protein-coding gene models with multiple transcripts, aiming to represent all of the available transcript evidence. However, downstream analyses often operate on only one representative transcript per gene locus, sometimes known as the canonical transcript. To choose canonical transcripts, TRaCE (Transcript Ranking and Canonical Election) holds an 'election' in which a set of RNA-seq samples rank transcripts by annotation edit distance. These sample-specific votes are tallied along with other criteria such as protein length and InterPro domain coverage. The winner is selected as the canonical transcript, but the election proceeds through multiple rounds of voting to order all the transcripts by relevance. Based on the set of expression data provided, TRaCE can identify the most common isoforms from a broad expression atlas or prioritize alternative transcripts expressed in specific contexts.Entities:
Year: 2021 PMID: 34297055 PMCID: PMC8696091 DOI: 10.1093/bioinformatics/btab542
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.(A) The complex set of transcript models for the Zea mays B73 gene sbe4 (starch branching enzyme4). Red blocks show the predicted coding regions, and orange blocks are untranslated regions. The longest translation contains a retained intron and was selected as the canonical transcript for Compara gene tree analysis. (B) The left side shows a portion of the gene tree focused on this maize gene and displaying homologs from Sorghum bicolor, Setaria italica, Brachypodium distachyon and Oryza sativa Japonica. The right side shows regions of protein sequences participating in the multiple sequence alignment, color coded by InterPro domain. The first row shows a unique region relative to other species that derives from the retained intron
Fig. 2.Flowchart of preparation of TRaCE inputs and a schematic of the rank-choice voting (RCV) approach to select transcripts for an example gene with three transcripts (blue, red, gray). Exon thickness corresponds to non-coding, coding and functional regions with Pfam domains. Voters are represented by rectangles, and rank transcripts by length criteria (9, 6 or 3 votes) or AED (1 vote per sample). Eight of the samples rank the red and blue transcripts equally (blue-red gradient), so both get tallied in round 1. RCV selects the blue transcript first with 24 rank 1 votes. After removing the blue votes from consideration, the red and gray transcripts tie with 10 rank 1 votes, but the red transcript is elected with 14 rank 2 votes