| Literature DB >> 31658277 |
Marcela K Tello-Ruiz1,2, Cristina F Marco3, Fei-Man Hsu4, Rajdeep S Khangura5, Pengfei Qiao6, Sirjan Sapkota7, Michelle C Stitzer8, Rachael Wasikowski9, Hao Wu10, Junpeng Zhan11,12, Kapeel Chougule1, Lindsay C Barone3, Cornel Ghiban3, Demitri Muna1, Andrew C Olson1, Liya Wang1, Doreen Ware1,13, David A Micklos3.
Abstract
The sophistication of gene prediction algorithms and the abundance of RNA-based evidence for the maize genome may suggest that manual curation of gene models is no longer necessary. However, quality metrics generated by the MAKER-P gene annotation pipeline identified 17,225 of 130,330 (13%) protein-coding transcripts in the B73 Reference Genome V4 gene set with models of low concordance to available biological evidence. Working with eight graduate students, we used the Apollo annotation editor to curate 86 transcript models flagged by quality metrics and a complimentary method using the Gramene gene tree visualizer. All of the triaged models had significant errors-including missing or extra exons, non-canonical splice sites, and incorrect UTRs. A correct transcript model existed for about 60% of genes (or transcripts) flagged by quality metrics; we attribute this to the convention of elevating the transcript with the longest coding sequence (CDS) to the canonical, or first, position. The remaining 40% of flagged genes resulted in novel annotations and represent a manual curation space of about 10% of the maize genome (~4,000 protein-coding genes). MAKER-P metrics have a specificity of 100%, and a sensitivity of 85%; the gene tree visualizer has a specificity of 100%. Together with the Apollo graphical editor, our double triage provides an infrastructure to support the community curation of eukaryotic genomes by scientists, students, and potentially even citizen scientists.Entities:
Year: 2019 PMID: 31658277 PMCID: PMC6816542 DOI: 10.1371/journal.pone.0224086
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Curation of exon 3 of PIN9 (Zm00001d043179).
Exons of incorrect length were the most common error detected by both triage methods. The Apollo editing window shows a “User-created annotation” at top followed by the longer, incorrect B73 RefGen_V4 model (“MAKER_updated”). The shortened exon was supported by aligned evidence: protein sequences from sorghum and rice, assembled long Iso-Seq reads combined from six tissues, and RNA-seq from roots, among other tissues.
Annotation errors and characteristics of maize genes identified by quality metrics and gene trees.
Errors found in 11 genes flagged by quality metrics in five maize gene families (25 transcripts). Errors found in 40 genes flagged by quality metrics, 34 genes flagged by gene trees and 12 genes flagged by both methods in 419 maize classical genes (2,127 transcripts).
| Single Triage | Double Triage | |||
|---|---|---|---|---|
| % | % | % | % | |
| Missing exon(s) | 55 | 53 | 29 | 33 |
| Extra exon(s) | 18 | 28 | 24 | 33 |
| Different exon length(s) | 82 | 60 | 52 | 100 |
| Non-canonical splice site(s) | 27 | 5 | 6 | 50 |
| Extend UTR(s) | 18 | 5 | 50 | 42 |
| Single transcript gene | 45 | 23 | 56 | 8 |
| Existing model (multiple transcripts) | 55 | 67 | 24 | 58 |
| Novel curation | 45 | 33 | 76 | 42 |
Fig 2Triage of BRICK1 (Zm00001d018535) with the Gramene gene tree visualizer.
Comparison to closest plant orthologs and maize paralogs revealed that the B73 RefGen_V4 model was missing the entire 5’ end. BRICK genes function in a common pathway to promote polarized cell division and cell morphogenesis in the maize leaf epidermis. The humans ortholog of BRICK1 (BRK1) is required for cell proliferation and cell transformation by oncogenes. Notably, patients with Von Hippel-Lindau syndrome normally develop tumors, but those lacking a functional Brk1 gene are protected from tumorigenesis.
Fig 3Flowchart of the double triage of classical maize genes and comparative number of genes flagged for curation by quality metrics and gene trees.
Parallel methods–MAKER-P quality metrics (blue) and the gene tree visualizer (yellow)–produced different but overlapping sets of genes for manual curation. Genes with five of more transcripts were excluded.
Fig 4Curation of two mini-exons in INVCW4 (Zm00001d001941).
The Apollo editing window shows a “User-created annotation” at top, followed by incorrect B73 RefGen_V4 model (“MAKER_updated”) and “v3 model mapped to v4.” A conserved 9-nucleotide exon (red circle) and a novel 19-nucleotide exon (blue circle) were supported by protein sequences from sorghum and rice, assembled EST transcripts from ultra-deep sequencing, long Iso-Seq reads combined from six tissues, and RNA-seq from root and other tissues.