| Literature DB >> 28105312 |
Sandeep Chakraborty1, Pedro J Martínez-García1, Abhaya M Dandekar1.
Abstract
Background: The transcriptome, a treasure trove of gene space information, remains severely under-used by current genome annotation methods.Entities:
Keywords: MAKER-P; RNA-seq; Trinity; berberine bridge enzyme; genome annotation; transcriptome; walnut genome sequence
Year: 2016 PMID: 28105312 PMCID: PMC5200947 DOI: 10.12688/f1000research.10040.1
Source DB: PubMed Journal: F1000Res ISSN: 2046-1402
Figure 1. YeATSAM flow.
First, transcripts from extraneous organisms are pruned. Next, the three longest open reading frames (ORFs) from each transcript undergo BLAST analysis to a database of plant peptides. Depending on the number of significant matches, the transcripts are clustered as: ( a) None - either a previously unknown gene, or non-coding RNA. ( b) One - Unique ORF ( c) Multiple ORFs matching to the same gene - merge the ORFs if the Evalue of the combined ORF is significantly lower. ( d) Multiple ORFs matching to different genes - duplicate the transcripts, associating each with a different ORF. Subsequently, the ORFs are merged based on overlapping amino acid sequences and exact substrings are removed.
Figure 2. Open reading frames (ORF) that can be merged.
( a) ORFs from the same transcript: C20727_G1_I1 has two ORFS (ORF 15 and ORF 36) matching to a DNA repair metallo-β-lactamase family protein (Accession number: XP007043420.1) with high significance. We merged the two ORFs (inserting ‘ZZZ’) since the Evalue of the combined ORF is significantly reduced. ( b) ORFs from different transcripts: We merged ORFs from two different transcripts (C53209_G8_I1 and C53209_G6_I1), since both transcripts map to the same scaffold (SUPER472) can be overlapped based on the sequence string ‘PNRSSLP’, and the merged ORF has a significantly reduced Evalue.
Key photosystem-related proteins in the chloroplast not annotated by MAKER-P and YeATSAM.
These transcripts have multiple open reading frames (ORFs) mapping to different proteins with high significance. For example, C59245_G1_I1 has another ORF (43) which maps to photosystem II reaction center protein B (PSBB). MAKER-P annotates PSBB, but not PSBH. These transcripts all emanate from the chloroplast, although not all genes that MAKER-P failed to annotate were from the chloroplast. Genes predicted by MAKER-P that are not identified by YeATSAM are listed with their homology to corresponding genes in the TAIR database.
| TRS | ORF | Len | TAIR | Description | E-value |
|---|---|---|---|---|---|
| C52274_G4_I1_B | 189 | 231 | ATCG00720.1 | PETB photosynthetic electron transfer B | 4.00-155 |
| C52274_G4_I1_C | 231 | 177 | ATCG00730.1 | PETD photosynthetic electron transfer D | 1.00e-108 |
| C53854_G1_I1_A | 45 | 98 | ATCG00070.1 | PSBK photosystem II reaction center protein K precursor | 1.00E-27 |
| C53854_G1_I1_B | 62 | 62 | ATCG00080.1 | PSBI photosystem II reaction center protein I | 3.00E-20 |
| C54343_G2_I1_A | 8 | 91 | ATCG00580.1 | PSBE photosystem II reaction center protein E | 4.00E-54 |
| C59245_G1_I1_B | 70 | 95 | ATCG00710.1 | PSBH photosystem II reaction center protein H | 4.00E-43 |
| WALNUT_00014004-RA | - | 1117 | AT5G16850.1 | TERT Telomerase reverse transcriptase | 0.0 |
| WALNUT_00018632-RA | - | 295 | ATMG00560.1 | RPL2 Nucleic acid-binding, OB-fold-like protein | 9e-152 |
| WALNUT_00019747-RA | - | 326 | AT1G24040.1 | Acyl-CoA N-acyltransferases (NAT) superfamily protein | 5e-121 |
| WALNUT_00031866-RA | - | 311 | AT5G07810.1 | SNF2 domain-containing protein/helicase domain-
| 9e-115 |
| WALNUT_00020600-RA | - | 155 | ATCG01240.1 | RPS7.2 ribosomal protein S7 chrC:140704-141171 | 1e-108 |
| WALNUT_00016414-RA | - | 231 | AT5G41850.1 | alpha/beta-Hydrolases superfamily protein |
| 6e-96 |
| WALNUT_00027509-RA | - | 289 | AT2G43190.3 | ribonuclease P family protein | chr2:17956220-17957833 | 2e-94 |
| WALNUT_00022174-RA | - | 389 | AT2G07707.1 | Plant mitochondrial ATPase, F0 complex, subunit | 5e-86 |
| WALNUT_00018616-RA | - | 124 | ATCG00890.1 | NDHB.1 NADH-Ubiquinone/plastoquinone (complex I) | 1e-79 |
| WALNUT_00007302-RA | - | 924 | AT5G14990.1 | BEST Arabidopsis thaliana protein match is: myosin | 2e-79 |
FAD-binding berberine bridge enzymes (BBE) are undetected in MAKER-P.
These oxidases are involved in the benzophenanthridine alkaloid biosynthesis in plants. Arabidopsis has 27 loci for this family (and a splice variant) ( Table 3). Here, there are four full length berberine bridge enzyme (BBE) genes (named JrBBE1-4) identified using the transcriptome. Some of the proteins are truncated (like C54286_G1_I1), which might be an artifact of the Trinity assembler. Thus, this is not a complete enumeration of the JrBBE genes.
| Id | Transcript | Length | Scaffold | ORF | TAIR Id |
|---|---|---|---|---|---|
|
| C54052_G1_I1 | 564 | JCF7180001213852 | 34 | AT1G26420.1 |
|
| C53871_G1_I1 | 564 | JCF7180001217410 | 28 | AT1G30700.1 |
|
| C55152_G1_I1 | 552 | JCF7180001222284:2429142-2890931 | 37 | AT4G20820.1 |
|
| C7952_G1_I1 | 559 | JCF7180001218369 | 110 | AT2G34790.1 |
| C54286_G2_I1 | 307 | JCF7180001217076 | 35 | AT1G11770.1 | |
| C54286_G1_I1 | 128 | JCF7180001217076 | 7 | AT4G20830.1 | |
| C12765_G1_I1 | 114 | JCF7180001218369 | 8 | AT4G20840.1 | |
| C51815_G1_I4 | 168 | JCF7180001218369 | 29 | AT4G20860.1 |
Expression counts (normalized) of transcripts from the FAD-binding berberine bridge enzyme (BBE) family.
The genes have tissue-specific expression - JrBBE3 is highly expressed in the roots and transition zone. The tissue abbreviations are from Chakraborty .
| id | Transcript | CE | CI | CK | EM | FL | HC | HL | HP | HU | IF | LE | LM | LY | PK | PL | PT | RT | SE | TZ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| C54052_G1_I1 | 44 | 4 | 136 | 197 | |||||||||||||||
|
| C53871_G1_I1 | 2 | 3 | 2 | 1 | 1 | 15 | 79 | 1 | |||||||||||
|
| C55152_G1_I1 | 43 | 34 | 25 | 62 | 1 | 2 | 35 | 1040 | 346 | ||||||||||
|
| C7952_G1_I1 | 32 | 85 | 8 | 55 | 11 | 711 | 15 | 8 | 241 | 137 | 37 | 123 | 315 | 420 | 160 | 217 | 5 | 18 | |
| C54286_G2_I1 | 33 | 20 | 30 | |||||||||||||||||
| C54286_G1_I1 | 19 | 7 | 24 | |||||||||||||||||
| C12765_G1_I1 | 26 | 77 | 2 | 39 | 4 | 42 | 8 | 23 | 23 | 19 | 5 | 22 | 9 | 2 | 8 | 6 | ||||
| C51815_G1_I4 |
Figure 3. Multiple sequence alignment of BBE from walnut and other organisms.
( a) The JrBBE sequences were aligned to berberine bridge enzyme (BBE) genes from Eschscholzia californica (EcBBE; California poppy), Arabidopsis thaliana (AtBBE15) and Nicotiana tabacum (Nectarin V). Secondary structure information from the structure PDBid:3D2D ( E. californica) was used to annotate the sequences. The signal peptides are different in these proteins, suggesting different localization of these proteins in walnut. ( b) Phylogenetic tree generated from the multiple sequence alignment.
Selected genes in chickpea that are not annotated in the NCBI database.
Most of the NCBI genes were predicted using Gnomon. YeATSAM used the publicly available transcriptome from chickpea to identify these genes. The corresponding genes from the TAIR database are shown. Several transcripts (like TC20962) encode multiple genes, while others (like TC01181) have only one significant ORF. TRid, transcript id; TAIRid: Arabidopsis thaliana id.
| TRid | TAIRid | Description | Evalue |
|---|---|---|---|
| TC20962 A | ATMG00070.1 | NAD9 NADH dehydrogenase subunit 9 chrM:23663-24235 | 3e-116 |
| TC20962 B | AT2G07687.1 | Cytochrome c oxidase, subunit III chr2:3311854-3312651 | 3e-107 |
| TC20962 C | AT2G07674.1 | Unknown conserved protein chr2:3269151-3269906 | 6e-41 |
| TC01181 | ATMG01360.1 | COX1 cytochrome oxidase chrM:349830-351413 | 0.0 |
| TC11063 | AT3G30841.1 | Cofactor-independent phosphoglycerate mutase chr3:12591595-12593401 | 0.0 |
| TC06038 | ATMG00090.1 | Structural constituent of ribosome;protein binding chrM:25482-28733 | 3e-124 |
| TC13206 | AT3G13440.1 | S-adenosyl-L-methionine-dependent methyltransferases superfamily | 1e-118 |
| TC07586 | AT2G07725.1 | Ribosomal L5P family protein chr2:3448402-3448959 | 2e-113 |
| TC19047 | ATMG00570.1 | Sec-independent periplasmic protein translocase | 8e-107 |
| TC00902 B | ATMG00640.1 | Hydrogen ion transporting ATP synthases, rotational | 3e-104 |
| TC15163 | AT4G28360.1 | Ribosomal protein L22p/L17e family protein chr4:14029294-14030926 | 1e-100 |
| TC13677 | AT5G05210.1 | Surfeit locus protein 6 chr5:1548198-1549534 | 9e-91 |
| TC13780 A | AT2G07707.1 | Plant mitochondrial ATPase, F0 complex, subunit 8 protein | 2e-90 |
| TC18786 | AT1G73440.1 | Calmodulin-related chr1:27611418-27612182 | 5e-45 |