| Literature DB >> 16273343 |
Chenwei Lin1, Lukas A Mueller, James Mc Carthy, Dominique Crouzillat, Vincent Pétiard, Steven D Tanksley.
Abstract
An EST database has been generated for coffee based on sequences from approximately 47,000 cDNA clones derived from five different stages/tissues, with a special focus on developing seeds. When computationally assembled, these sequences correspond to 13,175 unigenes, which were analyzed with respect to functional annotation, expression profile and evolution. Compared with Arabidopsis, the coffee unigenes encode a higher proportion of proteins related to protein modification/turnover and metabolism-an observation that may explain the high diversity of metabolites found in coffee and related species. Several gene families were found to be either expanded or unique to coffee when compared with Arabidopsis. A high proportion of these families encode proteins assigned to functions related to disease resistance. Such families may have expanded and evolved rapidly under the intense pathogen pressure experienced by a tropical, perennial species like coffee. Finally, the coffee gene repertoire was compared with that of Arabidopsis and Solanaceous species (e.g. tomato). Unlike Arabidopsis, tomato has a nearly perfect gene-for-gene match with coffee. These results are consistent with the facts that coffee and tomato have a similar genome size, chromosome karyotype (tomato, n=12; coffee n=11) and chromosome architecture. Moreover, both belong to the Asterid I clade of dicot plant families. Thus, the biology of coffee (family Rubiacaeae) and tomato (family Solanaceae) may be united into one common network of shared discoveries, resources and information.Entities:
Mesh:
Year: 2005 PMID: 16273343 PMCID: PMC1544375 DOI: 10.1007/s00122-005-0112-2
Source DB: PubMed Journal: Theor Appl Genet ISSN: 0040-5752 Impact factor: 5.699
Fig. 1Dendrogram depicting phylogenetic relationships of coffee to other higher plant taxa (based on Chase et al. 1993)
Characteristics of the 5 cDNA libraries used to develop the coffee EST database
| Library name | Tissue | Varieties | Average insert size, kb | Good quality ESTs |
|---|---|---|---|---|
| Leaf | Leaves, young | BP409 | 1.5±0.6 | 8,942 |
| Pericarp | Pericarp, all developmental stages | BP358, BP409, BP42, BP961, Q121 | 1.4±0.5 | 8,956 |
| Early stage cherry | Whole cherries, 18 and 22 week after pollination | BP358, BP409, BP42, Q121 | 1.4±0.3 | 9,843 |
| Middle stage seed | Endosperm and perisperm of seeds, 30 week after pollination | BP409, BP961, Q121 | 1.4±0.3 | 10,077 |
| Late stage seed | Endosperm and perisperm of seeds, 42 and 46 week after pollination | BP358, BP409, BP42, BP961, Q121 | 1.4±0.3 | 9,096 |
Comparison of the coffee and tomato EST databases derived from use of ESTScan calibrated with the same tomato training set (see Materials and methods for details)
| Tomato | Coffee | |
|---|---|---|
| Total unigenes | 30,576 | 13,175 |
| Average unigene length, bp | 774 | 678 |
| Unigenes with coding regions | 96% | 95% |
| Average length (bp) of predicated peptides | 569 | 556 |
| Average ESTScan score | 409 | 346 |
Fig. 2Histogram depicting the distribution of EST content for all coffee unigenes. Numbers above bars equals the number of unigenes represented in each
Fig. 3Plot depicting the sequence identify of the most similar match for each coffee unigene as compared with all other coffee unigenes. As a control, a similar analysis is shown for Arabidopsis genes (see Results for details)
Twenty most abundant InterPro domains identified in coffee unigene set and comparative statistics for tomato and Arabidopsis genes
| InterPro accession | Description | % of unigenes (ranking) | ||
|---|---|---|---|---|
| Coffee | Tomato |
| ||
| IPR000719 | Protein kinase | 1.6 | 1.20 (1) | 3.0 (1) |
| IPR000694 | Proline-rich region | 1.3 | 0.91 (4) | 0.003 (1763) |
| IPR002290 | Serine/threonine protein kinase | 0.85 | 1.10 (2) | 0 |
| IPR001245 | Tyrosine protein kinase | 0.69 | 1.0 (3) | 0.15 (311) |
| IPR008271 | Serine/threonine protein kinase, active site | 0.61 | 0.68 (5) | 2.6 (2) |
| IPR000504 | RNA-binding region RNP-1 (RNA recognition motif) | 0.55 | 0.60 (6) | 0.59 (6) |
| IPR001680 | G-protein beta WD-40 repeat | 0.49 | 0.51 (8) | 0.51 (8) |
| IPR001611 | Leucine-rich repeat | 0.48 | 0.59 (7) | 0.59 (7) |
| IPR002048 | Calcium-binding EF-hand | 0.36 | 0.34 (13) | 0.34 (13) |
| IPR000379 | Esterase/lipase/thioesterase | 0.33 | 0.43 (10) | 0.43 (10) |
| IPR001806 | Ras GTPase superfamily | 0.32 | 0.26 (22) | 0.43 (70) |
| IPR003579 | Ras small GTPase, Rab type | 0.29 | 0.23 (27) | 0 |
| IPR005123 | 2OG-Fe(II) oxygenase superfamily | 0.27 | 0.26 (21) | 0.47 (52) |
| IPR000626 | Ubiquitin | 0.27 | 0.22 (32) | 0.40 (89) |
| IPR002401 | E-class P450, group I | 0.27 | 0.46 (8) | 0.77 (24) |
| IPR002347 | Glucose/ribitol dehydrogenase | 0.26 | 0.23 (28) | 0.33 (110) |
| IPR001005 | Myb DNA-binding domain | 0.26 | 0.34 (15) | 1.34 (8) |
| IPR005225 | Small GTP-binding protein domain | 0.26 | 0.24 (25) | 0.68 (27) |
| IPR000608 | Ubiquitin-conjugating enzymes | 0.26 | 0.21 (34) | 0.19 (221) |
| IPR007090 | Leucine-rich repeat, plant specific | 0.25 | 0.40 (12) | 1.07 (11) |
Fig. 4Comparison of the gene ontology-based gene annotation categories for the coffee EST-derived unigene set, tomato EST-derived unigene set and the Arabidopsis proteome. Figure contains only categories in which more than 1% of the coffee unigenes were assigned. Categories for which coffee differs most significantly from Arabidopsis are shown in underline bold. (1) Cellular processes other than signal transduction and cell growth and/or maintenance. (2) Nucleobase/nucleoside/nucleotide and nucleic acid metabolism other than DNA metabolism and transcription. (3) Protein metabolism other than protein biosynthesis and protein modification. (4) Metabolism other than amino acid and derivative metabolism, biosynthesis, carbohydrate metabolism, catabolism, electron transport, lipid metabolism, nucleobase/nucleoside/nucleotide and nucleic acid metabolism and protein metabolism. (5) Cell growth and/or maintenance other than cell cycle and cell organization and biogenesis. (6) Physiological processes other than photosynthesis, response to stress, response to endogenous stimulus, response to external stimulus and metabolism
Fig. 5Characteristics of each coffee cDNA library in comparison to the entire coffee EST-derived unigene set. The total unigene and highly expressed unigene categories sum to greater 100% since the same unigene may contain ESTs from more than one library
Number of coffee unigenes showing significantly (P<0.05) different expression in pairwise comparisons of cDNA libraries
| Library | Pericarp | Early stage cherry | Middle stage seed | Late stage seed |
|---|---|---|---|---|
| Leaf | 384 | 752 | 548 | 562 |
| Pericarp | 610 | 458 | 527 | |
| Early stage cherry | 602 | 728 | ||
| Middle stage seed | 585 |
The 20 most highly expressed coffee unigenes: functional annotation and most similar Arabidopsis and Solanaceae homologs
| Coffee unigene#: annotation | Best match (e value/score) | EST count | ||||||
|---|---|---|---|---|---|---|---|---|
| Arbidopsis | Solanaceae Unigene_species | Total ESTs | Leaf | Pericarp | Early stage cherry | Middle stage seed | Late stage seed | |
| 125230: putative 2s seed storage protein | ND | 243065_tomato (e-103/238) | 1,219 |
|
|
|
|
|
| 120912: 11s seed storage protein | At5g44120 (1e-88/324) | 228376_tomato (0/802) | 687 |
|
|
| 244 |
|
| 121707: unknown function | At1g29050 (1e-139/489) | 246695_potato (e-163/283) | 324 |
|
|
| 149 |
|
| 120118: unknown function | At5g59320 (2e-21/99.8) | 221585_tomato (e-134/475) | 292 |
|
|
| 58 |
|
| 124988: unknown function | ND | ND | 204 | 58 |
|
|
|
|
| 120685: chitinase | At5g24090 (2e-43/172) | 214596_tomato (1e-35/84.5) | 202 |
|
|
|
|
|
| 124158: photoassimilate-responsive protein | At3g54040 (2e-36/149) | 196924_pepper (2e-39/138) | 182 |
|
|
|
| 28 |
| 119890: unknown function | ND | 204426_pepper (5e-07/52.8) | 183 |
|
|
|
| 0 |
| 123265: ADP-ribosylation factor | At2g47170 (1e-99/359) | 238338_tomato (0/693) | 182 | 58 |
|
|
|
|
| 124083: secretory peroxidase | At4g21960 (e-153/537) | 196145_pepper (0/681) | 161 |
|
|
| 49 |
|
| 124911: metallothionein | At5g02380 (0.32/32.3) | 207464_petunia (2e-06/51.0) | 163 |
|
|
|
|
|
| 119817: chitinase | At3g12500 (e-103/373) | 248120_potato (e-148/521) | 148 |
|
|
|
|
|
| 124815: unknown function | At3g29240 (1e-87/320) | 227940_tomato (e-146/517) | 145 |
|
|
|
|
|
| 122206: SAM synthase | At2g36880 (0/711) | 270415_petunia (0/887) | 142 |
|
|
|
|
|
| 119460: WRKY4 transcription factor | At1g80840 (3e-75/279) | 237166_tomato (e-137/487) | 123 |
|
|
|
|
|
| 123045: unknown function | At3g16000 (0.69/31.2) | 218824_tomato (90.36/33.1) | 123 |
|
|
|
|
|
| 120481: AdoMet synthase | At4g01850 (0/723) | 243236_potato (0/886) | 108 |
|
|
|
|
|
| 121265: Mobl/phocein | At5g45550 (e-119/425) | 196814_pepper (e-146/513) | 113 |
|
|
|
|
|
| 124791: plasmodesmal receptor | At5g15140 (1e-99/360) | 203764_pepper (8e-86/314) | 105 |
|
|
|
| 76 |
| 122071: rubiso small subunit | At1g67090 (9e-70/260) | 207453_petunia (3e-89/297) | 99 |
|
|
|
|
|
BLAST match values are given in parentheses
Bold numbers indicate library for which the highest number of ESTs were observed for each gene. Italic numbers indicate library for which the number of EST is significantly lower (P<0.05) than the highest
Gene families expanded in coffee relative to Arabidopsis
| Family # | # | # Coffee family member | Longest coffee member | Annotation |
|---|---|---|---|---|
| 266 | 1 | 21 | 122330 | Retrotransposon gag protein, class I |
| 180 | 5 | 14 | 124952 | Polygalacturonase isoenzyme 1 beta subunit with BURP domain |
| 632 | 1 | 12 | 123451 | Acidic endochitinase |
| 386 | 2 | 10 | 124158 | Photoassimilate-responsive protein |
| 382 | 4 | 8 | 119672 | Hypersensitive-induced protein, band 7 protein |
| 394 | 2 | 7 | 122791 | E-class P450 |
| 483 | 2 | 6 | 120054 | Bet v I allergen |
| 623 | 3 | 6 | 119581 | Root hair defective protein |
| 1,182 | 1 | 5 | 126674 | Unknown function |
| 695 | 2 | 5 | 126974 | Tyrosine decarboxylase |
| 783 | 2 | 5 | 122423 | Unknown function |
| 1,117 | 2 | 5 | 119449 | Trypsin inhibitor Kunitz |
Gene families unique to coffee in comparison to Arabidopsis
| Gene family # | # Family member | Longest member | Solanaceae hit | Annotation |
|---|---|---|---|---|
| 243 | 27 | 122956 | 258190 potato | Retrotransposon gag protein, class II |
| 687 | 11 | 120121 | 221585 tomato | Thaumatin, pathogenesis related |
| 965 | 10 | 119718 | 249401 potato | Zn-finger, CCHC type |
| 974 | 10 | 120244 | 2610402 potato | Disease resistance protein (TIR-NBS-LRR class) |
| 852 | 9 | 119638 | 225732 tomato | Retrotransposon gag protein, classs III |
| 360 | 8 | 121998 | 23671 tomato | Disease resistance protein |
| 1,019 | 7 | 124574 | 222350 tomato | Leucine-rich repeat, disease resistance protein |
| 1,607 | 7 | 122216 | none | Unknown function |
| 1,610 | 7 | 130519 | none | Unknown function |
| 1,676 | 7 | 126264 | 243065 tomato | Unknown function |
| 708 | 6 | 123769 | 236157 tomato | ABA/WDS induced protein |
| 1,852 | 5 | 120284 | 213688 tomato | Proline-rich region, extension-like protein |
| 2,362 | 5 | 122218 | 237314 tomato | Unknown function |
| 2,459 | 5 | 124466 | 267984 potato | Leucine-rich repeat, plant specific, receptor-related protein kinase |
Fig. 6Histogram showing match scores for each coffee unigene as compared with its best match in the Arabidopsis proteome
Coffee genes not found in Arabidopsis, but with conserved counterparts in tomato or other Solanaceous species
| Coffee unigene | Solanaceae EST-derived unigene match | Score | GenBank (non-redundant and dbest) best match | Score | Annotation |
|---|---|---|---|---|---|
| 124978 | 240871 tomato | 454 | Unknown function | ||
| 121324 | 235756 tomato | 429 | gblCB686389.1 [ | 44 | Unknown function |
| 131820 | 213100 tomato | 426 | gil50252229.1 [ | 73 | Unknown function |
| 121542 | 240321 tomato | 416 | refiNP_922676.1 [ | 75 | Unknown function |
| 131934 | 219759 tomato | 377 | embiCAE05735.1 [ | 297 | TFIIH basal transcription factor p52 subunit |
| 121140 | 236347 tomato | 320 | Unknown function | ||
| 131445 | 225435 tomato | 320 | Unknown function | ||
| 125230 | 243065 tomato | 238 | gil13183175 [ | 45 | 2S albumin |
| 131030 | 246364 potato | 213 | refINP_524404.1 [ | 110 | Phospyhatidyl inositol transfer protein |
| 120120 | 237254 tomato | 202 | gbICF349465.1 [Rose] | 52 | Unknown function |
| 126635 | 237314 tomato | 185 | Unknown function | ||
| 126575 | 237314 tomato | 182 | Unknown function | ||
| 130675 | 209387 petunia | 177 | gbICK093976.1 [ | 438 | Unknown function |
| 128020 | 237314 tomato | 167 | Unknown function | ||
| 123615 | 249253 potato | 163 | gbIAAO73272.1 [ | 140 | Unknown function |
| 126432 | 240551 tomato | 163 | gil34878866 [ | 56 | Phosphatidylinositolglycan class N |
| 124384 | 197378 pepper | 156 | gblCA815435.1 [ | 1,009 | Unknown function |
| 122126 | 239632 tomato | 153 | Unknown function | ||
| 131601 | 232010 tomato | 145 | gbICK229938.1 [ | 74 | 40S Ribosomal protein S21 |
| 119644 | 237150 tomato | 143 | refINP_921250.1 [ | 70 | Helicase |
The GenBanks Best match exclude those from Solanaceae, Coffea and Hedyotis (both members of the Rubiaceae family). Solanaceae EST-derived
Fig. 7Ratio of highest Arabidopsis match score to highest Solanaceae match score for individual coffee unigene. The analysis restricted to coffee unigenes with a Solanaceae match score >100