| Literature DB >> 29509991 |
Hue T M Tran1,2, Thiruvarangan Ramaraj3, Agnelo Furtado1, Leonard Slade Lee1, Robert J Henry1.
Abstract
Arabica coffee (Coffea arabica) has a small gene pool limiting genetic improvement. Selection for caffeine content within this gene pool would be assisted by identification of the genes controlling this important trait. Sequencing of DNA bulks from 18 genotypes with extreme high- or low-caffeine content from a population of 232 genotypes was used to identify linked polymorphisms. To obtain a reference genome, a whole genome assembly of arabica coffee (variety K7) was achieved by sequencing using short read (Illumina) and long-read (PacBio) technology. Assembly was performed using a range of assembly tools resulting in 76 409 scaffolds with a scaffold N50 of 54 544 bp and a total scaffold length of 1448 Mb. Validation of the genome assembly using different tools showed high completeness of the genome. More than 99% of transcriptome sequences mapped to the C. arabica draft genome, and 89% of BUSCOs were present. The assembled genome annotated using AUGUSTUS yielded 99 829 gene models. Using the draft arabica genome as reference in mapping and variant calling allowed the detection of 1444 nonsynonymous single nucleotide polymorphisms (SNPs) associated with caffeine content. Based on Kyoto Encyclopaedia of Genes and Genomes pathway-based analysis, 65 caffeine-associated SNPs were discovered, among which 11 SNPs were associated with genes encoding enzymes involved in the conversion of substrates, which participate in the caffeine biosynthesis pathways. This analysis demonstrated the complex genetic control of this key trait in coffee.Entities:
Keywords: Arabica coffee; association; caffeine; genome annotation; genome assembly; single nucleotide polymorphism discovery
Mesh:
Substances:
Year: 2018 PMID: 29509991 PMCID: PMC6131422 DOI: 10.1111/pbi.12912
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 9.803
Characteristics of the K7 arabica draft genome assembly
| Estimated genome size (Mb) | 1300 |
| Chromosome number (2n = 4x) | 44 |
| Total size of assembled contigs (Mb) | 1167 |
| Number of contigs | 265 687 |
| Largest contig (bp) | 186 701 |
| N50 length (contigs) (bp) | 12 184 |
| Number of scaffolds | 76 409 |
| Total size of assembled scaffolds (Mb) | 1448 |
| N50 length (scaffolds) (bp) | 54 544 |
| Longest scaffold (bp) | 769 411 |
| Number of gaps | 189 278 |
| Mean gaps length (bp) | 1485 |
| Total size of gaps (Mb) | 281 |
| GC content (%) | 37 |
Validation of draft genome using BWA, GMAP and BUSCO
| Results of read remapping using BWA | |
|---|---|
| Read alignment metrics | |
| Total number of reads mapping back | 98.4% |
| Reads properly paired | 93.0% |
Sequencing statistics of two extreme bulks for low/high caffeine and statistics of SNPs discovery and analysis
| Parameters | Low‐caffeine bulk | High‐caffeine bulk |
|---|---|---|
| No. of individuals | 18 | 18 |
| Average content (% dmb) | 1.03 | 1.48 |
| Total of reads after trimmed (#) | 230 140 744 | 324 14 616 |
| Average coverage (×) | 28 | 40 |
| GC content (%) | 36 | 36 |
| Average Phred score | 37 | 37 |
| Average length after trim (bp) | 147 | 147 |
| No. of variant called in each bulk against the reference | 513 899 | 792 617 |
| No. of variant between two bulks | 18 469 | |
| No. of nonsynonymous between two bulks | 1444 | |
| No. of CDS | 1086 | |
| No. of genes with KEGG pathways | 189 | |
| No. of unique KEGG pathways | 70 | |
| No. of unique enzymes | 80 | |
Substrates, pathways and enzymes involved in caffeine biosynthesis pathway associated with the TAVs identified
| Substrates | Pathway | EC | Metabolism | No. of seq |
|---|---|---|---|---|
| SAM | Cysteine and methionine metabolism | EC:2.1.1.37‐(cytosine‐5‐)‐methyltransferase | A | 1 |
| XMP | Purine metabolism | EC:6.3.5.2‐synthase (glutamine‐hydrolysing) | C | 2 |
| AMP | Purine metabolism | EC:4.3.2.2‐lyase | A | 1 |
| AICAR–SAICAR | Purine metabolism | EC:4.3.2.2‐lyase | A | 1 |
| FGAM | Purine metabolism | EC:6.3.5.3‐synthase | A | 2 |
| 10‐Formyl‐THF | One carbon pool by folate | EC:3.5.1.10‐deformylase | A | 1 |
| Glutamate | Arginine biosynthesis | EC:2.3.1.1‐N‐acetyltransferase | C | 1 |
| Arginine and proline metabolism | EC:2.7.2.11‐5‐kinase | C | 1 | |
| Carbapenem biosynthesis | EC:2.7.2.11‐5‐kinase | C | 1 | |
| Glutathione metabolism | EC:2.5.1.18‐transferase | A | 1 | |
| ATP–ADP | Purine metabolism | EC:3.6.1.15‐phosphatase | A | 29 |
| EC:3.6.1.3‐adenylpyrophosphatase | A | 24 | ||
| 7 pathways | 10 enzymes | 65 |
Type of metabolism: C: Catabolism (breakdown) of the substrate; A: Anabolism (synthesis) of the substrate.
Sequences where TAVs are located.
One sequence with two SNPs.
The 24 was duplicated with 29; Number in the same colour come from the same sequences (CDS).
Figure 1Substrates involved in caffeine biosynthesis pathways and its other related pathways catalysed by enzymes that were encoded by genes carrying the TAVs identified from this study. (a) The SAM (S‐adenosyl‐Lmethionine) cycle (the activated methyl cycle) in plants (adapted from Ashihara and Suzuki, 2004); (b) the biosynthetic pathways of caffeine from xanthosine (adapted from Ashihara et al., 2011b); (c) the ‘provider pathways’ for xanthosine synthesis in purine alkaloid forming plants (adapted from Ashihara and Suzuki, 2004); (d) de novo biosynthetic pathway of IMP in plants (adapted from Ashihara and Suzuki, 2004); circles: location of substrates/precursors, which were formed in the KEGG pathways catalysed by enzymes encoded by genes carrying the TAVs identified from this study. Circles of the same colour indicated the alternative locations of the same substrate.
Figure 2Snapshots of the KEGG pathways (obtained from Blast2GO analysis) at the location where SNPs were associated with enzymes involved in the metabolism of the substrates that entered to the caffeine biosynthesis pathway. Substrates are circled in red; enzymes are highlighted in blue squares.