| Literature DB >> 26198851 |
Robert King1, Martin Urban2, Michael C U Hammond-Kosack3, Keywan Hassani-Pak4, Kim E Hammond-Kosack5.
Abstract
BACKGROUND: Accurate genome assembly and gene model annotation are critical for comparative species and gene functional analyses. Here we present the completed genome sequence and annotation of the reference strain PH-1 of Fusarium graminearum, the causal agent of head scab disease of small grain cereals which threatens global food security. Completion was achieved by combining (a) the BROAD Sanger sequenced draft, with (b) the gene predictions from Munich Information Services for Protein Sequences (MIPS) v3.2, with (c) de novo whole-genome shotgun re-sequencing, (d) re-annotation of the gene models using RNA-seq evidence and Fgenesh, Snap, GeneMark and Augustus prediction algorithms, followed by (e) manual curation.Entities:
Mesh:
Year: 2015 PMID: 26198851 PMCID: PMC4511438 DOI: 10.1186/s12864-015-1756-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Chromosome and Supercontig sequence coverage of the F. graminearum PH-1 isolate and MIPS reference. Identification of underrepresented repeating sequences represented in the MIPS reference sequence could be calculated by dividing the total corrected average observed coverage by the original sequencing depth of 85.
| MIPS Chromosome/Supercontig number | Length (bp) | Raw average coverageb | Bases with coverage (%)c | Corrected average coveraged | Calculated multiples of supercontig sequences in genomee |
|---|---|---|---|---|---|
| 1 | 11,694,295 | 79.5 | 100.0 | 79.5 | 1 |
| 2 | 8,911,601 | 90.7 | 99.6 | 90.3 | 1 |
| 3 | 7,711,129 | 90.4 | 95.5 | 86.3 | 1 |
| 4 | 8,029,942 | 70.6 | 91.8 | 64.8 | 1 |
| 4: 7,953,943+ bpa | 75,973 | 1,000.2 | 77.8 | 778.16 | 9 |
| 3.13 | 12,585 | 1,083 | 96 | 1,039.7 | 12 |
| 3.18 | 8,774 | 930.3 | 99 | 921 | 11 |
| 3.2 | 7,037 | 1,000.6 | 91.3 | 913.5 | 11 |
| 3.29 | 2,326 | 731.9 | 95 | 695.3 | 8 |
| 3.28 | 2,628 | 687.8 | 98.1 | 674.7 | 8 |
| 3.3 | 2,172 | 573.5 | 99.2 | 568.9 | 7 |
| 3.22 | 6,062 | 573.5 | 75.5 | 433 | 5 |
| 3.24 | 6,000 | 557.4 | 73.5 | 409.7 | 5 |
| 3.23 | 6,331 | 541.4 | 74.2 | 401.7 | 5 |
| 3.21 | 7,489 | 551.3 | 78.2 | 431.1 | 5 |
| 3.25 | 6,524 | 368.1 | 63 | 231.9 | 3 |
| 3.15 | 10,471 | 96.7 | 100 | 96.7 | 1 |
| 3.31 | 2,080 | 85.8 | 100 | 85.8 | 1 |
| 3.26 | 2,996 | 95.7 | 100 | 95.7 | 1 |
| 3.12 | 15,604 | 438.9 | 99.4 | 436.3 | 5 |
aChromosome 4 from 7,953,943 bp onwards, this is the start of the repeating RNA annotated sequence
bThe read depth observed across the length of the sequence
cPercentage of bases that are A,C,T,G but not N
dThe calculated N bp corrected coverage
eThe number of copies of the sequence that should be represented in the reference sequence
Fig. 1Visual representation of AT rich and repeat rich regions in the RRes and MIPS references. All values are approximations to the nearest 1 kb and images have been scaled to equivalency. The green line represents the AT percentage and the blue the CG percentage to give a combined total of 100 %. The labelling is in the format of a number which represents the chromosome i.e. 1 = chromosome 1, a = the end of a reverse complement section, and b denotes regions in the MIPS version that were extended in the RRes version. The 3.15 label represents where the supercontig 3.15 was placed and “c” represents where FGRRES_20327 is located. The amino terminus set (5’) (upper panel) reveals the AT rich extensions of Chromosomes 2-4 and the sequence that was reverse complemented in Chromosome 1. The centromere set (middle panel) shows chromosome 1 AT rich region extension from 3 kbp to 57 kbp MIPS to RRes respectively within 8.97 Mbp region, Chromosome 2 extension from 3 kbp to 65 kbp MIPS to RRes respectively within 3.27 Mbp region, Chromosome 3 extension from 4 kbp to 56 kbp MIPS to RRes respectively within 5.35 Mbp region, and Chromosome 4 extension from 2 kbp to 61 kbp MIPS to RRes respectively within 5.0 Mbp region. The carboxyl terminus set (lower panel) shows Chromosome 1 extension at the carboxyl end of 14 kbp, Chromosome 2 extension at the carboxyl end of 17 kbp, Chromosome 3 extension at the carboxyl end of 19 kbp, Chromosome 4 extension at the carboxyl end of a repetitive sequence 1.3 Mbp (complete sequence length not shown)
Comparative statistics on the length and nucleotide composition of the four F. graminearum chromosomes. Comparison of the MIPS (old) and RRes (new) versions of the reference genome
| Reference | MIPS | RRes | |||||
|---|---|---|---|---|---|---|---|
| Chromosome | Number of total bases (bp) | Number of ‘N’ bases (bp) | Total number of bases minus ‘N’ bases (bp) | Number of total bases (bp) | Number of ‘N’ bases (bp) | Total number of bases minus ‘N’ bases (bp) | Length difference RR vs. MIPS (bp)a |
| 1 | 11,694,295 | 65,691 | 11,628,604 | 11,760,950 | 12 | 11,760,938 | +132,334 |
| 2 | 8,911,601 | 49,430 | 8,862,171 | 8,997,558 | 0 | 8,997,558 | +135,387 |
| 3 | 7,711,129 | 32,005 | 7,679,124 | 7,792,947 | 0 | 7,792,947 | +113,823 |
| 4 | 8,029,942 (85,155b) | 65,717 | 7,964,225 | 9,397,177 (1,384,836b) | 0 | 9,407,501 | +1,432,952 |
| Totals | 36,346,967 | 212,843 | 36,134,124 | 37,948,632 | 12 | 37,958,944 | +1,824,820 |
aNumber of total bases minus ‘N’ bases (bp) used in difference comparison
bApproximate total length of repeating sequence at the carboxyl end of chromosome 4
Basic statistics of the different reference and annotation versions of F. graminearum. A comparison of F. graminearum genome version statistics between BROAD, MIPS, and RRes
| BROAD | MIPS | RRes | |
|---|---|---|---|
| Genome size (bp)a | 36,565,771 | 36,553,761 | 38,060,440 |
| Scaffoldsb | 31 | 19 | 5 |
| GC (%) contentc | 48.3 | 48.3 | 48.2 (48.0d) |
| Spanned gaps | 402 | 402 | 1 |
| Predicted Genes | 13,322 | 13,826 | 14,164 |
| Repetitive (%) | 0.24 | 0.24 | 0.24 |
| Transposable elements (%) | 0.029 | 0.029 | 0.060 |
| ENA project accession | PRJNA13839 | N/A | PRJEB5475 |
aincluding all scaffolds, N bases, and the mitochondria
bincluding all scaffolds, N bases excluding the mitochondria
cExcluding N’s and mitochondria
dExcluding N’s, mitochondria and large repetitive sequence at the carboxyl end of chromosome
GO term (level 2) summary of the 412 new gene annotations. GO terms summaries for biological process and molecular function of the 412 new genes annotated in the RRes V4.0 gene set
| Biological process | ||
|---|---|---|
| GO-id | GO-term | No. |
| GO:0044699 | single-organism process | 18 |
| GO:0050896 | response to stimulus | 2 |
| GO:0008152 | metabolic process | 32 |
| GO:0009987 | cellular process | 22 |
| GO:0032502 | developmental process | 2 |
| GO:0071840 | cellular component organization or biogenesis | 2 |
| GO:0065007 | biological regulation | 10 |
| GO:0051179 | localization | 4 |
| GO:0032501 | multicellular organismal process | 2 |
| GO:0023052 | signalling | 1 |
| Total | 99 | |
| Molecular function | ||
| GO-id | GO-term | No. |
| GO:0005215 | transporter activity | 3 |
| GO:0001071 | nucleic acid binding transcription factor activity | 8 |
| GO:0004872 | receptor activity | 1 |
| GO:0003824 | catalytic activity | 25 |
| GO:0005488 | binding | 24 |
| GO:0030234 | enzyme regulator activity | 1 |
| Total | 64 | |
Level 2 GO terms of the modified gene annotations (_M) between RRes and the MIPS counterparts. GO summaries were extracted using Blast2GO
| Biological process | ||||
|---|---|---|---|---|
| GO-id | GO-term | RRes No. | MIPS No. | Diff No. |
| GO:0032501 | multicellular organismal process | 8 | 3 | 5 |
| GO:0048511 | rhythmic process | 2 | 1 | 1 |
| GO:0022610 | biological adhesion | 7 | 7 | 0 |
| GO:0051704 | multi-organism process | 6 | 3 | 3 |
| GO:0051179 | localization | 165 | 150 | 15 |
| GO:0008152 | metabolic process | 617 | 539 | 78 |
| GO:0023052 | signalling | 30 | 26 | 4 |
| GO:0000003 | reproduction | 4 | 4 | 0 |
| GO:0009987 | cellular process | 482 | 428 | 54 |
| GO:0050896 | response to stimulus | 72 | 56 | 16 |
| GO:0065007 | biological regulation | 152 | 112 | 40 |
| GO:0002376 | immune system process | 1 | 0 | 1 |
| GO:0040007 | growth | 1 | 1 | 0 |
| GO:0044699 | single-organism process | 408 | 357 | 51 |
| GO:0032502 | developmental process | 10 | 5 | 5 |
| GO:0022414 | reproductive process | 3 | 1 | 2 |
| GO:0040011 | locomotion | 1 | 0 | 1 |
| GO:0071840 | cellular component organization or biogenesis | 52 | 41 | 11 |
| Total | 2021 | 1734 | 287 | |
| Molecular function | ||||
| GO-id | GO-term | RRes | MIPS | Diff |
| GO:0004872 | receptor activity | 6 | 3 | 3 |
| GO:0060089 | molecular transducer activity | 7 | 5 | 2 |
| GO:0001071 | nucleic acid binding transcription factor activity | 79 | 55 | 24 |
| GO:0005198 | structural molecule activity | 21 | 20 | 1 |
| GO:0005085 | guanyl-nucleotide exchange factor activity | 2 | 2 | 0 |
| GO:0009055 | electron carrier activity | 5 | 5 | 0 |
| GO:0016209 | antioxidant activity | 5 | 5 | 0 |
| GO:0030234 | enzyme regulator activity | 4 | 4 | 0 |
| GO:0005488 | binding | 514 | 491 | 23 |
| GO:0003824 | catalytic activity | 488 | 449 | 39 |
| GO:0005215 | transporter activity | 75 | 72 | 3 |
| Total | 1206 | 1111 | 95 | |
InterProscan5 domain annotation comparison between RRes v4.0 versus MIPS v3.2. The number of proteins identified with protein domains associated with InterProscan ID’s that are linked to virulence genes. Those in bold are increased in the RRes v4.0 set
| RRes | MIPS | |||
|---|---|---|---|---|
| Parent Inter ID | Child Interpro ID | no. protein | no. protein | |
| IPR017853 | Glycoside hydrolase superfamily | 122 | 122 | |
| IPR029058 | Alpha/Beta hydrolase fold | 286 | N/A | |
| IPR000675 | cutinase |
| 12 | |
| IPR001031 | thioesterase | 5 | 5 | |
| IPR000383 | Xaa-Pro dipeptidyl-peptidase-like domain | 4 | 4 | |
| IPR000073 | Alpha/beta hydrolase fold-1 |
| 21 | |
| IPR001375 | Peptidase S9, prolyl oligopeptidase, catalytic domain | 9 | 9 | |
| IPR002018 | Carboxylesterase, type B | 26 | 26 | |
| IPR002921 | Fungal lipase-like domain | 8 | 8 | |
| IPR002925 | Dienelactone hydrolase | 8 | 8 | |
| IPR003140 | Phospholipase/carboxylesterase/thioesterase | 5 | 5 | |
| IPR013094 | Alpha/beta hydrolase fold-3 | 30 | 31 | |
| IPR029059 | Alpha/beta hydrolase fold-5 | 12 | N/A | |
| IPR003439 | ABC transporter-like | 62 | 62 | |
| IPR011050 | Pectin lyase fold/virulence factor | 33 | 33 | |
| IPR000070 | Pectinesterase, catalytic | 3 | 3 | |
| IPR004835 | Chitin synthase |
| 9 | |
| IPR001138 | Zn(2)-C6 fungal-type DNA-binding domain |
| 316 | |
| IPR001128 | Cytochrome P450 | 114 | 114 | |
| IPR011701 | Major facilitator superfamily |
| 248 | |
| IPR015433 | Phosphatidylinositol kinase |
| 2 | |
| IPR015500 | Peptidase S8, subtilisin-related | 28 | 29 | |
| IPR001283/IPR014044 | Cysteine-rich secretory protein, allergen V5/Tpx-1-related/CAP domain | 5 | 5a | |
| IPR011329 | Killer toxin | 4 | 4 | |
aOne of these proteins was only partial
New enzyme’s from the new RRes gene set. New enzyme codes designated to the 412 new RRes V4.0 gene set
| Gene ID | Enzyme code | Enzyme Name or description |
|---|---|---|
| 20327 | EC:3.2.1.55 | Non-reducing end alpha-L-arabinofuranosidase |
| 20406 | EC:2.7.11 | Transferring phosphorous-containing groups |
| 20377 | EC:3.1.74 | Cutinase |
| 20330 | EC:2.3.1 | Acyltransferases |
| 20264 | EC:2.7.1.67 | 1-phosphatidylinositol 4-kinase |
| 20189 | EC:2.3.3.14 | Homocitrate synthase |
| 20172 | EC:1.14.12 | Oxidoreductases |
| 20056 | EC:1.14.11 | Oxidoreductases |
| 20292 | EC:3.4.23 | Aspartic endopeptidases |
| 20373 | EC:1.14.11 | Oxidoreductases |
| 20101 | EC:1.1.1.158 | UDP-N-acetylmuramate dehydrogenase |
New enzyme codes from the modified RRes gene set. Enzymes codes identified in the modified RRes gene subset (_M) not found in the MIPS annotation
| Gene ID | Enzyme code | Enzyme Name |
|---|---|---|
| 06098_M | EC:6.1.1.20 | Phenylalanine--tRNA ligase |
| 08061_M | EC:4.1.1.49 | Phosphoenolpyruvate carboxykinase (ATP) |
| 01195_M | EC:3.6.3.8 | Calcium-transporting ATPase |
| 07202_M, 11727_M | EC:3.6.1.15 | Nucleoside-triphosphate phosphatase |
| 15917_M | EC:3.2.1.8 | Endo-1,4-beta-xylanase |
| 16359_B_M | EC:3.1.4.11 | Phosphoinositide phospholipase C |
| 10980_M | EC:2.7.7.49 | RNA-directed DNA polymerase |
| 13343_M | EC:2.7.1.30 | Glycerol kinase |
| 09688_M | EC:2.5.1.18 | Glutathione transferase |
| 16945_M | EC:2.3.1.225 | Protein S-acyltransferase |
| 13161_M | EC:2.2.1.2 | Transaldolase |
| 08613_M | EC:2.1.1.71 | Phosphatidyl-N-methylethanolamine N-methyltransferase |
| 15680_M | EC:1.3.1.9 | Enoyl-[acyl-carrier-protein] reductase (NADH) |
| 06127_M | EC:1.2.1.2 | Formate dehydrogenase |
| 15680_M | EC:1.13.12.16 | Nitronate monooxygenase |
| 04922_M | EC:1.1.1.9 | D-xylulose reductase |
| 03873_M | EC:1.1.1.158 | UDP-N-acetylmuramate dehydrogenase |
| 03648_M | EC:1.1.1.184 | Carbonyl reductase (NADPH) |
The number of coding and non-coding gene models predicted in the RRes and MIPS versions. Summary of coding and non-coding annotations of both the RRes and MIPS annotations across chromosomes 1-4 and supercontigs
| Chromosome | RRes gene no. | MIPS gene no. | RNA RRes (total) | tRNA RRes | rRNA RRes | ncRNA RRes | tmRNA RRes |
|---|---|---|---|---|---|---|---|
| 1 | 4393 | 4303 | 118 | 74 | 9 | 34 | 1 |
| 2 | 3645 | 3549 | 156 | 97 | 32 | 27 | 0 |
| 3 | 3090 | 2999 | 121 | 87 | 8 | 25 | 1 |
| 4a | 3032 | 2969 | 99 | 61 | 13 | 25 | 0 |
| 4b (1,384,836 bp) | 0 | 0 | 538 | 0 | 538 | 0 | 0 |
| Supercontig_3.12 | 4 | 3 | 10 | 0 | 0 | 10 | 0 |
| Supercontig_3.15 | n/a | 3 | n/a | 0 | 0 | 0 | 0 |
| Total | 14,164 | 13,826 | 504 (538a) | 319 | 62 (538) | 121 | 2 |
aWithout carboxyl end repeating units with RNA annotation
bcarboxyl end repeating units with RNA annotation
Fig. 2Gene and RNA distribution across the four chromosome (Chr 1–4). Each vertical bar represents a single gene or RNA annotation, aligned next to a heat map for genetic recombination (red = high to blue = low, recombination frequency - second row down of each chromosome). Rows 3 through to rows 7 denote the following: All gene annotations, recombinant frequency, AT rich regions (AT > 90 %) greater than 2 kb (red = >5.0 kb, orange = 2.5-5.0 kb, yellow = 2.0-2.5 kb), predicted secretome (n = 615), tRNA annotations, all RNA annotations, and new gene annotations only. The location of each centromere is highlighted in row 3 with a red arrow