| Literature DB >> 28851275 |
Jean-Félix Dallery1, Nicolas Lapalu1, Antonios Zampounis1,2, Sandrine Pigné1, Isabelle Luyten3, Joëlle Amselem3, Alexander H J Wittenberg4, Shiguo Zhou5, Marisa V de Queiroz6, Guillaume P Robin1, Annie Auger1, Matthieu Hainaut7,8, Bernard Henrissat7,8,9, Ki-Tae Kim10, Yong-Hwan Lee10, Olivier Lespinet11,12, David C Schwartz5, Michael R Thon13, Richard J O'Connell14.
Abstract
BACKGROUND: The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture.Entities:
Keywords: Colletotrichum higginsianum; Fungal genome; SMRT sequencing; accessory chromosomes; optical map; secondary metabolism genes; segmental duplication; subtelomeres; transposable elements
Mesh:
Substances:
Year: 2017 PMID: 28851275 PMCID: PMC5576322 DOI: 10.1186/s12864-017-4083-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Comparison of Colletotrichum higginsianum genome assemblies and annotations
| Input data & Assembly statistics | NCBI accession number | |
|---|---|---|
| CACQ02000000 | LTAN01000000 | |
| Type of input data: | ||
| PacBio P5-C3 read coverage | - | 133x |
| Sanger Fosmid (For/Rev) read coverage | 0.2x | - |
| Illumina GAII read coverage | 76x | - |
| 454 GS-FLX Titanium read coverage | 25x | - |
| Chromosome numbera | 12 | 12 |
| Genome physical sizeb | 53.35 Mb | 53.35 Mb |
| Assembly length | 49.08 Mb | 50.72 Mb |
| Total sequence alignable to optical map | 77.14 kb | 50.38 Mb |
| Number of contigs | 10,269 | 28 |
| Largest contig | 49.23 kb | 6.04 Mb |
| N50 contig length | 6.15 kb | 5.20 Mb |
| G+C content | 55.10% | 51.86% |
| Coverage by Transposable Elementsc | 1.2% | 7.0% |
| Coverage by Simple Sequence Repeatsd | - | 12.7% |
| Number of predicted gene modelse | 16,172 | 14,651 |
| Genes with RNA-Seq evidencef | 14,502 | 12,878 |
| Annotation completeness (BUSCO)g | ||
| Complete genes | 2,946 (79%) | 3,616 (97%) |
| Fragmented genes | 569 (15%) | 76 (2%) |
| Missing genes | 210 (6%) | 33 (0.9%) |
aIndependently determined by optical mapping [13] and cytological karyotyping [14]
bEstimated by optical mapping
cTEs were detected using RepeatMasker for assembly CACQ02000000 and REPET for assembly LTAN01000000
dSSRs were detected using REPET for assembly LTAN01000000 (not analyzed for assembly CACQ02000000)
eDifferent gene annotation pipelines were used for each assembly
fFive or more mapped Illumina reads
gGene annotation completeness was estimated using a set of 3,725 Sordariomycete Benchmarking Universal Single-Copy Orthologs (BUSCOs)
Fig 1Validation of the C. higginsianum genome assembly by alignment of unitig sequences (orange) against chromosome optical maps (blue). MluI restriction sites are represented in optical maps and unitigs by vertical bars. Chromosomes 7 and 9 show discrepancies between unitigs and optical maps. These optical maps are colour-coded to highlight the break-points
Differences between the core chromosomes (1-10) of Colletotrichum higginsianum and mini-chromosomes 11 and 12
| Chromosome | |||
|---|---|---|---|
| Characteristic | 1-10 (mean) | 11 | 12 |
| Total length (bp) | 4,914,036 | 646,208 | 597,935 |
| Number of protein-coding genes | 1,438 | 138 | 133 |
| Proportion of genes by length (%) | 46.0 | 25.5*** | 25.4*** |
| Proportion of expressed genes (%)a | 54.1 | 31.9** | 9.8*** |
| Number of transposable element (TE) copies | 128 | 146 | 63 |
| Proportion of TEs by length (%) | 5.9 | 38.4*** | 28.0*** |
| G+C (%) | 54.5 | 49.3*** | 47.2*** |
| Proportion of genes with unknown function (%) | 25.7 | 55.8*** | 73.7*** |
| Proportion of secreted protein genes (%) | 11.2 | 10.1 | 7.5 |
| Proportion of effector genes (%) b | 1.9 | 5.8** | 4.5* |
Asterisks indicate data for the mini-chromosomes differ significantly from the mean for chromosomes 1-10 (Fisher’s exact test, *** P <0.001; ** P <0.01; * P <0.05)
aGenes were considered to be expressed if they showed ≥1% of the expression-level of actin (corresponding to ≥10 TPM), based RNA-Seq data from one in vitro and three in planta samples [13]
bCandidate secreted effector protein genes included CSEPs predicted from the genome (secreted proteins without homologs outside the genus Colletotrichum) and some ChECs (C. higginsianum effector candidates) previously predicted from the transcriptome [32] that are absent from the new annotation or have BLAST hits to effectors from outside the genus
Summary of predicted C. higginsianum secondary metabolism key genes and clusters
| Gene categorya | 2012 | New |
|---|---|---|
| SM Clusters | 47 | 69d (8) |
| PKS | 58 | 40e |
| NRPS | 12 | 15 |
| PKS-NRPS | 6 | 6 |
| TS | 17 | 17f |
| DMATS | 10 | 11 |
| NRPS-like | nd | 12 |
aDMATS, dimethylallyl tryptophan synthase; NRPS, non-ribosomal peptide synthetase; PKS, polyketide synthase; SM, secondary metabolism; TS, terpene synthase
bAs published by O'Connell et al. [13]
cThis study. Number in brackets corresponds to SM clusters with NRPS-like genes as the only key gene
dIncludes one cluster that is duplicated with 98 % homology
eTwo PKS genes are disrupted by TEs and one has a wrongly predicted gene model
fIncludes one TS that is duplicated with 100% homology
Fig. 2Schematic representation of selected C. higginsianum secondary metabolism (SM) genes clusters. a Resolution of a former split SM cluster by the PacBio assembly. The new cluster 16 encompasses four contigs from the old assembly [13], two of which contain former clusters 18 and TRC3. Arrowheads: transposable elements b Comparison of cluster 19 and the depudecin cluster of Alternaria brassicicola. Protein identity is high (> 70%) and gene order and orientation are conserved except for the gene DEP6/CH63R_06317. c Comparison of cluster 46 and the fusicoccin cluster from Diaporthe amygdali [74]. In D. amygdali, genes are dispersed at two distinct loci in contrast to C. higginsianum. Protein identity is moderate to high and genes were extensively rearranged. Shading indicates syntenic blocks and genes pairs. Yellow: acetyl-transferase
Fig. 3Schematic representation of the distribution of secondary metabolism gene clusters and transposable elements across the 12 C. higginsianum chromosomes. The 5' end of unitig_7 containing the ribosomal repeats is fragmented between 13 unitigs that are too small to align with the optical map. Putative locations of the centromeres are indicated where possible
Fig. 4Waves of expression of secondary metabolism (SM) genes of C. higginsianum during infection of Arabidopsis thaliana. a Heatmap showing the expression profiles of SM key genes. Under-represented transcripts (dark green to bright green) and over-represented transcripts (dark red to bright red) are depicted as log2 relative expression index. The log2 expression levels are presented in the adjoining heatmap colour-coded from white (not expressed) to dark blue (strongly expressed). Red arrowhead: ChPKS38. b Schematic representation of the stage-specific expression of SM gene clusters. The expression of all genes within each cluster was evaluated using the Transcript Per Million (TPM) normalisation method. A cluster was considered expressed if more than 50% of genes had a TPM greater than 1% of the actin gene TPM, and |log2FC| ≥ 2, q-value ≤ 0.01. c Time-course of the expression of the pChPKS38::RFP reporter gene in planta and in vitro (cellophane) using confocal microscopy. All images are overlays of bright field and RFP channels captured with the same settings. RFP channels are projections of 15-25 0.2 μm optical sections. Co: conidium, arrowhead: appressorium, BH: biotrophic hypha, NH: necrotrophic hypha. Bars = 10 μm
Major families and characteristics of transposable elements in the C. higginsianum genome
| Type of elementa | No. consensusb | No. copies | No. complete copies | Complete/incomplete copies | Genome coverage (%)c | TE space coverage (%)d | ||
|---|---|---|---|---|---|---|---|---|
|
| ||||||||
| LTR | 11 | 636 | 86 | 0.135 | 3.55 | 50.71 | ||
| LARD | 2 | 47 | 10 | 0.213 | 0.67 | Class I | 9.57 | Class I |
| LINE | 3 | 50 | 13 | 0.260 | 0.23 | 4.7 | 3.29 | 67 |
| Class I (unclassified) | 4 | 123 | 2 | 0.016 | 0.24 | 3.43 | ||
|
| ||||||||
| TIR | 16 | 474 | 289 | 0.610 | 1.64 | Class II | 23.43 | Class II |
| MITE | 1 | 30 | 17 | 0.567 | 0.04 | 2.3 | 0.57 | 33 |
| Helitron | 3 | 111 | 19 | 0.171 | 0.62 | 8.86 | ||
|
| 1 | 11 | 4 | 0.364 | 0.01 | 0.14 | ||
aLTR: long terminal repeat, LARD: large retrotransposon derivative element, LINE: long interspersed element, TIR: terminal inverted repeat, MITE: miniature inverted-repeat transposable element
bNumber of TE concensus sequences in the genome
cPercentage of genome covered by the element
dPercentage of repetitive fraction covered by the element
Fig. 5Schematic representation of the predicted domain structure of three families of conserved repeat elements present in the subtelomeric regions of all C. higginsianum chromosomes. DTX-chim_G199 was likely derived from DHX-G198 by the insertion of a DNA transposon, whereas DHX-chim-G203 was derived from DHX-G198 by the insertion of a non-LTR retrotransposon
Fig. 6Violin plot depicting the frequency distribution of the distance (bp) between genes and the nearest transposable element (TE). The inner box plots represent the median and interquartile range of the distance for each of three gene classes. Genes located within secondary metabolism clusters (SM genes) and genes encoding candidate secreted effector proteins were located significantly closer (p < 0.001) to TEs than a random sample of genes taken from the genome as a whole
Fig. 7Circos plot showing segmental duplications (SDs). Genes are represented in green and transposable-elements in red. Gene IDs in each duplicated block (grey sectors) are given without the prefix "CH63R_". An entire secondary metabolism gene cluster (shaded blue) is duplicated in SD2. cP450: Cytochrome P450; Eff: Effector protein; Sec: Secreted; TF: Transcription Factor; TS: Terpene Synthase