| Literature DB >> 30909861 |
Nicolas Denancé1, Martial Briand1, Romain Gaborieau1, Sylvain Gaillard1, Marie-Agnès Jacques2.
Abstract
BACKGROUND: The phytopathogenic bacterium Xylella fastidiosa was thought to be restricted to the Americas where it infects and kills numerous hosts. Its detection worldwide has been blooming since 2013 in Europe and Asia. Genetically diverse, this species is divided into six subspecies but genetic traits governing this classification are poorly understood.Entities:
Keywords: 16S rRNA gene; Horizontal gene transfer; K-mer; Phylogeny; SkIf; Taxonomy
Mesh:
Substances:
Year: 2019 PMID: 30909861 PMCID: PMC6434890 DOI: 10.1186/s12864-019-5565-9
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
List of the 47 Xylella genome sequences used in this study
| Genotype | Strain | STa | Host plant | Country (year)b | Accession number | Reference |
|---|---|---|---|---|---|---|
|
| ATCC 35879 | 2 |
| FL, USA (1987) | NZ_JQAP00000000 | Unpublished |
| subsp. | DSM 10026 | 2 |
| FL, USA (1987) | NZ_FQWN01000006 | Unpublished |
|
| CFBP 7969 | 2 |
| NC, USA (1985) | PHFQ00000000 | This study |
| CFBP 7970 | 2 |
| FL, USA (1987) | PHFR00000000 | This study | |
| CFBP 8071 | 1 |
| CA, USA (1987) | PHFP00000000 | This study | |
| CFBP 8073 | 75 |
| Mexico (2012) | LKES00000000 | [ | |
| CFBP 8082 | 2 |
| FL, USA (1983) | PHFT00000000 | This study | |
| CFBP 8351 | 1 | CA, USA (1993) | PHFU00000000 | This study | ||
| EB92–1 | 1 |
| FL, USA (1992) | AFDJ00000000 | [ | |
| GB514 | 1 |
| TX, USA (2007) | NC_017562 | [ | |
| M23 | 1 |
| CA, USA (2003) | NC_010577 | [ | |
| Stag’s Leap | 1 |
| CA, USA (1994) | LSMJ010000 | [ | |
| Temecula1 | 1 |
| CA, USA 1998) | NC_004556 | [ | |
| ATCC 35871 | 41 |
| CA, USA (1983) | NZ_AUAJ00000000 | Unpublished | |
|
| BB01 | 42 |
| GA, USA (2016) | NZ_MPAZ01000000 | [ |
| CFBP 8078 | 51 | FL, USA (1983) | PHFS00000000 | This study | ||
| CFBP 8416 | 7 |
| COR, FR (2015) | LUYC00000000 | [ | |
| CFBP 8417 | 6 |
| COR, FR (2015) | LUYB00000000 | [ | |
| CFBP 8418 | 6 |
| COR, FR (2015) | LUYA00000000 | [ | |
| Dixon | 6 |
| CA, USA (1994) | AAAL00000000 | [ | |
| Griffin-1 | 7 |
| GA, USA (2006) | AVGA00000000 | [ | |
| M12 | 7 |
| CA, USA (2003) | NC_010513 | [ | |
| Sy-VA | 8 |
| VA, USA (2002) | JMHP00000000 | [ | |
| Ann-1 | 5 |
| CA, USA (1995) | CP006696 | [ | |
|
| CFBP 8356 | 72 |
| Costa Rica (2015) | PHFV00000000 | This study |
| Co33 | 72 |
| Costa Rica (2014) | LJZW00000000 | [ | |
| Mul0034 | 30 |
| USA (2003) | CP006740 | [ | |
|
| Mul-MD | 29 |
| MD, USA (2011) | AXDP00000000 | [ |
| 32 | 16 |
| Brazil (1997) | AWYH00000000 | [ | |
|
| 3124 | 16 | Brazil (2009) | CP009829 | Unpublished | |
| 11,399 | 12 |
| Brazil (1996) | NZ_JNBT01000030 | [ | |
| 6c | 14 |
| Brazil (1997) | AXBS00000000 | [ | |
| 9a5c | 13 |
| Brazil (1992) | NC_002488 | [ | |
| CFBP 8072 | 74 |
| Ecuador (2012) | LKDK00000000 | [ | |
| CoDiRO | 53 |
| Italy (2013) | JUJW00000000 | [ | |
| COF0324 | 14 | Costa Rica (2006) | LRVG01000000 | Unpublished | ||
| COF0407 | 53 | Costa Rica (2009) | LRVJ00000000 | Unpublished | ||
| CVC0251 | 12 |
| Brazil (1999) | LRVE01000000 | Unpublished | |
| CVC0256 | 12 |
| Brazil (1999) | LRVF01000000 | Unpublished | |
| Fb7 | 69 |
| Argentina (1998) | CP010051 | Unpublished | |
| Hib4 | 70 |
| Brazil (2000) | CP009885 | Unpublished | |
| J1a12 | 12 | Brazil (2001) | CP009823 | Unpublished | ||
| OLS0478 | 53 |
| Costa Rica (2010) | LRVI00000000 | Unpublished | |
| OLS0479 | 53 |
| Costa Rica (2010) | LRVH00000000 | Unpublished | |
| Pr8x | 14 |
| Brazil (2009) | CP009826 | Unpublished | |
| U24D | 13 |
| Brazil (2000) | CP009790 | Unpublished | |
|
| PLS 229 | – |
| Taiwan (−) | JDSQ00000000 | [ |
aSequence Type determined following the MSLT scheme dedicated to X. fastidiosa [52]
bExact year of isolation or oldest year of literature citing the stain
cDNA was recovered from infected periwinkle. This genome is the one of the CoDiRO strain, the agent responsible for the Olive Quick Decline Syndrome in Italy (46)
List of Xylella fastidiosa strains sequenced for this study and genome properties
| Strain | Accession | Nb of readsa | Cover.b | Assemby size (bp) | Nb contigsc | N50 | Mean size (bp) | Largest (bp) | GC % |
|---|---|---|---|---|---|---|---|---|---|
| CFBP 7969 | PHFQ00000000 | 7,952,452 | 957x | 2,436,752 | 89 | 116,341 | 27,379 | 445,308 | 51.48 |
| CFBP 7970 | PHFR00000000 | 8,041,300 | 968x | 2,493,794 | 93 | 104,928 | 26,815 | 258,911 | 51.45 |
| CFBP 8071 | PHFP00000000 | 7,606,748 | 916x | 2,489,737 | 101 | 104,990 | 24,651 | 297,538 | 51.48 |
| CFBP 8082 | PHFT00000000 | 7,610,344 | 916x | 2,532,132 | 118 | 104,927 | 21,459 | 301,313 | 51.51 |
| CFBP 8351 | PHFU00000000 | 8,741,758 | 1053x | 2,479,202 | 93 | 104,608 | 26,658 | 266,361 | 51.45 |
| CFBP 8078 | PHFS00000000 | 8,807,962 | 1060x | 2,602,010 | 191 | 87,559 | 13,623 | 204,167 | 51.67 |
| CFBP 8356 | PHFV00000000 | 8,088,406 | 974x | 2,541,621 | 197 | 93,086 | 12,902 | 190,454 | 51.58 |
apair-end (301 bp)
bCoverage calculated for a mean genome of 2.5 Mb
cLarger than 500 bp
Properties of genome sequences of strains CFBP 7970, DSM 10026 and ATCC 35879
| CFBP 7970a | DSM 10026b | ATCC 35879c | |
|---|---|---|---|
| Sequencing technology | Ilumina MiSeq | Shot gun | Illumina MiSeq |
| Assembling method | Velvet | Not available | CLC Genomic Workbench |
| Genome size | 2,493,794 bp | 2,426,538 bp | 2,522,328 bp |
| Number of contigs | 93 | 72 | 16 |
| Minimal size of contigs | 500 bp | 1 kb | 1.2 kb |
| Coverage | 968x | 416x | 1380x |
a Data from the present study
b More details at: https://www.ncbi.nlm.nih.gov/genome/173?genome_assembly_id=295121
c More details at: https://www.ncbi.nlm.nih.gov/genome/173?genome_assembly_id=212014
Repertoire of 16S rRNA gene sequences in 47 genomes of Xylella sp
| Codes of strains having one copy of 16S rRNA | Codes of strains having two copies of 16S rRNA | |
|---|---|---|
| X. | EB92–1, CFBP 8073, CFBP 8351 | ATCC 35879, CFBP 7969, CFBP 7970, CFBP 8071, CFBP 8082, DSM10026, GB514, M23, Stag’s Leap, Temecula1 |
| X. | ATCC 35871, BB01, CFBP 8078, CFBP 8417, CFBP 8418, Dixon, Griffin-1, Sy-VA | CFBP 8416, M12 |
| X. | Mul-MD | Mul0034 |
| X. | CFBP 8356 | Ann-1, CO33 |
| X. | CFBP 8072, COF0324, COF0407, OLS0479, Xf6c, Xf32 | 11,399, 3124, 9a5c, CoDiRO, CVC0251, CVC0256, Fb7, Hib4, J1a12, OLS0478, Pr8x, U24D |
| X. | PLS 229 | – |
Specific signatures in 16S rRNA nucleotide sequences to discriminate X. fastidiosa subspecies
| SNPs at the designed positionsa | ||||||||
|---|---|---|---|---|---|---|---|---|
| 75a | 76 | 151 | 455 | 474 | 1127 | 1264 | 1340 | |
| C | A | C | G | – | G | A | C | |
| C | A | C | G | – | T | A | C | |
| C | A | T | A | – | G | G | C | |
| C | A | T | A | T | G | A | C | |
| T | A | C | A | – | G | G | C | |
| C | G | T | A | – | G | A | T | |
arefers to SNP positions within the alignment of the copies of 16S rRNA (Additional file 4)
brefers to Ann-1strain only
crefers to strains CFBP 8356 and CO33 strains
Fig. 1Distribution of k-mers along the X. fastidiosa genome sequences. Frequency of core (black) and specific (colored) k-mers mapped onto the genome of reference (mentioned into brackets) of each subspecies. a subsp. fastidiosa with or without CFBP 8073 strain. b subsp. sandyi. c subsp. morus. d subsp. multiplex. e subsp. pauca
Main features related to the specific mers identified in X. fastidiosa subspecies
| A. | FAS1 | FAS21 | SAN1 | SAN21 | MOR1 | MUL1 | PAU1 | |
| number of mers | 2905 | 1978 | 9808 | 5765 | 3094 | 4906 | 11,365 | |
| number of unique mers | 2836 | 1957 | 9431 | 5636 | 2995 | 4813 | 11,162 | |
| total mer size (bp) | 133,179 | 85,038 | 518,683 | 292,740 | 142,614 | 258,228 | 627,685 | |
| number of mers in CDS | 2172 | 1411 | 7783 | 4161 | 2341 | 3603 | 9088 | |
| number of unique CDS | 1142 | 811 | 2406 | 1646 | 1115 | 1119 | 1711 | |
| total mer size in CDS (bp) | 100,015 | 60,092 | 414,736 | 215,901 | 108,336 | 189,973 | 504,054 | |
| number of mers in intergenic regions | 733 | 567 | 2025 | 1604 | 753 | 1303 | 2277 | |
| total mer size in intergenic regions (bp) | 33,164 | 24,946 | 103,947 | 76,839 | 34,278 | 68,255 | 123,631 | |
| B. Combination | MOR-FAS1 | MOR-FAS21 | MOR-SAN1 | MOR-SAN21 | MOR-SAN-SAN21 | MOR-SAN-SAN2-FAS21 | MOR-MUL1 | MOR-PAU1 |
| number of mers | 495 | 1236 | 352 | 371 | 237 | 3389 | 2131 | 71 |
| number of unique mers | 491 | 1222 | 347 | 358 | 235 | 3369 | 2072 | 71 |
| total mer size (bp) | 20,058 | 53,052 | 13,123 | 14,066 | 8019 | 136,470 | 98,450 | 2377 |
| number of mers in CDS | 331 | 825 | 258 | 243 | 116 | 2367 | 1638 | 58 |
| number of unique CDS | 216 | 430 | 192 | 178 | 103 | 806 | 572 | 41 |
| total mer size in CDS (bp) | 13,428 | 36,147 | 9816 | 9275 | 4249 | 96,946 | 76,167 | 2018 |
| number of mers in intergenic regions | 164 | 411 | 94 | 128 | 121 | 1022 | 493 | 13 |
| total mer size in intergenic regions (bp) | 6630 | 16,905 | 3307 | 4791 | 3770 | 39,524 | 22,283 | 359 |
| C. Within subspecies | CFBP 80721 | Hib41 | ||||||
| number of mers | 1266 | 2360 | 3885 | 4775 | 4694 | |||
| number of unique mers | 1238 | 2323 | 3644 | 4663 | 4596 | |||
| total mer size (bp) | 59,733 | 121,367 | 207,098 | 283,540 | 341,563 | |||
| number of mers in CDS | 1003 | 1776 | 3194 | 3150 | 3338 | |||
| number of unique CDS | 486 | 716 | 1147 | 1205 | 1324 | |||
| total mer size in CDS (bp) | 47,655 | 92,528 | 173,936 | 194,240 | 269,888 | |||
| number of mers in intergenic regions | 263 | 584 | 690 | 1625 | 1356 | |||
| total mer size in intergenic regions (bp) | 12,078 | 28,839 | 33,162 | 89,300 | 71,675 | |||
1Composition of the groups:
MOR (subsp. morus): Mul-MD and Mul0034 (Reference: 2,666,577 bp). FAS (subsp. fastidiosa): ATCC 35879, DSM 10026, CFBP 7969, CFBP 7970, CFBP 8071, CFBP 8082, CFBP 8351, EB92–1, GB514, M23, Stag’s Leap and Temecula1 (Reference: 2,521,148 bp). FAS2 (subsp. fastidiosa): All the members of the group FAS (with Temecula1 as reference), plus CFBP 8073. MUL (subsp. multiplex): ATCC 35871, BB01, CFBP 8078, CFBP 8416, CFBP 8417, CFBP 8418, Dixon, Griffin-1, Sy-VA and M12 (Reference: 2,475,130 bp). SAN (subsp. sandyi): Ann-1 (Reference: 2,780,908 bp). SAN2 (subsp. sandyi-like): CFBP 8356 and CO33 (Reference: 2,416,985 bp). PAU (subsp. pauca): 32, 3124, 11,399, 6c, CFBP 8072, CoDiRO, COF0324, COF0407, CVC0251, CVC0256, Fb7, Hib4, J1a12, OLS0478, OLS0479, Pr8x, U24D and 9a5c (Reference: 2,731,750 bp). I.1 (subsp. pauca): U24D, Fb7, CVC0251, CVC0256, J1a12, 11,399, 3124, 32 and 9a5c (Reference: 2,731,750 bp). I.2 (subsp. pauca): 6c, COF0324 and Pr8x (Reference: 2,666,242 bp). I.3 (subsp. pauca): COF0407, OLS0478, OLS0479 and CoDiRO (Reference: 2,542,932 bp). CFBP 8072 (subsp. pauca; 2,496,662 bp). Hib4 (subsp. pauca; 2,877,548 bp)
Main Gene Ontologies (GO) identified as enriched in almost all the subspecies for the CDS harboring specific mers
| GO term1 | Description | FAS2,3 | FAS22,3 | MUL2,3 | PAU2,3 | SAN2,3 | SAN22,3 |
|---|---|---|---|---|---|---|---|
| GO:0003824 | catalytic activity | 473/368 | 343/498 | 358/308 | 672/143 | 779/86 | 630/229 |
| GO:0000166 | nucleotide binding | 138/96 | 106/128 | 143/90 | 240/32 | – | 228/54 |
| GO:0017076 | purine nucleotide binding | 109/73 | 87/95 | 118/70 | 197/20 | – | 181/44 |
| GO:0032553 | ribonucleotide binding | 114/74 | 89/99 | 121/72 | 203/23 | – | 186/48 |
| GO:0032555 | purine ribonucleotide binding | 108/73 | 87/94 | 117/70 | 197/19 | – | 181/44 |
| GO:1901265 | nucleoside phosphate binding | 138/96 | 106/128 | 143/90 | 240/32 | – | 228/54 |
| GO:0036094 | small molecule binding | 153/103 | 114/142 | 155/100 | 266/33 | – | 251/62 |
| GO:0043168 | anion binding | 140/87 | 102/125 | 140/88 | 242/27 | – | 226/59 |
| GO:0097367 | carbohydrate derivative binding | 117/81 | 94/104 | 126/73 | 209/23 | – | 190/50 |
| GO:0005488 | binding | 323/277 | 247/353 | 296/245 | 547/140 | – | 502/187 |
| GO:0008152 | metabolic process | 465/411 | 355/521 | 397/331 | 723/168 | – | 644/264 |
1Complete datasets are provided in Additional file 7
2Top line: number of GO-associated CDSs in the list of CDSs harboring specific mers (query) / number of GO-associated CDSs in the CDSs of reference genome that do not harbor specific mers. Middle line: number of non-annotated (no GOs) CDSs in the list of CDSs harboring specific mers (query) / number of non-annotated (no GOs) CDSs in the CDSs of reference genome that do not harbor specific mers. The addition of the four values in each column correspond to the total number of CDS of the reference genome. The addition of the numerator values corresponds to the number of CDS in the query list. The addition of the denominator values corresponds to the number of CDSs of the reference genome that are not in the list of CDSs harboring specific mers. Bottom line: FDR/ P-value
3Composition of the groups: FAS (subsp. fastidiosa, 12): ATCC 35879, DSM 10026, CFBP 7969, CFBP 7970, CFBP 8071, CFBP 8082, CFBP 8351, EB92–1, GB514, M23, Stag’s Leap, Temecula1. FAS2 (subsp. fastidiosa, 13): All the members of the group FAS, plus CFBP 8073. MUL (subsp. multiplex, 10): ATCC 35871, BB01, CFBP 8078, CFBP 8416, CFBP 8417, CFBP 8418, Dixon, Griffin-1, M12, Sy-VA. SAN (subsp. sandyi, 1): Ann-1. SAN2 (subsp. sandyi-like, 2): CO33 and CFBP 8356. PAU (subsp. pauca, 18): 32, 3124, 11,399, 6c, 9a5c, CFBP 8072, CoDiRO, COF0324, COF0407, CVC0251, CVC0256, Fb7, Hib4, J1a12, OLS0478, OLS0479, Pr8x, U24D
Fig. 2Overlap between gene ontologies differentially represented (over or under) in genes harboring specific k-mers. a Relationships between six groups: FAS and FAS2, subsp. fastidiosa without or with CFBP 8073, respectively; MUL, subsp. multiplex; SAN and SAN2, subsp. sandyi and sandyi-like, respectively; PAU, subsp. pauca. Note: the subsp. morus is not indicated on the Venn diagram as it was identified only one GO term, specific to it. b Relationships between three groups (pauca, multiplex and the third one resulting from the grouping of subsp. fastidiosa, sandyi and morus)
Selected differentially represented Gene Ontologies of CDS with specific mers in X. fastidiosa subspecies or subclades
| GO term1 | Description | FDR/ | Annot. test/ref2 | Non annot. Test/ref3 | Enrichment |
|---|---|---|---|---|---|
| Specific to subsp. | |||||
| GO:0000270 | peptidoglycan metabolic process | 4.40e-36/5.38e-39 | 723/168 | 988/771 | over |
| GO:0000902 | cell morphogenesis | 4.21e-4/2.27e-5 | 25/0 | 1686/939 | over |
| GO:0005886 | plasma membrane | 0.0017/1.15e-4 | 96/23 | 1615/916 | over |
| GO:0009252 | peptidoglycan biosynthetic process | 0.0029/2.08e-4 | 20/0 | 1691/939 | over |
| GO:0009273 | peptidoglycan-based cell wall biogenesis | 0.0029/2.08e-4 | 20/0 | 1691/939 | over |
| GO:0009279 | cell outer membrane | 0.0346/0.0035 | 19/1 | 1692/938 | over |
| GO:0009653 | anatomical structure morphogenesis | 4.21e-4/2.272e-5 | 25/0 | 1686/939 | over |
| GO:0016021 | integral component of membrane | 0.0060/4.49e-4 | 292/112 | 1419/827 | over |
| GO:0019867 | outer membrane | 0.0458/0.0047 | 25/3 | 1686/936 | over |
| GO:0030312 | external encapsulating structure | 0.0088/7.40e-4 | 22/1 | 1689/938 | over |
| GO:0031224 | intrinsic component of membrane | 0.0049/3.68e-4 | 293/112 | 1418/827 | over |
| GO:0042546 | cell wall biogenesis | 0.0029/2.08e-4 | 20/0 | 1691/939 | over |
| GO:0044036 | cell wall macromolecule metabolic process | 0.0012/7.19e-5 | 23/0 | 1688/939 | over |
| GO:0044038 | cell wall macromolecule biosynthetic process | 0.0029/2.08e-4 | 20/0 | 1691/939 | over |
| GO:0044425 | membrane part | 0.0014/8.82eE-5 | 302/112 | 1409/827 | over |
| GO:0044462 | external encapsulating structure part | 0.0346/0.0035 | 19/1 | 1692/938 | over |
| GO:0045229 | external encapsulating structure organization | 0.0023/1.55e-4 | 26/1 | 1685/938 | over |
| GO:0048856 | anatomical structure development | 4.21e-4/2.27e-5 | 25/0 | 1686/939 | over |
| GO:0071554 | cell wall organization or biogenesis | 0.0094/7.93e-4 | 23/1 | 1688/938 | over |
| GO:0071555 | cell wall organization | 0.0346/0.0035 | 19/1 | 1692/938 | over |
| GO:0006164 | purine nucleotide biosynthetic process | 0.0079/6.34e-4 | 27/2 | 1684/937 | over |
| GO:0009127 | purine nucleoside monophosphate biosynthetic process | 0.0088/7.40e-4 | 22/1 | 1689/938 | over |
| GO:0009144 | purine nucleoside triphosphate metabolic process | 0.0035/2.62e-4 | 28/1 | 1686/938 | over |
| GO:0009152 | purine ribonucleotide biosynthetic process | 0.0122/0.0010 | 26/2 | 1685/937 | over |
| GO:0009168 | purine ribonucleoside monophosphate biosynthetic process | 0.0088/7.4042e-4 | 22/1 | 1689/938 | over |
| GO:0009205 | purine ribonucleoside triphosphate metabolic process | 0.0035/2.6218e-4 | 25/1 | 1686/938 | over |
| GO:0072522 | purine-containing compound biosynthetic process | 0.0346/0.0034 | 18/1 | 1693/938 | over |
| GO:0072528 | pyrimidine-containing compound biosynthetic process | 0.0034/2.457e-4 | 29/2 | 1682/937 | over |
| GO:0009117 | nucleotide metabolic process | 0.0012/7.00e-5 | 64/11 | 1647/928 | over |
| GO:0009123 | nucleoside monophosphate metabolic process | 1.59e-5/5.71e-7 | 49/3 | 1662/936 | over |
| GO:0009124 | nucleoside monophosphate biosynthetic process | 0.0014/8.89e-5 | 32/2 | 1679/937 | over |
| GO:0009141 | nucleoside triphosphate metabolic process | 0.0034/2.45e-4 | 29/2 | 1682/937 | over |
| GO:0009156 | ribonucleoside monophosphate biosynthetic process | 6.00e-4/3.25e-5 | 30/1 | 1681/938 | over |
| GO:0009165 | nucleotide biosynthetic process | 0.0106/8.96e-4 | 43/7 | 1668/932 | over |
| GO:0009199 | ribonucleoside triphosphate metabolic process | 0.0023/1.55e-4 | 26/1 | 1685/938 | over |
| GO:0009260 | ribonucleotide biosynthetic process | 6.15e-4/3.36e-5 | 35/2 | 1676/937 | over |
| GO:0016032 | viral process | 0.0213/0.0019 | 0/6 | 1711/933 | under |
| GO:0019058 | viral life cycle | 0.0213/0.0019 | 0/6 | 1711/933 | under |
| GO:0019068 | virion assembly | 0.0213/0.0019 | 0/6 | 1711/933 | under |
| GO:0044403 | Symbiont process | 0.0213/0.0019 | 0/6 | 1711/933 | under |
| Specific to subsp. | |||||
| GO:0006304 | DNA modification | 0.0030/5.64e-6 | 16/0 | 1126/1280 | over |
| GO:0006305 | DNA alkylation | 0.0469/5.31e-4 | 10/0 | 1132/1280 | over |
| GO:0006306 | DNA methylation | 0.0469/5.31e-4 | 10/0 | 1132/1280 | over |
| GO:0044728 | DNA methylation or demethylation | 0.0469/5.31e-4 | 10/0 | 1132/1280 | over |
| GO:0009110 | vitamin biosynthetic process | 0.0469/5.56e-4 | 18/3 | 1124/1277 | over |
| GO:0042364 | water-soluble vitamin biosynthetic process | 0.0469/5.56e-4 | 18/3 | 1124/1277 | over |
| Specific to subsp. | |||||
| GO:0006259 | DNA metabolic process | 0.0021/2.52e-5 | 53/21 | 1066/1222 | over |
| GO:0071103 | DNA conformation change | 0.0324/5.80eE-4 | 13/1 | 1106/1242 | over |
| GO:0140097 | catalytic activity, acting on DNA | 0.0018/2.11eE-5 | 33/8 | 1086/1235 | over |
| GO:0006996 | organelle organization | 0.0256/4.47e-4 | 162 | 1103/1241 | over |
| Specific to subsp. | |||||
| GO:0006260 | DNA replication | 0.0485/1.99e-5 | 31/10 | 1084/1491 | over |
| Specific to the combination of subsp. | |||||
| GO:1901607 | alpha-amino acid biosynthetic process | 0.0390/2.30e-4 | 26/35 | 546/2009 | over |
| Specific to the combination of subsp. | |||||
| GO:0022411 | cellular component disassembly | 0.0168/8.44e-4 | 6/0 | 800/1810 | over |
| GO:0032984 | macromolecular complex disassembly | 0.0168/8.44e-4 | 6/0 | 800/1810 | over |
| GO:0043241 | protein complex disassembly | 0.0168/8.44e-4 | 6/0 | 800/1810 | over |
| GO:0023052 | signaling | 0.0496/0.0031 | 18/14 | 788/1796 | over |
| GO:0007165 | signal transduction | 0.0496/0.0031 | 18/14 | 788/1796 | over |
| GO:0006090 | pyruvate metabolic process | 0.0086/3.44e-4 | 13/5 | 793/1805 | over |
| GO:0006096 | glycolytic process | 0.0149/5.74e-4 | 17/10 | 789/1800 | over |
| GO:0006733 | oxidoreduction coenzyme metabolic process | 0.0343/0.0018 | 8/2 | 798/1808 | over |
| GO:0044264 | cellular polysaccharide metabolic process | 0.0149/7.01e-4 | 9/2 | 797/1808 | over |
| GO:0006757 | ATP generation from ADP | 0.0078/3.08e-4 | 11/3 | 795/1807 | over |
| GO:0016052 | carbohydrate catabolic process | 0.0359/0.0020 | 10/4 | 796/1806 | over |
| GO:0005976 | polysaccharide metabolic process | 0.0359/0.0020 | 9/3 | 797/1807 | over |
| GO:0006165 | nucleoside diphosphate phosphorylation | 0.0168/8.36e-4 | 11/4 | 795/1806 | over |
| GO:0009132 | nucleoside diphosphate metabolic process | 0.0066/2.55e-4 | 10/2 | 796/1808 | over |
| GO:0009135 | purine nucleoside diphosphate metabolic process | 0.0066/2.55e-4 | 10/2 | 796/1808 | over |
| GO:0009179 | purine ribonucleoside diphosphate metabolic process | 0.0066/2.55e-4 | 10/2 | 796/1808 | over |
| GO:0009185 | ribonucleoside diphosphate metabolic process | 0.0383/0.0022 | 13/7 | 793/1803 | over |
| GO:0019362 | pyridine nucleotide metabolic process | 0.0383/0.0022 | 13/7 | 793/1803 | over |
| GO:0046496 | nicotinamide nucleotide metabolic process | 0.0066/2.58e-4 | 7/0 | 799/1810 | over |
| GO:0003755 | peptidyl-prolyl cis-trans isomerase activity | 0.0066/2.58e-4 | 7/0 | 799/1810 | over |
| GO:0000413 | protein peptidyl-prolyl isomerization | 0.0066/2.58e-4 | 7/0 | 799/1810 | over |
| GO:0016859 | cis-trans isomerase activity | 0.0383/0.0022 | 13/7 | 793/1803 | over |
| GO:0018208 | peptidyl-proline modification | 0.0168/8.36e-4 | 11/4 | 795/1806 | over |
| GO:0042221 | response to chemical | 0.0454/0.00275 | 5/0 | 801/1810 | over |
| GO:0000049 | tRNA binding | 0.0454/0.00275 | 5/0 | 801/1810 | over |
| GO:0006935 | chemotaxis | 0.0454/0.00275 | 5/0 | 801/1810 | over |
| GO:0040011 | locomotion | 0.0168/8.44e-4 | 6/0 | 800/1810 | over |
| GO:0042330 | taxis | 0.0168/8.44e-4 | 6/0 | 800/1810 | over |
| Specific to the subclade I.3 from subsp. | |||||
| GO:0006310 | DNA recombination | 0.0093/5.91e-4 | 0/12 | 1147/1281 | under |
| GO:0006812 | cation transport | 0.0368/0.0029 | 2/15 | 1145/1278 | under |
| GO:0015672 | monovalent inorganic cation transport | 0.0282/0.0022 | 1/13 | 1146/1280 | under |
| GO:0034220 | ion transmembrane transport | 0.0089/5.56e-4 | 2/19 | 1145/1274 | under |
| GO:0098655 | cation transmembrane transport | 0.0167/0.0012 | 1/14 | 1146/1279 | under |
| GO:0098660 | inorganic ion transmembrane transport | 0.0103/6.72e-4 | 1/15 | 1146/1278 | under |
| GO:0098662 | inorganic cation transmembrane transport | 0.0488/0.0040 | 1/12 | 1146/1281 | under |
| GO:0008324 | cation transmembrane transporter activity | 0.0282/0.0022 | 1/13 | 1146/1280 | under |
| GO:0044422 | organelle part | 0.0167/0.0012 | 1/14 | 1146/1279 | under |
| GO:0044446 | intracellular organelle part | 0.0167/0.0012 | 1/14 | 1146/1279 | under |
| Specific to the CFBP 8072 genome from subsp. | |||||
| GO:0009142 | nucleoside triphosphate biosynthetic process | 0.0303/0.0020 | 0/8 | 1205/1024 | under |
| GO:0072330 | monocarboxylic acid biosynthetic process | 0.0158/9.28e-4 | 0/9 | 1205/1023 | under |
| Specific to the Hib4 genome from subsp. | |||||
| GO:0006950 | response to stress | 1.39e-4/1.05e-5 | 1/17 | 1323/1047 | under |
| GO:0006979 | response to oxidative stress | 0.0279/0.0034 | 0/7 | 1324/1057 | under |
| GO:0033554 | cellular response to stress | 0.0153/0.0017 | 1/11 | 1323/1053 | under |
| GO:0008565 | protein transporter activity | 0.0279/0.0034 | 1/10 | 1323/1054 | under |
| GO:0009055 | electron transfer activity | 0.0279/0.0034 | 0/7 | 1324/1057 | under |
| GO:0015197 | peptide transporter activity | 0.0153/0.0017 | 1/11 | 1323/1053 | under |
| GO:0016667 | oxidoreductase activity, acting on a sulfur group of donors | 0.0135/0.0015 | 0/8 | 1324/1056 | under |
| GO:0051540 | metal cluster binding | 0.0055/5.56e-4 | 2/14 | 1322/1050 | under |
| GO:0051536 | iron-sulfur cluster binding | 0.0055/5.56e-4 | 2/14 | 1322/1050 | under |
| GO:0051539 | 4 iron, 4 sulfur cluster binding | 0.0279/0.0034 | 1/10 | 1323/1054 | under |
| GO:0022607 | cellular component assembly | 0.0023/2.15e-4 | 1/13 | 1323/1051 | under |
| GO:0043933 | macromolecular complex subunit organization | 0.0066/6.79e-4 | 0/9 | 1324/1055 | under |
1Complete datasets are provided in Additional file 7
2Annot test/ref.: number of GO-associated CDS in the list of CDS harboring specific mers (query) / number of GO-associated CDS in the reference genome
3Non-annot test/ref.: number of non-annotated (no GOs) CDS in the list of CDS harboring specific mers (query) / number of non-annotated (no GOs) CDS in the reference genome
Fig. 3Distribution of k-mers specific to the X. fastidiosa subspecies morus and others. a core k-merome of X. fastidiosa species. b specific subsp. morus. c, d, e specific subsp. morus + sandyi and/or sandyi-like. f specific subsp. morus + multiplex. g, h specific subsp. morus + fastidiosa (with/without CFBP 8073 strain). i specific subsp. morus + fastidiosa (with CFBP 8073) + subsp. sandyi + subsp. sandyi-like. j specific subsp. morus + pauca. Frequency of k-mers are mapped onto the genome of reference for subsp. morus (Mul0034)
Fig. 4Distribution of k-mers specific to the X. fastidiosa subspecies pauca and its subclades. a core k-merome of X. fastidiosa species. b specific subsp. pauca. c specific of subclade I.2, I.3 and strain Hib4 from subsp. pauca. Frequency of k-mers are mapped onto the genome of reference for subsp. morus (Mul0034)
Fig. 5Phylogenetic representation of X. fastidiosa using k-mers, ANIb and MLSA schemes. All the representations were constructed using the 46 X. fastidiosa, with addition of the X. taiwanensis genome sequence. a Whole genome-based dendrogram built with distance matrixes obtained after running simka (shared k-mers of 22 nucleotides) or ANIb (1020 nt) algorithms. Some specificities and similarities in enriched gene ontologies or identification of plasmid and chromosomic sequences specific to X. fastidiosa are highlighted at nodes or subclades. b Maximum-Likelihood (ML) tree constructed with 1000 replicates for bootstrap values using the concatenated sequences (4161 bp) of seven housekeeping genes from a MultiLocus Sequence Analysis (MLSA) scheme. Key features related to X. fastidiosa subspecies obtained through the combination of specific k-mer identified and gene ontologies enrichment tests are indicated at nodes
Fig. 6Inter- and intrasubspecies comparisons of ANIb and shared k-mers values. a Boxplot of the ANIb values calculated from our genome dataset. b Boxplot of the shared k-mer values. c Dot plot of the ANIb and shared k-mer mean values. Linear regression and its corresponding r2 is indicated. For intrasubpecies comparisons, the number of plotted values corresponds to [(number of genome) 2 - number of genome]. For intersubspecies comparisons, it corresponds to [(2 * number of genome subspecies A * number of genome subspecies B)]. Number of genomes: fastidiosa (13), morus (2), sandyi (3), multiplex (10), pauca (18)