| Literature DB >> 28201585 |
José S L Patané1, Joaquim Martins1, Ana Beatriz Castelão2, Christiane Nishibe3, Luciana Montera3, Fabiana Bigi4, Martin J Zumárraga4, Angel A Cataldi4, Antônio Fonseca Junior5, Eliana Roxo6, Ana Luiza, A R Osório7, Kláudia S Jorge Ufms7, Tyler C Thacker8, Nalvo F Almeida3, Flabio R Araújo9, João C Setubal.
Abstract
Entities:
Year: 2017 PMID: 28201585 PMCID: PMC5381553 DOI: 10.1093/gbe/evx022
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FCore- (left) and Pan-genome (right) curves including only genomes of the 38-data set having <100 contigs (33 genomes). The core-genome is close to a plateau, indicating the complete core set is included in the 38-data set analysis (below 3,000 genes). The pan-genome curve indicates the pool of genes is quite high even within relatively closely related M. bovis genomes. Equations refer to the exponential regression for each curve.
FML tree of the 38-data set matrix obtained using the site-congruence method. Left: branch support values (UFBoot); Right: same tree with actual branch lengths according to ML. RI for this tree (retention index for the character “sampling locality”) = 0.81.
FVariability of genomic characteristics across the 38 core-coding genes obtained in DAMBE. Each index was normalized to a relative average of 1.0 for better visualization. Numbers in x axis refer to individual genomes, with the 11 US genomes starting at position #28. Nc, effective number of codons; RSCU, relative synonymous codon usage; CAI2, corrected version of the codon adaptation index; GC-content, average genomic GC content; Sum(ACGT), size of the core-coding data set in base pairs.
COG Categories Enriched for Genes under Positive Selection
| COG Abbr. | COG Category | All | Sel | % All | % Sel | (%Sel)/(%All) | ||
|---|---|---|---|---|---|---|---|---|
| A | RNA processing and modification | 1 | 0 | 0.0004 | 0.0000 | 0.00 | ||
| B | Chromatin structure and dynamics | 0 | 0 | 0.0000 | 0.0000 | — | ||
| C | Energy production and conversion | 147 | 6 | 0.0644 | 0.0541 | 0.84 | ||
| D | Cell cycle control, cell division, chromosome partitioning | 32 | 6 | 0.0140 | 0.0541 | 3.85** | ||
| E | Amino acid transport and metabolism | 165 | 6 | 0.0723 | 0.0541 | 0.75 | ||
| F | Nucleotide transport and metabolism | 69 | 4 | 0.0302 | 0.0360 | 1.19 | ||
| G | Carbohydrate transport and metabolism | 101 | 4 | 0.0443 | 0.0360 | 0.81 | ||
| H | Coenzyme transport and metabolism | 123 | 6 | 0.0539 | 0.0541 | 1.00 | ||
| I | Lipid transport and metabolism | 188 | 14 | 0.0824 | 0.1261 | 1.53** | ||
| J | Translation, ribosomal structure and biogenesis | 115 | 5 | 0.0504 | 0.0450 | 0.89 | ||
| K | Transcription | 167 | 9 | 0.0732 | 0.0811 | 1.11 | ||
| L | Replication, recombination and repair | 95 | 10 | 0.0416 | 0.0901 | 2.16** | ||
| M | Cell wall/membrane/envelope biogenesis | 89 | 4 | 0.0390 | 0.0360 | 0.92 | ||
| N | Cell motility | 22 | 3 | 0.0096 | 0.0270 | 2.80** | ||
| O | Posttranslational modification, protein turnover, chaperones | 100 | 7 | 0.0438 | 0.0631 | 1.44* | ||
| P | Inorganic ion transport and metabolism | 98 | 5 | 0.0429 | 0.0450 | 1.05 | ||
| Q | Secondary metabolites biosynthesis, transport and catabolism | 129 | 5 | 0.0565 | 0.0450 | 0.80 | ||
| R | General function prediction only | 334 | 8 | 0.1464 | 0.0721 | 0.49 | ||
| S | Function unknown | 183 | 5 | 0.0802 | 0.0450 | 0.56 | ||
| T | Signal transduction mechanisms | 76 | 2 | 0.0333 | 0.0180 | 0.54 | ||
| U | Intracellular trafficking, secretion, and vesicular transport | 21 | 1 | 0.0092 | 0.0090 | 0.98 | ||
| V | Defense mechanisms | 27 | 1 | 0.0118 | 0.0090 | 0.76 | ||
| W | Extracellular structures | 0 | 0 | 0.0000 | 0.0000 | — | ||
| Y | Nuclear structure | 0 | 0 | 0.0000 | 0.0000 | — | ||
| Z | Cytoskeleton | 0 | 0 | 0.0000 | 0.0000 | — | ||
| 2282 | 111 | 1.00 | 1.00 | |||||
Note.—All, all genes; Sel, genes under positive selection; %, relative frequencies; “Total” represents the amount of genes for which a COG was found, and sum of relative frequencies for each COG (=1.0). The last column shows the proportion between (%Sel)/(%All). Test for COG categories significantly enriched for positive selection was done in R by bootstrapping values from this last column and then checking for significance of each category (1,000 pseudoreplicates; α = 0.05; one-tailed critical value = 1.52, i.e., genes with Sel/All > 152% are significant).
Significant under P-value = 0.05; *borderline nonsignificant.
Polymorphic Genomic Blocks ≥2,000 bp Identified across the Five Mycobacterium bovis NCBI Refseqs
| Block | AF212297 (UK) | BAA-935 (UK) | Strain 1595 (Korea) | SP38 (Brazil) | Strain 30 (USA) | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Start | End | Start | End | Start | End | Start | End | Start | End | |
| LSP-1 | 926157 | 928398 | — | 926052 | 928293 | 4176355 | 4178596 | 926097 | 928278 | |
| LSP-2 | 1036390 | 1054849 | 1029810 | 1048269 | 1036444 | 1054903 | 4286983 | 4305442 | 1036526 | 1054984 |
| LSP-3 | — | — | 1413645 | 1417785 | 316471 | 320611 | 1413866 | 1418005 | ||
| RD3 | 1764645 | 1774379 | b | 1769178 | 1778913 | 672665 | 682400 | 1769203 | 1778938 | |
| RDcap_Asia2 | 1988553 | 1997632 | — | 1993162 | 2002241 | 896315 | 905394 | 1993090 | 2002169 | |
| RD2 | 2199641 | 2210799 | a | 2204606 | 2215764 | 1106529 | 1117687 | 2204166 | 2215322 | |
| RD5oryx | 2605004 | 2607181 | — | 2610050 | 2612227 | 1511708 | 1513885 | 2609452 | 2611629 | |
| LSP-4 | 2762357 | 2772053 | 2716954 | 2725679 | 2767457 | 2777153 | 1669019 | 1678696 | c | |
| Rep1 | — | 3597147 | 3614668 | — | — | — | ||||
| Rep2 | — | 3614670 | 3633236 | — | — | — | ||||
| Rep1 | — | 3633243 | 3650816 | — | — | — | ||||
| Rep2 | — | 3650819 | 3669385 | — | — | — | ||||
| RDbovis(c)_fadD18 | 3886050 | 3890304 | 3909675 | 3913279 | 3891927 | 3896172 | 2791908 | 2795181 | c | |
| RD1 | 4286586 | 4296110 | — | 4292815 | 4302330 | 3191235 | 3200759 | 4277324 | 4286846 | |
Note.—UK genome BAA-935 had most of the losses (seven: LSP-1, LSP-3, RD3, RDcap_Asia2, RD2, RD5oryx and RD1), and is the one bearing the two duplications. US strain 30 had two losses (LSP-4 and RDbovis(c)_fadD18) and the inversion LSP-2, and genome AF2122/97 (UK) had one loss (LSP-3, shared with BAA-935). Gray shading: segment inversion.
Only 417 bp.
Only 364 bp.
Only 500 bp.
FML tree of the 76- data set composed of US strains, plus outgroup, based on their core-coding set of genes. Left: chronogram with respective ML support. Right: same tree with branches in substitutions/site. Red: clades of geographically associated genomes (a paraphyletic clade in the case of MN due to inclusion of a sample from TX).