| Literature DB >> 22452820 |
Abigail Manson McGuire1, Brian Weiner, Sang Tae Park, Ilan Wapinski, Sahadevan Raman, Gregory Dolganov, Matthew Peterson, Robert Riley, Jeremy Zucker, Thomas Abeel, Jared White, Peter Sisk, Christian Stolte, Mike Koehrsen, Robert T Yamamoto, Milena Iacobelli-Martinez, Matthew J Kidd, Andreia M Maer, Gary K Schoolnik, Aviv Regev, James Galagan.
Abstract
BACKGROUND: The sequence of the pathogen Mycobacterium tuberculosis (Mtb) strain H37Rv has been available for over a decade, but the biology of the pathogen remains poorly understood. Genome sequences from other Mtb strains and closely related bacteria present an opportunity to apply the power of comparative genomics to understand the evolution of Mtb pathogenesis. We conducted a comparative analysis using 31 genomes from the Tuberculosis Database (TBDB.org), including 8 strains of Mtb and M. bovis, 11 additional Mycobacteria, 4 Corynebacteria, 2 Streptomyces, Rhodococcus jostii RHA1, Nocardia farcinia, Acidothermus cellulolyticus, Rhodobacter sphaeroides, Propionibacterium acnes, and Bifidobacterium longum.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22452820 PMCID: PMC3388012 DOI: 10.1186/1471-2164-13-120
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Phylogenetic Tree based on uniform SYNERGY orthogroups, scaled by phylogenetic distance. The labels A-C indicate the branches selected for further analysis in our dN/dS analysis (A: The branch leading to the Mtb cluster; B: The branch leading to the pathogenic Mycobacteria; C: The branch leading to the non-pathogenic, soil-dwelling Mycobacteria). See Methods for details on the phylogenetic tree construction.
Summary of Organisms
| Organism | Patho-genic | Host required | Description | Reference |
|---|---|---|---|---|
| Y | Y | Causes TB; Laboratory strain | [ | |
| Y | Y | Causes TB; Avirulent sister strain to | [ | |
| Y | Y | Causes TB; isolated from TB patient in S. Africa | [ | |
| Y | Y | Causes bovine TB; attenuated vaccine strain | [ | |
| Y | Y | Causes bovine TB | [ | |
| Y | Y | Causes TB; MDR strain | [ | |
| Y | Y | Causes TB; isolated in NY City | [ | |
| Y | Y | Causes TB; highly contagious & virulent strain | [ | |
| Y | Causes Buruli ulcer | [ | ||
| Y | From fish; Skin lesions in human | [ | ||
| Y | Y | Causes leprosy | [ | |
| Y | Opportunistic pathogen; can causeTB-type pulmonary infection | [ | ||
| Y | Y | Causes paratuberculosis; obligate pathogen of cattle | [ | |
| Soil bacteria; degrades PAH | [ | |||
| Soil bacteria; degrades PAH | [ | |||
| Y | Widely used model for | [ | ||
| Soil bacteria; degrades PAH | [ | |||
| Soil bacteria; Degrades PAH + wide variety of organic compounds | [ | |||
| Y | Skin & soft tissue infections | [ | ||
| Soil bacteria important for biofuels research and bioremediation; degrades PCB + wide variety of organic compounds | [ | |||
| Y | Causes nocardiosis | [ | ||
| Produces amino acids (Glu) | [ | |||
| Produces amino acids (Glu) | [ | |||
| Y | Causes diphtheria | [ | ||
| Y | Causes nocosomial infections | [ | ||
| Soil bacteria;antibiotic-producing | [ | |||
| Soil bacteria;antibiotic producing | [ | |||
| Hot springs of Yellowstone | [ | |||
| Gram -, motile; photosyn.; fixes N2 | [ | |||
| Y | Causes acne | [ | ||
| Digestive track commensal; yogurt | [ | |||
Figure 2Summary of SYNERGY results: Number of gains, losses, and duplication at each node. For each node, the node number is marked in black; the total number of genes present at each node is indicated in red, and the numbers of gains, losses, and duplications are indicated in parenthesis in blue http://www.broadinstitute.org/ftp/pub/seq/msc/pub/SYNERGY/index.html.
Figure 3Evolution of gene categories. This figure shows several examples of the evolution of metabolic pathways, PFAM domains, and GO term descriptors. Graphics similar to these can be found for each category in the supplementary information website at http://www.broadinstitute.org/ftp/pub/seq/msc/pub/SYNERGY/index.html. a) Fatty acid degradation metabolic pathway. b) PFAM group PF00823 (PPE genes). c) PFAM group PF00440 (tetR family transcription factors). d) GO term 0009405 (pathogenesis.)
50 PFAM categories most expanded in the Mtb clade relative to the non-pathogenic, soil-dwelling Mycobacteria
| PFAM name | PFAM ID | p-valuea | inter-to-intra- |
|---|---|---|---|
| b PIN domain | PF01850 | 4.20E-09 | 1.10E+01 |
| GHMP kinases C terminal | PF08544 | 1.00E-08 | 8.80E+00 |
| DHHA1 domain | PF02272 | 2.10E-08 | 6.80E+00 |
| KGG Stress-induced bacterial acidophilic repeat motif | PF10685 | 6.70E-08 | 6.20E+00 |
| b Protein of unknown function (DUF1396) | PF07161 | 4.10E-07 | 9.20E+00 |
| b | PF07704 | 9.30E-07 | 1.00E+01 |
| Tetratricopeptide repeat | PF07720 | 9.90E-07 | 8.60E+00 |
| PA domain | PF02225 | 1.00E-06 | 6.40E+00 |
| c Patatin-like phospholipase | PF01734 | 1.20E-06 | 9.20E+00 |
| e Protein of unknown function (DUF1490) | PF07371 | 1.20E-06 | 1.00E+01 |
| 4 FAD binding domain | PF01565 | 1.30E-06 | 6.40E+00 |
| Fumarate reductase/succinate dehydrog. flavoprotein C-term domain | PF02910 | 1.50E-06 | 5.90E+00 |
| FIST C domain | PF10442 | 2.00E-06 | 1.20E+01 |
| Corticotropin ACTH domain | PF00976 | 2.30E-06 | 6.00E+00 |
| c Beta-ketoacyl synthase, N-terminal domain | PF00109 | 2.40E-06 | 5.00E+00 |
| IlvB leader peptide | PF08049 | 4.20E-06 | 3.50E+00 |
| c Beta-ketoacyl synthase, C-terminal domain | PF02801 | 4.50E-06 | 4.80E+00 |
| c Acyl transferase domain | PF00698 | 4.60E-06 | 5.20E+00 |
| b PPE family | PF00823 | 5.60E-06 | 1.00E+01 |
| b Proteins of 100 residues with WXG | PF06013 | 7.10E-06 | 7.90E+00 |
| b Phd YefM | PF02604 | 7.70E-06 | 9.00E+00 |
| Ponericin | PF07442 | 7.70E-06 | 8.70E+00 |
| b Plasmid stabilization system protein | PF05016 | 9.90E-06 | 9.30E+00 |
| Threonine leader peptide | PF08254 | 1.10E-05 | 8.90E+00 |
| Toxin 33 Waglerin family | PF08121 | 2.20E-05 | 3.20E+00 |
| Phosphatidylethanolamine-binding protein | PF01161 | 2.20E-05 | 7.30E+00 |
| b PemK-like protein | PF02452 | 3.10E-05 | 9.00E+00 |
| Erythronolide synthase docking | PF08990 | 5.60E-05 | 7.30E+00 |
| d Radical SAM superfamily | PF04055 | 5.80E-05 | 3.90E+00 |
| d ThiS family | PF02597 | 6.60E-05 | 5.60E+00 |
| b Pentapeptide repeats (8 copies) | PF01469 | 1.20E-04 | 1.00E+01 |
| Rubredoxin | PF00301 | 1.30E-04 | 3.80E+00 |
| d Pterin 4 alpha carbinolamine dehydratase | PF01329 | 1.50E-04 | 8.80E+00 |
| Leucine rich repeat N-terminal domain | PF01462 | 1.60E-04 | 1.00E+01 |
| e Domain of unknown function (DUF1610) | PF07754 | 2.10E-04 | 2.40E+00 |
| SEC-C motif | PF02810 | 2.30E-04 | 3.00E+00 |
| d MoaC family | PF01967 | 2.60E-04 | 7.10E+00 |
| Berberine and berberine like | PF08031 | 2.70E-04 | 9.30E+00 |
| Cytochrome B6-F complex subunit VI (PetL) | PF05115 | 2.80E-04 | 9.40E+00 |
| Region found in RelA/SpoT proteins | PF04607 | 3.00E-04 | 5.00E+00 |
| Quinolinate phosphoribosyl transferase, C-terminal domain | PF01729 | 3.30E-04 | 4.20E+00 |
| Fumarate reductase subunit C | PF02300 | 3.70E-04 | 1.00E+01 |
| LHC Antenna complex alpha/beta subunit | PF00556 | 5.00E-04 | 3.20E+00 |
| RNPHF zinc finger | PF08080 | 5.10E-04 | 6.50E+00 |
| Protein of unknown function (DUF1416) | PF07210 | 5.20E-04 | 9.60E+00 |
| PsbJ | PF01788 | 5.40E-04 | 3.70E+00 |
| Bacterial transferase hexapeptide (three repeats) | PF00132 | 5.50E-04 | 3.00E+00 |
| N Chalcone and stilbene synthases, N-terminal domain | PF00195 | 6.60E-04 | 8.40E+00 |
| d MoaE protein | PF02391 | 8.00E-04 | 7.80E+00 |
| b Protein of unknown function (DUF1066) | PF06359 | 8.30E-04 | 1.10E+01 |
a Bonferroni-corrected p-value calculated from T-test
b pathogenicity or survival within the host
c Lipid metabolism
d Pterin cofactor biosynthesis
e unknown function
The 50 GO terms most expanded in the Mtb clade relative to the non-pathogenic, soil-dwelling Mycobacteria
| GO descriptor | GO term ID | p-valuea | inter-to-intra centroid difference |
|---|---|---|---|
| 4-hydroxy-3-methylbut-2-en-1-yl diphosphate red. activity | GO_0051745 | 4.80E-08 | 7.20E+00 |
| dTMP biosynthetic process | GO_0006231 | 7.30E-07 | 4.40E+00 |
| response to cAMP | GO_0051591 | 1.40E-06 | 1.20E+01 |
| succinate dehydrogenase (ubiquinone) activity | GO_0008177 | 1.70E-06 | 5.10E+00 |
| iron ion transport | GO_0006826 | 2.70E-06 | 4.70E+00 |
| magnesium ion binding | GO_0000287 | 2.80E-06 | 1.70E+00 |
| c fatty-acyl-CoA synthase activity | GO_0004321 | 5.10E-06 | 6.50E+00 |
| c acyltransferase activity | GO_0008415 | 7.60E-06 | 3.60E+00 |
| transferase activity, transferring alkyl or aryl (other than methyl) groups | GO_0016765 | 1.30E-05 | 2.60E+00 |
| c tricarboxylic acid cycle | GO_0006099 | 1.30E-05 | 3.40E+00 |
| d Mo-molybdopterin cofactor biosynthetic process | GO_0006777 | 1.40E-05 | 6.30E+00 |
| integral to membrane | GO_0016021 | 2.00E-05 | 2.00E+00 |
| acid phosphatase activity | GO_0003993 | 2.50E-05 | 3.20E+00 |
| phosphatase activity | GO_0016791 | 3.20E-05 | 4.10E+00 |
| erythronolide synthase activity | GO_0047879 | 4.20E-05 | 8.20E+00 |
| d 4-alpha-hydroxytetrahydrobiopterin dehydratase activity | GO_0008124 | 6.80E-05 | 6.20E+00 |
| c lipid metabolic process | GO_0006629 | 6.80E-05 | 4.70E+00 |
| bacteriochlorophyll biosynthetic process | GO_0030494 | 7.00E-05 | 1.30E+00 |
| plasma membrane | GO_0005886 | 7.10E-05 | 2.70E+00 |
| d tetrahydrobiopterin biosynthetic process | GO_0006729 | 1.10E-04 | 8.80E+00 |
| c lipid biosynthetic process | GO_0008610 | 1.20E-04 | 3.50E+00 |
| phosphatidylcholine metabolic process | GO_0046470 | 1.60E-04 | 9.50E+00 |
| c geranyltranstransferase activity | GO_0004337 | 1.60E-04 | 6.80E+00 |
| cytoplasm | GO_0005737 | 1.90E-04 | 1.40E+00 |
| protein transport | GO_0015031 | 1.90E-04 | 1.70E+00 |
| guanosine tetraphosphate metabolic process | GO_0015969 | 2.20E-04 | 5.00E+00 |
| glyoxylate cycle | GO_0006097 | 2.20E-04 | 4.30E+00 |
| phosphoglycolate phosphatase activity | GO_0008967 | 2.80E-04 | 4.30E+00 |
| terpenoid biosynthetic process | GO_0016114 | 3.90E-04 | 2.80E+00 |
| sulfur metabolic process | GO_0006790 | 4.10E-04 | 5.30E+00 |
| 4 iron, 4 sulfur cluster binding | GO_0051539 | 5.00E-04 | 2.90E+00 |
| succinate dehydrogenase activity | GO_0000104 | 5.70E-04 | 4.60E+00 |
| b mycocerosate synthase activity | GO_0050111 | 5.80E-04 | 4.10E+00 |
| c phospholipid biosynthetic process | GO_0008654 | 6.10E-04 | 2.30E+00 |
| nucleoside metabolic process | GO_0009116 | 6.30E-04 | 3.60E+00 |
| c phosphopantetheine binding | GO_0031177 | 8.20E-04 | 3.00E+00 |
| adenylate cyclase activity | GO_0004016 | 8.30E-04 | 5.50E+00 |
| D-arabinono-1,4-lactone oxidase activity | GO_0003885 | 9.70E-04 | 8.40E+00 |
| anaerobic respiration | GO_0009061 | 9.90E-04 | 1.10E+01 |
| nodulation | GO_0009877 | 1.10E-03 | 7.10E+00 |
| c prenyltransferase activity | GO_0004659 | 1.10E-03 | 4.20E+00 |
| c lysophospholipase activity | GO_0004622 | 1.30E-03 | 8.50E+00 |
| c acetyl-CoA carboxylase activity | GO_0003989 | 1.30E-03 | 2.40E+00 |
| histidinol-phosphatase activity | GO_0004401 | 2.10E-03 | 6.50E+00 |
| pyridine nucleotide biosynthetic process | GO_0019363 | 2.30E-03 | 5.00E+00 |
| NAD biosynthetic process | GO_0009435 | 3.30E-03 | 1.30E+00 |
| lactate fermentation to propionate and acetate | GO_0019652 | 3.40E-03 | 3.40E+00 |
| alkylglycerone-phosphate synthase activity | GO_0008609 | 3.40E-03 | 7.10E+00 |
| b cyclopropane-fatty-acyl-phospholipid synthase activity | GO_0008825 | 4.00E-03 | 5.90E+00 |
| methylcrotonoyl-CoA carboxylase activity | GO_0004485 | 4.40E-03 | 3.00E+00 |
aBonferroni-corrected p-value calculated from T-test
b pathogenicity or survival within the host
c Lipid metabolism
d Cofactor biosynthesis
e unknown function
The 50 PFAM categories most expanded in the Mycobacteria relative to the non- Mycobacteria
| PFAM descriptor | PFAM ID | p-valuea | inter-to-intra centroid difference |
|---|---|---|---|
| e Protein of unknown function (DUF2599) | PF10783 | 1.50E-10 | 8.80E+00 |
| c Cutinase | PF01083 | 1.60E-10 | 1.50E+01 |
| e Uncharacterized protein conserved in bacteria (DUF2236) | PF09995 | 2.20E-10 | 1.60E+01 |
| c Lpp-LpqN Probable lipoprotein LpqN | PF10738 | 4.70E-10 | 1.30E+01 |
| e Domain of unknown function (DUF385) | PF04075 | 5.70E-10 | 1.30E+01 |
| b Domain of unk function DUF140 | PF02405 | 1.60E-09 | 1.40E+01 |
| Retinal pigment epithelial membrane protein | PF03055 | 8.50E-09 | 1.40E+01 |
| e Domain of unknown function (DUF427) | PF04248 | 1.40E-08 | 1.60E+01 |
| ABC transporter transmembrane region 2 PF06472 | PF06472 | 1.80E-08 | 1.10E+01 |
| b Peroxidase | PF00141 | 4.10E-08 | 1.20E+01 |
| b mce related protein | PF02470 | 1.30E-07 | 1.30E+01 |
| N O-methyltransferase N-terminus | PF02409 | 1.70E-07 | 1.70E+01 |
| Activator of Hsp90 ATPase homolog 1-like protein | PF08327 | 2.40E-07 | 1.10E+01 |
| Coronavirus nonstructural protein NS1 | PF06145 | 3.00E-07 | 1.20E+01 |
| e Predicted integral membrane protein (DUF2189) | PF09955 | 3.10E-07 | 1.40E+01 |
| e Uncharacterized protein family (UPF0089) | PF03007 | 3.80E-07 | 1.50E+01 |
| b Acetyltransf 2 N-acetyltransferase | PF00797 | 3.90E-07 | 1.70E+01 |
| e Domain of unknown function (DUF1957) | PF09210 | 5.50E-07 | 7.90E+00 |
| KRAB box | PF01352 | 7.00E-07 | 1.70E+01 |
| Prokaryotic acetaldehyde dehydrogenase, dimerisation | PF09290 | 8.10E-07 | 9.00E+00 |
| DmpG-like communication domain | PF07836 | 9.50E-07 | 1.10E+01 |
| Nuclear transport factor 2 (NTF2) domain | PF02136 | 1.00E-06 | 1.30E+01 |
| Wyosine base formation | PF08608 | 1.10E-06 | 1.40E+01 |
| AIG2-like family | PF06094 | 1.30E-06 | 1.80E+01 |
| e Protein of unknown function (DUF867) | PF05908 | 1.30E-06 | 1.50E+01 |
| Phage-related minor tail protein | PF10145 | 2.10E-06 | 1.20E+01 |
| c Fatty acid desaturase | PF03405 | 2.30E-06 | 1.00E+01 |
| PaaX-like protein | PF07848 | 2.80E-06 | 1.50E+01 |
| Adenylate and Guanylate cyclase catalytic domain | PF00211 | 3.70E-06 | 1.10E+01 |
| Fibronectin-attachment protein (FAP) | PF07174 | 3.80E-06 | 1.30E+01 |
| Leucine Rich Repeat | PF07723 | 5.50E-06 | 1.10E+01 |
| 2-nitropropane dioxygenase | PF03060 | 5.70E-06 | 1.40E+01 |
| c Fatty acid desaturase | PF00487 | 7.90E-06 | 1.30E+01 |
| e Protein of unknown function (DUF732) | PF05305 | 9.10E-06 | 1.50E+01 |
| c Enoyl-CoA hydratase/isomerase family | PF00378 | 9.70E-06 | 1.10E+01 |
| arg-2/CPA1 leader peptide | PF08252 | 1.00E-05 | 1.40E+01 |
| c alpha/beta hydrolase fold | PF07859 | 1.10E-05 | 1.50E+01 |
| Cytochrome P450 | PF00067 | 1.40E-05 | 1.20E+01 |
| c Cyclopropane-fatty-acyl-phospholipid synthase PF02353 | PF02353 | 1.60E-05 | 1.20E+01 |
| Isoprenylcysteine carboxyl methyltransferase (ICMT) family | PF04140 | 1.90E-05 | 1.50E+01 |
| Hydratase/decarboxylase | PF01689 | 2.50E-05 | 6.60E+00 |
| PsbJ | PF01788 | 3.00E-05 | 9.40E+00 |
| Linocin M18 bacteriocin protein | PF04454 | 3.40E-05 | 1.60E+01 |
| Extensin-like protein repeat | PF02095 | 4.00E-05 | 1.50E+01 |
| 5HT transporter Serotonin (5-HT) neurotransmitter transporter, N-terminus | PF03491 | 4.00E-05 | 9.50E+00 |
| e Protein of unknown function (DUF571) | PF04600 | 4.20E-05 | 7.70E+00 |
| Tryptophyllin-3 skin active peptide | PF08248 | 4.90E-05 | 1.50E+01 |
| AMP-binding enzyme | PF00501 | 8.10E-05 | 9.20E+00 |
| e Bacterial protein of unknown function (DUF853) | PF05872 | 9.90E-05 | 1.30E+01 |
| c Acyl-ACP thioesterase | PF01643 | 1.10E-04 | 1.10E+01 |
aBonferroni-corrected p-value calculated from T-test
b pathogenicity or survival within the host
c Lipid metabolism
d Cofactor biosynthesis
e unknown function
The 50 GO terms most expanded in the Mycobacteria relative to the non- Mycobacteria
| GO term descriptor | GO term ID | p-valuea | inter-to-intra-centroid difference |
|---|---|---|---|
| c sterol biosynthetic process | GO:0016126 | 1.00E-10 | 1.80E+01 |
| b regulation of apoptosis | GO:0042981 | 1.10E-10 | 1.90E+01 |
| c 3alpha,7alpha,12alpha-trihydroxy-5beta- cholest-24-enoyl-CoA hydratase activity | GO:0033989 | 1.40E-10 | 1.70E+01 |
| c linalool 8-monooxygenase activity | GO:0050056 | 1.50E-10 | 1.70E+01 |
| c sterol 14-demethylase activity | GO:0008398 | 5.10E-09 | 1.60E+01 |
| c cutinase activity | GO:0050525 | 5.80E-09 | 1.60E+01 |
| oxidoreductase activity, acting on NADH or NADPH, nitrogenous group as acceptor | GO:0016657 | 1.80E-08 | 1.40E+01 |
| c diacylglycerol O-acyltransferase activity | GO:0004144 | 2.10E-08 | 1.70E+01 |
| ligase activity, forming carbon-carbon bonds | GO:0016885 | 6.10E-08 | 1.40E+01 |
| indolylacetylinositol arabinosyltransferase activity | GO:0050409 | 7.90E-08 | 1.20E+01 |
| c 4-hydroxy-2-oxovalerate aldolase activity | GO:0008701 | 1.00E-07 | 1.10E+01 |
| arylamine N-acetyltransferase activity | GO:0004060 | 2.80E-07 | 1.70E+01 |
| c lipid transport | GO:0006869 | 5.00E-07 | 1.70E+01 |
| c lipid biosynthetic process | GO:0008610 | 5.00E-07 | 8.50E+00 |
| biphenyl-2,3-diol 1,2-dioxygenase activity | GO:0018583 | 1.40E-06 | 9.50E+00 |
| cis-stilbene-oxide hydrolase activity | GO:0033961 | 1.50E-06 | 1.20E+01 |
| c acyl-[acyl-carrier-protein] desaturase activity | GO:0045300 | 1.70E-06 | 1.00E+01 |
| 5-carboxymethyl-2-hydroxymuconic-semialdehyde dehydrog activity | GO:0018480 | 2.00E-06 | 1.20E+01 |
| c fatty-acyl-CoA synthase activity | GO:0004321 | 2.10E-06 | 1.20E+01 |
| c steroid biosynthetic process | GO:0006694 | 2.30E-06 | 1.20E+01 |
| c propanoyl-CoA C-acyltransferase activity | GO:0033814 | 2.50E-06 | 1.20E+01 |
| extracellular matrix binding | GO:0050840 | 2.70E-06 | 1.30E+01 |
| lipid glycosylation | GO:0030259 | 6.70E-06 | 5.60E+00 |
| d coenzyme F420-dependent N5, N10-methenyltetrahydromethanopterinreductase activity | GO:0018537 | 6.90E-06 | 1.20E+01 |
| C-terminal protein amino acid methylation | GO:0006481 | 1.10E-05 | 1.10E+01 |
| metabolic process | GO:0008152 | 1.30E-05 | 4.60E+00 |
| 4-oxalocrotonate decarboxylase activity | GO:0047437 | 1.70E-05 | 1.10E+01 |
| oxidoreductase activity | GO:0016491 | 1.80E-05 | 5.20E+00 |
| oxidation reduction | GO:0055114 | 1.90E-05 | 4.80E+00 |
| defense response to bacterium | GO:0042742 | 2.40E-05 | 1.60E+01 |
| b cyclopropane-fatty-acyl-phospholipid synthase activity | GO:0008825 | 2.90E-05 | 9.80E+00 |
| c 3-hydroxy-2-methylbutyryl-CoA dehyd. activity | GO:0047015 | 3.40E-05 | 1.10E+01 |
| nutrient reservoir activity | GO:0045735 | 3.40E-05 | 9.30E+00 |
| structural constituent of cell wall | GO:0005199 | 3.60E-05 | 7.90E+00 |
| 2-nitropropane dioxygenase activity | GO:0018580 | 4.10E-05 | 1.40E+01 |
| adenylate cyclase activity | GO:0004016 | 6.00E-05 | 9.50E+00 |
| b beta-lactam antibiotic catabolic process | GO:0030655 | 6.70E-05 | 1.40E+01 |
| DNA primase activity | GO:0003896 | 8.40E-05 | 1.10E+01 |
| cyclic nucleotide biosynthetic process | GO:0009190 | 8.50E-05 | 9.80E+00 |
| iron ion transport | GO:0006826 | 8.70E-05 | 1.20E+01 |
| di-, tri-valent inorganic cation transmembrane transporter activity | GO:0015082 | 9.00E-05 | 6.80E+00 |
| phosphorus-oxygen lyase activity | GO:0016849 | 1.60E-04 | 9.50E+00 |
| limonene-1,2-epoxide hydrolase activity | GO:0018744 | 1.60E-04 | 7.80E+00 |
| c fatty acid metabolic process | GO:0006631 | 1.70E-04 | 8.40E+00 |
| sirohydrochlorin cobaltochelatase activity | GO:0016852 | 1.80E-04 | 1.50E+01 |
| intracellular signaling cascade | GO:0007242 | 2.00E-04 | 8.80E+00 |
| c enoyl-CoA hydratase activity | GO:0004300 | 2.30E-04 | 1.30E+01 |
| di-, tri-valent inorganic cation transport | GO:0015674 | 2.40E-04 | 6.20E+00 |
| c acyl-CoA dehydrogenase activity | GO:0003995 | 2.60E-04 | 9.10E+00 |
| catechol O-methyltransferase activity | GO:0016206 | 3.20E-04 | 1.50E+01 |
aBonferroni-corrected p-value calculated from T-test
b pathogenicity or survival within the host
c Lipid metabolism
d Cofactor biosynthesis
e unknown function
Figure 4Evolution of genes upregulated when grown on saturated or unsaturated fatty acids. Genes upregulated by at least 1.5 standard deviations are indicated here. a) Genes expressed under palmitic acid but not oleic or linoleic (genes expressed in saturated fatty acid conditions). b) Genes expressed under linoleic or oleic acid but not palmitic (genes expressed under unsaturated fatty acid conditions). c) The ratio of the phylogenetic profiles for genes expressed under palmitic and linoleic acid, normalized by genome size.
Figure 5New predicted RNAs. a) An example of a new predicted RNA. This is the RNA2 in Table 6. This figure shows a screenshot from the GenomeView Browser [64]. The light blue bars show the coding regions (Rv1230c and Rv1231); the tan bar shows the conserved region predicted by Gumby [65]; and the green bar shows the region predicted to fold by Evofold [66]. The yellow and green plots in the center show the RNA-seq data. Green signifies reads from the negative strand, and yellow shows the total reads (positive and negative strands). The multiple alignment is shown on the bottom (darker grey signifies a higher degree of conservation; red signifies no alignment at that position). You can see that this predicted RNA region is conserved through M. avium. The rulers at the top show the gene structure. Small red squares show where stop codons are present all six reading frames, indicating that this intergenic region is unlikely to be a protein-coding region missed in the annotation. b) Northern blots validating four of the new, predicted small RNAs (RNA1, RNA2, RNA3, and RNA9 in Table 6).
Top 12 predicted RNAs, ranked by their RPKM score
| Conserved region in | Region in | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| RNA14 | 1612987 | --+ | 58 | 1013 | 69526 | Y | 756567 | 756626 | 100 | 626 | ||
| RNA24 | 1374224 | --- | 46 | 567 | 30071 | Y | - | - | - | |||
| RNA34 | 1393055 | --- | 85 | 111 | 9251 | Y | 5147110 | 5147242 | 173 | 489 | ||
| RNA4 | 483829 | -++ | 49 | 55 | 2139 | Y | - | - | - | |||
| RNA5 | 1200514 | -++ | 93 | 81 | 2055 | Y | 5363400 | 5363474 | 10 | 50 | ||
| RNA6 | 987053 | -++ | 92 | 174 | 6767 | Y | - | - | - | |||
| RNA7 | 1810184 | +++ | 44 | 517 | 9280 | Y | 3296996 | 3297034 | 291 | 2804 | ||
| RNA8 | 3587635 | -++ | 56 | 83 | 6917 | N | 2009575 | 2009665 | 6 | 25 | ||
| RNA94 | 4224925 | --+ | 41 | 502 | 24405 | N | 6420618 | 6420674 | 36 | 237 | ||
| RNA10 | 659351 | +++ | 39 | 58 | 1829 | Y | - | - | - | |||
| RNA11 | 1794708 | -++ | 48 | 375 | 18231 | Y | 3277552 | 3277599 | 70 | 548 | ||
| RNA12 | 2447526 | --- | 74 | 332 | 9684 | Y | 4335086 | 4335137 | 111 | 802 | ||
1Conserved intergenic regions determined by Gumby.
2Indicates whether this region is predicted to fold by Evofold.
3Region in M. smegmatis that aligns with the conserved region in Mtb, and its corresponding RPKM value.
4Tested experimentally
5Orientation relative to neighboring genes. The first and last characters give the strands of the flanking genes; the middle character gives the strand for the predicted RNA.
Motifs passing our set of stringent filters, ranked by their degree of conservation
| ka | Cluster # | Motif | MAP scoreb | Specificityc | Knownd | Palind-romicitye | Conser-vationf | Motif Logog |
|---|---|---|---|---|---|---|---|---|
| 100 | 71 | 1 | 65.6 | 1.8E-29 | KstR | 0.93 | 46 | |
| 250 | 179 | 1 | 88.6 | 4.8E-26 | KstR | 0.71 | 43 | |
| 50 | 35 | 1 | 87.4 | 1.8E-19 | KstR | 0.79 | 40 | |
| 200 | 182 | 1 | 94.2 | 5.7E-28 | KstR | 0.80 | 36 | |
| 100 | 73 | 2 | 31.3 | 1.5E-32 | IdeR | 0.85 | 30 | |
| 100 | 49 | 15 | 16.3 | 2.8E-18 | KstR | 0.71 | 29 | |
| 250 | 112 | 1 | 25.5 | 2.1E-25 | DosR | 0.76 | 21 | |
| 100 | 80 | 1 | 15.0 | 2.2E-14 | 0.92 | 21 | ||
| 50 | 29 | 1 | 15.0 | 2.2E-14 | 0.92 | 21 | ||
| 50 | 23 | 31 | 21.7 | 6.7E-24 | 0.71 | 18 | ||
| 200 | 47 | 64 | 7.2 | 1.2E-15 | 0.75 | 16 | ||
| 250 | 87 | 1 | 22.1 | 1.5E-13 | ZurB | 0.94 | 16 | |
| 200 | 6 | 1 | 16.3 | 2.0E-12 | IdeR | 0.70 | 16 | |
| 200 | 184 | 19 | 12.1 | 4.7E-13 | 0.75 | 16 | ||
| 250 | 123 | 18 | 14.5 | 3.7E-11 | 0.82 | 16 | ||
| 250 | 224 | 1 | 26.9 | 6.2E-24 | DosR | 0.73 | 15 | |
| 200 | 46 | 3 | 19.8 | 1.6E-12 | 0.78 | 15 | ||
| 100 | 5 | 25 | 12.4 | 3.8E-15 | 0.71 | 14 | ||
| 200 | 120 | 62 | 15.3 | 2.0E-14 | 0.74 | 14 | ||
| 200 | 71 | 9 | 16.6 | 2.3E-16 | 0.75 | 14 | ||
| 200 | 195 | 1 | 25.4 | 7.2E-26 | DosR | 0.76 | 14 | |
| 50 | 48 | 1 | 68.8 | 6.4E-35 | DosR | 0.86 | 13 | |
| 100 | 74 | 1 | 66.9 | 3.6E-34 | DosR | 0.88 | 13 | |
| 50 | 23 | 12 | 26.8 | 5.0E-22 | 0.73 | 13 | ||
| 250 | 89 | 1 | 46.636 | 4.0E-20 | 0.75 | 12 | ||
| 100 | 81 | 84 | 6.4 | 2.3E-16 | 0.71 | 12 | ||
| 250 | 92 | 6 | 5.7 | 3.7E-17 | DosR | 0.73 | 12 | |
| 100 | 52 | 91 | 9.7 | 1.5E-18 | 0.76 | 11 | ||
| 200 | 91 | 1 | 47.1 | 4.7E-20 | 0.70 | 10 | ||
| 50 | 36 | 43 | 34.8 | 4.6E-30 | 0.72 | 10 | ||
| 100 | 42 | 4 | 15.3 | 2.8E-17 | 0.70 | 10 | ||
| 200 | 80 | 9 | 8.5 | 4.0E-13 | 0.72 | 10 | ||
| 50 | 43 | 7 | 15.7 | 8.7E-12 | 0.72 | 10 | ||
| 50 | 36 | 21 | 48.5 | 5.3E-12 | 0.71 | 10 | ||
| 200 | 26 | 17 | 7.5 | 2.1E-12 | 0.70 | 9 | ||
| 50 | 6 | 61 | 6.3 | 1.2E-11 | 0.73 | 9 | ||
| 50 | 11 | 43 | 13.7 | 1.4E-11 | 0.72 | 14 | ||
ak indicates the value of k in the k-means clustering process (50, 100, 200, or 250)
bMAP score indicates the AlignACE MAP score [76]
cSpecificity score [77]
dCompareACE score ≥ 0.7 to the alignment for this known motif
eCompareACE score to its reverse complement
fnumber of ScanACE hits in the genome that are conserved in ≥ 8 genomes
gsequence logo [78]
Figure 6New predicted motif with binding sites upstream of fatty acid-related genes. a) Motif logo. b) Conserved binding site locations for this new motif are marked with red x's. Red lines indicate orthologous relationships between genes in Mtb H37Rv, M. avium 104, and M. smegmatis.