Literature DB >> 25873371

Phylogenomic identification of regulatory sequences in bacteria: an analysis of statistical power and an application to Borrelia burgdorferi sensu lato.

Che L Martin, Che I Martin1, Tika Y Sukarna1, Saymon Akther2, Girish Ramrattan2, Pedro Pagan2, Lia Di2, Emmanuel F Mongodin3, Claire M Fraser3, Steven E Schutzer4, Benjamin J Luft5, Sherwood R Casjens6, Wei-Gang Qiu7.   

Abstract

UNLABELLED: Phylogenomic footprinting is an approach for ab initio identification of genome-wide regulatory elements in bacterial species based on sequence conservation. The statistical power of the phylogenomic approach depends on the degree of sequence conservation, the length of regulatory elements, and the level of phylogenetic divergence among genomes. Building on an earlier model, we propose a binomial model that uses synonymous tree lengths as neutral expectations for determining the statistical significance of conserved intergenic spacer (IGS) sequences. Simulations show that the binomial model is robust to variations in the value of evolutionary parameters, including base frequencies and the transition-to-transversion ratio. We used the model to search for regulatory sequences in the Lyme disease species group (Borrelia burgdorferi sensu lato) using 23 genomes. The model indicates that the currently available set of Borrelia genomes would not yield regulatory sequences shorter than five bases, suggesting that genome sequences of additional B. burgdorferi sensu lato species are needed. Nevertheless, we show that previously known regulatory elements are indeed strongly conserved in sequence or structure across these Borrelia species. Further, we predict with sufficient confidence two new RpoS binding sites, 39 promoters, 19 transcription terminators, 28 noncoding RNAs, and four sets of coregulated genes. These putative cis- and trans-regulatory elements suggest novel, Borrelia-specific mechanisms regulating the transition between the tick and host environments, a key adaptation and virulence mechanism of B. burgdorferi. Alignments of IGS sequences are available on BorreliaBase.org, an online database of orthologous open reading frame (ORF) and IGS sequences in Borrelia. IMPORTANCE: While bacterial genomes contain mostly protein-coding genes, they also house DNA sequences regulating the expression of these genes. Gene regulatory sequences tend to be conserved during evolution. By sequencing and comparing related genomes, one can therefore identify regulatory sequences in bacteria based on sequence conservation. Here, we describe a statistical framework by which one may determine how many genomes need to be sequenced and at what level of evolutionary relatedness in order to achieve a high level of statistical significance. We applied the framework to Borrelia burgdorferi, the Lyme disease agent, and identified a large number of candidate regulatory sequences, many of which are known to be involved in regulating the phase transition between the tick vector and mammalian hosts.
Copyright © 2015 Martin et al.

Entities:  

Mesh:

Substances:

Year:  2015        PMID: 25873371      PMCID: PMC4453575          DOI: 10.1128/mBio.00011-15

Source DB:  PubMed          Journal:  MBio            Impact factor:   7.867


INTRODUCTION

A major rationale for sequencing a large number of closely related genomes is to identify candidate gene-regulatory elements and networks based on the observation that functional elements tend to be conserved in DNA sequences between as well as within genomes (1–3). Such evolutionary approaches, which may be called phylogenomic footprinting (4), are relatively cost-effective and have been successfully used in revealing candidate regulatory elements in humans (5, 6), Drosophila species (7, 8), and yeasts (9, 10). The evolutionary approach is especially valuable for non-model bacterial species for which a method of experimental and genetic manipulations is limited or nonexistent (11). Borrelia burgdorferi sensu lato, a non-model bacterial species group of Gram-negative spirochetes, consists of at least 18 named and putative species (12, 13). Several species of this complex are causative agents of Lyme disease, a tick-borne infectious disease that is increasing in prevalence throughout North America, Europe, and East Asia (13, 14). Three species, Borrelia garinii, Borrelia afzelii, and B. burgdorferi sensu stricto, cause the majority of Lyme disease worldwide. In North America, Lyme disease is predominantly caused by B. burgdorferi sensu stricto. At least 20 evolutionary lineages of B. burgdorferi sensu stricto exist in Europe and North America, some of which are more likely than others to cause disseminated Lyme disease in humans (15–17). As an obligate parasite, B. burgdorferi must survive in two physiologically distinct environments between the tick and its vertebrate host for its maintenance in nature, and hence elaborative mechanisms for regulating levels of gene expression during such phase transitions have evolved (18–20). Over 100 genes (~10%) in the B. burgdorferi genome are differentially expressed during the transition between the tick and mammalian phases (21–23). RpoS (σs), an alternative sigma factor, appears to be a main transcriptional control mechanism regulating the tick-mammal transitions via the Rrp2-RpoN-RpoS gene regulatory pathway (19, 22, 23). For example, the Rrp2-RpoN-RpoS pathway is activated during tick feeding, leading to the upregulation of mammalian phase lipoprotein genes (e.g., ospC [encoding outer surface protein C] and dbpAB [encoding decorin-binding proteins A and B] operon) and the simultaneous downregulation of tick phase genes (e.g., ospA [encoding outer surface protein A]) (18, 19, 24–26). Five genes (ospC, dbpA, oppA5, bba66, and bba07) have been identified to contain a consensus RpoS-dependent promoter sequence (27). Additional gene-regulatory pathways important for B. burgdorferi sensu lato pathogenesis are beginning to be understood, such as post-transcriptional control with small RNAs, genes targeting the host complement systems, and genes responsible for its persistent infection in hosts (18, 19, 28). In spite of these new findings, the majority of downstream targets of key gene regulatory mechanisms, including the Rrp2-RpoN-RpoS pathway, remain to be identified (29). Much of the knowledge about Borrelia gene regulation, e.g., the discovery of the RpoN-RpoS pathway, benefited from prior studies of homologous proteins in model organisms, such as Escherichia coli (19). Recently, we sequenced the genomes of 13 strains of B. burgdorferi sensu stricto and nine strains of other B. burgdorferi sensu lato species, bringing the total number of completed or draft B. burgdorferi sensu lato genomes to at least 24 (30, 31). These genomes make it possible to use phylogenomic footprinting for ab initio discovery of Borrelia-specific gene-regulatory elements and networks that may not exist in other bacterial groups. Previously, five putative noncoding RNAs (ncRNAs) have been identified based on a comparison of three genome sequences (32). Five additional candidate ncRNAs on lp54 and cp26, the two constitutive plasmids, have been identified using these B. burgdorferi sensu lato genomes (33). Here, we describe the results of a more comprehensive and systemic search for highly conserved putative regulatory genomic elements in the core B. burgdorferi sensu lato genome. In addition, we propose a statistical framework for guiding the search for candidate functional elements using phylogenomic footprinting in Borrelia or other bacterial groups.

RESULTS AND DISCUSSION

Genome sequences. (i) Genomes and orthologous ORFs.

We and other groups have sequenced and released the genome sequences of 23 B. burgdorferi sensu lato strains isolated from North America and Europe encompassing eight B. burgdorferi sensu lato species (see Table S1 in the supplemental material). The present study is based on the genomic sequences of the three universally present replicons, including the cp26 and lp54 plasmids and the main chromosome. We have previously identified, by using automated homology searches and manual synteny analysis, 837 orthologous open reading frame (ORF) families, including 750 on the main chromosome, 26 on the cp26 plasmid, and 62 on the lp54 plasmid (30, 34).

(ii) Orthologous IGS families.

After identifying consensus start codon positions for orthologous ORF families (see Materials and Methods), discarding short (<150-base) predicted ORFs, and filtering out short (<30-base) intergenic spacer (IGS) sequences and IGS sequences not present in seven or more sequenced B. burgdorferi sensu lato species, the final data set for all subsequent analysis consists of 17 orthologous IGS families on the cp26 plasmid, 26 orthologous IGS families on the lp54 plasmid, and 203 orthologous IGS families on the main chromosome (Table 1).
TABLE 1 

Orthologous ORFs and IGSs

CharacteristicValue
Main chromosomelp54cp26
No. of orthologous ORF families7506226
Synonymous tree length (TS)a1.49651.89591.7931
Nonsynonymous tree length (TN)a0.10920.36840.1925
Ratio (TN/TS)0.072970.19440.1074
No. of orthologous IGS familiesb2032717
No. of convergent IGSs (no. conservedc; %)31 (6d; 19.4)3 (0; 0)3 (0; 0)
No. of tandem IGSs (no. conservedc; %)109 (41e; 37.6)19 (8g; 42.1)8 (0; 0)
No. of divergent IGSs (conservedc; %)63 (40f; 63.5)5 (0; 0)(3h; 50)

Median values obtained by PAML (35) among 23, 41, and 327 orthologous ORF families on cp26, lp54, and the main chromosome, respectively. The total number of sequences in individual ORF families is 22 for those on the main chromosome and 23 for those on lp54 and cp26.

Includes only IGSs with an alignment length of 30 bases or more.

With nucleotide substitution rates obtained by Rates4site (63) significantly lower (P < 0.001 by t test) than those of flanking third-codon sites.

bb0004-bb0005, bb0364-bb0365, bb0459-bb0460, bb0536-bb0537, bb0688-bb0689, bb0758-bb0759.

bb0034-bb0035, bb0057-bb0058, bb0089-bb0090, bb0146-bb0147, bb0163-bb0164, bb0172-bb0173, bb0208-bb0209, bb0219-bb0220, bb0247-bb0248, bb0250-bb0251, bb0255-bb0256, bb0278-bb0279, bb0328-bb0329, bb0339-bb0340, bb0347-bb0348, bb0380-bb0381, bb0381-bb0382, bb0389-bb0390, bb0390-bb0391, bb0430-bb0431, bb0434-bb0435, bb0539-bb0540, bb0542-bb0543, bb0567-bb0568, bb0584-bb0585, bb0603-bb0604, bb0608-bb0610, bb0642-bb0643, bb0647-bb0648, bb0671-bb0672, bb0679-bb0680, bb0693-bb0694, bb0715-bb0716, bb0726-bb0727, bb0744-bb0745, bb0755-bb0756, bb0770-bb0771, bb0773-bb0774, bb0776-bb0777, bb0808-bb0809, bb0830-bb0831.

bb0007-bb0008, bb0023-bb0024, bb0045-bb0046, bb0100-bb0101, bb0133-bb0134, bb0135-bb0136, bb0154-bb0155, bb0190-bb0192, bb0201-bb0202, bb0214-bb0215, bb0226-bb0227, bb0236-bb0237, bb0253-bb0254, bb0313-bb0314, bb0336-bb0337, bb0346-bb0347, bb0365-bb0366, bb0373-bb0374, bb0400-bb0401, bb0436-bb0437, bb0454-bb0455, bb0457-bb0458, bb0460-bb0461, bb0507-bb0508, bb0560-bb0561, bb0571-bb0572, bb0596-bb0597, bb0598-bb0599, bb0620-bb0621, bb0623-bb0624, bb0629-bb0630, bb0655-bb0656, bb0706-bb0707, bb0723-bb0724, bb0734-bb0735, bb0748-bb0749, bb0760-bb0761, bb0812-bb0814, bb0828-bb0829, bb0835-bb0836.

bba14-bba15, bba16-bba18, bba21-bba23, bba24-bba25, bba39-bba40, bba51-bba52, bba64-bba65, bba65-bba66.

bbb08-bbb09, bbb25-bbb26, bbb27-bbb28.

Orthologous ORFs and IGSs Median values obtained by PAML (35) among 23, 41, and 327 orthologous ORF families on cp26, lp54, and the main chromosome, respectively. The total number of sequences in individual ORF families is 22 for those on the main chromosome and 23 for those on lp54 and cp26. Includes only IGSs with an alignment length of 30 bases or more. With nucleotide substitution rates obtained by Rates4site (63) significantly lower (P < 0.001 by t test) than those of flanking third-codon sites. bb0004-bb0005, bb0364-bb0365, bb0459-bb0460, bb0536-bb0537, bb0688-bb0689, bb0758-bb0759. bb0034-bb0035, bb0057-bb0058, bb0089-bb0090, bb0146-bb0147, bb0163-bb0164, bb0172-bb0173, bb0208-bb0209, bb0219-bb0220, bb0247-bb0248, bb0250-bb0251, bb0255-bb0256, bb0278-bb0279, bb0328-bb0329, bb0339-bb0340, bb0347-bb0348, bb0380-bb0381, bb0381-bb0382, bb0389-bb0390, bb0390-bb0391, bb0430-bb0431, bb0434-bb0435, bb0539-bb0540, bb0542-bb0543, bb0567-bb0568, bb0584-bb0585, bb0603-bb0604, bb0608-bb0610, bb0642-bb0643, bb0647-bb0648, bb0671-bb0672, bb0679-bb0680, bb0693-bb0694, bb0715-bb0716, bb0726-bb0727, bb0744-bb0745, bb0755-bb0756, bb0770-bb0771, bb0773-bb0774, bb0776-bb0777, bb0808-bb0809, bb0830-bb0831. bb0007-bb0008, bb0023-bb0024, bb0045-bb0046, bb0100-bb0101, bb0133-bb0134, bb0135-bb0136, bb0154-bb0155, bb0190-bb0192, bb0201-bb0202, bb0214-bb0215, bb0226-bb0227, bb0236-bb0237, bb0253-bb0254, bb0313-bb0314, bb0336-bb0337, bb0346-bb0347, bb0365-bb0366, bb0373-bb0374, bb0400-bb0401, bb0436-bb0437, bb0454-bb0455, bb0457-bb0458, bb0460-bb0461, bb0507-bb0508, bb0560-bb0561, bb0571-bb0572, bb0596-bb0597, bb0598-bb0599, bb0620-bb0621, bb0623-bb0624, bb0629-bb0630, bb0655-bb0656, bb0706-bb0707, bb0723-bb0724, bb0734-bb0735, bb0748-bb0749, bb0760-bb0761, bb0812-bb0814, bb0828-bb0829, bb0835-bb0836. bba14-bba15, bba16-bba18, bba21-bba23, bba24-bba25, bba39-bba40, bba51-bba52, bba64-bba65, bba65-bba66. bbb08-bbb09, bbb25-bbb26, bbb27-bbb28.

Power analysis. (i) Synonymous tree lengths.

Synonymous tree lengths (T) of ORFs are the key parameter for determining the statistical significance of sequence variability of flanking IGSs (see Materials and Methods, equations 1 and 2). T and nonsynonymous tree lengths (T) were obtained for 23, 41, and 327 IGS-flanking ORF families on cp26, lp54, and the main chromosome, respectively. Median tree length values are listed in Table 1, since the tree lengths are not normally distributed and many outliers exist. The outliers include high T values at 0256 (rpsU, encoding ribosomal protein S21), at b10, b12, and b13 (three plasmid-partitioning genes on cp26), and at b19 (ospC) and high T values at a24 (dbpA) and b19 (ospC). An earlier study using between-species comparisons found a similar group of outliers (33). The smaller T values of chromosomal ORFs than those of plasmid-borne ORFs have more to do with higher effective recombination rates caused by diversifying natural selection on the plasmids than with the unsequenced chromosome of strain 297 (34). The T/T ratios indicate that ORFs on the main chromosome and cp26 are about twice as conserved as ORFs on lp54, consistent with a previous study based on pairwise comparisons between B31 and another strain (30).

(ii) Plasmid-borne elements are more easily resolved.

Using the median T value of 1.5 (Table 1) as the expected number of neutral substitutions per site for IGSs on the main chromosome during the evolutionary diversification among the 22 Borrelia genomes, levels of statistical significance of an IGS segment (with a length [L] of 5, 10, 15, or 20 bases) showing n = 0 to 10 substitutions are plotted according to equation 1 in Materials and Methods (Fig. 1A). These results show that one may not expect comparative analysis of these genomes to reveal functional chromosomal IGS elements shorter than five bases (Fig. 1, the “L=5” line). Regulatory sequences with a length of 10 bases having 0 to 3 variable sites would be marginally significant (Fig. 1, the “L=10” line). With a higher median T value of 1.85 for sequences on the plasmids, shorter and more variable regulatory sequences could be significantly detected using these Borrelia genomes (Fig. 1B).
FIG 1 

Statistical power of phylogenomic footprinting. (A and B) Each data point represents the probability (y axis, in −log10) of an L-mer IGS segment having n substitutions (x axis) after evolving with an expected neutral distance of T0. These probabilities were calculated according to equation 1 in Materials and Methods and obtained using the R function pbinom (58). (A) Probabilities for IGSs on the main chromosome, with the neutral distance T0 approximated by T = 1.5 substitutions/site (Table 2); (B) probabilities for IGSs on the plasmids, with the neutral distance T0 approximated by T = 1.85 substitutions/site (Table 2). These two plots show that the statistical power of identifying regulatory elements using phylogenomic footprinting increases with the length of the element (L), the degree of its sequence conservation (n), and the total neutral divergence among the genomes (T). (C) Phylogenetic tree of neutrally evolved IGS sequences (each 10,199 bp long) simulated by Evolver (35) with parameters taken from a typical plasmid-borne gene (a39, with T = 1.85, %GC = 21.3%, and a transition-to-transversion ratio of 3.66). (D) Sensitivity of statistical power (y axis, calculated by equation 2 in Materials and Methods) to phylogenetic diversity (x axis, measured by T). Vertical gray lines indicate subtree distances from B31 up to a labeled strain.

Statistical power of phylogenomic footprinting. (A and B) Each data point represents the probability (y axis, in −log10) of an L-mer IGS segment having n substitutions (x axis) after evolving with an expected neutral distance of T0. These probabilities were calculated according to equation 1 in Materials and Methods and obtained using the R function pbinom (58). (A) Probabilities for IGSs on the main chromosome, with the neutral distance T0 approximated by T = 1.5 substitutions/site (Table 2); (B) probabilities for IGSs on the plasmids, with the neutral distance T0 approximated by T = 1.85 substitutions/site (Table 2). These two plots show that the statistical power of identifying regulatory elements using phylogenomic footprinting increases with the length of the element (L), the degree of its sequence conservation (n), and the total neutral divergence among the genomes (T). (C) Phylogenetic tree of neutrally evolved IGS sequences (each 10,199 bp long) simulated by Evolver (35) with parameters taken from a typical plasmid-borne gene (a39, with T = 1.85, %GC = 21.3%, and a transition-to-transversion ratio of 3.66). (D) Sensitivity of statistical power (y axis, calculated by equation 2 in Materials and Methods) to phylogenetic diversity (x axis, measured by T). Vertical gray lines indicate subtree distances from B31 up to a labeled strain.
TABLE 2 

Predicted regulatory elements

IGSaOrientationncRNAbPromotorcTerminatord
a01-a03Tandem+
a03-a04Convergent+
a05-a07Tandem++
a07-a08Divergent+
a14-a15Tandem+
a16-a18Tandem+++
a21-a23Tandem+
a25-a30Divergent+
a34-a36Divergent++
a37-a38Tandem++
a61-a62Divergent+
a62-a64Convergent+
a64-a65Tandem+
a73-a74Divergent+
b03-b04Tandem++
b04-b05Divergent+++
B12-B13Tandem+
B13-B14Convergent+
b16-b17Convergent++
B18-b19Divergent+++
b19-b22Convergent++(2)
b28-b29Tandem++
b29-b01Tandem++
0089-0090Tandem+
0100-0101Divergent+
0103-0104Tandem+
0135-0136Divergent+
0146-0147Tandem+
0166-0167Divergent+
0190-0192Divergent+
0195-0196Divergent+
0214-0215Divergent+
0236-0237Divergent+
0239-0240Divergent+
0243-0244Convergent+
0247-0248Tandem+
0253-0254Divergent+
0327-0328Divergent+
0346-0347Divergent++
0364-0365Convergent+
0384-0385Tandem+
0385-0386Tandem+
0408-0409Divergent+
0421-0422Tandem++
0434-0435Tandem+
0436-0437Divergent++
0437-0438Tandem+
0460-0461Divergent++
0472-0473Tandem+
0536-0537Convergent+
0543-0544Tandem+
0551-0552Divergent+
0571-0572Divergent+
0574-0575Tandem+
0577-0578eTandem+
0596-0597Divergent+
0602-0603Convergent+
0603-0604Tandem+++
0608-0610Tandem+
0620-0621Divergent+
0676-0677Divergent+
0723-0724Divergent+
0744-0745Tandem+
0772-0773Tandem+
0775-0776Divergent+
0828-0829Divergent+

Including IGSs on chromosome, lp54, and cp26 that are ≥30 nucleotides and present in at least seven of the eight B. burgdorferi sensu lato species.

Presence (+; n = 28) of a conserved RNA structure predicted by RNAz (61). Sequences are available in Table S3 in the supplemental material.

Presence (+; n = 39) of a conserved promoter predicted by PromPredict (51). Sequences are available in Table S4 in the supplemental material.

Presence (+; n = 19) of a conserved transcription terminator predicted by TransTermHP (52). Sequences are available in Table S5 in the supplemental material.

0577-0578 contains DsrA, a small ncRNA that regulates rpoS expression. It is not identified here due to an overlap with the 3′ end of 0577 (49).

(iii) A need for divergent genomes.

We simulated the evolution of plasmid-borne IGS sequences from the 23 genomes under neutral conditions using EVOLVER (35). A tree (Fig. 1C) was inferred, and synonymous subtree lengths (Fig. 1D, gray vertical lines) were obtained at various levels of phylogenetic divergence. The analysis shows that even long (L = 20 bp) functional IGS elements would not be resolvable if using only the genomes of B. burgdorferi sensu stricto and its closest relative, SV1 (Fig. 1D, “SV1” line). The plot further suggests that elements shorter than 5 bases would not be resolved at a false discovery rate smaller than a P value of 0.001 even with additional genomes. Nevertheless, sequencing more genomes from phylogenetically distinct B. burgdorferi sensu lato lineages would be the most cost-effective for the identification of candidate functional elements using phylogenomics (Fig. 1C and D). In North and South America, divergent B. burgdorferi sensu lato species not represented by the present genome data set include Borrelia carolinensis, Borrelia kurtenbachii, Borrelia californiensis, an unnamed species (“geno-species 2”), Borrelia americana, Borrelia andersonii, and Borrelia chilensis (13, 36). In Eurasia, B. burgdorferi sensu lato species highly divergent from those in the present study include Borrelia sinica, Borrelia yangtze, Borrelia tanukii, Borrelia japonica, Borrelia lusitaniae, and Borrelia turdi (13, 36). Comparison among distantly related genomes, however, introduces its own problems, such as an inability to identify species-specific regulatory elements since such elements evolved recently and are not conserved across all genomes (37).

Borrelia IGSs are enriched in conserved elements. (i) Conserved IGSs.

Conserved IGSs were identified as those with significantly low (by t tests at P <0.001) nucleotide substitution rates relative to the rates at flanking third-base sites. At each IGS locus, substitution rates of IGS and ORF sites were coestimated with a concatenated alignment using Rates4site (38). For example, the b08-b09 IGS contains a higher proportion of slowly evolving sites than its flanking third-base sites (Fig. 2B). Among the three directional types of IGSs, divergent IGSs tend to contain a large number of conserved sequences (63.5% for chromosomal IGSs) while convergent IGSs have relatively few conserved sequences (19.4% for chromosomal IGSs) (Table 1). This observation is consistent with the expectation that divergent and tandem IGSs are more likely than convergent IGSs to house cis-regulatory sequences. Among the three replicons, IGSs and ORFs on the main chromosome (Fig. 2A, right-most panels) contain a higher proportion of low-rate sites and are therefore more conserved than IGSs and ORFs on the plasmids (Fig. 2A, left-most and middle panels). Relatively low evolutionary rates on the main chromosome are expected, since it has lower effective recombination rates than the plasmids (34).
FIG 2 

Frequency distributions of base substitution rates. (A) Normalized base substitution rates (x axis), obtained by using concatenated IGS-ORF alignments and calculated by Rates4site (38), are distributed similarly among the three types of IGSs (top three rows) and the third-base sites (bottom row). Chromosomal sequences (right column) are more conserved than plasmid-borne sequences (left and middle columns). (B) Substitution rates of a conserved divergent IGS (middle panel) consist of a significantly higher (P = 5.6e−06, by a Wilcoxon rank sum test) density of low-rate sites than its flanking third-base sites (left and right panels).

Frequency distributions of base substitution rates. (A) Normalized base substitution rates (x axis), obtained by using concatenated IGS-ORF alignments and calculated by Rates4site (38), are distributed similarly among the three types of IGSs (top three rows) and the third-base sites (bottom row). Chromosomal sequences (right column) are more conserved than plasmid-borne sequences (left and middle columns). (B) Substitution rates of a conserved divergent IGS (middle panel) consist of a significantly higher (P = 5.6e−06, by a Wilcoxon rank sum test) density of low-rate sites than its flanking third-base sites (left and right panels).

(ii) PCIBs.

We identified a total of 935 and 276 perfectly conserved intergenic blocks (PCIBs) with a minimal length of six nucleotides on the main chromosome and the plasmids, respectively. These PCIBs occur within 125 nucleotides upstream or downstream of an ORF. The total lengths of these ORF-flanking PCIBs on the main chromosome and the two plasmids are, respectively, 26,417 and 10,199 bases, or 43.3% and 29.8% of the selected IGS sequences on the B31 genome. The comparable numbers from randomly permuted IGS alignments are 27.4% and 10.6%, respectively, indicating that B. burgdorferi IGSs are about 1.5 times and 3.0 times as enriched in conserved sequence blocks as expected by chance on the main chromosome and the plasmids, respectively. While the observed PCIBs outnumber those in shuffled alignments in nearly every length category, those on the main chromosome (P = 8.1e−12 by a one-tailed t test; Fig. 3A) are not as significant as those on the plasmids (P = 7.3e−15; Fig. 3B). Such deficiency in enrichment or lack of significance of conserved sequences on the main chromosome, however, does not necessarily imply that chromosomal IGSs harbor a proportionally smaller number of regulatory sequences. Rather, these deficiencies reflect a relative lack of statistical power for distinguishing functional IGS elements from neutrally evolving sequences on the main chromosome, which has a lower level of overall phylogenetic divergence than the plasmids (Fig. 1, Table 1). Sequencing the genomes of additional B. burgdorferi species is therefore expected to increase the resolving power of phylogenomics toward revealing shorter and more reliable functional IGS elements on the main chromosome as well as on the plasmids.
FIG 3 

Observed and predicted counts of perfectly conserved intergenic blocks (PCIBs) on the chromosome (A) and plasmids (B) (note the scale difference of the y axis). A PCIB has no nucleotide variations or alignment gaps. The minimum length of a PCIB is six nucleotides. “Observed,” length distribution of 935 PCIBs on the main chromosome and 276 PCIBs on the plasmids; “permuted,” counts of L-mer PCIBs from 10 rounds of permutations of original IGS alignments; “simulated,” PCIB counts from simulated sequences using Evolver (35); “expected,” PCIB counts obtained by equation 2 in Materials and Methods. Solid triangles represent L-mers having significantly higher counts than permuted counts (P < 0.001 by one-tailed t tests).

Observed and predicted counts of perfectly conserved intergenic blocks (PCIBs) on the chromosome (A) and plasmids (B) (note the scale difference of the y axis). A PCIB has no nucleotide variations or alignment gaps. The minimum length of a PCIB is six nucleotides. “Observed,” length distribution of 935 PCIBs on the main chromosome and 276 PCIBs on the plasmids; “permuted,” counts of L-mer PCIBs from 10 rounds of permutations of original IGS alignments; “simulated,” PCIB counts from simulated sequences using Evolver (35); “expected,” PCIB counts obtained by equation 2 in Materials and Methods. Solid triangles represent L-mers having significantly higher counts than permuted counts (P < 0.001 by one-tailed t tests).

(iii) The binomial model is robust to substitution models.

PCIB counts obtained from simulated IGS sequences using genes (0457 and a39) having median T values are not significantly different from the counts obtained by equation 2 in Materials and Methods (P = 0.1017 and P = 0.0924, respectively, by paired t tests). The close match between the counts from realistically simulated sequences and counts from the simplest sequence evolution model indicate that the analytical model is robust to variations in the value of evolution parameters, such as unequal base frequencies, bias in transitions to transversions, and rate heterogeneities among sites. This result is consistent with the original binomial model, which similarly was shown through simulations to be robust to models of base substitutions (2). Counts from the analytical model and the simulations, however, deviate greatly from permutation-based counts (Fig. 3A and B). This large discrepancy is likely due to the fact that individual IGS sequences vary greatly in T, while the simulation and analytical results are based on a single T value, considering that the analytical model is sensitive to the T value (Fig. 1D).

RpoS-dependent promoter regions are conserved. (i) cis-regulatory sequences of ospC.

On the cp26 plasmid, ospC is directly regulated by RpoS through its binding to a cis-acting promoter sequence (39–41). The −35/−10 promoter sequence of ospC is indeed highly conserved and contains PCIBs among the B. burgdorferi sensu lato species. Notably, the functionally critical C and T at −15 and −14, respectively, are constant among the genomes (Fig. 4B). Our comparative analysis thus supports the functional importance of the RpoS recognition sequence. Further upstream of the RpoS recognition site, two sets of inverted repeats (IRs) function as operators for post-invasion repression of ospC (42–45). These IRs were not necessary for ospC induction in trans-complementation experiments but may be required for cis induction of ospC (19, 39, 41, 46). Additionally, RNA structural analysis using RNAz showed that IRs in ospC promoters of all B. burgdorferi sensu lato species form stable secondary structures, although their sequences are not conserved between the species (Fig. 4B). This finding corroborates the suggestion in an earlier study which highlighted the functional significance of these IRs’ secondary structures (46).
FIG 4 

Conserved RpoS-dependent promoter sequences. RpoS recognition sites, ribosome-binding sites, and inverted repeats are conserved among B. burgdorferi sensu lato species in the promoters of six genes known to be upregulated during transmission from the tick to mammalian hosts. A new putative RpoS consensus sequence is derived (inset). The inverted repeats upstream of ospC are conserved in secondary structure but not in primary sequence.

Conserved RpoS-dependent promoter sequences. RpoS recognition sites, ribosome-binding sites, and inverted repeats are conserved among B. burgdorferi sensu lato species in the promoters of six genes known to be upregulated during transmission from the tick to mammalian hosts. A new putative RpoS consensus sequence is derived (inset). The inverted repeats upstream of ospC are conserved in secondary structure but not in primary sequence.

(ii) Newly identified putative RpoS-dependent promoters.

Ten genes in B31 have been identified through a combination of genetic manipulations and quantitative PCR as being absolutely dependent on RpoS for their expression, including ospC on cp26 and bba07, bba25 (dbpB)-bba24 (dbpA), bba34 (oppA5), and bba66 on lp54 (27). The putative RpoS-dependent promoter regions consisting of −35 and −10 promoter sequences in the upstream of ospC, a07, a25, and a34 are indeed highly conserved across the B. burgdorferi sensu lato species (Fig. 4A to D). In addition, the two inverted repeats upstream of a25 (dbpB) are perfectly conserved across these species (Fig. 4C). Using a customized RpoS motif-searching script (see Materials and Methods), we identified two additional putative RpoS-dependent promoters upstream of a36 and a73 (Fig. 4D and E), both of which encode lipoproteins that are highly upregulated in the presence of RpoS (27). All these putative RpoS recognition sites are highly conserved among the orthologous IGS sequences but vary considerably among the coregulated genes, suggesting differential binding affinity to RpoS. The WebLogo analysis showed significant nucleotide conservation at the −10 and −35 sites, while the intervening region between these two sites varies in sequence as well as in length (Fig. 4F).

Putative noncoding RNAs and coregulated genes.

Gene regulation is often associated with cis-acting sequences and trans-acting proteins that cooperatively affect the function of RNA polymerase. A number of studies have identified functional cis- and trans-acting elements that are critical to the regulation of virulent genes in B. burgdorferi and other pathogens (19, 41–43, 46). Although PCIBs include RpoS recognition and other known cis-regulatory sequences, regulatory sequences are not necessarily perfectly conserved among the species and even less so among the coregulated genes. For example, the RpoS recognition sequences varied considerably among coregulated genes (Fig. 4). To further identify putative regulatory sequences and coregulated genes, we performed a stand-alone NCBI-BLAST (47) search for statistically significant matches among the IGS sequences on the B. burgdorferi B31 core genome. Close to 1,610 matches were identified using an E value cutoff of 0.01. Under the assumption that regulatory sequences are highly conserved among orthologs, we retained only BLAST hits with an average between-species sequence identity of 90% or more for both the query and subject sequences and matches occurring within 125 nucleotides of flanking genes. We also removed BLAST matches with query or subject sequences located in regions with more than 10% gapped alignment sites. A total of 393 unique BLAST matches remained after these conservation-based filtering procedures, which include 40 self-matching palindromic sequences and 353 other sequences. These 393 BLAST hits are likely to be regulatory sequences, because they are not only highly conserved between species but also either self-matching palindromes or similarly oriented with respect to their downstream ORFs.

(i) Coregulated genes.

Table S2 in the supplemental material lists four examples of putative coregulatory gene sets that are supported by multiple lines of evidence, including (i) cross-species sequence conservation of the shared IGS elements, (ii) being approximately equally distant from the downstream genes, and (iii) similar biological functions of downstream genes. The 5′ ACATTTAAAATA 3′ motif shared between a07 and a73 may contribute to their RpoS-mediated upregulation (27). A shared 15-base 5′ ATCTTATAATATAAT 3′ motif upstream of a15 (ospA) and b05 (chbA) hints at possible coregulation of these genes. The 21-base-long, G-rich motif shared between a62 (lp6.6) and a74 (osm28) may contribute to their RpoS-mediated repression in mammals besides the conserved T-rich motif (27). It is equally interesting to note that the same 13-base sequence, 5′ ACTTTACTTTTTT 3′, resides at similar locations in the upstream of a18 and b10, the first of three plasmid-partitioning genes on lp54 and cp26, respectively.

(ii) Noncoding RNAs.

Of the 40 identified palindromic sequences, 28 were significantly predicted (RNA class probability of 0.9 or greater) by RNAz to form stable ncRNA secondary structures (see Table S3 in the supplemental material). The biological significance of these putative ncRNAs inferred using the Infernal software (48) included clustered regularly interspaced short palindromic repeat (CRISPR)-RNA direct repeat elements, a novel finding in B. burgdorferi. The functional significance of the longest putative ncRNAs is further highlighted by the compensatory changes within the stem regions and the high variability within the loop regions (Fig. 5). Note that the 0577-0578 IGS overlaps with DsrA, a small ncRNA that is responsible for temperature-dependent production of RpoS (49). It is not identified by our methods due to an overlap of 29 nucleotides with the 3′-end sequence of 0577. Nevertheless, the alignment of 0577-0578 IGS sequences (not shown) reveals a high degree of sequence conservation across the eight species in the region that binds to the rpoS transcript, except that a TTAAA tandem repeat (previously known as the TAAAT repeat [50]) varies with 3 to 7 copies among the B. burgdorferi sensu stricto and Borrelia finlandensis strains. This genetic variation appears to be evolutionarily recent and specific to the aforementioned lineages, since there is no length variation in the same region among other species.
FIG 5 

Predicted secondary structures of highly conserved putative ncRNAs. Structures of these eight longest inverted repeats (IRs) were predicted using RNAz (61) and plotted with B31 sequences using Varna (62). Arrows point to variations in the indicated strains. The Rfam accessions and annotations based on searches using Infernal (48) are as follows: IR0146-0147-RF00082, small RNA G (SraG); IR0243-0244-RF02152, long noncoding RNA (MINT_2); IR0434-0435-RF00074, pre-miRNA (mir-29); IR0346-0347-RF01350, CRISPR direct repeat element (CRISPR-DR41); IR0385-0386-RF01379, CRISPR direct repeat element (CRISPR-DR66); IR0602-0603-RF02066, bacterial small RNAs (STnc320); IRa37-a38, RF02058-bacterial small RNAs (STnc400); and IRb03-b04, RF00741-pre-miRNA (mir-378). Structures of another six long conserved IRs in Borrelia (IRb04-b05, IRb12-b13, IRb29-b01, IRa16-a18, IRa21-a23, and IRa34-a36) have been published earlier (32, 33).

Predicted secondary structures of highly conserved putative ncRNAs. Structures of these eight longest inverted repeats (IRs) were predicted using RNAz (61) and plotted with B31 sequences using Varna (62). Arrows point to variations in the indicated strains. The Rfam accessions and annotations based on searches using Infernal (48) are as follows: IR0146-0147-RF00082, small RNA G (SraG); IR0243-0244-RF02152, long noncoding RNA (MINT_2); IR0434-0435-RF00074, pre-miRNA (mir-29); IR0346-0347-RF01350, CRISPR direct repeat element (CRISPR-DR41); IR0385-0386-RF01379, CRISPR direct repeat element (CRISPR-DR66); IR0602-0603-RF02066, bacterial small RNAs (STnc320); IRa37-a38, RF02058-bacterial small RNAs (STnc400); and IRb03-b04, RF00741-pre-miRNA (mir-378). Structures of another six long conserved IRs in Borrelia (IRb04-b05, IRb12-b13, IRb29-b01, IRa16-a18, IRa21-a23, and IRa34-a36) have been published earlier (32, 33).

(iii) Promoters and terminators.

The IGS sequences were tested for the presence of promoters and transcription terminators (see Materials and Methods). Overall, 57 such putative elements within 125 nucleotides of the flanking ORFs were identified, including 39 predicted promoters (including the well-studied ospC promoter) and 19 putative transcription terminators (see Tables S4 and S5 in the supplemental material). Note that these predicted IGS terminators do not include an intragenic terminator that is a part of the bmpB (bb_0382) coding sequence (50). The alignment of bmpB sequences (not shown) displays an absence of nucleotide substitutions in the terminator region across all eight species, except at two opposite positions of the stem region. These two sites show an A-T pairing in six species, two compensatory changes resulting in a G-C pairing in B. burgdorferi sensu stricto, and one substitution resulting in a G-T mismatch in Borrelia bissettii DN127. All variations at these two sites are synonymous. Strong sequence conservation and compensatory substitutions support the functional importance of this and other terminators as a mechanism for regulating differential expression of cotranscribed genes in Borrelia (50).

Concluding remarks.

The present study is the first systematic search of gene regulatory elements in B. burgdorferi using a large number of genomes. Previous efforts were either based on a limited number of genomes (32) or using plasmid sequences only (33). The phylogenomic search identified a large number of candidate cis-regulatory (Table 2, Fig. 4) and trans-regulatory (Fig. 5) elements that are highly conserved among B. burgdorferi sensu lato species. We caution, however, that regulatory elements may not be conserved in primary sequences. For example, the RpoS-binding sites span across a variable region (Fig. 4F). The inverted repeats upstream of ospC are conserved in the secondary structure but not in the primary sequence (Fig. 4B). Computational approaches not based on sequence conservation, such as PromPredict (51) and TransTermHP (52), are therefore valuable complementary tools for predicting regulatory elements. To aid future computational and experimental characterization of the genome-wide regulatory network in B. burgdorferi sensu lato, we released all IGS alignments on BorreliaBase.org, a publicly accessible online database of orthologous ORFs and IGSs in Borrelia (53). The website will be periodically updated to include newly released Borrelia genomes. Predicted regulatory elements Including IGSs on chromosome, lp54, and cp26 that are ≥30 nucleotides and present in at least seven of the eight B. burgdorferi sensu lato species. Presence (+; n = 28) of a conserved RNA structure predicted by RNAz (61). Sequences are available in Table S3 in the supplemental material. Presence (+; n = 39) of a conserved promoter predicted by PromPredict (51). Sequences are available in Table S4 in the supplemental material. Presence (+; n = 19) of a conserved transcription terminator predicted by TransTermHP (52). Sequences are available in Table S5 in the supplemental material. 0577-0578 contains DsrA, a small ncRNA that regulates rpoS expression. It is not identified here due to an overlap with the 3′ end of 0577 (49). The statistical approach we developed here based on an earlier model (2) suggests that genome sequences from additional B. burgdorferi sensu lato species are needed to identify IGS elements shorter than five bases and to further reduce false discovery rates, especially for those on the main chromosome (Fig. 1 and 3). The GERP++ tool, which similarly estimates statistical significance using empirically calculated neutral substitution rates, identifies putatively functional elements conserved among vertebrates in a more automated fashion (5). In the future, one may consider adapting the GERP++ approach to identify functional IGS elements in bacterial genomes by using synonymous tree lengths as neutral expectations. For now, we expect the proposed statistical framework to be helpful for estimating the false discovery rates of conserved IGS sequences as well as for determining the number of genomes (and at what phylogenetic levels) necessary for achieving a certain level of statistical significance in Borrelia and other bacterial species.

MATERIALS AND METHODS

Identification of orthologous IGSs. (i) Consensus start positions.

We used these orthologous ORFs as anchors for identifying orthologous IGS sequences based on the assumption that IGS sequences are orthologous if they are flanked by orthologous ORFs (11). A major problem in IGS identification is the inconsistent start codon positions among the orthologous ORFs, each of which had been predicted independently by the program Glimmer3 (54). In fact, one important rationale for sequencing multiple genomes of a single species or species group is to improve the prediction of genes and their start codon positions (55). To minimize the erroneous mixing of true IGSs and sequences that may in fact be a part of ORFs, we identified a consensus start codon position for each orthologous ORF family based on the majority of predicted start codon positions among (but not within) B. burgdorferi sensu lato species. After the identification of a consensus start codon position for each orthologous ORF family, we used a customized Perl script based on BioPerl (56) to extract orthologous IGSs. IGS sequences were aligned directly with MUSCLE (57), while flanking ORF sequences were aligned according to the MUSCLE alignment of translated protein sequences. IGS loci were categorized into three types based on their orientation relative to the transcription directions of its two flanking ORFs: a “divergent” IGS is located at the 5′ ends of both flanking ORFs, a “tandem” IGS at the 5′ end of one of the two flanking ORFs and the 3′ end of another flanking ORF, and a “convergent” IGS at the 3′ ends of both flanking ORFs.

(ii) Filtering by length.

In identifying reliable IGSs, we used only long (≥150-base) orthologous ORFs and those that are present in at least seven of the eight genome-sequenced B. burgdorferi sensu lato species. ORFs with limited phylogenetic presence are likely to be erroneously predicted, and their flanking IGSs were excluded from analysis. We further excluded IGS loci with an alignment length of 30 or fewer bases. Short IGSs are likely to be between genes that are cotranscribed (e.g., part of an operon) and thus lacking regulatory elements.

Identification of conserved IGS elements. (i) Substitution rates.

We identified evolutionarily conserved IGSs by coestimating per-site nucleotide substitution rates for an IGS with its two flanking ORFs. At each IGS locus, alignments of the IGS sequences and two flanking ORF sequences were concatenated using a customized Perl script. Nucleotide substitution rates were subsequently estimated using Rates4site with the HKY model and 16 discrete categories (38). Customized Perl scripts were then used to extract per-site substitution rates at the IGS as well as at the first, second, and third codon positions of the flanking ORFs. Conserved IGS sequences were identified as those having significantly lower substitution rates than the flanking third codon positions by t tests or Wilcoxon rank sum tests (nonparametric equivalent of t test) in an R statistical environment (58).

(ii) Power analysis.

The statistical power of detecting conserved noncoding sequences using phylogenetic footprinting increases with the length of conserved elements, the number of genomes, and the evolutionary distance. In a hypothetical, simplified case of a sequence with a length of L nucleotides evolving under the Jukes-Cantor model and using a group of N equally related genomes, each of which deviates from an ancestral sequence by a distance of D substitutions, the probability of false positives (FP; i.e., selectively neutral elements misidentified as conserved sequences) is given by a cumulative binomial function, , where k is the number of base substitutions and C is the threshold number of base changes below which a sequence is considered conserved (2). For genomes related not by a star phylogeny, one may consider a Poisson model in which the probability that a nucleotide remains identical after evolving with an expected number of substitutions given by the neutral tree length T0 is . Using the synonymous tree length (T) of the flanking ORFs to approximate the neutral tree length (T0), the statistical significance of deviation from the neutral expectation of an IGS displaying n substitutions is given by The number n can be estimated either by the number of variable sites or by the total tree length of the IGS itself. In the special case of n = 0 (i.e., an absence of base substitution) in such an L-mer sequence, This FP discovery rate decreases with increasing L, thus defining the minimum length of functional L-mer conserved elements that can be identified given a set of genomes with a total phylogenetic diversity measured by T. We used the CODEML program of the PAML (version 4.8) package (35) to obtain the T and nonsynonymous tree lengths (T) for each flanking ORF family. The R pbinom function was used to obtain the cumulative binomial probabilities (58).

(iii) Validation by simulations.

We tested the validity of the T-based binomial model with simulated sequences generated by the Evolver program of the PAML (version 4.8) package (35). Option 5 of Evolver simulates the evolution of noncoding nucleotide sequences given user-specified phylogeny, length of sequences, total tree length, and a nucleotide substitution model. For the phylogeny and total tree length, we used those estimated by the CODEML program of PAML for a typical ORF (with a T close to the median) on the main chromosome or plasmid. For the nucleotide substitution model, we used an HKY model with base frequencies, transition-to-transversion ratio (κ), and rate heterogeneity parameters (α and γ) estimated by CODEML for the same ORF. Counts of perfectly conserved L-mer sequences were compared with the analytically predicted counts (equation 2) as a test of the validity as well as the robustness of the analytical model, which is based on the Juke-Cantor model, the simplest of nucleotide substitution models.

(iv) Permutation test.

Statistical significance of conserved IGS sequences was also empirically estimated by permuting IGS alignments (11). We used customized Perl scripts to permute each IGS alignment 10 times and extracted all ungapped segments 6 bases or longer that were perfectly conserved among all sequenced genomes. The numbers of occurrences of L-mer perfectly conserved IGS blocks (PCIBs) were then compared with the observed numbers with one-tailed t tests. L-mers that are significantly more numerous than permuted counts have relatively low false-positive discovery rates.

Prediction of regulatory IGS sequences. (i) Ribosome-binding sites, promoters, and intrinsic terminators.

To identify putative functional elements contributing to the evolutionary conservation of IGSs, we tested for the presence of ribosome-binding sites (RBS), promoters, intrinsic transcription terminators, noncoding RNAs (ncRNAs), and RpoS recognition sites. Only elements discovered within 125 nucleotides from each flanking ORF and present in all studied strains were reported. This allowed us to filter out recently pseudogenized sequences, which tend to be conserved and closer to the center of a long IGS. The RBS profile specific for B. burgdorferi sensu lato was identified using the RBSFinder algorithm with the 16S rRNA sequence of B. burgdorferi B31 as the reference and the 5′ upstream sequences of 26 ORF sequences on the cp26 plasmid of B. burgdorferi B31 as sample sequences (59). PromPredict (version 1.0), which detects differences in free energy between promoter and nonpromoter regions in bacterial genomes, was used to predict promoter sequences in individual IGS sequences (51). Promoters were reported only if they were identified in all orthologous IGS sequences. We considered known promoter regions (e.g., ospC and dbpA) as our positive controls and convergent IGS segments within consecutive ORFs as negative controls. TransTermHP (version 1.0), which identifies the pattern of a hairpin loop followed by a thymine-rich segment, was employed to predict Rho-independent transcription terminators (52). Terminators were reported only if they were identified in all orthologous IGS sequences.

(ii) RpoS recognition sites.

To identify potential RpoS recognition sites, we first used previously published RpoS binding sequences in B. burgdorferi B31 (27) as the query sequences to search among orthologous IGS sequences using NCBI-BLASTN (with the megablast and 30 no-dust options) (47). We obtained a new consensus sequence by aligning predicted RpoS sequences (one representative strain per species for each IGS locus) with MUSCLE (57). The consensus sequence of these predicted RpoS recognition sites was obtained and visualized using WebLogo (version 2.8.2) (60).

(iii) Noncoding RNAs and coregulated genes.

Based on the B31 IGS sequences, we used NCBI-BLASTN to identify similar segments with the following parameters: -task “blastn-short,” -dust 0 (no sequence filtering), -evalue 1e−5 (an expect value of 10−5), -word_size 5 (word size 5) (47). The BLASTN protocol identified IGS segments that are either similar to each other or self-similar palindromes. We retained only IGS segments that are highly conserved, showing a 90% or higher average sequence identity between the eight B. burgdorferi sensu lato species and 10% or less gapped alignment sites. Conserved palindromes were retained as putative ncRNAs (32, 33). RNAz (version 2.1) was used to identify the conserved secondary structure of the putative ncRNA (61). The consensus secondary structure of the predicted ncRNA elements was plotted using Varna (62). Infernal (version 1.1), which implements covariance models to search DNA sequence databases for similar RNA structures and sequences, was used to infer functions of putative ncRNAs (48). Genes sharing a conserved IGS segment were considered potential members of a coregulated network (1, 3). Genome sources. This table lists 22 Borrelia burgdorferi sensu lato genomes used in the present study, including their taxonomy, geographic origins, biological origins, GenBank accession numbers, and original references. Table S1, DOCX file, 0.2 MB Putative pairs of coregulated genes. This table lists sequences of four pairs of IGSs, each pair containing a conserved motif shared between two downstream genes. The four candidate pairs of coregulated genes are a15 (ospA)–b05 (chbA), a07 (chpAI)–a73 (p35), a18-b10, a62 (lp6.6)–a74 (oms28). Table S2, DOCX file, 0.1 MB Predicted ncRNA sequences. This table lists sequences of 28 highly conserved palindromic sequences predicted to be ncRNAs. Table S3, DOCX file, 0.1 MB Predicted promoter sequences. This table lists sequences of 39 predicted promoters. Table S4, DOCX file, 0.2 MB Predicted terminators. This table lists sequences of 18 predicted terminators. Table S5, DOCX file, 0.1 MB
  62 in total

1.  A probabilistic method for identifying start codons in bacterial genomes.

Authors:  B E Suzek; M D Ermolaeva; M Schreiber; S L Salzberg
Journal:  Bioinformatics       Date:  2001-12       Impact factor: 6.937

2.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues.

Authors:  Tal Pupko; Rachel E Bell; Itay Mayrose; Fabian Glaser; Nir Ben-Tal
Journal:  Bioinformatics       Date:  2002       Impact factor: 6.937

3.  The Bioperl toolkit: Perl modules for the life sciences.

Authors:  Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal:  Genome Res       Date:  2002-10       Impact factor: 9.043

4.  MUSCLE: multiple sequence alignment with high accuracy and high throughput.

Authors:  Robert C Edgar
Journal:  Nucleic Acids Res       Date:  2004-03-19       Impact factor: 16.971

Review 5.  Evolutionary genomics of Borrelia burgdorferi sensu lato: findings, hypotheses, and the rise of hybrids.

Authors:  Wei-Gang Qiu; Che L Martin
Journal:  Infect Genet Evol       Date:  2014-04-03       Impact factor: 3.342

6.  Stage-specific global alterations in the transcriptomes of Lyme disease spirochetes during tick feeding and following mammalian host adaptation.

Authors:  Radha Iyer; Melissa J Caimano; Amit Luthra; David Axline; Arianna Corona; Dumitru A Iacobas; Justin D Radolf; Ira Schwartz
Journal:  Mol Microbiol       Date:  2014-12-30       Impact factor: 3.501

7.  Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

Authors:  Paul Cliften; Priya Sudarsanam; Ashwin Desikan; Lucinda Fulton; Bob Fulton; John Majors; Robert Waterston; Barak A Cohen; Mark Johnston
Journal:  Science       Date:  2003-05-29       Impact factor: 47.728

8.  Sequencing and comparison of yeast species to identify genes and regulatory elements.

Authors:  Manolis Kellis; Nick Patterson; Matthew Endrizzi; Bruce Birren; Eric S Lander
Journal:  Nature       Date:  2003-05-15       Impact factor: 49.962

9.  Global analysis of Borrelia burgdorferi genes regulated by mammalian host-specific signals.

Authors:  Chad S Brooks; P Scott Hefty; Sarah E Jolliff; Darrin R Akins
Journal:  Infect Immun       Date:  2003-06       Impact factor: 3.441

10.  BorreliaBase: a phylogeny-centered browser of Borrelia genomes.

Authors:  Lia Di; Pedro E Pagan; Daniel Packer; Che L Martin; Saymon Akther; Girish Ramrattan; Emmanuel F Mongodin; Claire M Fraser; Steven E Schutzer; Benjamin J Luft; Sherwood R Casjens; Wei-Gang Qiu
Journal:  BMC Bioinformatics       Date:  2014-07-03       Impact factor: 3.169

View more
  2 in total

1.  Evidence that BosR (BB0647) Is a Positive Autoregulator in Borrelia burgdorferi.

Authors:  Zhiming Ouyang; Jianli Zhou; Michael V Norgard
Journal:  Infect Immun       Date:  2016-08-19       Impact factor: 3.441

2.  Infections and mixed infections with the selected species of Borrelia burgdorferi sensu lato complex in Ixodes ricinus ticks collected in eastern Poland: a significant increase in the course of 5 years.

Authors:  Angelina Wójcik-Fatla; Violetta Zając; Anna Sawczyn; Jacek Sroka; Ewa Cisak; Jacek Dutkiewicz
Journal:  Exp Appl Acarol       Date:  2015-11-21       Impact factor: 2.132

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.