Literature DB >> 21867524

Transcriptomes of Frankia sp. strain CcI3 in growth transitions.

Derek M Bickhart1, David R Benson.   

Abstract

BACKGROUND: Frankia sp. strains are actinobacteria that form N2-fixing root nodules on angiosperms. Several reference genome sequences are available enabling transcriptome studies in Frankia sp. Genomes from Frankia sp. strains differ markedly in size, a consequence proposed to be associated with a high number of indigenous transposases, more than 200 of which are found in Frankia sp. strain CcI3 used in this study. Because Frankia exhibits a high degree of cell heterogeneity as a consequence of its mycelial growth pattern, its transcriptome is likely to be quite sensitive to culture age. This study focuses on the behavior of the Frankia sp. strain CcI3 transcriptome as a function of nitrogen source and culture age.
RESULTS: To study global transcription in Frankia sp. CcI3 grown under different conditions, complete transcriptomes were determined using high throughput RNA deep sequencing. Samples varied by time (five days vs. three days) and by culture conditions (NH4+ added vs. N2 fixing). Assembly of millions of reads revealed more diversity of gene expression between five-day and three-day old cultures than between three day old cultures differing in nitrogen sources. Heat map analysis organized genes into groups that were expressed or repressed under the various conditions compared to median expression values. Twenty-one SNPs common to all three transcriptome samples were detected indicating culture heterogeneity in this slow-growing organism. Significantly higher expression of transposase ORFs was found in the five-day and N2-fixing cultures, suggesting that N starvation and culture aging provide conditions for on-going genome modification. Transposases have previously been proposed to participate in the creating the large number of gene duplication or deletion in host strains. Subsequent RT-qPCR experiments confirmed predicted elevated transposase expression levels indicated by the mRNA-seq data.
CONCLUSIONS: The overall pattern of gene expression in aging cultures of CcI3 suggests significant cell heterogeneity even during normal growth on ammonia. The detection of abundant transcription of nif (nitrogen fixation) genes likely reflects the presence of anaerobic, N-depleted microsites in the growing mycelium of the culture, and the presence of significantly elevated transposase transcription during starvation indicates the continuing evolution of the Frankia sp. strain CcI3 genome, even in culture, especially under stressed conditions. These studies also sound a cautionary note when comparing the transcriptomes of Frankia grown in root nodules, where cell heterogeneity would be expected to be quite high.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21867524      PMCID: PMC3188489          DOI: 10.1186/1471-2180-11-192

Source DB:  PubMed          Journal:  BMC Microbiol        ISSN: 1471-2180            Impact factor:   3.605


Background

Studies on actinorhizal symbioses have benefitted greatly from several genome sequences of the actinobacterial symbiont Frankia sp. strains. Such strains induce root nodules and fix N2 in a broad array of plants [1]. The smallest frankial genome finished to date is that of Frankia sp. HFPCcI3 (CcI3) that infects plants of the family Casuarinaceae; it is about 5.4 Mbp in size and encodes 4499 CDS [2]. A striking feature of the CcI3 genome is the presence of over 200 transposase genes or gene remnants that may play, or have played, a role in genome plasticity [3]. In addition, relative to other Frankia sp. genomes that have been sequenced, CcI3 contains few gene duplicates [2]. Comparative genome studies suggest that evolution has favored gene deletion rather than duplication in this strain, perhaps as an outcome of its symbiotic focus on a single, geographically limited group of plants in the Casuarinaceae [2]. Transcriptome sequencing of bacterial genomes has yielded surprising complexity (for a review see [4]). Such studies have shown differential cistron transcription within operons [5], small regulatory RNA transcripts [6-9] and numerous riboswitch controlled transcripts [10,11]. Significant transcriptional heterogeneity has also been found in single cultures that has been ascribed to subpopulations within an otherwise synchronized bacterial population [12]. High throughput RNA-seq methods provide a tool for transcript quantification with a much higher dynamic range than that provided by microarray studies by relying on direct comparison of transcript abundance for assessing differential expression [13]. Frankia transcriptome studies have the potential to reveal common genes and pathways active in, or essential to, symbiosis and free-living growth. A first step to resolving symbiotic-specific expression is to gain insight into transcriptional behavior and variability in axenic culture. This work helps address the issue of cultural heterogeneity that will likely be exacerbated by physiological heterogeneity in symbiosis. A previous transcriptome study has been done using whole-genome microarrays in Alnus and Myrica root nodules using cultured Frankia alni strain ACN14a as a reference [14]. In that study, relatively few surprises were encountered and the overall transcription profile was similar in both nodule types. We focus here on an approach using transcriptome deep sequencing of cultured Frankia strain CcI3 grown under different conditions, and the analysis of subsequent data to provide insight into the global expression that may impinge on physiology and genome stability in Frankia strains.

Results and Discussion

Culture characteristics and experimental design

As a consequence of its filamentous growth habit, Frankia sp. strain CcI3 grows from hyphal tips with an initial doubling time of about 18 hrs that subsequently slows to more linear growth [15]. As tips extend, cells left behind are physiologically in stationary phase and eventually senesce. Thus, even young cultures (defined here as three days old) have a degree of physiological heterogeneity that increases as cultures age [16]. This heterogeneity must be taken into account in interpreting global transcriptome analyses. Several factors in our sampling and library creation may influence a transcriptome analysis. Single Frankia cultures were used in preparing RNA libraries for each sample prior to sequencing. In addition, each sample was run on the Illumina GA IIx sequencer without technical replicates. While technical and biological replicates would have eliminated two potential sources of variability in the results of this experiment, several studies have suggested that both types of variability are unlikely to influence end results [13,17], while other studies have found significant variation among replicate samples [18,19]. Such effects may only influence low RPKM value genes [20] but, as with many such studies, our results must be viewed in the light of many potential variables.

RNA sample quality and features

RNA preparations used for making dscDNA libraries for Illumina sequencing had 260/280 ratios greater than 2.0 and greater than 400 to 950 ng per μl. PCR amplification using primers for the glnA gene failed to yield an amplicon from RNA preparations indicating very low, if any, DNA contamination. In addition, an RT-PCR assay revealed no detectable DNA within total RNA samples prepared in a separate experiment, confirming that the RNA extraction technique can apply to sensitive RNA based experiments that use strain CcI3. Transcriptome sequencing done using 5dNH4 CcI3 cells yielded about six million reads, three million of which could be mapped to the Frankia sp. CcI3 genome (Table 1). Almost 51% of the mapped reads were from rRNA or tRNA (Table 1). An updated base-calling algorithm (RTA v. 1.6) yielded substantially higher reads for samples from 3dNH4 and 3dN2 cultures. About 26 million reads were obtained for the latter samples, with about 16 million mapped reads in each (Table 1). Non-coding RNAs represented a greater proportion of mapped reads in these two samples, comprising nearly 80% of the total.
Table 1

Dataset statistics

5dNH4 (#ORFs/#Readsǂ)3dNH4 (#ORFs/#Readsǂ)3dN2 (#ORFs/#Readsǂ)
rRNA/tRNA65/1,401,12065/12,799,04964/13,524,803
mRNA4,491/1,322,1394,544/2,813,0634544/2,945,205
hypothetical1,355/307,0271,363/547,1961,363/634,786
pseudogenes49/8,88249/31,56649/44,989
transposases135/24,528137/62,484137/87,928
phage proteins26/1256426/17,29226/25,218
CRISPRs9/6,5539/8,9269/12,702

ǂ Includes reads that mapped ambiguously. Ambiguous reads were only counted once.

Dataset statistics ǂ Includes reads that mapped ambiguously. Ambiguous reads were only counted once. Even after ribosomal RNA depletion, non-coding sequences formed the majority of reads in all samples with the greatest reduction seen in the 5dNH4 sample (Table 1). This relative amount of rRNA could be related to the reduction of rRNA in older cultures, as observed in stationary and death phase cultures of E. coli [21]. On the other hand, given the concentration dependence of the rRNA depletion method used in preparing the mRNA-seq libraries, a decrease in the proportion of rRNA in the five-day time point could have resulted from more efficient depletion. Incomplete depletion of rRNA populations is similar to what is observed in other studies and is related to the sheer abundance of such sequences [22]. The number of coding RNA reads was similar among all three samples although the read length for the 3dNH4 and 3dN2 samples was 76 versus 34 for 5dNH4. All of the pseudogenes present in the CcI3 genome had transcripts in at least two of the three genomes (Table 1). Pseudogene transcription is presently not believed be a rare event [23], though many pseudogenes identified in a bacterial genome may simply be misannotated ORFS.

Functional Pathways

The 100 genes with the highest RPKM value in each condition, omitting ribosomal RNAs, are listed in Table 2. The number of hypothetical genes in this group range from 29 in the 3dNH4 cells to 39 in the 3dN2 cells to 43 in the 5dNH4 cells. Older cultures had more transcripts associated with tRNAs, transposases, CRISPR elements, integrases and hypothetical proteins than did younger cultures. Indeed, had they been included in the list, 18 of the 46 tRNA genes in CcI3 would have been in the top 100 most abundant transcript populations in 5dNH4 cells whereas no tRNAs were found in the top 100 transcripts in 3dN2 or 3dNH4 cell populations. The picture painted by the abundance of such transcripts is one of cells starved for essential metabolites such as amino acids, as expected in aging cells. In addition, enzymes involved in solving oxidative damage (e.g. protein-methionine-S-oxide reductase) were also more abundant in the older culture. Conversely, enzymes involved in catabolism (eg. alcohol dehydrogenase) were more frequently represented in the two younger cultures.
Table 2

The top 100 highly expressed coding ORFs predicted by RPKM values

3dNH41Locus tagRPKM23dN2Locus tagRPKM5dNH4Locus tagRPKM
heat shock protein Hsp20Francci3_117910755heat shock protein Hsp20Francci3_11793553hypothetical proteinFrancci3_10174967
aldehyde dehydrogenaseFrancci3_29447165aldehyde dehydrogenaseFrancci3_29443152heat shock protein Hsp20Francci3_11792077
chaperonin GroELFrancci3_43985923hypothetical proteinFrancci3_15452327hypothetical proteinFrancci3_39991926
cold-shock DNA-binding Francci3_02605495transposase IS66Francci3_18642261transposase IS66Francci3_18641801
OsmC-like protein Francci3_44655490hypothetical proteinFrancci3_21781993polysaccharide deacetylaseFrancci3_01651616
co-chaperonin GroESFrancci3_06325362response regulator receiverFrancci3_01201823hypothetical proteinFrancci3_21011596
Hemerythrin HHE cationFrancci3_10664392Hemerythrin HHE cationFrancci3_10661807phage integraseFrancci3_42741451
hypothetical proteinFrancci3_15454225hypothetical proteinFrancci3_19361789Radical SAMFrancci3_17531392
NAD/NADP transhydrogenase Francci3_29473226OsmC-like protein Francci3_44651777hypothetical proteinFrancci3_22411333
UspAFrancci3_27603221hypothetical proteinFrancci3_39991614hypothetical proteinFrancci3_28901265
hypothetical proteinFrancci3_34943190cold-shock DNA-bindingFrancci3_02601592phosphoribosyl-ATPphosphataseFrancci3_43171245
hypothetical proteinFrancci3_21783071sigma 54 modulationFrancci3_07641574hypothetical proteinFrancci3_01591184
sigma 54 modulation proteinFrancci3_07643004cold-shock DNA-binding Francci3_44691458ribonucleaseHIIFrancci3_35881161
cold-shock DNA-binding Francci3_44692949putative DNA-bindingFrancci3_19491392GDP-mannose 4,6-dehydrataseFrancci3_13071134
Alcohol dehydrogenaseFrancci3_29452916LuxR family regulatorFrancci3_07651361hypothetical proteinFrancci3_40231122
putative Lsr2-like protein Francci3_34982659chaperoninGroELFrancci3_43981199major facilitator superfamilyFrancci3_22891122
hypothetical proteinFrancci3_19362577hypothetical proteinFrancci3_41231176RNA-directed DNA polymeraseFrancci3_23181088
hypothetical proteinFrancci3_22702529hypothetical proteinFrancci3_34941175methionine-S-oxide reductaseFrancci3_22681071
thioredoxin-related Francci3_04472355hypothetical proteinFrancci3_22691174HypAFrancci3_19371047
SsgAFrancci3_34182154transcriptional regulatorFrancci3_42551167acyltransferase 3Francci3_2337987
luciferase-likeFrancci3_27612117co-chaperoninGroESFrancci3_06321150hypothetical proteinFrancci3_3302982
molecular chaperone DnaKFrancci3_43522036hypothetical proteinFrancci3_24421117Serine acetyltransferase-likeFrancci3_3842970
globin Francci3_25811935SsgAFrancci3_34181043hypothetical proteinFrancci3_0227970
LuxR family regulatorFrancci3_07651934SecE subunitFrancci3_05671037hypothetical proteinFrancci3_1719965
thioredoxin reductaseFrancci3_45361913putative Lsr2-like protein Francci3_34981022hypothetical proteinFrancci3_0238957
Rhodanese-likeFrancci3_04491881PEP phosphomutaseFrancci3_15331005hypothetical proteinFrancci3_2200947
carbonic anhydrase Francci3_07081859hypothetical proteinFrancci3_2270973hypothetical proteinFrancci3_1831945
superfamily MFS_1Francci3_27521811chaperone hypC/hupFFrancci3_1946954serine/threonine kinaseFrancci3_4051938
hypothetical proteinFrancci3_32501807transposase, IS4Francci3_3990953signal transduction kinaseFrancci3_0085938
exodeoxyribonuclease III Francci3_11801754thioredoxin-related Francci3_0447951hypothetical proteinFrancci3_4019922
PEP phosphomutaseFrancci3_15331742ATP synthase F0Francci3_3713928hypothetical proteinFrancci3_0396914
STAS (anti-σ factor antagonist)Francci3_04411728mannose 4,6-dehydrataseFrancci3_1053921CRISPR-associated proteinFrancci3_0021899
hypothetical protein Francci3_19351687phage integraseFrancci3_4338919hypothetical proteinFrancci3_0038899
sigma 38Francci3_35051673protein of unknown functionFrancci3_3347892RecombinaseFrancci3_3989898
hypothetical proteinFrancci3_02271665transposase, IS4Francci3_0391878aldo/keto reductaseFrancci3_3416890
hypothetical proteinFrancci3_16151634major facilitator MFS_1Francci3_2752865transposase, IS4Francci3_1873875
hypothetical proteinFrancci3_29431629NAD/NADP transhydrogenaseFrancci3_2947863Excisionase/Xis, DNA-bindingFrancci3_0405875
hypothetical proteinFrancci3_00541629hypothetical proteinFrancci3_4084855transposase, IS4Francci3_0151874
transposase IS66Francci3_18641625hypothetical proteinFrancci3_2380839CRISPR-associated proteinFrancci3_0020869
transcriptional regulator, CarDFrancci3_42551596hypothetical proteinFrancci3_4114821CRISPR-associated proteinFrancci3_3345863
alanine dehydrogenase/PNT-likeFrancci3_29461532Alcohol dehydrogenaseFrancci3_2945796glycosyl transferaseFrancci3_3318859
serine phosphatase Francci3_32491453hypothetical proteinFrancci3_3791782metallophosphoesteraseFrancci3_1990839
chaperonin GroELFrancci3_06331439acyl-CoA dehydrogenase Francci3_1000781hypothetical proteinFrancci3_3339837
hypothetical proteinFrancci3_09491437transcriptional regulatorFrancci3_3081780transcriptional regulatorFrancci3_3081834
transcription factor WhiB Francci3_37591430hypothetical proteinFrancci3_0037779hypothetical proteinFrancci3_3317826
fatty acid desaturase, type 2Francci3_03071430Amino acid adenylationFrancci3_2461777hypothetical proteinFrancci3_4072824
STASFrancci3_43021405hypothetical proteinFrancci3_1615775transcriptional regulatorFrancci3_0908816
Heavy metal transportproteinFrancci3_04891368hypothetical proteinFrancci3_2179775hypothetical proteinFrancci3_4129809
sigma-24Francci3_37681353hypothetical protein Francci3_1534773transposase, IS4Francci3_4227803
transcriptional regulator, TetRFrancci3_27581349hypothetical proteinFrancci3_2329767Antibiotic biosynthesisFrancci3_0875800
hypothetical proteinFrancci3_34171343carbonic anhydrase Francci3_0708764hypothetical proteinFrancci3_3336796
SecE subunit Francci3_05671339transcription factor WhiB Francci3_3759751hypothetical protein Francci3_2440781
Excisionase/Xis, DNA-bindingFrancci3_00991327UspAFrancci3_2760747hypothetical proteinFrancci3_4509778
hypothetical proteinFrancci3_37911315exodeoxyribonuclease III Francci3_1180747putative copper resistanceFrancci3_2497771
ATP synthase F0, A subunit Francci3_37131263hypothetical proteinFrancci3_1832737transcriptional regulatorFrancci3_0210765
30S ribosomal proteinS1Francci3_10571256protein of unknown functionFrancci3_2628714hypothetical proteinFrancci3_1090764
heat shock protein Hsp20Francci3_21741241hypothetical proteinFrancci3_4509714hypothetical proteinFrancci3_4156760
NAD(P) transhydrogenase, betaFrancci3_29481231hypothetical proteinFrancci3_1650709RNA-binding S4Francci3_3479747
putative transcriptional regulator Francci3_16741218STASFrancci3_0441701hypothetical proteinFrancci3_1545746
protein of unknown functionFrancci3_04501215molecularchaperoneDnaKFrancci3_4352694hypothetical proteinFrancci3_3238746
Alcohol dehydrogenaseFrancci3_15441206hypothetical proteinFrancci3_0159693hypothetical proteinFrancci3_3301737
putative DNA-binding protein Francci3_19491203acyl transferase region Francci3_0991691hypothetical proteinFrancci3_1985724
glutaredoxin 2Francci3_04831202regulatory protein GntRFrancci3_3218690Rhodanese-like Francci3_2753721
translation elongation factor TuFrancci3_05801179CRISPR-associated proteinFrancci3_3346680ThiolaseFrancci3_2502718
thioredoxin Francci3_45371165hypothetical proteinFrancci3_1874678response regulator receiverFrancci3_0120715
cytochrome P450Francci3_44641164hypothetical proteinFrancci3_1935672hypothetical proteinFrancci3_0498705
hypothetical proteinFrancci3_25821156IS630 family transposaseFrancci3_1872670DNApolymeraseIIIsubunitalphaFrancci3_4168703
hypothetical proteinFrancci3_15341106globin Francci3_2581663hypothetical proteinFrancci3_0037693
protein of unknown functionFrancci3_14061054hypothetical proteinFrancci3_4127657hypothetical proteinFrancci3_3241684
Vesicle-fusing ATPaseFrancci3_26301041thioredoxin Francci3_453765330SribosomalproteinS6Francci3_4522683
HesB/YadR/YfhFFrancci3_31211032hypothetical proteinFrancci3_0066644putative hydrolaseFrancci3_2567682
hypothetical proteinFrancci3_05321022Alcohol dehydrogenaseFrancci3_1544644transposase IS116/IS110Francci3_2124681
acyl transferase region Francci3_09911015hypothetical proteinFrancci3_2440642hypothetical proteinFrancci3_1807675
Superoxide dismutaseFrancci3_28171013Tetratricopeptide TPR_4Francci3_1951639hypothetical proteinFrancci3_1805675
hypothetical proteinFrancci3_21851007hypothetical proteinFrancci3_0227635hypothetical proteinFrancci3_2364675
hypothetical proteinFrancci3_43431006hypothetical proteinFrancci3_2315634hypothetical proteinFrancci3_2380671
serine/threonine kinaseFrancci3_4051989hypothetical proteinFrancci3_4019633response regulator receiverFrancci3_4048670
acyl-CoA dehydrogenase Francci3_1000989hypothetical proteinFrancci3_0949633putative O-methyltransferaseFrancci3_0204670
conserved hypothetical proteinFrancci3_0096986serine phosphatase Francci3_3249632channel proteinFrancci3_3898669
hypothetical proteinFrancci3_3886983Amino acid adenylationFrancci3_2459632hypothetical proteinFrancci3_2032667
Rhodanese-like Francci3_2753982transposase IS116/IS110Francci3_2124630hypothetical proteinFrancci3_1459664
hypothetical proteinFrancci3_4042973hypothetical proteinFrancci3_3417628flavoproteinFrancci3_1816662
hypothetical proteinFrancci3_3999971Antibiotic biosynthesisFrancci3_0875626hypothetical proteinFrancci3_0160660
protein of unknown functionFrancci3_2628958protein of unknown functionFrancci3_1406621AMP-dependent synthetaseFrancci3_1806659
LuxR family regulatorFrancci3_3253958hypothetical proteinFrancci3_3247621serine/threonine protein kinaseFrancci3_3395659
50SribosomalproteinL24Francci3_0593944hypothetical proteinFrancci3_2943620hypothetical proteinFrancci3_4161655
ribosomal protein S2Francci3_3581936transcription factor WhiBFrancci3_3790618hypC/hupFFrancci3_1946655
hypothetical proteinFrancci3_2736934hypothetical proteinFrancci3_3997618hypothetical proteinFrancci3_0494655
hypothetical proteinFrancci3_2269932transcriptional regulatorFrancci3_4158614transcriptional regulatorFrancci3_0985654
hypothetical proteinFrancci3_2809929hypothetical proteinFrancci3_2184610Excisionase/Xis, DNA-bindingFrancci3_1856653
acyl-CoA dehydrogenase-likeFrancci3_0053915hypothetical proteinFrancci3_0054608phosphohydrolaseFrancci3_1134648
Antibiotic biosynthesisFrancci3_0875911CRISPR-associated proteinFrancci3_0023608SsgAFrancci3_3418646
2-oxoacid oxidoreductaseFrancci3_3248906RecombinaseFrancci3_2373607major facilitator MFS_1Francci3_2752643
translationinitiationfactorIF-1Francci3_0605904CRISPR-associated proteinFrancci3_3345606Inorganic diphosphataseFrancci3_4310636
electron transfer flavoproteinFrancci3_3659889hypothetical proteinFrancci3_2219606hypothetical proteinFrancci3_1032636
hypothetical proteinFrancci3_4326884hypothetical proteinFrancci3_3299605DNA-directed RNA polymeraseFrancci3_3194635
50SribosomalproteinL33Francci3_0563880LuxR family regulatorFrancci3_3253604chaperoninGroELFrancci3_4398635
hypothetical proteinFrancci3_3625856hypothetical proteinFrancci3_2101604UspAFrancci3_2760633
Cytochrome-c oxidaseFrancci3_2009855transcriptional regulator Francci3_1674600Aldehyde dehydrogenaseFrancci3_2944632
GrpE proteinFrancci3_4353846transcriptional regulatorFrancci3_0908596hypothetical proteinFrancci3_1014631

1 Gene annotations and locus tag numbers are colored based on their presence in all three samples (bold), in the 3dN2 and 5dNH4 samples (italic), in the 3dN2 and 3dNH4 samples (underscore), in the 3dNH4 and 5dNH4 samples (italic/underscore), and in one of the three samples (normal font).

2 RPKM (Reads per Kilobase Million) = (# reads per ORF)/(size of ORF in kilobases × millions of reads in the dataset).

The top 100 highly expressed coding ORFs predicted by RPKM values 1 Gene annotations and locus tag numbers are colored based on their presence in all three samples (bold), in the 3dN2 and 5dNH4 samples (italic), in the 3dN2 and 3dNH4 samples (underscore), in the 3dNH4 and 5dNH4 samples (italic/underscore), and in one of the three samples (normal font). 2 RPKM (Reads per Kilobase Million) = (# reads per ORF)/(size of ORF in kilobases × millions of reads in the dataset). Comparison of the top 100 gene lists with each other (color coded in Table 2) and construction of heat maps of all genes revealed that overall gene expression varied more with culture age (three versus five days) than culture condition (+/- NH4+), with 3dNH4 and 3dN2 clustering before the 5dNH4 sample (Figure 1). Gene dendrograms (left side of the figure) gave five clusters of genes (Groups I through V) that had within-group expression profiles consistent among the three culture conditions tested. The genes in each cluster are listed in Additional File 1: Gene_list.xls.
Figure 1

Heat map representation of pair-wise gene expression in each sample. The dendrogram at the top of the figure indicates relatedness of the three samples based on overall gene expression values. The dendrogram on the left side of the figure orders genes into groups based on the divergence of expression values among the three samples. The colors display gene expression variance: red indicates a higher gene expression, green indicates lower expression and black indicates the median value. This figure was generated using a log scale of RPKM values.

Heat map representation of pair-wise gene expression in each sample. The dendrogram at the top of the figure indicates relatedness of the three samples based on overall gene expression values. The dendrogram on the left side of the figure orders genes into groups based on the divergence of expression values among the three samples. The colors display gene expression variance: red indicates a higher gene expression, green indicates lower expression and black indicates the median value. This figure was generated using a log scale of RPKM values. Group I genes are clearly down-regulated in 3dNH4 cells; these include 30 transporter related genes, five diguanylate cyclases and an array of putative N-controlled proteins such as assimilatory nitrate reductase, adenosine deaminase, allantoinase and nitrogen fixation (nif) genes in addition to 252 hypothetical proteins. Group II genes are up regulated in 3dN2 cultures and include most of the nif genes, genes involved in sulfur metabolism and iron-sulfur protein synthesis, cell division proteins and hydrogenase synthesis. The 3dN2 culture was prepared with a modified iron stock containing a higher concentration of iron sulphate and sodium molybdate [24]. We cannot rule out that an increase in iron-sulfur protein synthesis may be related to the increase in iron sulphate to the medium although it is more likely to be related to an increased demand for iron and molybdenum. Eight phage integrases were also present in Group II, which was the highest number of integrases present in any of the five groups. Group III contains genes that have relatively more transcripts in 5dNH4 cells; these include a larger proportion of hypothetical protein ORFs (523 ORFs) than were present in the other four groups (average of ~200 ORFs per group). All of the annotated excisionase/Xis ORFs were present in the Group III list, suggesting that phage-related excisionases are being transcribed more in the 5dNH4 sample than in the other conditions. Group IV genes were more abundantly transcribed in the 3dNH4+ sample including several sigma factors; this group also had the fewest transposase ORFS (2 ORFs). Group V contains ORFs more highly expressed in younger cultures. ORFs in this grouping include 17 ribosomal protein ORFs, and a majority of the glycolytic enzymes. As expected, nif ORFs were more highly expressed in the 3dN2 sample, with numerous vesicles present, than in the 3dNH4 sample and were in Group II on the heat map. The 5dNH4 culture also had nif expression above that detected in the 3dNH4 culture. Three nif ORFs were not significantly expressed in the 5dNH4 sample over the 3dNH4 sample as predicted by a Kal's ztest p value [25] (Table 3). On the other hand, the genes for the core nitrogenase components nitrogenase reductase (nifH), and nitrogenase alpha and beta chains (nifKD) were upregulated in the 3dN2 sample, and were cotranscribed to similar extents within individual cultures, suggesting that they exist in an operon independent from the rest of the nif cluster. An intergenic space consisting of 208 nucleotides between these three ORFs and the rest of the cluster supports this analysis. The presence of nif transcripts in all cell types, even where ammonia should still be in excess, is in concert with the heterogeneous nature of the frankial growth habit, where mycelia develop microsites that are potentially nutrient deficient or microaerobic due to adjoining cell populations. The 5dNH4 cells are most likely depleted for combined nitrogen and, indeed, a few vesicles can be observed in older cultures. This observation highlights a fundamental problem with the mRNA deep sequencing of a Frankia culture where different cell physiologies can skew average gene expression in a culture. Apart from isolated vesicles [26] that are unlikely to give a sufficient quantity of mRNA for second generation sequencing technologies, long-read, single molecule sequencing techniques run in parallel could specifically sequence the transcriptome of distinct cell morphologies in a pure culture as was recently done with Vibrio cholerae [27].
Table 3

Fold changes of nif cluster ORF expression levels1

Feature IDAnnotation5dNH4 vs 3dNH43dN2 vs 3dNH43dN2 vs 5dNH4
Francci3_4473thiamine pyrophosphate enzyme-like TPP-binding1.281.891.48
Francci3_4474pyruvate flavodoxin/ferredoxin oxidoreductase-like1.601.931.20
Francci3_4475aminotransferase, class V2.901.520.90
Francci3_4476UBA/THIF-type NAD/FAD binding fold1.20*2.081.73
Francci3_4477HesB/YadR/YfhF2.092.000.04
Francci3_4478nitrogenase cofactor biosynthesis protein NifB1.352.171.61
Francci3_4479NifZ0.541.452.23
Francci3_4480nitrogen fixation protein NifW2.492.140.16*
Francci3_4481protein of unknown function DUF6832.811.750.61
Francci3_4482protein of unknown function DUF2690.23*1.441.77
Francci3_4483Dinitrogenase iron-molybdenum cofactor biosynthesis1.822.031.12*
Francci3_4484nitrogenase molybdenum-iron cofactor biosynthesis protein NifN2.551.780.43
Francci3_4485nitrogenase MoFe cofactor biosynthesis protein NifE1.471.921.31
Francci3_4486nitrogenase molybdenum-iron protein beta chain1.16*2.402.08
Francci3_4487nitrogenase molybdenum-iron protein alpha chain1.622.941.82
Francci3_4488nitrogenase iron protein1.343.712.77

1Fold changes calculated as quotients of RPKM values

* Insignificant p value as determined by Kal's ztest.

Fold changes of nif cluster ORF expression levels1 1Fold changes calculated as quotients of RPKM values * Insignificant p value as determined by Kal's ztest.

Insertion Sequences

Recent studies on Frankia proteomes have indicated the presence of several transposases in CcI3 grown in culture and in symbiosis [28], raising the question of how IS elements behave in cultured CcI3 cells. Given the number of transposase ORFs in the CcI3 genome (148 complete plus 53 fragments identified by PSI-BLAST analysis [2]), mRNA deep sequencing provides an efficient method of quantifying their behavior in cultures grown under different conditions. RPKM values for the transposase ORFs were plotted against the locations of IS elements in strain CcI3 (Figure 2; [3]). Additional files 2, 3, 4, 5, 6 and 7 list the calculated expression data for the transposase ORFs. Transposase transcripts were generally more abundant than the transcriptome's median RPKM value (dashed line; values respective of sample) throughout the genome. The visual representation of transcript abundance in Figure 2 indicates that transposase ORFs were overall more highly expressed in older cultures and, to a lesser extent, in N2 fixing cells than in younger, nutrient sufficient cultures. Seventy-three transposase ORFs in the 5dNH4 sample were more highly expressed with respect to the 3dNH4 sample (Figure 2; Additional file 8: SNP_call_list.xls). Only 29 transposase ORFs were shown statistically to have higher expression in 3dNH4 than in 5dNH4. A similar trend was noticed in the 3dN2 vs 3dNH4 sample, with 91 transposase ORFs having statistically significant higher expression values in the 3dN2 sample. Many transposase ORFs had similar expression in the 3dN2 vs 3dNH4 and the 5dNH4 vs 3dNH4 comparisons. This is reflected in the ztest p values, as the 3dN2 vs 3dNH4 comparison had 50 changes with p values greater than 0.05 and the 5dNH4 versus 3dNH4 comparison had 48 changes with p values greater than 0.05. The majority of the insignificant p values in the comparisons are due to similarity of RPKM values.
Figure 2

Plot of transposase transcript RPKM values against previously determined transposase gene clusters. Scale on the bottom represents the genome coordinates in Mb. The red line indicates the density of transposase ORFs in a 250 kb moving window in the CcI3 genome. Blue bars indicate RPKM values of each transposase ORF in the indicated growth conditions. The dotted line indicates the median RPKM value for all ORFs within the sample. Grey boxes indicate previously determined active deletion windows [3]. An IS66 transposase transcript having an RPKM value greater than 1600 in all three samples is indicated with a broken line.

Plot of transposase transcript RPKM values against previously determined transposase gene clusters. Scale on the bottom represents the genome coordinates in Mb. The red line indicates the density of transposase ORFs in a 250 kb moving window in the CcI3 genome. Blue bars indicate RPKM values of each transposase ORF in the indicated growth conditions. The dotted line indicates the median RPKM value for all ORFs within the sample. Grey boxes indicate previously determined active deletion windows [3]. An IS66 transposase transcript having an RPKM value greater than 1600 in all three samples is indicated with a broken line. One IS66 transposase (Locus tag: Francci3_1864) near the 2 Mb region of the genome had an RPKM greater than 1600 in all samples. The majority of these reads were ambiguous. This transposase has five paralogs with greater than 99% nucleotide similarity, thereby accounting for ambiguous reads, so the elevated RPKM, while still high, is distributed among several paralogs. Other transposase ORFs with RPMK values higher than the median were more likely to be present in CcI3 deletion windows (gray boxes [3]) as determined by a Chi Square test against the likelihood that high RPKM transposase ORFs would exist in a similar sized region of the genome at random (p value = 1.32 × 10-7). This observation suggests that any transposase found in these windows is more likely to be transcribed at higher levels than transposases outside of these regions. The largest change in expression was found in an IS3/IS911 ORF between the 5dNH4 and 3dNH4 samples. This ORF (locus tag: Francci3_1726, near 1.12 Mb) was expressed eleven fold higher in the 5dNH4 sample than in the 3dNH4 sample. Five other IS66 ORFs are also highly expressed in 5dNH4 ranging from 4 fold to 5 fold higher expression than in the 3dNH4 sample. Eight IS4 transposases had no detected reads under the alignment conditions in each growth condition. These eight IS4 transposases are members of a previously described group of 14 paralogs that have nearly 99% similarity in nucleic acid sequence [3]. Parameters of the sequence alignment used allowed for ten sites of ambiguity, therefore discarding reads from eight of these 14 duplicates as too ambiguous to map on the reference genome. Graphic depictions of assembled reads derived from raw CLC workbench files show that the majority of reads for the six detected IS4 transposases mapped around two regions. Both of these regions contained one nucleotide difference from the other eight identical transposases. De novo alignment of the unmapped reads from each sample resulted in a full map of the highly duplicated IS4 transposase ORFs (data not shown). More globally, the 5dNH4 and 3dN2 samples had higher RPKM values per transposase ORF than in the 3dNH4 sample. The sum of the RPKM values among the transposase data set placed the 5dNH4 sample (34350 sum RPKM) and the 3dN2 (36150 sum RPKM) each nearly 30% higher than in 3dNH4 (26916 sum RPKM). The numbers of transposase genes classified as upregulated in the heat maps in Figure 1 include 44 in 3dN2 cells, 40 in 5dNH4 cells and only two in 3dNH4 cells. Twenty-eight were down regulated in the 3dNH4 cells as shown by the heat map analysis (Additional File 8: SNP_call_list.xls). These results suggest a relative quiescence of transposase ORFs during healthy growth, and a burst of transcription when cells are stressed. Mutagenesis of genes involved in general metabolic pathways in Escherichia coli has been shown to promote earlier transposition of an IS5 family insertion sequence [29]. Media supplements to the mutated cells were shown to delay transposition events, thereby showing general starvation responses were likely involved in increased IS element activity [29]. The expression of nif cluster genes in the 5dNH4 sample suggests that the ammonium content of the medium was depleted, or nutrient deprived microsites had developed among the mycelia. One of the highly expressed non-ribosomal ORFs is the pyrophosphohydrolase gene hisE (Francci3_4317), suggesting that the amino acid histidine is in short supply. Additionally, a serine O-acetyltransferase was highly expressed in 5dNH4 cells, indicating activity in the cysteine synthesis pathway. Higher expression of both ppx/gppA ORFs (Locus tags: Francci3_0472 and Francci3_3920) in the 5dNH4 sample suggests that the stringent response [30] is active in response to amino acid deprivation. Two ORFs annotated as (p)ppGpp synthetases (Locus tags: Francci3_1376 and Francci3_1377) were actually more highly expressed in 3dN2 and 3dNH4 cells than in 5dNH4 cells. Transcription of IS elements does not directly correlate to translation [31]. Many IS elements prevent their own transposition by requiring a -1 frame shift mutation in the transcript in order to express a functional transposase protein [32]. Since the specific methods of translational control used by Frankia IS elements are unknown, transcriptome data alone cannot be used as a proportional metric for transposition activity. On the other hand, recent proteomic studies on the CcI3 genome have confirmed that translation of many IS elements does occur in vivo and in symbiosis [16,33].

RT-qPCR confirmation of transposase transcription

Duplicated copies of highly similar transposase ORFs presented a problem in the analysis of transcript sequence data. To compare transcription frequencies of duplicated ORFs in different culture conditions, we used RT-qPCR to amplify conserved regions of eight duplicated transposase ORF families using primers designed to amplify conserved regions in each group. The duplicates had greater than 98% nucleotide similarity with each other. The glutamine synthetase I (glnA) gene was used to normalize expression data as previously described [34]. We included a five-day old nitrogen fixing (5dN2) condition in our assay to better estimate transposase ORF expression in two older culture conditions (5dN2 and 5dNH4). The results of the RT-qPCR assay confirmed the transcriptome sequence data (Figure 3). Comparing the five-day samples with three-day samples revealed an increase in transposase ORF transcription in older cultures in nearly all cases (Figure 3a). The only exception was in the case of the Tn3 family of transposases where transcription was predicted to be higher (fold change values less than one) at three days in both conditions. This may be due to transposition immunity described for other members of the Tn3 family [35]. Cross comparisons of NH4 and N2 samples revealed that nitrogen fixing cultures had more transposase transcripts from these duplicated families than from the ammonium cultures at both time points (Figures 3b and 3c). The most dramatic change in transcript quantity was found for the IS4 transposases' transcripts in the 5dN2 sample that were 7.4 fold higher than levels in the 3dNH4 sample. As the representative transposase ORFs chosen for the RT-qPCR analysis were families of duplicates, a direct comparison of RT-qPCR fold change to transcriptome RPKM values was difficult to make. Still, the results of this experiment confirm the general trend of transposase ORF transcription in Frankia sp. CcI3: older and nitrogen-deprived cultures had higher transcription of transposase ORFs.
Figure 3

Results of the RT-qPCR assay of highly duplicated transposase ORFs. All values indicate relative fold increase of transcription between samples standardized against glnA transcript levels. Panel A - fold changes of transcripts between five day and three day time points of cultures grown on N2 (black bars) or NH4 (gray bars). Panel B: fold changes of 5dN2 vs 3dNH4. Panel C: fold changes of 3dN2 vs 5dNH4 transposase ORFs respectively. The table (inset) indicates the copy number of duplicated transposase ORFs within each IS group as well as the locus tag of one of the representative members of that group. Error bars indicate standard error of triplicate reactions over each histogram.

Results of the RT-qPCR assay of highly duplicated transposase ORFs. All values indicate relative fold increase of transcription between samples standardized against glnA transcript levels. Panel A - fold changes of transcripts between five day and three day time points of cultures grown on N2 (black bars) or NH4 (gray bars). Panel B: fold changes of 5dN2 vs 3dNH4. Panel C: fold changes of 3dN2 vs 5dNH4 transposase ORFs respectively. The table (inset) indicates the copy number of duplicated transposase ORFs within each IS group as well as the locus tag of one of the representative members of that group. Error bars indicate standard error of triplicate reactions over each histogram.

Prophage and CRISPRs

ORFs with phage-related annotations were all more highly transcribed in the five-day sample with respect to both three-day samples (Table 4). Several ORFs annotated as phage integrases were expressed more than two-fold in the 5dNH4 sample when compared to the 3dNH4 sample. Comparisons of fold change among all three samples yielded many statistically insignificant differences as determined by a Kal's z-test suggesting that these ORFs are likely transcribed at similar rates regardless of culture conditions. A phage SPO1 DNA polymerase-related protein (Francci3_0075) was constitutively expressed in all three samples, and four phage resistance ORFs were up-regulated in the 5dNH4 sample. The latter include members of the pspA and pgl (Phi C31) families of phage resistance genes. Similar RPKM values between the two pgl ORFs in all three samples suggest that these ORFs are transcribed as an operon in CcI3.
Table 4

Fold changes of phage related ORFs1

Feature IDAnnotation5dNH4 vs 3dNH43dN2 vs 3dNH43dN2 vs 5dNH4
Francci3_0075phage SPO1 DNA polymerase-related protein-1.02*1.19*1.21*
Francci3_0114phage integrase-1.10*1.541.70
Francci3_0407phage integrase1.481.23-1.20
Francci3_0878phage integrase1.05*1.551.48
Francci3_1095phage integrase1.461.621.11
Francci3_1144phage integrase2.721.63-1.67
Francci3_1203phage integrase1.391.661.20
Francci3_1870phage integrase-like SAM-like3.051.53-2.00
Francci3_2053phage integrase-like SAM-like-1.321.832.43
Francci3_2147phage integrase1.921.52-1.26
Francci3_2228phage shock protein A, PspA2.471.43-1.73
Francci3_2304phage integrase1.60-1.24*-1.99
Francci3_2344phage integrase1.591.20*-1.32
Francci3_2443putative phage-related terminase large subunit1.341.841.37
Francci3_2954bacteriophage (phiC31) resistance gene PglY1.571.38-1.14*
Francci3_2955bacteriophage (phiC31) resistance gene PglZ1.471.22*-1.21*
Francci3_3052phage integrase1.07*1.431.34
Francci3_3350phage integrase1.421.741.22
Francci3_3388phage integrase1.551.841.19
Francci3_3390phage integrase1.89-1.09*1.73
Francci3_3532phage integrase2.021.48-1.36
Francci3_3535phage shock protein A, PspA-1.98-1.861.06*
Francci3_3583phage integrase-1.341.391.86
Francci3_3734phage integrase-like SAM-like1.341.621.21
Francci3_4274phage integrase4.521.60-2.83
Francci3_4338phage integrase-1.361.692.30

1Fold changes calculated as quotients of RPKM values

*Insignificant p value as determined by Kal's ztest.

Negative values indicate a fold reduction of expression in the reference (later) condition.

Fold changes of phage related ORFs1 1Fold changes calculated as quotients of RPKM values *Insignificant p value as determined by Kal's ztest. Negative values indicate a fold reduction of expression in the reference (later) condition. CcI3 has four putative CRISPR arrays, two of which are located near clusters of CAS ORFs (data obtained from CRISPRFinder [36]). Three of the CRISPR arrays had high numbers of repeat copies (38, 15 and 20 spacers per array ordered with respect to the OriC) making alignment of ambiguous sequence reads difficult. Even the shorter 36 bp read lengths of the 5dNH4 sample could not be reliably mapped across the arrays using the CLC Genome Workshop alignment programs. As a result, few reads mapped to the array region of the CRISPR islands and numerous deletions were predicted (Additional Files 2 through 7). The CAS ORF transcripts, by contrast, were detected in all three samples. Again, transcription was modestly higher in the 5dNH4 sample than in the 3dNH4 sample (Table 5). In this instance, the 3dN2 sample had nearly two fold higher expression of all CAS ORFs when compared with the 3dNH4 sample. Comparison of the 5dNH4 and 3dN2 samples revealed insignificant fold changes as determined by a Kal's ztest.
Table 5

Fold changes of CRISPR associated ORFs1

Feature IDAnnotation5dNH4 vs 3dNH43dN2 vs 3dNH43dN2 vs 5dNH4
Francci3_0017CRISPR-associated helicase Cas3, core1.311.391.06*
Francci3_0020CRISPR-associated protein, CT19752.991.63-1.84
Francci3_0021CRISPR-associated protein, CT19762.791.42-1.96
Francci3_0023CRISPR-associated protein Cas11.311.571.20
Francci3_0024CRISPR-associated protein, Cas21.161.311.13*
Francci3_3341CRISPR-associated helicase Cas3, core1.291.351.05*
Francci3_3344CRISPR-associated protein TM18011.04*1.451.39
Francci3_3345CRISPR-associated protein Cas41.971.36-1.44
Francci3_3346CRISPR-associated protein Cas11.141.291.13

1Fold changes calculated as quotients of RPKM values

*Insignificant p value as determined by Kal's ztest.

Negative values indicate a fold reduction of expression in the reference (later) condition.

Fold changes of CRISPR associated ORFs1 1Fold changes calculated as quotients of RPKM values *Insignificant p value as determined by Kal's ztest. Negative values indicate a fold reduction of expression in the reference (later) condition.

SNP detection

Given the base pair resolution of RNA sequencing, it is possible to identify single nucleotide polymorphisms (SNPs). Recent analysis of the bovine milk transcriptome revealed high fidelity of SNP calls derived from an RNA-seq experiment, though the authors caution that stringent criteria are necessary to reduce false positive calls [37]. Using similar filtering criteria, we identified 215 SNPs in the 5dNH4 sample, 365 SNPs in the 3dN2 sample and 350 SNPs in the 3dNH4 sample. Comparison of the SNP populations revealed that the 5dNH4 sample had substantially different SNP calls than the 3dN2 and 3dNH4 samples. Only 21 of the putative SNPs were found in all three samples (Table 6). Twelve of these common SNPs resulted in non-synonymous amino acid changes.
Table 6

Detected SNPs present in all three samples

Locus tagAnnotationPositionReference1Variants2Amino Acid Change
Francci3_0398putative DNA-binding protein452GG/AArg -> Gln
Francci3_1612NLP/P60356GG/AArg -> Gln
375AA/CGln -> His
Francci3_1959Transposase, IS1101109GG/AGly -> Asp
Francci3_2025Transposase, IS481GA/G-
91CC/TArg -> Cys
119TT/CVal -> Ala
Francci3_2063hypothetical310AA/CMet -> Leu
313CC/TPro -> Ser
333CC/T-
353AA/GGlu -> Gly
Francci3_3047Radical SAM93GG/C-
Francci3_3251putative signal transduction histidine kinase293TC/TVal -> Ala
Francci3_3418SsgA165CT/C-
Francci3_4082dnaE3579TC/T-
3601GG/AGlu -> Lys
Francci3_4107Integrase135CC/T-
Francci3_4124Recombinase162TT/A-
168CT/C-
Francci3_4157Hypothetical36CC/T-
49AA/GSer -> Gly

1 The nucleotide present in the reference genome sequence of Frankia sp. CcI3.

2 The predicted allelic variants for the reference position nucleotide. The most common polymorphic nucleotide is listed first in the proportion.

Detected SNPs present in all three samples 1 The nucleotide present in the reference genome sequence of Frankia sp. CcI3. 2 The predicted allelic variants for the reference position nucleotide. The most common polymorphic nucleotide is listed first in the proportion. There are several possibilities that may explain the variance of SNP content between the 5dNH4 sample and the two three day samples. The age of the culture is a possible, yet unlikely, contributor to a significantly different SNP pattern. Frankia strains are maintained by bulk transfer of cells since derivation from single colonies is problematical due to the hyphal habit of growth. Thus, over time, SNPs likely arise spontaneously. Another possibility is that errors are incorporated into the mRNA-seq libraries resulting in false positive SNPs. The Superscript III© reverse transcriptase used in the first strand cDNA synthesis was derived from a MML virus [38] and has an error rate of approximately 3.0 × 10-5 errors per base [39]. Therefore, only SNPs detected in all three samples with high coverage and multiple variant copies were likely true positive SNPs.

Conclusions

We deep-sequenced dscDNA libraries derived from three culture conditions of Frankia sp. CcI3. Overall gene expression varied more as a function of culture age than as a function of nitrogen deprivation, likely because the cell population has fewer actively growing cells at the fifth day of culture and those remaining are adapting to nutrient deprivation. In two limited nutrient environments, transposase ORFs were relatively more highly expressed than in younger ammonium grown cells. A RT-qPCR assay designed to quantify highly duplicated transposase ORFs supported the data from the mRNA-seq experiment. These results, in tandem with discovery of putative SNPs, suggests that the IS element laden CcI3 genome is in constant flux within the relatively mundane conditions of a culture flask.

Methods

Culture media and conditions

Frozen stocks of Frankia sp. strain CcI3, were suspended in duplicate in 200 ml of Frankia Defined Minimal media (FDM) containing 45 mM sodium pyruvate and 9.3 mM ammonium chloride in 500 ml flasks [40]. Cells were grown at 30°C for three or five days on FDM with or without (N2 fixing cells) ammonium. Nitrogen fixing cultures were prepared using a modified iron stock as previously described [24]. Given the difficulty in quantifying viable Frankia cells in culture, a total of three ml of gravity-settled cells were harvested per culture flask for RNA extraction.

RNA extraction

Frankia cells were processed using a ZR Fungal/Bacterial RNA MiniPrep™ kit from Zymo Research© (http://www.zymoresearch.com) using the manufacturer's recommendations. To completely remove genomic DNA (gDNA) contamination from the RNA extraction, we performed the in-column DNAse I optional step using Amplification grade DNAse I (Invitrogen™, http://www.invitrogen.com). DNAseI incubation times were extended to 30 minutes at 37°C in order to completely remove gDNA from the sample. A final elution volume of 15 μl of RNAse free water was used instead of the recommended 6 μl elution volume. Only RNA samples with a 260/280 nm wavelength ratio above 2.00 were used for library construction and RT-qPCR assays. In order to enrich mRNA content for generating a cDNA library, we used the MICROBExpress™ Bacterial mRNA Enrichment Kit (Ambion Inc., http://www.ambion.com). The manufacturer's website specifies that the oligonucleotide sequence used by the kit should anneal to the 16S and 23S rRNA sequences of many eubacterial species including Frankia sp. Approximately 10 μg of Frankia total RNA in each condition was processed using the kit per the manufacturer's instructions. This procedure yielded 2 - 3.75 μg of RNA after depletion for each sample. Subsequent gel analysis and sequencing data revealed substantial 16S and 23S rRNA within the sample, suggesting only partial depletion of rRNA transcripts. Samples were nonetheless prepared using the depletion kit in order to minimize variability due to differential handling in the experiment.

Complementary DNA library generation

One microgram of processed Frankia RNA was used in an Illumina mRNA-seq kit. The poly-dT pulldown of polyadenylated transcripts was omitted, and the protocol was followed beginning with the mRNA fragmentation step. A SuperscriptIII© reverse transcriptase was used instead of the recommended SuperscriptII© reverse transcriptase (Invitrogen™). This substitution was made in light of the higher G+C% of Frankia sp. transcripts (71% mol G+C) and the ability of the SuperscriptIII© transcriptase to function at temperatures greater than 45°C. Because of this substitution, the first strand cDNA synthesis stage of the protocol could be conducted at 50°C instead of 42°C. Since a second-strand cDNA synthesis was performed, the cDNA library was agnostic with respect to the strandedness of the initial mRNA. The final library volumes were 30 μl at concentrations of 40 - 80 ng/μl as determined by Nanodrop spectrophotometer.

Library clustering and Illumina platform sequencing

Prior to cluster generation, cDNA libraries were analyzed using an Agilent© 2100 Bioanalyzer (http://www.chem.agilent.com) to determine final fragment size and sample concentration. The peak fragment size was determined to be approximately 200 +/- 25 bp in length for each sample. Twenty nmoles of each cDNA library were prepared using a cluster generation kit provided by Illumina Inc. The single-read cluster generation protocol was followed. Final cluster concentrations were estimated at 100,000 clusters per tile for the five day sample and 250,000 clusters per tile for the two three day samples on each respective lane of the sequencing flow-cell. An Illumina® Genome Analyzer IIx™ was used in tandem with reagents from the SBS Sequencing kit v. 3 in order to sequence the cDNA clusters. A single end, 35 bp internal primer sequencing run was performed as per instructions provided by Illumina®. Raw sequence data was internally processed into FASTQ format files which were then assembled against the Frankia sp. CcI3 genome [Genbank: CP000249] using the CLC Genomics Workbench™ software package distributed by CLC Bio©. Frankia sp. CcI3 has a several gene duplicates. This made the alignment of the short reads corresponding to the gene duplicates difficult. Reads could only be mapped to highly duplicated ORFs by setting alignment conditions to allow for 10 ambiguous map sites for each read. In the case of a best hit "tie," an ambiguous read was mapped to a duplicated location at random. Without this setting, more than 20 ORFs would not have been detected by the alignment program simply due to nucleotide sequence similarity. To standardize gene expression calculations among different samples, the CLC Genomic Workbench software calculates an expression value termed "reads per kilobase million" (RPKM). This calculation incorporates variable gene length in the gene expression ratio, and the total number of reads obtained from a sequencing run [41]. The equation used to determine RPKM values is as follows: The RPKM value allows comparisons between datasets containing variable numbers of reads as well as expression of genes with varying lengths. Because of the disparate quantities of rRNA reads among the three samples, we removed all non-coding RNA (ncRNA) reads from the data set before calculating RPKM values. This ensures that the reads from the 5dNH4 sample, which had the lowest number of ncRNA reads, were not overrepresented. Comparisons of gene expression were tested using Kal's Z-test [25]. Heat maps were generated using the Cluster 3.0 command line program (http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/software.htm). Datasets were normalized and median subtracted prior to map generation. Maps were viewed using Java Treeview [42]. Potential SNPs were filtered using the following criteria: (1) reads containing putative SNPs were discarded if they had an average quality score of less than 15; (2) the polymorphic base within the read had to have a quality score above 20; (3) at least 10× coverage of the SNP position was required; (4) the SNP had to be present in 25% of the reads at that location. Raw sequence reads and calculated RPKM values for each CcI3 ORF were uploaded to the Gene Expression Omnibus database at NCBI (http://www.ncbi.nlm.nih.gov/projects/geo) with the accession number GSE30680.

RT-qPCR assays

The nucleotide sequences for the target transposase ORFs in Frankia strain CcI3 [genbank: CP000249] were retrieved from Genbank. Primers were designed using the Primer3 webtool (http://frodo.wi.mit.edu/primer3/) with settings to generate primers with a melting temperature of ~60°C. Due to the limitations of extension time in quantitative polymerase chain reactions (qPCR), primers were designed to amplify less than 200 bp of sequence when possible. Stocks of Frankia sp. CcI3 cells were grown in four culture conditions that included two time points and two medium types. Three of the conditions mirrored those used in the mRNA-seq experiment (3dN2, 3dNH4 and 5dNH4). A fourth condition, consisting of cells grown in nitrogen fixing medium for five days (5dN2), was also used. Cells were harvested and RNA was purified in the same manner as used in the mRNA-seq experiment. Approximately one micro-gram of RNA from each sample was used in subsequent reverse transcriptase reactions. Complementary DNA was synthesized using the SuperscriptIII© reverse transcriptase with gene specific primers (~100 nM final concentrations per reaction mix). Synthesis of the first strand was carried out at 55°C for 50 minutes with a five minute denature step at 80°C. RT reactions were diluted ten-fold with sterile water after denaturation. All qPCR experiments were performed using the Bio-Rad™ SsoFast© Evagreen qPCR 2X master mix. Reaction volumes were reduced to 12.5 μl. A Bio-Rad™ iQ5 real-time thermocycler was used to quantify reactions. Antibody denaturing of the SsoFast polymerase was performed at 95°C for 1.5 minutes immediately prior to any cycling step. This was followed by one 98°C denaturation for 2 minutes. Temperature cycling consisted of the following: 35 cycles of 98°C for 10 seconds then 55°C for 15 seconds and finally 65°C for 15 seconds. Melt curves (to determine if there were multiple PCR amplicons) were constructed by heating final amplified reactions from 65°C to 95°C for 10 seconds in single degree stepwise fashion. Primer efficiencies were calculated from readings derived from a standard curve of known DNA concentrations. Relative expression levels of target genes were calculated using the Pfaffl standardization as previously described [34]. The glutamine synthetase I gene (glnA) was used as a reference gene to standardize relative expression in the four samples.

Authors' contributions

DMB created the RNA-seq libraries. DMB and DRB planned the experiments, analyzed the data and wrote the manuscript. Both authors have read and approved of the final manuscript

Additional file 1

Gene lists for heatmap clusters. List of ORFs segregated as clusters from the heat map figure (Figure 1). Click here for file

Additional file 2

3dN2 sample dataset statistics. Tabular output of CLC Genome Workbench software for the 3dN2 sample. Click here for file

Additional file 3

3dNH4 sample dataset statistics. Tabular output of CLC Genome Workbench software for the 3dNH4 sample. Click here for file

Additional file 4

5dNH4 sample dataset statistics. Tabular output of CLC Genome Workbench software for the 5dNH4 sample. Click here for file

Additional file 5

Pairwise comparison of three day samples. Comparison of RPKM values from the 3dNH4 and 3dN2 samples for annotated Frankia sp. strain CcI3 ORFs. Click here for file

Additional file 6

Pairwise comparison of 3dN2 with 5dNH4. Comparison of RPKM values from the 5dNH4 and 3dN2 samples for annotated Frankia sp. strain CcI3 ORFs. Click here for file

Additional file 7

Pairwise comparison of the two NH4 grown cells. Comparison of RPKM values from the 3dNH4 and 5dNH4 samples for annotated Frankia sp. strain CcI3 ORFs. Click here for file

Additional file 8

SNP calling and filtering datasets. Excel worksheets containing raw SNP calling data from all three RNA-seq experiments. Click here for file
  36 in total

1.  Riboswitches control fundamental biochemical pathways in Bacillus subtilis and other bacteria.

Authors:  Maumita Mandal; Benjamin Boese; Jeffrey E Barrick; Wade C Winkler; Ronald R Breaker
Journal:  Cell       Date:  2003-05-30       Impact factor: 41.582

Review 2.  The role of small RNAs in quorum sensing.

Authors:  Michal Bejerano-Sagie; Karina Bivar Xavier
Journal:  Curr Opin Microbiol       Date:  2007-03-26       Impact factor: 7.934

3.  A cell-free system of Tn3 transposition and transposition immunity.

Authors:  T Maekawa; K Yanagihara; E Ohtsubo
Journal:  Genes Cells       Date:  1996-11       Impact factor: 1.891

4.  Dynamics of gene expression revealed by comparison of serial analysis of gene expression transcript profiles from yeast grown on two different carbon sources.

Authors:  A J Kal; A J van Zonneveld; V Benes; M van den Berg; M G Koerkamp; K Albermann; N Strack; J M Ruijter; A Richter; B Dujon; W Ansorge; H F Tabak
Journal:  Mol Biol Cell       Date:  1999-06       Impact factor: 4.138

Review 5.  Translational frameshifting in the control of transposition in bacteria.

Authors:  M Chandler; O Fayet
Journal:  Mol Microbiol       Date:  1993-02       Impact factor: 3.501

6.  Translational control in production of transposase and in transposition of insertion sequence IS3.

Authors:  Y Sekine; N Eisaki; E Ohtsubo
Journal:  J Mol Biol       Date:  1994-02-04       Impact factor: 5.469

7.  Growth phase-coupled changes of the ribosome profile in natural isolates and laboratory strains of Escherichia coli.

Authors:  A Wada; R Mikkola; C G Kurland; A Ishihama
Journal:  J Bacteriol       Date:  2000-05       Impact factor: 3.490

8.  Isolation and nitrogenase activity of vesicles from Frankia sp. strain EAN1pec.

Authors:  L S Tisa; J C Ensign
Journal:  J Bacteriol       Date:  1987-11       Impact factor: 3.490

9.  Cloning and overexpression of Moloney murine leukemia virus reverse transcriptase in Escherichia coli.

Authors:  M L Kotewicz; J M D'Alessio; K M Driftmier; K P Blodgett; G F Gerard
Journal:  Gene       Date:  1985       Impact factor: 3.688

10.  Isolation and nitrogen-fixing activity of Frankia sp. strain CpI1 vesicles.

Authors:  N A Noridge; D R Benson
Journal:  J Bacteriol       Date:  1986-04       Impact factor: 3.490

View more
  10 in total

1.  Pb2+ tolerance by Frankia sp. strain EAN1pec involves surface-binding.

Authors:  Teal Furnholm; Medhat Rehan; Jessica Wishart; Louis S Tisa
Journal:  Microbiology (Reading)       Date:  2017-04-26       Impact factor: 2.777

2.  What stories can the Frankia genomes start to tell us?

Authors:  Louis S Tisa; Nicholas Beauchemin; Maher Gtari; Arnab Sen; Luis G Wall
Journal:  J Biosci       Date:  2013-11       Impact factor: 1.826

Review 3.  Recent advances in actinorhizal symbiosis signaling.

Authors:  Emilie Froussart; Jocelyne Bonneau; Claudine Franche; Didier Bogusz
Journal:  Plant Mol Biol       Date:  2016-02-12       Impact factor: 4.076

Review 4.  Biological nitrogen fixation in non-legume plants.

Authors:  Carole Santi; Didier Bogusz; Claudine Franche
Journal:  Ann Bot       Date:  2013-03-10       Impact factor: 4.357

5.  Contrasted reactivity to oxygen tensions in Frankia sp. strain CcI3 throughout nitrogen fixation and assimilation.

Authors:  Faten Ghodhbane-Gtari; Karima Hezbri; Amir Ktari; Imed Sbissi; Nicholas Beauchemin; Maher Gtari; Louis S Tisa
Journal:  Biomed Res Int       Date:  2014-05-28       Impact factor: 3.411

6.  Omics profiles used to evaluate the gene expression of Exiguobacterium antarcticum B7 during cold adaptation.

Authors:  Hivana P M B Dall'Agnol; Rafael A Baraúna; Pablo H C G de Sá; Rommel T J Ramos; Felipe Nóbrega; Catarina I P Nunes; Diego A das Graças; Adriana R Carneiro; Daniel M Santos; Adriano M C Pimenta; Marta S P Carepo; Vasco Azevedo; Vivian H Pellizari; Maria P C Schneider; Artur Silva
Journal:  BMC Genomics       Date:  2014-11-18       Impact factor: 3.969

7.  Evolution and Functional Analysis of orf1 Within nif Gene Cluster from Paenibacillus graminis RSA19.

Authors:  Qin Li; Xiaomeng Liu; Haowei Zhang; Sanfeng Chen
Journal:  Int J Mol Sci       Date:  2019-03-06       Impact factor: 5.923

8.  A Stable Genetic Transformation System and Implications of the Type IV Restriction System in the Nitrogen-Fixing Plant Endosymbiont Frankia alni ACN14a.

Authors:  Isaac Gifford; Summer Vance; Giang Nguyen; Alison M Berry
Journal:  Front Microbiol       Date:  2019-09-24       Impact factor: 5.640

9.  Metaproteomics reveals abundant transposase expression in mutualistic endosymbionts.

Authors:  Manuel Kleiner; Jacque C Young; Manesh Shah; Nathan C VerBerkmoes; Nicole Dubilier
Journal:  MBio       Date:  2013-06-18       Impact factor: 7.867

10.  Host Plant Compatibility Shapes the Proteogenome of Frankia coriariae.

Authors:  Amir Ktari; Abdellatif Gueddou; Imen Nouioui; Guylaine Miotello; Indrani Sarkar; Faten Ghodhbane-Gtari; Arnab Sen; Jean Armengaud; Maher Gtari
Journal:  Front Microbiol       Date:  2017-05-02       Impact factor: 5.640

  10 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.