| Literature DB >> 27759035 |
Shengwei Hou1, Ulrike Pfreundt1, Dan Miller2, Ilana Berman-Frank2, Wolfgang R Hess1.
Abstract
Metatranscriptomic differential RNA-Seq (mdRNA-Seq) identifies the suite of active transcriptional start sites at single-nucleotide resolution through enrichment of primary transcript 5' ends. Here we analyzed the microbial community at 45 m depth at Station A in the northern Gulf of Aqaba, Red Sea, during 500 m deep mixing in February 2012 using mdRNA-Seq and a parallel classical RNA-Seq approach. We identified promoters active in situ for five different pico-planktonic genera (the SAR11 clade of Alphaproteobacteria, Synechococcus of Cyanobacteria, Euryarchaeota, Thaumarchaeota, and Micromonas as an example for picoeukaryotic algae), showing the applicability of this approach to highly diverse microbial communities. 16S rDNA quantification revealed that 24% of the analyzed community were group II marine Euryarchaeota in which we identified a highly abundant non-coding RNA, Tan1, and detected very high expression of genes encoding intrinsically disordered proteins, as well as enzymes for the synthesis of specific B vitamins, extracellular peptidases, carbohydrate-active enzymes, and transport systems. These results highlight previously unknown functions of Euryarchaeota with community-wide relevance. The complementation of metatranscriptomic studies with mdRNA-Seq provides substantial additional information regarding transcriptional start sites, promoter activities, and the identification of non-coding RNAs.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27759035 PMCID: PMC5069720 DOI: 10.1038/srep35470
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Comparative overview of the transcript library preparation for mdRNA-Seq and RNA-Seq from the microbial community.
Seawater was pre-filtered in the shade through a 20 μm mesh to keep out larger material and then onto polyethersulfone filters of 0.45 μm pore size. After isolation of total RNA, RNase treatment and cDNA synthesis, the samples were paired-end sequenced on an Illumina HiSeq 2000 platform with a read-length of 100 nt. An example of the theoretical read distribution over a protein-coding gene and its 5′ UTR is given at the bottom. The arrows mark the TSSs. Only steps important for understanding are sketched; cDNA amplification steps were omitted. 5′ adapter sequences are indicated as “ACUG”.
Figure 2Relative abundance of microbial taxa determined by 16S amplicon sequencing.
Abundances are reported as percent of the total count of merged MiSeq reads. 16S sequences from chloroplasts of eukaryotic phytoplankton were not classified further (SILVA pipeline default). (a) Taxa abundances at phylum level. Only taxa with > 0.01% abundance were considered. (b) Abundances at family or the next available level. Only taxa with > 0.5% abundance were included.
BLASTN and BLASTX results (e-value cutoff 0.00001) for all non-ribosomal FW and RV reads, separately for both libraries.
| Total reads non-rRNA | Total reads with BLASTN/X hits | Bacteria | Eukaryota | Archaea | Viruses | Unclassified | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Total | % | Total | % | Total | % | Total | % | Total | % | Total | % | |||
| BLASTN | mdRNA-seq FW | 3,339,240 | 52.95 | 2,367,277 | 37.54 | 458,251 | 7.27 | 234,353 | 3.72 | 8,876 | 0.14 | 214,038 | 3.39 | |
| mdRNA-seq RV | 3,346,934 | 53.12 | 2,066,011 | 32.79 | 574,692 | 9.12 | 142,641 | 2.26 | 8,288 | 0.13 | 156,845 | 2.49 | ||
| RNA-seq FW | 2,456,294 | 38.95 | 651,373 | 10.33 | 706,682 | 11.21 | 1,006,430 | 15.96 | 8,112 | 0.13 | 30,928 | 0.49 | ||
| RNA-seq RV | 2,059,792 | 32.69 | 539,054 | 8.56 | 750,628 | 11.91 | 681,418 | 10.81 | 8,324 | 0.13 | 32,310 | 0.51 | ||
| BLASTX | mdRNA-seq FW | 1,543,275 | 24.47 | 1,180,103 | 18.71 | 100,235 | 1.59 | 90,212 | 1.43 | 4,418 | 0.07 | 7,063 | 0.11 | |
| mdRNA-seq RV | 2,482,609 | 39.40 | 1,197,798 | 19.01 | 457,885 | 7.27 | 91,261 | 1.45 | 7,172 | 0.11 | 10,373 | 0.16 | ||
| RNA-seq FW | 914,685 | 14.50 | 313,715 | 4.97 | 333,315 | 5.29 | 175,119 | 2.78 | 5,426 | 0.09 | 5,841 | 0.09 | ||
| RNA-seq RV | 799,880 | 12.69 | 304,414 | 4.83 | 359,850 | 5.71 | 35,419 | 0.56 | 7,270 | 0.12 | 4,949 | 0.08 | ||
The number of reads and percentage of total non-ribosomal reads is given that were assigned to Bacteria, Eukaryota, Archaea or Viruses.
Figure 3Taxonomic assignments of metatranscriptomic reads.
FW (red rectangles) and RV (blue rectangles) reads of the different libraries were compared against NCBI nucleotide (nt) and protein (nr) databases using BLASTN (a) or BLASTX (b). The graphs are separated into eukaryotic (green), archaeal (orange), bacterial (blue) taxa, and other nodes (purple). The top-20 ranked taxonomic bins are shown as the percentage of total reads with a BLAST hit (sum of FW and RV reads). “Cellular organisms” is a node specification in NCBI taxonomy for all cellular life including Archaea, Bacteria and Eukaryota. The suffix “LCA” denotes that the reads were assigned to these nodes by the latest common ancestor algorithm, which moves reads with ambiguous assignments higher up in the phylogeny.
Figure 4The Synechococcus CC9605 phycobiliprotein gene cluster with 32 identified TSSs (indicated by green arrows) inferred from mdRNA-Seq coverage.
The TSS upstream of mpeB is amongst the most active promoters overall (position 59 in Table S1). There are 38 annotated genes in this region4142, of which 13 are of unknown function (unk1 – unk13). Roman numerals I to V indicate 5 major transcriptional units that encompass almost all protein-coding genes with a known function and that are associated with very strong TSSs. There is one additional TSS recruiting a similar number of mdRNA-Seq reads, mapping to position 425184r, driving the transcription of unk11 a gene of unknown function42. The coverage resulting from reads originating from mdRNA-Seq in FW orientation are shown in light red and the respective associated reads in RV orientation in yellow. Additional details, including the mapped RNA-Seq coverage can be found in Supplemental Dataset S1.
Protein-coding genes in Candidatus Pelagibacter sp. HTCC7211 recruiting highest read numbers in mdRNA-Seq or RNA-Seq.
| Gene name | Locus_tag | Start | End | Strand | TSS | RNA-Seq | Product |
|---|---|---|---|---|---|---|---|
| potD | PB7211_697 | 1265830 | 1266982 | − | 646 | spermidine/putrescine-binding periplasmic protein | |
| ybhL | PB7211_203 | 285807 | 286571 | + | 234 | integral membrane protein | |
| tauA | PB7211_601 | 704432 | 705401 | − | 440 | taurine transport system periplasmic protein | |
| trxB_1 | PB7211_981 | 473324 | 474270 | − | 209 | thioredoxin-disulfide reductase | |
| opuAC | PB7211_687 | 1212766 | 1213737 | − | 1176 | substrate-binding region of ABC-type glycine betaine transport system; also called TmoX, was among the 10 most highly expressed genes in metaproteomic analyses of samples collected from the Sargasso Sea | |
| amt | PB7211_1287 | 1226344 | 1227766 | + | 2751 | ammonium transporter | |
| hupA | PB7211_929 | 1129157 | 1129454 | − | 145 | non-specific DNA-binding protein HBsu | |
| PB7211_8 | PB7211_8 | 434503 | 434915 | − | 449 | ATP synthase subunit C, putative | |
| PB7211_1146 | PB7211_1146 | 381310 | 382633 | − | 655 | xanthine/uracil/vitamin C permease family protein | |
| PB7211_669 | PB7211_669 | 529616 | 530457 | + | 265 | bacteriorhodopsin | |
| groS | PB7211_1393 | 400887 | 401197 | − | 340 | chaperonin GroS | |
| PB7211_641 | PB7211_641 | 257224 | 257510 | − | 549 | conserved hypothetical protein | |
| PB7211_298 | PB7211_298 | 298651 | 299839 | + | 656 | trap dicarboxylate transporter - dctp subunit | |
| hflK | PB7211_216 | 1372461 | 1373566 | − | 631 | HflK protein | |
| PB7211_130 | PB7211_130 | 1302141 | 1303365 | + | 395 | Receptor family ligand binding domain protein | |
| PB7211_401 | PB7211_401 | 362111 | 362674 | + | 307 | acid tolerance regulatory protein actr | |
| acpP | PB7211_217 | 613491 | 613746 | + | 124 | acyl carrier protein | |
| ahcY | PB7211_1045 | 368453 | 369739 | + | 632 | adenosylhomocysteinase | |
| PB7211_767 | PB7211_767 | 1161417 | 1163849 | − | 257 | dimethylglycine dehydrogenase | |
| rpoZ | PB7211_373 | 739908 | 740423 | + | 222 | dna-directed rna polymerase omega subunit | |
| gidA | PB7211_100 | 225271 | 227142 | − | 74 | tRNA uridine 5-carboxymethylaminomethyl modification enzyme GidA | |
| yhdW | PB7211_1204 | 1014159 | 1015259 | − | 1791 | ABC transporter | |
| dnaK | PB7211_144 | 208983 | 210940 | + | 799 | chaperone protein DnaK | |
| glcB | PB7211_627 | 61961 | 64285 | + | 86.5 | malate synthase | |
| atpD | PB7211_587 | 338515 | 340096 | + | 166 | ATP synthase F1, beta subunit | |
| PB7211_885 | PB7211_885 | 1226899 | 1228008 | + | 153 | conserved hypothetical protein | |
| PB7211_1419 | PB7211_1419 | 904648 | 906787 | + | 48 | V-type H(+)-translocating pyrophosphatase | |
| ftsH | PB7211_115 | 1396942 | 1398849 | − | 25 | ATP-dependent Zn protease | |
| yjcG | PB7211_576 | 255410 | 257224 | − | 0 | Na+/solute symporter, Ssf family | |
| groL | PB7211_1188 | 399202 | 400863 | − | 0 | chaperonin GroL | |
| Tuf | PB7211_560 | 832167 | 833357 | + | 0 | translation elongation factor Tu | |
| ompA | PB7211_618 | 1400978 | 1401475 | − | 0 | OmpA family protein | |
| aprA | PB7211_563 | 1114126 | 1115985 | − | 0 | adenylylsulfate reductase, alpha subunit | |
| aldA | PB7211_1336 | 657904 | 659652 | + | 0 | acetaldehyde dehydrogenase II | |
| fusA | PB7211_234 | 845525 | 847603 | + | 0 | translation elongation factor G | |
| PB7211_21 | PB7211_21 | 1223550 | 1224404 | + | 0 | gxgxg motif-containing protein |
The respective number matching this criterion is in boldface and grey-boxed. Genes are ranked according to the reads associated with the respective TSS and then according to the sum of RNA-Seq FW and RV reads (RNA-Seq). The nt positions and gene IDs refer to Supplemental Dataset S2.
Figure 5Conserved secondary structure of the Candidatus Thalassoarchaea ncRNA 1 (Tan1).
Figure 6Transcriptome share of genes related to peptidases and carbohydrate-active enzymes.
The heat maps show the relative distribution of reads assigned to MEROPS75 peptidase families (a) and to CAZy76 carbohydrate-active enzyme classes/modules (b). Read counts were normalized by total number of reads assigned to each reference for each library. Only families or classes/modules with a minimum relative read count of 2% when added up across all references are shown. Rows were hierarchically clustered using the complete-linkage method according to pairwise euclidean distances. The percentages denote the fractions of reads assigned to MEROPS75 peptidases or CAZy76 enzymes, compared to all reads assigned to the corresponding reference. Families representing extracellular peptidases are highlighted by red rectangles. The percentages of reads assigned to extracellular peptidases compared to reads recruited by MEROPS peptidases are given on top.