Literature DB >> 31440544

Data on taxonomic annotation and diversity of 18S rRNA gene amplicon libraries derived from high throughput sequencing.

Takafumi Kataoka1, Ryuji Kondo1.   

Abstract

This Data in Brief article is a supporting information for the research article entitled "Protistan community composition in anoxic sediments from three salinity-disparate Japanese lakes" by Kataoka and Kondo (2019) [1]. Summary of 18S rRNA gene sequences originated from anoxic sediment of three lakes in two seasons using high throughput sequencing techniques (MiSeq, Illumina) was shown in this data article. Supergroup-level taxonomy was compared between the SILVA search for SILVA database and BLASTn search for the PR2 database. Alpha diversity was calculated in each sample, and beta-diversity was calculated among the six amplicon libraries. Partial sequence length between the primer set of 574*f and 1132R Hugerth et al., 2015 was compared between the forward read and the combined read.

Entities:  

Keywords:  18S rRNA gene; High throughput sequencing (HTS); MiSeq; Protists; V4–V5 hypervariable region

Year:  2019        PMID: 31440544      PMCID: PMC6699457          DOI: 10.1016/j.dib.2019.104213

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table Comparing methods of annotating taxonomic path for 18S rRNA gene sequence is valuable because sequence in public database is still insufficient for identifying diverse eukaryotic microbes. Information of partial sequence length between the forward- and reverse-primer is valuable for understanding protistan composition in natural environment where unknown microbes inhabit. Alpha and beta diversities of protistan genotypes in lacustrine sediments are rare example.

Data

Raw read from MiSeq was quality controlled and grouped into OTUs at 98% sequence similarity level, then OTUs that is constructed only one sequence (singleton) was removed (Table 1). Annotation method for taxonomic path for representative sequence of each OTU of 18S rRNA gene sequence was compared in order to clarify suitable method for identifying supergroup taxonomy (Table 2). Alpha diversity was compared by calculating rarefaction curve (Fig. 1) in each sample, and beta diversity was determined by calculating by similarity profile analysis of all samples (Fig. 2). Partial sequence length between the forward and reverse primers was compared between independently generated query sequences (Fig. 3).
Table 1

Summary of sequence read and OTU number before and after singleton was removed.

Hiruga1Hiruga2Suigetsu1Suigetsu2Biwa1Biwa2
Including all reads
Sequence read1195291574026376448948390826276815
OTU984108642639141413612
After removed singleton
Sequence read1192211571766361948815389041275292
OTU67686028125823562089
Number of singleton30822614513317851523
% singleton31.320.834.034.043.142.2
Table 2

Number of OTUs showing mismatch between a SINA search (the SILVA database ver. 132) and a BLASTn search (the PR2 database ver. 4.10.0) identification at supergroup taxonomy.

Number of OTUsSINA × SILVA identification
AlveolataAmoebozoaArchaeplastidaOpisthokontaRhizariaStramenopilesPicozoaCentrohelidaCryptophyceaeHaptophytaIncertaeSedisNAMAKO-1
BLASTn × PR2 identificationAlveolata621210238
Amoebozoa22201
Archaeplastida42255451220
Opisthokonta1387611341812
Rhizaria105113
Stramenopiles5745534
Hacrobia1134122117320
Apusozoa2929
Unknown321
Fig. 1

Rarefaction curves of 98% similarity-based-OTUs in each sample (A) including all reads and (B) with singleton reads removed.

Fig. 2

Similarity profile analysis to detect significant clusters (p < 0.05). Dissimilarity was calculated by relative abundance data of sequence reads using the Bray-Curtis index, and significantly distant samples were clustered using Ward's method.

Fig. 3

Partial sequence length between the primer sets, 574*f and 1132R [2], of sequences in the PR2 database to which OTU representatives received the best hit using a BLAST search. The labels Combined and Forward indicate the combined sequences yielded from both primers and single sequences yielded from the forward primer, respectively. The number on the top of each plot shows the number of sequences analysed. The bar in the box indicates the median value. The top and bottom of the boxes indicate the upper and lower quartiles, respectively.

Summary of sequence read and OTU number before and after singleton was removed. Number of OTUs showing mismatch between a SINA search (the SILVA database ver. 132) and a BLASTn search (the PR2 database ver. 4.10.0) identification at supergroup taxonomy. Rarefaction curves of 98% similarity-based-OTUs in each sample (A) including all reads and (B) with singleton reads removed. Similarity profile analysis to detect significant clusters (p < 0.05). Dissimilarity was calculated by relative abundance data of sequence reads using the Bray-Curtis index, and significantly distant samples were clustered using Ward's method. Partial sequence length between the primer sets, 574*f and 1132R [2], of sequences in the PR2 database to which OTU representatives received the best hit using a BLAST search. The labels Combined and Forward indicate the combined sequences yielded from both primers and single sequences yielded from the forward primer, respectively. The number on the top of each plot shows the number of sequences analysed. The bar in the box indicates the median value. The top and bottom of the boxes indicate the upper and lower quartiles, respectively.

Experimental design, materials, and methods

Lacustrine sediments were collected from the southern basin of Lake Biwa, and the central basins of Lake Suigetsu and Lake Hiruga using an Ekman–Birge-type bottom sampler (RIGO, Saitama, Japan) [1]. Surface sediment was subsampled from the 0–5 cm depth using a syringe with the needle-end cut-off. Total nucleic acids were extracted from the 0.5 g sediment samples using a FastDNA Spin Kit for Soil (MP Biomedicals, LLC, Solon, OH) according to the manufacturers' instructions. An amplicon library for high throughput sequencing analysis of protists 18S rRNA genes was constructed using a primer set targeting to the V4–V5 hypervariable region in protist 18S rRNA genes named 574*f (5′-CGGTAAYTCCAGCTCYV-3′) and 1132R (5′-CCGTCAATTHCTTYAART-3′) [2]. PCR amplification was performed in a 25 μL reaction mixture containing 1 × KAPA HiFi HotStart ReadyMix (KAPA Biosystems), 0.3 μM of each primer and 3 μL of ten-times diluted gDNA that corresponded to 0.4–1.3 ng of gDNA, under cycling conditions as follows: heating to 94 °C for 3 min to activate the hot-start DNA polymerase, 30 cycles at 94 °C for 30 s, annealing at 51 °C for 30 s, elongation at 72 °C for 45 s, then a final elongation at 72 °C for 7 min. Amplicon with expected lengths of 560 bp, which was determined using agarose gel electrophoresis, were purified and labelled with an index primer set attaching to both the 5′ and 3′ ends (NEBNext Multiplex Oligos, New England BioLabs), then sequenced using MiSeq Reagent kit v3 for 2 × 300 bp (Illumina, CA, USA). All of the generated sequence reads were de-multiplexed according to the index primers and processed using the software package Claident ver. 0.2.2017.07.26 [3], as previously described with a minor modification [4]. For generating the pared-end sequences, forward and reverse reads were combined with >50 bp overlapping ends of each read by VSEARCH. The combined reads of >400 bp length with a quality value of >30 were used for establishing operational taxonomic units (OTUs) using a 98% cut-off level. The OTUs that were detected as a single read within all samples (singletons) were omitted because too many singletons, which accounted for 21%–43% of OTUs (Table 1). A representative sequence of each OTU was filtered to split the sequences into ribosomal RNA (rRNA) and non-rRNA genes using riboPicker [5], and both rRNA and non-rRNA sequences were identified using the SINA programme [6] with reference to the SILVA database (SSURef_NR99_132 [7]). The taxonomic path for both rRNA and non-rRNA sequences was also obtained from the top hit of a BLASTn search [8], with reference to the PR2 database (ver. 4.10.0 [9]). A given p-value cut-off of 1 × 10−50 was used to remove non-rRNA genes [10]. In order to focus on potentially heterotrophic protists, fungal and autotrophic sequences were removed according to the PR2 taxonomy path. Rarefaction curves were calculated using the vegan package, ver. 2.4 [11]. Similarity profile analysis was conducted using the clustsig package, ver. 1.1. The dissimilarity was calculated by relative abundance data of sequence reads using the Bray-Curtis index, and significantly distant samples were clustered using Ward's method. All statistical analyses were conducted using R software ver. 3.3.2 (http://cran.r-project.org).

Specifications table

Subject areaBiology
More specific subject areaMicrobial Ecology
Type of dataTables, figures, FASTQ
How data was acquiredHigh throughput sequencing data of 18S rRNA gene amplicon using Illumina MiSeq sequencing
Data formatRaw and analysed
Experimental factorsGenomic DNA was extracted from anoxic sediment in lakes.
Experimental featuresAmplicon was generated using a primer set of 574*f and 1142R.
Data source locationLakes Hiruga and Suigetsu in Mikata Lake Group in Fukui Prefecture and Lake Biwa in Shiga Prefecture, Japan.
Data accessibilityAnalysed data is presented in the article. Raw DNA sequences are available in the DNA Data Bank of Japan (DDBJ) under the accession numberDRA007713(https://ddbj.nig.ac.jp/DRASearch/submission?acc=DRA007713).
Related research articleT. Kataoka, R. Kondo. Protistan community composition in anoxic sediments from three salinity-disparate Japanese lakes. Estuarine, Coastal and Shelf Science, 224, 34–42 (2019).https://doi.org/10.1016/j.ecss.2019.04.046
Value of the data

Comparing methods of annotating taxonomic path for 18S rRNA gene sequence is valuable because sequence in public database is still insufficient for identifying diverse eukaryotic microbes.

Information of partial sequence length between the forward- and reverse-primer is valuable for understanding protistan composition in natural environment where unknown microbes inhabit.

Alpha and beta diversities of protistan genotypes in lacustrine sediments are rare example.

  9 in total

1.  Seasonal and geographical distribution of near-surface small photosynthetic eukaryotes in the western North Pacific determined by pyrosequencing of 18S rDNA.

Authors:  Takafumi Kataoka; Haruyo Yamaguchi; Mayumi Sato; Tsuyoshi Watanabe; Yukiko Taniuchi; Akira Kuwata; Masanobu Kawachi
Journal:  FEMS Microbiol Ecol       Date:  2016-11-02       Impact factor: 4.194

2.  Identification and removal of ribosomal RNA sequences from metatranscriptomes.

Authors:  Robert Schmieder; Yan Wei Lim; Robert Edwards
Journal:  Bioinformatics       Date:  2011-12-06       Impact factor: 6.937

3.  SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

Authors:  Elmar Pruesse; Jörg Peplies; Frank Oliver Glöckner
Journal:  Bioinformatics       Date:  2012-05-03       Impact factor: 6.937

4.  Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi, and land plants.

Authors:  Akifumi S Tanabe; Hirokazu Toju
Journal:  PLoS One       Date:  2013-10-18       Impact factor: 3.240

5.  Systematic design of 18S rRNA gene primers for determining eukaryotic diversity in microbial consortia.

Authors:  Luisa W Hugerth; Emilie E L Muller; Yue O O Hu; Laura A M Lebrun; Hugo Roume; Daniel Lundin; Paul Wilmes; Anders F Andersson
Journal:  PLoS One       Date:  2014-04-22       Impact factor: 3.240

6.  The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.

Authors:  Christian Quast; Elmar Pruesse; Pelin Yilmaz; Jan Gerken; Timmy Schweer; Pablo Yarza; Jörg Peplies; Frank Oliver Glöckner
Journal:  Nucleic Acids Res       Date:  2012-11-28       Impact factor: 16.971

7.  The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy.

Authors:  Laure Guillou; Dipankar Bachar; Stéphane Audic; David Bass; Cédric Berney; Lucie Bittner; Christophe Boutte; Gaétan Burgaud; Colomban de Vargas; Johan Decelle; Javier Del Campo; John R Dolan; Micah Dunthorn; Bente Edvardsen; Maria Holzmann; Wiebe H C F Kooistra; Enrique Lara; Noan Le Bescot; Ramiro Logares; Frédéric Mahé; Ramon Massana; Marina Montresor; Raphael Morard; Fabrice Not; Jan Pawlowski; Ian Probert; Anne-Laure Sauvadet; Raffaele Siano; Thorsten Stoeck; Daniel Vaulot; Pascal Zimmermann; Richard Christen
Journal:  Nucleic Acids Res       Date:  2012-11-27       Impact factor: 16.971

Review 8.  Comparison of the complete protein sets of worm and yeast: orthology and divergence.

Authors:  S A Chervitz; L Aravind; G Sherlock; C A Ball; E V Koonin; S S Dwight; M A Harris; K Dolinski; S Mohr; T Smith; S Weng; J M Cherry; D Botstein
Journal:  Science       Date:  1998-12-11       Impact factor: 47.728

9.  NCBI BLAST: a better web interface.

Authors:  Mark Johnson; Irena Zaretskaya; Yan Raytselis; Yuri Merezhuk; Scott McGinnis; Thomas L Madden
Journal:  Nucleic Acids Res       Date:  2008-04-24       Impact factor: 16.971

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.