Literature DB >> 20957043

Use of ITS2 region as the universal DNA barcode for plants and animals.

Hui Yao1, Jingyuan Song, Chang Liu, Kun Luo, Jianping Han, Ying Li, Xiaohui Pang, Hongxi Xu, Yingjie Zhu, Peigen Xiao, Shilin Chen.   

Abstract

BACKGROUND: The internal transcribed spacer 2 (ITS2) region of nuclear ribosomal DNA is regarded as one of the candidate DNA barcodes because it possesses a number of valuable characteristics, such as the availability of conserved regions for designing universal primers, the ease of its amplification, and sufficient variability to distinguish even closely related species. However, a general analysis of its ability to discriminate species in a comprehensive sample set is lacking. METHODOLOGY/PRINCIPAL
FINDINGS: In the current study, 50,790 plant and 12,221 animal ITS2 sequences downloaded from GenBank were evaluated according to sequence length, GC content, intra- and inter-specific divergence, and efficiency of identification. The results show that the inter-specific divergence of congeneric species in plants and animals was greater than its corresponding intra-specific variations. The success rates for using the ITS2 region to identify dicotyledons, monocotyledons, gymnosperms, ferns, mosses, and animals were 76.1%, 74.2%, 67.1%, 88.1%, 77.4%, and 91.7% at the species level, respectively. The ITS2 region unveiled a different ability to identify closely related species within different families and genera. The secondary structure of the ITS2 region could provide useful information for species identification and could be considered as a molecular morphological characteristic.
CONCLUSIONS/SIGNIFICANCE: As one of the most popular phylogenetic markers for eukaryota, we propose that the ITS2 locus should be used as a universal DNA barcode for identifying plant species and as a complementary locus for CO1 to identify animal species. We have also developed a web application to facilitate ITS2-based cross-kingdom species identification (http://its2-plantidit.dnsalias.org).

Entities:  

Mesh:

Substances:

Year:  2010        PMID: 20957043      PMCID: PMC2948509          DOI: 10.1371/journal.pone.0013102

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

As one of the most important markers in molecular systematics and evolution [1]–[6], ITS2 shows significant sequence variability at the species level or lower. The availability of its structural information permits analysis at higher taxonomic level [1], [3], [7]–[9], which provides additional information for improving accuracy and robustness in the reconstruction of phylogenetic trees [10]. Furthermore, ITS2 is potentially useful as a standard DNA barcode to identify medicinal plants [11]–[15] and as a barcode to identify animals [16]–[19]. ITS2 is regarded as one of the candidate DNA barcodes because of its valuable characteristics, including the availability of conserved regions for designing universal primers, the ease of its amplification, and enough variability to distinguish even closely related species. Since Hebert first proposed the use of the cytochrome c oxidase subunit 1 (CO1) as a barcode to identify animals, DNA barcoding has attracted worldwide attention [20], [21]. Many loci have been proposed as plant barcodes, including ITS [22], [23], rbcL [24], [25], psbA-trnH [24], [26], [27], and matK [26]–[28]. Most recently, the Plant Working Group of the Consortium for the Barcode of Life recommended a two-locus combination of rbcL + matK as a plant barcode [29]. However, some researchers have suggested that DNA barcodes based on uniparentally inherited markers can never reflect the complexity that exists in nature [22]. In addition, nuclear genes can provide more information than barcoding based on organellar DNA, which is inherited from only one parent [30]. Although ITS2 shows a great potential as a barcode to identify plants and animals, an extensive evaluation based on a comprehensive sample set is lacking. To validate the potential of using the ITS2 region to identify closely related species of plants and animals, we analyzed 50,790 plant and 12,221 animal ITS2 sequences (Table S1) available in a public database. The results support the conclusion that the ITS2 region can be used as an effective barcode for the identification of plant species and as a complementary locus to CO1 for identifying animals.

Results

For plants, the lengths of ITS2 sequences from dicotyledons and mosses were distributed between 100 and 700 bp, and the lengths of ITS2 sequences from monocotyledons, gymnosperms, and ferns were distributed between 100 and 480 bp. The average lengths of ITS2 sequences for dicotyledons, monocotyledons, gymnosperms, ferns, and mosses were 221, 236, 240, 224, and 260 bp, respectively. For animals, the ITS2 sequence lengths ranged from 100 to 1,209 bp (mainly dispersed between 195 and 510 bp), with an average of 306 bp. The GC contents of the ITS2 sequences of the dicotyledons, monocotyledons, gymnosperms, ferns, mosses, and animals were calculated, and the averages were 59.4%, 61.3%, 62.9%, 55.5%, 64.7%, and 48.3%, respectively. The average and distributions of ITS2 sequence lengths, as well as the GC contents of the six taxa, are shown in Figure 1 and Figure 2, respectively.
Figure 1

Box plots of the ITS2 sequence length of plants and animals.

In a box plot, the box shows the interquartile range (IQR) of the data. The IQR is defined as the difference between the 75th percentile and the 25th percentile. The solid and dotted line through the box represent the median and the average length, respectively.

Figure 2

Box plots of GC contents of ITS2 of plants and animals.

In a box plot, the box shows the IQR of the data. The IQR is defined as the difference between the 75th percentile and the 25th percentile. The solid and dotted line through the box represent the median and the average GC contents, respectively.

Box plots of the ITS2 sequence length of plants and animals.

In a box plot, the box shows the interquartile range (IQR) of the data. The IQR is defined as the difference between the 75th percentile and the 25th percentile. The solid and dotted line through the box represent the median and the average length, respectively.

Box plots of GC contents of ITS2 of plants and animals.

In a box plot, the box shows the IQR of the data. The IQR is defined as the difference between the 75th percentile and the 25th percentile. The solid and dotted line through the box represent the median and the average GC contents, respectively. Inter-specific divergence was assessed by three parameters: average inter-specific distance, average theta prime, and smallest inter-specific distance [11], [31], [32]. In contrast, intra-specific variation was evaluated by three additional parameters: average intra-specific difference, theta (θ), and average coalescent depth [27], [32]. The inter-specific genetic distances between congeneric species of plants and animals were greater than the intra-specific variations of the ITS2 regions of the different taxa (Table 1).
Table 1

Analysis of intra- and inter-specific divergences of congeneric species in plants and animals.

TaxaAnimalsDicotyledonsMonocotyledonsGymnospermsMossesFerns
All inter-specific distance 0.3761±0.59820.1042±0.13930.1829±0.19400.0537±0.08920.1007±0.09130.4758±0.3547
Theta prime 0.2820±0.42570.0999±0.11180.1127±0.13100.0573±0.07440.1874±0.17920.4995±0.2906
Minimum inter-specific distance 0.1361±0.22540.0370±0.06670.0386±0.08090.0195±0.05760.0838±0.14660.2399±0.3173
All intra-specific distance 0.0522±0.11500.0214±0.08090.0309±0.07120.0170±0.04130.0114±0.04560.0082±0.0160
Theta 0.0274±0.08090.0231±0.07810.0244±0.07640.0255±0.05110.0289±0.07920.0262±0.0254
Coalescent depth 0.0596±0.19620.0363±0.17390.0360±0.12130.0368±0.06530.0452±0.10870.0336±0.0256
BLAST1 method based on similarity was used to evaluate the identification capacity of the ITS2 region [33]. At the genus level, the use of the ITS2 region had a >97% success rate for the identification of plants and animals (Table 2). At the species level, ITS2 sequences correctly identified 91.9% of 12,221 animal samples, whereas the success rates of using ITS2 sequences for the identification of 34,676 dicotyledons, 11,598 monocotyledons, 946 gymnosperms, 42 ferns, and 3,528 mosses were 76.1%, 74.2%, 67.1%, 88.1%, and 77.4% at the species levels, respectively (Table 2).
Table 2

Identification efficiency of ITS2 regions in plants and animals using BLAST1 method.

TaxaTaxa levelCorrect identification (%)Ambiguous identification (%)
AnimalsSpecies91.78.3
Genus99.70.3
DicotyledonsSpecies76.123.9
Genus99.10.9
MonocotyledonsSpecies74.225.8
Genus97.92.1
GymnospermsSpecies67.132.9
Genus99.50.5
MossesSpecies77.422.6
Genus98.61.4
FernsSpecies88.111.9
Genus100.00
In addition, we studied the possibility of using ITS2 sequences to identify closely related species in different families. First, we studied 34 dicotyledon families, each having more than 10 genera. For 13 families, the rates of successful identification were more than 80%; success rates for identification fell below 70% in only seven families (Fig. 3). Of the 14 monocotyledon families that each had more than 5 genera, identification success rates were lower than 70% in only two families (Fig. 3). The success rates for using the ITS2 region to identify species in families with more than 10 genera of mosses and gymnosperms and all families of ferns are also shown in Fig. 3. The success rates for using the ITS2 region to identify species in families with less than 10 genera of dicotyledons, mosses, gymnosperms, and with less than 5 genera of monocotyledons are listed in Table S2. Compared to the success rates when identifying species in plants, the success rates for identifying species in the nine phyla of animals studied were much higher (more than 90%), except for Cnidaria (77.1%) (Fig. 3).
Figure 3

Identification efficiency when using ITS2 regions to distinguish between closely related species in different families of plants and animals using the BLAST1 method.

The ITS2 sequences of all animal phyla, dicotyledon, gymnosperm, and mosses families with more than 10 genera, monocotyledon families with more than 5 genera, and all fern families are shown in this figure.

Identification efficiency when using ITS2 regions to distinguish between closely related species in different families of plants and animals using the BLAST1 method.

The ITS2 sequences of all animal phyla, dicotyledon, gymnosperm, and mosses families with more than 10 genera, monocotyledon families with more than 5 genera, and all fern families are shown in this figure. Second, we focused on the ability of ITS2 to discriminate amongst the lower taxa. Of the 35 dicotyledon genera that each had more than 80 species, identification success rates were more than 80% for 12 genera. The success rates for identification of species within the Draba and Rhododendron genera were the two lowest at 27.2% and 21.9%, respectively (Table 3). The success rates for the identification of species within the dicotyledon genera with less than 80 species can be found in Table S3. Of the 42 monocotyledon genera with more than 30 species, identification success rates were greater than 80% in 13 genera. The success rates for identification of species within the Kniphofia, Ophrys, and Diuris genera were the three lowest at 16.2%, 22.7%. and 31.1%, respectively (Table 4). The success rates for the identification of species within genera with less than 30 species of monocotyledons and of species from different genera of gynosperms, ferns, and mosses can be found in Table S3. All 28 animal genera with more than 20 species each had a species identification success rates greater than 80%, except for the genus Calligrapha and Dolichopus. The success rates for the identification of species within the genus Calligrapha and Dolichopus were the lowest, which were at 73.3% and 73.8%, respectively (Table 5). The success rates for the identification of genera with less than 20 species of animals are presented in Table S3.
Table 3

Success rates of ITS2 for species identification in genera with more than 80 species in dicotyledons.

Family nameGenus nameNo. of speciesNo. of samplesSuccess rate at the species level (%)
Fabaceae Astragalus 32238165.9
Fabaceae Indigofera 23426695.5
Fabaceae Trifolium 22333470.1
Melastomataceae Miconia 20622366.4
Brassicaceae Draba 19945227.2
Asteraceae Centaurea 18528458.5
Plantaginaceae Veronica 17826490.2
Oxalidaceae Oxalis 17620180.6
Moraceae Ficus 17421585.6
Solanaceae Solanum 16224883.9
Asteraceae Senecio 16121977.6
Fabaceae Aspalathus 13816555.8
Fabaceae Acacia 12715172.8
Rosaceae Rubus 12419972.9
Begoniaceae Begonia 12423697.9
Polygalaceae Polygala 12312889.8
Asteraceae Artemisia 11815963.5
Rosaceae Cliffortia 11815167.5
Acanthaceae Ruellia 11715179.5
Euphorbiaceae Euphorbia 11716886.9
Balsaminaceae Impatiens 11713797.8
Apiaceae Eryngium 11313662.5
Myrtaceae Eucalyptus 10613561.5
Euphorbiaceae Croton 10414259.9
Calceolariaceae Calceolaria 9910374.8
Convolvulaceae Cuscuta 9826174.7
Caryophyllaceae Dianthus 9714140.4
Lamiaceae Salvia 9621381.2
Berberidaceae Berberis 9416455.5
Ericaceae Rhododendron 8623321.9
Euphorbiaceae Macaranga 8412766.9
Sapindaceae Acer 8374581.5
Rosaceae Prunus 8222278.8
Urticaceae Pilea 818897.7
Rubiaceae Coffea 8111172.1
Table 4

Success rates of ITS2 for species identification in genera with more than 30 species in monocotyledons.

Family nameGenus nameNo. of speciesNo. of samplesSuccess rate at the species level (%)
Alliaceae Allium 27371772.7
Amaryllidaceae Cyrtanthus 435786.0
Amaryllidaceae Crinum 343452.9
Arecaceae Pinanga 4916195.7
Asphodelaceae Kniphofia 529916.2
Costaceae Costus 509452.1
Cyperaceae Carex 31850680.6
Cyperaceae Eleocharis 5212290.2
Hyacinthaceae Lachenalia 315070.0
Juncaceae Luzula 455651.8
Juncaceae Juncus 425168.6
Liliaceae Gagea 7922856.1
Liliaceae Lilium 7812479.0
Liliaceae Fritillaria 495882.8
Musaceae Musa 376382.5
Orchidaceae Maxillaria 22748262.9
Orchidaceae Oncidium 13921565.1
Orchidaceae Dendrobium 12116091.9
Orchidaceae Disa 12014379.7
Orchidaceae Ophrys 10026022.7
Orchidaceae Paphiopedilum 8519276.6
Orchidaceae Phalaenopsis 5623265.9
Orchidaceae Masdevallia 484979.6
Orchidaceae Gomesa 465549.1
Orchidaceae Satyrium 425998.3
Orchidaceae Dendrochilum 425271.2
Orchidaceae Cyrtochilum 417569.3
Orchidaceae Telipogon 384676.1
Orchidaceae Dichaea 366681.8
Orchidaceae Diuris 336131.1
Orchidaceae Scaphyglottis 3340100.0
Orchidaceae Cymbidium 305874.1
Poaceae Poa 11517846.1
Poaceae Bromus 668076.3
Poaceae Elymus 5415574.2
Poaceae Festuca 516972.5
Poaceae Nassella 313680.6
Poaceae Hordeum 3148181.7
Potamogetonaceae Potamogeton 3321172.5
Zingiberaceae Globba 6010357.3
Zingiberaceae Alpinia 468568.2
Zingiberaceae Amomum 375294.2
Table 5

Success rates of ITS2 for species identification in genera with more than 20 species in animals.

Family nameGenus nameNo. of speciesNo. of samplesSuccess rate at the species level (%)
Aphelenchoididae Bursaphelenchus 328681.4
Camaenidae Satsuma 27122100.0
Ceratopogonidae Culicoides 39134100.0
Chrysomelidae Timarcha 4218397.3
Chrysomelidae Calligrapha 234573.3
Clausiliidae Albinaria 253196.8
Clausiliidae Isabellaria 202395.7
Conidae Conus 2323100.0
Culicidae Culex 2324198.8
Culicidae Aedes 2115493.5
Dolichopodidae Dolichopus 386573.8
Drosophilidae Drosophila 404381.4
Enidae Mastus 244495.5
Gyrodactylidae Gyrodactylus 4913599.3
Heteroderidae Heterodera 4121193.8
Longidoridae Xiphinema 2552100.0
Lycaenidae Agrodiaetus 7511190.1
Nesticidae Nesticus 2651100.0
Nitidulidae Meligethes 798287.8
Planorbidae Biomphalaria 229195.6
Poritidae Porites 2020689.3
Pratylenchidae Pratylenchus 2215497.4
Psychodidae Phlebotomus 24129100.0
Reduviidae Triatoma 2812794.5
Sarcophagidae Sarcophaga 2433100.0
Simuliidae Simulium 2217780.8
Steinernematidae Steinernema 4614096.4
Trichogrammatidae Trichogramma 5927899.3
To identify the species, we focused not only on the divergence of primary sequences of ITS2, but also on the use of variations in the secondary structures of ITS2. The secondary structures and alignments of primary sequences of ITS2 were reconstructed in four different species from the same genus, four species from different genera of the same family, and four species from the different families of dicotyledons, monocotyledons, and animals. These are shown in Figures 4, S1, S2, S3, S4, and S5. All of the secondary structures in these species have four similar helices: Helix I, II, III, and IV (Figs. 4, S2 and S4) [2], [34], [35]. Helix III is relatively longer than the others. At the different taxa levels of dicotyledons, monocotyledons, and animals, the secondary structures show different levels of similarity, which result from the differences in the primary sequences of these species. Thus, the species of dicotyledons, monocotyledons, and animals could be identified by their secondary structure. And, the secondary structure of the ITS2 region could be considered as a molecular morphological characteristic.
Figure 4

The secondary structure of ITS2 in different species of dicotyledons.

Although ITS2 sequences are advantageous for identification purposes, one of the concerns for accepting the ITS2 region as a barcode is the potential contamination of fungal sequences [11]. We checked the studied ITS2 sequences of plants and animals using the Hidden Markov model (HMM) for fungal ITS2 annotation, in addition to conducting BLAST searches of the fungal nrITS database [36]. For the plants, 139 and 136 ITS2 sequences may have been fungal sequences, as determined by BLAST and HMM, respectively. Less than 10 ITS2 sequences of gynosperms, ferns, and mosses may have been fungal sequences, as determined by the BLAST and HMM. There were 37 and 32 dicotyledon ITS2 sequences, as well as 30 and 27 animal ITS2 sequences that may have been fungal sequences as determined by the BLAST and HMM, respectively. There were 86 monocotyledon ITS2 sequences that may have been fungal sequences (Table S4). Finally, we developed a web application at http://its2-plantidit.dnsalias.org to allow researchers to further test the usefulness of ITS2 for species identification across plant and animal kingdoms. Four different modules have been implemented at the time of this writing. The first module, “View,” provides a gene-card like summary regarding the ITS2 reference sequence for a particular species. The users perform a query with a taxonomy ID used in NCBI's taxonomy browser. The module then displays all sequences associated with the taxonomy ID, as well as the reference barcode sequences for the ITS2 region of this species. The second module, “Retrieve,” allows the user to retrieve various segments of the ITS2 region, which can be divided into the 5.8S gene segment, the ITS2 core region, and the 28S gene segment. The sequences for these different regions can then be used to build various models, such as HMMs. The third module, “Annotate,” allows users to annotate the 5.8S gene segment, the ITS2 core region, and the 28S gene segment for their own sequences. The users need to provide the alignment of multiple sequences for the 5.8S gene and the 28S gene segments. The module then builds HMMs with these fragments, and uses HMM to query the input sequences to define the boundaries of the various fragments. The users can choose to export various segments individually or by batch. The last module, “Identify,” performs a BLAST search on a query sequence against our internal ITS2 reference barcode sequence database. Species identification is based on the assumption that the ITS2 sequence for this species is included in the reference database. In such a case, if the top hit represents a unique species, this species should represent the species to which the sample belongs. In contrast, if the top hit includes more than one unique species, the ITS2 sequence cannot be used to identify the sample, and additional DNA barcodes are needed to resolve the identity of the sample. If the reference database does not contain the ITS2 sequence of the species under investigation, the identification is more complicated, and has been stated elsewhere [33]. In summary, a comprehensive reference database is critical for species identification, which is the reason this database was constructed.

Discussion

An ideal barcode should possess sufficient variation among the sequences to discriminate species; however, it also needs to be sufficiently conserved so that there is less variability within species than between species [37], [38]. Chen et al. (2010) compared seven candidate DNA barcodes (psbA-trnH, matK, rbcL, rpoC1, ycf5, ITS2, and ITS) from medicinal plant species and proposed that ITS2 can be potentially used as a standard DNA barcode to identify medicinal plants. The ITS2 region has also been used as a barcode to identify spider mites [41], Sycophila [16], and Fasciola [18]. In the present study, we extended this analysis across all plants and animals, and assessed the species discrimination capacity of ITS2 sequences for 50,790 plant and 12,221 animal sequences (Table S1). The success rates for identification of plants and animals were more than 97% and 74% at the genus and species level (Table 2), respectively, except for gymnosperms, which had a 67.1% success rate at the species level. In addition, the ITS2 region had a high success rate for discriminating between closely related species in plants and animals (Fig. 3, Tables 3, 4, 5, S2, and S3). The sequence length of ITS2 is short (Fig. 1), which satisfies the requirements for PCR amplification and sequencing. Finally, the secondary structures of ITS2 are conserved and can provide useful biological information for alignment [2], [4], [35]; thus, it can be considered as molecular morphological characteristics for species identification. The ITS2 sequence lengths of plants and animals were mainly distributed in the 195–510 bp range. The identification of plant and animal voucher species and other collections using DNA barcoding techniques is one of the main tasks in natural museums and research institutes. The length of the ITS2 region is sufficiently short to allow amplification of even degraded DNA. In addition, the intra-specific variations in plants and animals are lower than the inter-specific divergences. But the overlap of genetic variation without barcoding gaps significantly increases when the number of closely related species is increased [32]. Hebert et al. found that more than 98% of 13,320 congeneric species pairs, including representatives from 11 phyla, have sufficient sequence divergence to ensure easy identification [20]. However, the sequence divergence of COI for some animal species, such as cnidarians [20] and the West Palaearctic Pandasyopthalmus taxa [39], is relatively low, and even invariant. In addition, mtDNA is maternally inherited; other resources of data should be considered, such as nuclear DNA, morphology, or ecology [40]. The success rate of using ITS2 for identification of animals is 91.7% at the species level based on testing of a comprehensive sample set, and the identification efficiency of ITS2 for sequences in cnidarians is more than 77%. ITS2 sequences have a relatively high divergence rate; thus, it can be used as a complementary locus to CO1 for identification of animal species. Recently, ITS2 region has been found to vary in primary sequences and secondary structures in a way that correlates highly with taxonomic classification. Several researchers have already demonstrated the potential for using ITS2 for taxonomic classification and phylogenetic reconstruction at both the genus and species levels for eukaryotes, including animals, plants, and fungi [2], [4], [8], [9], [42], [43]. The ITS2 region of nuclear DNA provides a powerful tool because of sufficient variation in primary sequences and secondary structures. Analysis of the secondary structures formed by the RNA transcript as it folds back upon itself at transcription has been less commonly conducted; however, it has been proven extremely useful in aiding proper sequence alignment [1], [44]. Schultz and Wolf described the utilization of ITS2′s primary sequence and secondary structure information, together with an ITS2-specific scoring matrix and an ITS2-specific substitution model, based on tools such as 4SALE, the CBCAnalyzer, and ProfDistS [9]. Among of 50,790 ITS2 sequences of plants and 12,221 ITS2 sequences of animals,139 and 30 sequences, respectively, could be fungal sequences. Thus, the frequency is less than 0.3% in both plants and animals. This result is similar to that of Chen et al. [11]. The frequency of suspected fungal sequences in monocotyledon ITS2 sequences is twice as high as in dicotyledons, which may be due to the presence of endophytic fungi in most monocotyledon species. Although the rate of fungal contamination is very low, we should pay more attention to the data from the public database [11]. There are multiple copies of ITS (containing ITS1 and ITS2) in plants and animals. Although different copies of ITS exist, which may result in misleading phylogenetic inferences [45], there remain several advantages for its widespread use, such as the levels of variations and multicopy structure facilitating PCR amplification, even from herbarium specimens [46]. In conclusion, we believe that the ITS2 locus can be used as a barcode for authenticating plant species, as well as a complementary locus to CO1 for identifying animal species. The sequences of the universal primers and the amplification conditions for obtaining the ITS2 sequences of plants and animals can be found in Table S5, as well as in the ITS2 application web. There were limited ITS2 sequences of ferns and vertebrates in the GenBank; therefore, the success rates for ITS2 to identify them need further investigation.

Materials and Methods

Reference Database Construction

All ITS2 sequences of dicotyledons, monocotyledons, gymnosperms, mosses, ferns and animals were downloaded from GenBank on June 28, 2010 by searching using the keywords “internal transcribed spacer 2,” which retrieved 160,295 sequences. These sequences were used to construct an analysis dataset. The raw data were annotated and trimmed using ITS2 annotation tools based on HMM [42]. Two conserved regions of the 5.8S and 28S gene for plants and animals, respectively, were used to delimit the ITS2 region. A maximum E-value of 1.0 was used. The trimmed sequences were edited manually. The sequences with less than 100 bp length, or with ambiguous bases with more than two “Ns”, or with unnamed species (such as those with spp. and aff. in the species name) were excluded. The selected ITS2 sequences were filtered then with a HMM-based annotation [35] and fungal nrITS database (http://www.emerencia.org/fungalitspipeline.html) [36] using the BLAST tool. The ITS2 sequences belonging to a genus that contains only one species were excluded from the analysis. Finally, a reference database was constructed. The detailed sequences information can be found in Table S6. The workflow is shown in Figure 5.
Figure 5

The workflow diagram for the construction of ITS2 sequences libraries.

GC Content, Sequence Length, and Intra- and Inter-specific Divergence

The GC content and sequence length were calculated for all of the ITS2 sequences of dicotyledons, monocotyledons, gymnosperms, ferns, mosses, and animals. The intra- and inter-specific divergences were calculated based on different taxa. Sequences were aligned using Clustal W, and Kimura 2-parameter (K2P) distances were calculated using PAUP4b10 (Florida State University, USA). The intra-specific variations and inter-specific divergences of congeneric species in the dicotyledons, monocotyledons, gymnosperms, ferns, mosses, and animals were calculated using a K2P distance matrix, as described previously [11], [31], [32].

Species Identification

All ITS2 sequences of plants and animals were used as query sequences. Query sequences were divided into the following: dicotyledon, monocotyledon, gymnosperm, fern, moss, and animal. BLAST1, which was implemented using the BLAST program (Version 2.2.17), was used to search for the reference database for each query sequence [33].

Secondary Structure of the ITS2 Region

To identify the effect of primary sequence divergences on secondary structure, ITS2 sequences with different sequence divergence (∼1%, ∼5%, ∼10%) were subjected to the secondary structure prediction in a genus that had three other species and three other genera in the same family. Paphiopedilum (Orchidaceae) of monocotyledons, Acaena (Rosaceae) of dicotyledons, and Heterodera (Ceratopogonidae) of animals were used to construct secondary structures using tools from the ITS2 database [35].

Web Application for ITS2-based Species Determination

We developed a web application (http://its2-plantidit.dnsalias.org) to facilitate the utilization of the ITS2 sequence for various DNA barcoding studies. DNA sequences related to ITS2 regions were retrieved from GenBank, and were preprocessed to remove the flanking 5.8S and 28S rRNA gene sequences, as described in section Reference Database Construction. Sequences that belong to the same species, indicated by having the same taxonomy ID, were assembled using the program Phrap. The consensus sequence of the corresponding sequence clusters was considered as the average or reference sequence of the ITS2 region for the species, which can be retrieved from the application. The web application was built using the Catalyst web application framework (http://www.catalystframework.org/) for Perl language running in a Fedora 12 environment. This web application consists of four analytic modules at the time of the writing: View, Retrieve, Annotate, and Identify. No. of genera, species, and samples used in this study. (0.03 MB DOC) Click here for additional data file. Success rates of using ITS2 sequences to identify dicotyledon, moss, and gymnosperm species in families having less than 10 genera and monocotyledon species in families having less than 5 genera. (0.05 MB XLS) Click here for additional data file. Success rates of using ITS2 sequences to identify dicotyledon species in genera having less than 80 species, monocotyledon species in genera having less than 30 species, gymnosperm, moss, and fern species in different genera and animal species in genera having less than 20 species. (0.39 MB XLS) Click here for additional data file. Sequences that may be of fungal origin. (0.03 MB XLS) Click here for additional data file. The sequences of the universal primers and the amplification conditions for obtaining the ITS2 sequences of plants and animals. (0.03 MB DOC) Click here for additional data file. Samples used to determine the potential for using ITS2 sequences to identify species, and their accession numbers in GenBank. (5.91 MB XLS) Click here for additional data file. Alignment of primary sequences of dicotyledons. (A) Alignment of the primary sequences of four species from the genus Acaena of Rosaceae; (B) Alignment of the primary sequences of four species from four genera of Rosaceae; and (C) Alignment of the primary sequences of four species from four families of dicotyledons. (0.03 MB PDF) Click here for additional data file. Secondary structure of ITS2 in different species of monocotyledons. (4.00 MB TIF) Click here for additional data file. Alignment of the primary sequences of monocotyledons. (A) Alignment of the primary sequences of four species from the genus Paphiopedilum of Orchidaceae; (B) Alignment of the primary sequences of four species from four genera of Orchidaceae; and (C) Alignment of the primary sequences of four species from four families of monocotyledons. (0.03 MB PDF) Click here for additional data file. Secondary structure of ITS2 in different species of animals. (3.86 MB TIF) Click here for additional data file. Alignment of the primary sequences of animals. (A) Alignment of the primary sequences of four species from the genus Heterodera of Heteroderidae; (B) Alignment of the primary sequences of four species from four genera of Heteroderidae; and (C) Alignment of the primary sequences of four species from four families of animals aided by secondary structure using 4SALE [47]. (0.04 MB PDF) Click here for additional data file.
  41 in total

1.  Biological identifications through DNA barcodes.

Authors:  Paul D N Hebert; Alina Cywinska; Shelley L Ball; Jeremy R deWaard
Journal:  Proc Biol Sci       Date:  2003-02-07       Impact factor: 5.349

Review 2.  Ribosomal ITS sequences and plant phylogenetic inference.

Authors:  I Alvarez; J F Wendel
Journal:  Mol Phylogenet Evol       Date:  2003-12       Impact factor: 4.286

3.  Distinguishing species.

Authors:  Tobias Müller; Nicole Philippi; Thomas Dandekar; Jörg Schultz; Matthias Wolf
Journal:  RNA       Date:  2007-07-24       Impact factor: 4.942

4.  DNA barcodes: genes, genomics, and bioinformatics.

Authors:  W John Kress; David L Erickson
Journal:  Proc Natl Acad Sci U S A       Date:  2008-02-19       Impact factor: 11.205

5.  The use of mean instead of smallest interspecific distances exaggerates the size of the "barcoding gap" and leads to misidentification.

Authors:  Rudolf Meier; Guanyang Zhang; Farhan Ali
Journal:  Syst Biol       Date:  2008-10       Impact factor: 15.683

6.  Testing the reliability of genetic methods of species identification via simulation.

Authors:  Howard A Ross; Sumathi Murugan; Wai Lok Sibon Li
Journal:  Syst Biol       Date:  2008-04       Impact factor: 15.683

7.  Including RNA secondary structures improves accuracy and robustness in reconstruction of phylogenetic trees.

Authors:  Alexander Keller; Frank Förster; Tobias Müller; Thomas Dandekar; Jörg Schultz; Matthias Wolf
Journal:  Biol Direct       Date:  2010-01-15       Impact factor: 4.540

8.  Taxonomic reliability of DNA sequences in public sequence databases: a fungal perspective.

Authors:  R Henrik Nilsson; Martin Ryberg; Erik Kristiansson; Kessy Abarenkov; Karl-Henrik Larsson; Urmas Kõljalg
Journal:  PLoS One       Date:  2006-12-20       Impact factor: 3.240

9.  The internal transcribed spacer 2 database--a web server for (not only) low level phylogenetic analyses.

Authors:  Jörg Schultz; Tobias Müller; Marco Achtziger; Philipp N Seibel; Thomas Dandekar; Matthias Wolf
Journal:  Nucleic Acids Res       Date:  2006-07-01       Impact factor: 16.971

10.  Validation of the ITS2 region as a novel DNA barcode for identifying medicinal plant species.

Authors:  Shilin Chen; Hui Yao; Jianping Han; Chang Liu; Jingyuan Song; Linchun Shi; Yingjie Zhu; Xinye Ma; Ting Gao; Xiaohui Pang; Kun Luo; Ying Li; Xiwen Li; Xiaocheng Jia; Yulin Lin; Christine Leon
Journal:  PLoS One       Date:  2010-01-07       Impact factor: 3.240

View more
  170 in total

1.  Comparative analysis of a large dataset indicates that internal transcribed spacer (ITS) should be incorporated into the core barcode for seed plants.

Authors:  De-Zhu Li; Lian-Ming Gao; Hong-Tao Li; Hong Wang; Xue-Jun Ge; Jian-Quan Liu; Zhi-Duan Chen; Shi-Liang Zhou; Shi-Lin Chen; Jun-Bo Yang; Cheng-Xin Fu; Chun-Xia Zeng; Hai-Fei Yan; Ying-Jie Zhu; Yong-Shuai Sun; Si-Yun Chen; Lei Zhao; Kun Wang; Tuo Yang; Guang-Wen Duan
Journal:  Proc Natl Acad Sci U S A       Date:  2011-11-18       Impact factor: 11.205

2.  Phylogeny of Oedogoniales, Chaetophorales and Chaetopeltidales (Chlorophyceae): inferences from sequence-structure analysis of ITS2.

Authors:  Mark A Buchheim; Danica M Sutherland; Tina Schleicher; Frank Förster; Matthias Wolf
Journal:  Ann Bot       Date:  2011-10-25       Impact factor: 4.357

3.  An ITS-based phylogenetic framework for the genus Vorticella: finding the molecular and morphological gaps in a taxonomically difficult group.

Authors:  Ping Sun; John C Clamp; Dapeng Xu; Bangqin Huang; Mann Kyoon Shin; Franziska Turner
Journal:  Proc Biol Sci       Date:  2013-10-02       Impact factor: 5.349

4.  Discordance in variation of the ITS region and the mitochondrial COI gene in the subterranean amphipod Crangonyx islandicus.

Authors:  Etienne Kornobis; Snæbjörn Pálsson
Journal:  J Mol Evol       Date:  2011-08-04       Impact factor: 2.395

5.  DNA barcoding: a new tool for palm taxonomists?

Authors:  Marc L Jeanson; Jean-Noël Labat; Damon P Little
Journal:  Ann Bot       Date:  2011-07-14       Impact factor: 4.357

6.  Identification of Crocus sativus and its Adulterants from Chinese Markets by using DNA Barcoding Technique.

Authors:  Wei-Juan Huang; Fei-Fei Li; Yu-Jing Liu; Chun-Lin Long
Journal:  Iran J Biotechnol       Date:  2015-03       Impact factor: 1.671

7.  Tree diversity and species identity effects on soil fungi, protists and animals are context dependent.

Authors:  Leho Tedersoo; Mohammad Bahram; Tomáš Cajthaml; Sergei Põlme; Indrek Hiiesalu; Sten Anslan; Helery Harend; Franz Buegger; Karin Pritsch; Julia Koricheva; Kessy Abarenkov
Journal:  ISME J       Date:  2015-07-14       Impact factor: 10.302

8.  Plant DNA Barcoding Principles and Limits: A Case Study in the Genus Vanilla.

Authors:  Pascale Besse; Denis Da Silva; Michel Grisoni
Journal:  Methods Mol Biol       Date:  2021

Review 9.  Microbiomes in forensic botany: a review.

Authors:  Sarah Ishak; Eleanor Dormontt; Jennifer M Young
Journal:  Forensic Sci Med Pathol       Date:  2021-04-08       Impact factor: 2.007

10.  Phenolic variation among Chamaecrista nictitans subspecies and varieties revealed through UPLC-ESI(-)-MS/MS chemical fingerprinting.

Authors:  Luis Quirós-Guerrero; Federico Albertazzi; Emanuel Araya-Valverde; Rosaura M Romero; Heidy Villalobos; Luis Poveda; Max Chavarría; Giselle Tamayo-Castillo
Journal:  Metabolomics       Date:  2019-01-19       Impact factor: 4.290

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.