| Literature DB >> 28218638 |
Christopher Noune1, Caroline Hauxwell2.
Abstract
Next generation sequencing and bioinformatic approaches are increasingly used to quantify microorganisms within populations by analysis of 'meta-barcode' data. This approach relies on comparison of amplicon sequences of 'barcode' regions from a population with public-domain databases of reference sequences. However, for many organisms relevant 'barcode' regions may not have been identified and large databases of reference sequences may not be available. A workflow and software pipeline, 'MetaGaAP,' was developed to identify and quantify genotypes through four steps: shotgun sequencing and identification of polymorphisms in a metapopulation to identify custom 'barcode' regions of less than 30 polymorphisms within the span of a single 'read', amplification and sequencing of the 'barcode', generation of a custom database of polymorphisms, and quantitation of the relative abundance of genotypes. The pipeline and workflow were validated in a 'wild type' Alphabaculovirus isolate, Helicoverpa armigera single nucleopolyhedrovirus (HaSNPV-AC53) and a tissue-culture derived strain (HaSNPV-AC53-T2). The approach was validated by comparison of polymorphisms in amplicons and shotgun data, and by comparison of predicted dominant and co-dominant genotypes with Sanger sequences. The computational power required to generate and search the database effectively limits the number of polymorphisms that can be included in a barcode to 30 or less. The approach can be used in quantitative analysis of the ecology and pathology of non-model organisms.Entities:
Keywords: HaSNPV-AC53; MetaGaAP; baculoviruses; bioinformatics; community analysis; meta-barcoding; metapopulation
Year: 2017 PMID: 28218638 PMCID: PMC5372007 DOI: 10.3390/biology6010014
Source DB: PubMed Journal: Biology (Basel) ISSN: 2079-7737
Primers used for amplification of selected regions within the ORFs BRO-A and DNA polymerase.
| Target Gene | Primer | Fragment Size |
|---|---|---|
| BRO-A | * 5′-CATTTGCAAGGATATTGGAGT-3′ | 365 bp |
| DNA Polymerase | * 5′-GTATGACTTATCACGACAATTGC-3′ | 325 bp |
* An adapter, BarcodeX barcode adaptor and random hexamer is attached to the forward primer in the 5′ direction; # trP1 adapter is attached to the reverse primer in the 5′ direction.
Figure 1A visual representation of how the Biostars 175929 tool produces sequences containing all polymorphism combinations.
Figure 2The MetaGaAP workflow to identify genotypes and the relative abundance of the community composition within a single isolate.
Figure 3Fragments A and B of the HaSNPV-AC53 BRO-A target region containing the identified polymorphisms, showing identical alignment of polymorphisms in the amplicon, shotgun and Sanger sequences of the dominant BRO-A variant identified by MetaGaAP within the wild type baculovirus isolate HaSNPV-AC53.
Relative abundance of the identified AC53 BRO-A community composition that were above the 20x coverage threshold with G_33554431 identified as the dominant strain in the population.
| Genotype | Reads | Relative Abundance % |
|---|---|---|
| G_33554431 # | 258084 | 97.03 |
| G_33554303 | 1643 | 0.62 |
| G_33552383 | 787 | 0.30 |
| G_16777215 | 666 | 0.25 |
| G_33554423 | 533 | 0.20 |
| G_25165823 | 437 | 0.16 |
| G_33554430 | 437 | 0.16 |
| G_33292287 | 400 | 0.15 |
| G_31457279 | 393 | 0.15 |
| G_33554429 | 261 | 0.10 |
| G_33554399 | 228 | 0.09 |
| G_33554427 | 213 | 0.08 |
| G_33553919 * | 138 | 0.05 |
| G_33554175 | 129 | 0.05 |
| G_33546239 | 123 | 0.05 |
| G_33554367 | 105 | 0.04 |
| G_29360127 | 103 | 0.04 |
| G_33030143 | 103 | 0.04 |
| G_33550335 | 92 | 0.03 |
| G_33552255 | 68 | 0.03 |
| G_33521663 | 62 | 0.02 |
| G_33554415 | 56 | 0.02 |
| G_33554428 | 55 | 0.02 |
| G_20971519 | 52 | 0.02 |
| G_33553407 | 48 | 0.02 |
| G_23068671 | 35 | 0.01 |
| G_33554239 | 28 | 0.01 |
| G_33538047 | 21 | 0.01 |
# Equivalent to the AC53-T2 BRO-A G_1; * Equivalent to the AC53-T2 BRO-A G_0.
Relative abundance of the two BRO-A genotypes within AC53-T2 BRO-A.
| Genotype | Reads | Relative Abundance % |
|---|---|---|
| AC53-T2 BRO-A G_1 | 104,065 | 54.27 |
| AC53-T2 BRO-A G_0 | 87,689 | 45.73 |
Figure 4Comparison of the AC53-T2 reference sequence to the Sanger sequence and the two identified genotypes within HaSNPV-AC53-T2. The Sanger chromatogram at position 293 shows the two competing genotypes which were identified with MetaGaAP and validates relative abundance result.