Literature DB >> 30131909

Generating DNA sequence data with limited resources for molecular biology: Lessons from a barcoding project in Indonesia.

Gillian H Dean¹, Rani Asmarayani^2,3, Marlina Ardiyani², Yessi Santika², Teguh Triono^2,4, Sarah Mathews^5,6, Campbell O Webb^5,7.

Abstract

The advent of the DNA sequencing age has led to a revolution in biology. The rapid and cost-effective generation of high-quality sequence data has transformed many fields, including those focused on discovering species and surveying biodiversity, monitoring movement of biological materials, forensic biology, and disease diagnostics. There is a need to build capacity to generate useful sequence data in countries with limited historical access to laboratory resources, so that researchers can benefit from the advantages offered by these data. Commonly used molecular techniques such as DNA extraction, PCR, and DNA sequencing are within the reach of small laboratories in many countries, with the main obstacles to successful implementation being lack of funding and limited practical experience. Here we describe a successful approach that we developed to obtain DNA sequence data during a small DNA barcoding project in Indonesia.

Entities: Chemical Disease Gene Species

Keywords: DNA barcoding; DNA sequence data; limited resources; molecular biology

Year: 2018 PMID： 30131909 PMCID： PMC6055555 DOI： 10.1002/aps3.1167

Source DB: PubMed Journal: Appl Plant Sci ISSN： 2168-0450 Impact factor: 1.936

DNA barcoding allows the identification of specimens via DNA amplification and sequencing and provides a useful complement to morphology‐based identification methods in that it is rapid, needs only a small amount of tissue from any stage of the life cycle, and can be performed without extensive knowledge of the organisms (Hebert et al., 2003). The increasing ease and decreasing costs of obtaining DNA sequence data has accelerated advances in systematics, taxonomy, community ecology, and conservation (reviewed in Kress et al., 2015); food and wildlife forensics (reviewed in Staats et al., 2016); monitoring of agricultural pests and invasive species (Ashfaq and Hebert, 2016); and a myriad of human health applications including identification of parasites and disease vectors (Ondrejicka et al., 2014). In particular, DNA barcoding has become an increasingly important means to aid efforts to catalog biodiversity, and large consortia affiliated under the International Barcode of Life Project (iBOL; http://ibol.org/) are working toward this goal. Although these organizations have engaged with local partners in biodiversity‐rich regions, smaller local barcoding projects still play an important role in contributing to global barcoding initiatives by facilitating the collection of specimens from less accessible locations and filling in gaps for the larger initiatives (Borisenko et al., 2009). Even if in‐country molecular biology is impossible, local scientists can send tissue samples to the Canadian Centre for DNA Barcoding (CCDB; http://www.ccdb.ca), which is part of the iBOL initiative, for DNA extraction and PCR amplification. As described above, the generation of DNA barcodes has the potential to develop many useful resources for the various stakeholders in these countries, such as identifying species listed in the Convention on International Trade of Endangered Species (CITES; Lahaye et al., 2008) and determining authenticity of traditional Chinese medicines (Han et al., 2016). There is often a disconnect between the locations where the organisms occur and where the sequence data are generated. Most of the world's biodiversity is found in countries that have less well‐developed scientific research infrastructure, whereas DNA sequence data typically have been generated in countries with relatively low levels of terrestrial biodiversity but well‐established infrastructure and a highly trained workforce. There are, however, strong reasons for generating data in the originating countries. The Convention on Biological Diversity (CBD; https://www.cbd.int/) and the Nagoya Protocol on Access and Benefit Sharing lay out a framework for access to genetic resources and benefit sharing (Davis and Borisenko, 2017). Governments of biodiversity‐rich countries have imposed restrictions to limit access to their genetic resources to varying degrees, meaning that in some countries most or all of the molecular biology work must be done in the source country. Beyond the letter of the law, there are also strong ethical and social reasons (in terms of international friendship and collaboration) for foreign scientists to share their expertise and support local efforts. By performing lab work in the country of collection, and fully sharing data, results, and authorship with local scientists, foreign scientists act as true collaborators. The resulting trust is both an investment by the foreign scientists in their own future research opportunities, and a gesture of goodwill that promotes successful science for all involved (Vernooy et al., 2010). In our experience, major barriers to in‐country work are lack of practical experience using the techniques required to generate high‐quality DNA sequence data, insufficient funding coupled with higher costs for reagents, and a lack of infrastructure. The wider aim of our project was to lay the groundwork for an open access digital flora of Gunung Palung National Park, West Kalimantan, Indonesia, that will include DNA barcode data, complete collection information, taxonomic determinations, and high‐quality photographs. We carried out a pilot study (October 2008 to October 2010) in the Molecular Systematics Laboratory at the Herbarium Bogoriense, Research Center for Biology, Indonesian Institute of Sciences (LIPI), Cibinong, West Java, to assess protocols for collecting and processing samples from a wide range of taxa within a single project. Here we detail our workflow for this pilot study and aim to provide practical suggestions for other researchers wishing to establish a molecular biology lab or to increase the volume of samples being handled by an existing lab. We focus on specific barriers that we identified during our pilot study at three key stages of the project: sample collection, laboratory work, and data management and processing. Although the project we discuss here is focused on DNA barcoding of plant species, many of our recommendations apply equally to generating sequence data from other biological material. We present our results with reference to resources that describe essential infrastructure and skills needed by a small laboratory to generate high‐quality sequence data and offer suggestions for troubleshooting. Although we cannot cover every possible scenario and solution, we describe the development of a successful approach for our particular circumstances. We encourage researchers who face similar challenges to begin generating sequence data for their own research projects and applications as well as for international barcoding initiatives.

COLLECTION OF PLANT SAMPLES

Our approach to plant collecting for this project (“A digital flora of Gunung Palung National Park”; http://www.xmalesia.info) was typical of generalist surveys of fertile plants in tropical rainforest. Unlike a subsequent project in the same locality that created permanent forest plots and then sampled and DNA‐barcoded both sterile and fertile trees (the “Xmalesia project,” also at http://www.xmalesia.info; U.S. National Science Foundation [NSF] grant no. 1020868), the project reported here sampled only fertile plants discovered opportunistically during surveys in the forest. The majority of plants included were woody trees, shrubs, and lianas, but some non‐woody herbs were also collected. Ferns were not sampled. In total, 406 specimens were collected. Of these, 371 species‐level identifications were made (with 337 unique species), 26 genus‐level identifications, and six family‐level identifications; three specimens were undetermined (Appendix S1).

Voucher specimens

Every tissue sample taken for DNA analysis should be associated with a voucher specimen (all or part of the plant that becomes a permanent record of the specimen; see Culley, 2013). Funk et al. (2017) recently produced an excellent manual covering all aspects of voucher collection. Vouchers should be collected at least in duplicate, so that one set may be lodged in the country of origin and another set (or sets) can be sent to additional herbaria and/or the relevant taxonomic expert. Vouchers should be physical specimens except in instances where it is impossible to obtain a physical specimen; in such cases, photographic vouchers are an alternative (LaFrankie and Chua, 2015). Numerous photographs should be taken of the individual from which the voucher is made in order to record as many diagnostic features as possible (Baskauf and Kirchoff, 2008), and the vouchers themselves should also be carefully photographed. Some DNA barcoding services, such as the CCDB, offer reduced pricing if good photographic vouchers are provided. Photographic vouchers also have an additional role to play through their potential to accelerate inventories of biodiversity and support fundamental taxonomy (Webb et al., 2010). For our project, fertile collections (of up to five duplicates) were pressed in newspaper in the field and preserved in 70% alcohol obtained from a local pharmacy. Within two weeks, they were dried in a custom‐made aluminum oven heated by a kerosene cooking stove as electricity was not available, similar to the method described in Funk et al. (2017). Dried specimens were then shipped to Herbarium Bogoriense for determination, accessioning, and distribution. For each set of duplicates, a single silica gel–dried leaf sample was also made as described below. The collection and backup of data associated with the sampled specimens (metadata) is as important as collection of the sample itself. Detailed metadata should be recorded immediately in the field in durable field notebooks, which should themselves be photographed regularly to create a backup of the raw data. Data elements should include all the standard plant collection elements: specimen code/number, collector, date, vegetation plot code (if in plot), latitude and longitude (in decimal degrees), elevation, location, microhabitat, vegetation type, plant density (one only, a few, many), reproductive state, sex, size (height and/or diameter), plant habit (tree, liana, etc.), notable morphological features, local name, local uses, taxonomic determination in field, identity of determiner, confidence in determination (low, medium, high), and type of collection (spirit collection, carpological collection, etc.). Options now exist to enter data directly into digital devices. Although this reduces transcription errors, it can increase the chance of total loss of data. Extreme care should be taken to back up these digital records daily in the field, and/or back up to cloud storage if possible.

Tissues for DNA extraction

Tissue for DNA extraction must be collected and processed separately from the voucher specimens described above, as described in detail by Gemeinholzer et al. (2010). Optimal tissues for DNA extraction are healthy, fully expanded leaves that are not senescing. If these leaves are large they should be torn or cut into smaller pieces to increase the drying rate (see below). If cutting, be sure to clean scissors with alcohol between specimens to prevent cross‐contamination. Although DNA extraction from cambium samples has been reported to work well (Colpaert et al., 2005), we recommend using leaf tissue because it is simple to harvest, does not require special equipment such as a cork borer, and is much less invasive. In cases where it is known that the specimens will be found as very tall trees, then preparations to take cork bores should be made, bearing in mind that obtaining permission to core trees in parks and other protected areas may be impossible. We also found that a sling shot was an excellent tool for obtaining fresh leaf material from tall trees, although great care must be taken to confirm that the fallen leaf is actually from the target tree. The critical element when collecting tissue for DNA extraction is that it be dried rapidly because slow drying hastens DNA degradation. Rapid drying is typically achieved by placing the tissue immediately into a desiccant. A simple, effective, and economical desiccant is silica gel (Gemeinholzer et al., 2010; Neubig et al., 2014; Funk et al., 2017). The method we used for drying in silica gel is detailed in Appendix 1. Tissue can also be preserved in hexadecyltrimethylammonium bromide (CTAB) solution (reviewed in Gemeinholzer et al., 2010), or by using salt, CTAB‐salt gel, or RNAlater (Thermo Fisher Scientific, Waltham, Massachusetts, USA; reviewed in Neubig et al., 2014) for transport back to the lab for DNA extraction. An alternative approach is to use products such as Whatman FTA PlantSaver Cards (Whatman, Maidstone, United Kingdom; reviewed by Gemeinholzer et al., 2010; Neubig et al., 2014), where leaf squashes are made onto special paper. Care needs to be taken not to cross‐contaminate samples on the cards during collection. The paper can be used directly as a solid‐state PCR template after only a few simple washes, or the DNA can be eluted from the cards. These can be used very successfully (Siegel et al., 2017), and a similar method using Whatman paper instead of FTA cards has recently been developed and tested by Zou et al. (2017). However, as yields can be low if the DNA is eluted, and there are fewer options for troubleshooting failed PCR reactions that may stem from characteristics of the DNA sample when using the solid‐state method, paper‐based methods may be most useful for projects analyzing specimens where they have been shown to give good PCR results.

LABORATORY PROCEDURES

DNA extraction

Key steps leading to successful DNA extractions are grinding the tissue sufficiently and identifying the best extraction protocol(s) for the purpose at hand. The availability of lab equipment and infrastructure is also a consideration: suggestions for a minimum set of lab equipment and basic molecular biology protocols are given in Appendix 1. Efficient grinding of the plant tissues is the first step toward high yields of DNA. For efficient and simultaneous homogenization of multiple tissue samples, we used a modified version of a grinder based on a reciprocating saw (Alexander et al., 2007; Appendix 1) as an inexpensive alternative to commercially available bead beaters. Pestles and mortars with the addition of molecular‐biology‐grade silver sand to aid grinding by hand can be used as an alternative. If a minimal sample size is needed and the tissues are soft, they can be ground in microfuge tubes with micropestles (e.g., Geneaid catalog no. MP050; Geneaid Biotech Ltd., New Taipei City, Taiwan). The DNA extraction protocol needs to be considered carefully. Plants, especially tropical plants, synthesize a wide range of compounds, such as polysaccharides and polyphenols (Coley and Barone, 1996), that can be co‐purified with DNA and may reduce yield and/or inhibit subsequent PCR reactions. In some cases, the extraction protocol will need to be tailored to meet the specific challenges of the tissue, and it may be difficult to find a single method that works well for all samples. Most widely used DNA extraction methods can be placed into one of two groups: those that use DNA‐binding columns to purify DNA, and those that use chemical methods to partition DNA from cellular contents in solution. DNA‐binding columns are reliable and produce consistent results, require less technical expertise to use effectively, and generate little or no hazardous waste. The major disadvantage is that they can be expensive, although cheaper versions are becoming available, and consideration is needed of the savings in time and labor achieved with kits. If a partition‐based method is chosen, we recommend searching the literature for successes using that particular method to extract DNA from closely related taxa, or from taxa with similar extraction challenges (e.g., excess polysaccharides). There are numerous simple DNA extraction methods that have been used successfully on a variety of samples including cashew and corn (Sika et al., 2015), potato (Hosaka, 2004), Rosaceae (Antanaviciute et al., 2015), and rice (Sajib et al., 2017) that could be tested and may be successful. Otherwise, a CTAB method modified by adding agents to remove specific secondary metabolites is a good starting point; see Allen et al. (2006) and Neubig et al. (2014). Many of these methods require toxic chemicals such as phenol and chloroform, which must be handled in a fume hood and be disposed of safely in accordance with local regulations using established protocols. Safety Data Sheets (SDS) that accompany all purchased chemicals and are available online (e.g., at http://www.sigmaaldrich.com) are a good source of safety information. To find the best extraction protocol for our needs, we assessed two relatively inexpensive and reasonably simple CTAB‐based methods, modified to be carried out in microfuge tubes. Both of these protocols have been used successfully by the LIPI Molecular Systematics Laboratory for taxon‐specific projects. Initially, we extracted DNA from the tissues of 75 specimens using the extraction method of Tel‐Zur et al. (1999), modified by Wendel (Appendix 1). After PCR, 63 specimens did not yield enough PCR product for sequencing both rbcL and matK (discussed in detail below). Therefore, we extracted DNA from these and a further 331 specimens, using the extraction method of Porebski et al. (1997), which generated a smaller volume of hazardous chemical waste but included one extra overnight step compared to the Wendel extraction method (Appendix 1). In total, we extracted DNA from the tissues of 406 specimens. For an additional comparison, we used the column‐based DNeasy Plant Mini Kit (QIAGEN, Venlo, The Netherlands; Appendix 1) to extract DNAs from the tissues of a subset of 48 specimens that were previously subject to CTAB extractions. The molecular biology workflow we used is shown in Figure 1, DNA extraction methods are detailed in Appendix 1, and DNA extraction data are shown in Appendix S1.

Figure 1

Molecular biology workflow used for processing specimens (DNA extraction and PCR amplification) during this study. DNA extraction methods used were after Tel‐Zur et al. (1999) modified by Wendel (https://www.eeob.iastate.edu/faculty/wendel/dna-extraction), Porebski et al. (1997), and the QIAGEN DNeasy Plant Mini Kit (QIAGEN, Venlo, The Netherlands). Two general approaches are widely used to determine the quantity and quality of DNA extracts. Gel electrophoresis of DNA samples and a ladder for quantification allow estimation of DNA concentrations and determination of whether the sample is degraded or contains mostly fragments of high molecular weight. Spectrophotometry allows quantity to be estimated as well as the identification of some common contaminants such as proteins and phenol. We used gel electrophoresis because we did not have access to an appropriate spectrophotometer. We attempted PCR for all samples regardless of the evidence of DNA degradation or low yield that we obtained from the gel, although lower PCR success is expected from attempts to amplify loci from DNAs that are highly degraded.

PCR primers and amplification

Published, taxon‐specific primers for the group of interest are a good starting point for clade‐focused studies. If such primers are not available or, as is the case in our study, a wide range of taxa are being studied, universal primers designed to work across phylogenetically diverse taxa are a good option (e.g., those recommended by the CBOL Plant Working Group 2009). The criteria for CBOL‐recommended primers are based on universality (successful amplification across multiple taxa), sequence quality and coverage (amplification of regions that return high‐quality sequence data), and discrimination (enable the most species to be distinguished). Relevant taxon‐specific primer sequences can still be useful for troubleshooting if the project is broad in scope but poor PCR results are associated with particular taxa. If these approaches are not successful, primers can be designed based on publicly available sequence data. Ideally, sequence alignments should be generated from multiple taxa related to the target taxa so that suitable, conserved regions can be identified as primer sites. Primers can then be designed to amplify the region of interest using software such as PrimerDesign (Brodin et al., 2013) or Primaclade (Gadberry et al., 2005). Lorenz (2012) offers general guidelines for PCR primer design. PCR can be challenging and, in order to achieve reproducible amplification, it is critical to use DNAs of high quality whenever possible, and to always use well‐designed primers and properly prepared and stored reagents. Storing DNA is challenging (Anchordoquy and Molina, 2007) and is discussed in detail, along with details on using frost‐free freezers for storing DNA and other reagents, in Appendix 1. Water quality is often a problem, and if reliable Milli‐Q (MilliporeSigma, Burlington, Massachusetts, USA) or equivalent water is not available, it is recommended to purchase molecular‐biology‐grade water from a reliable reagent company. We selected PCR primers for the plant DNA barcodes rbcL and matK based on recommendations from the CBOL Plant Working Group (2009). Appendix 1 details primer sequences and PCR conditions. We performed two 12.5‐μL PCR reactions for every DNA sample extracted using a CTAB‐based protocol (Fig. 1). Two small‐volume reactions were used instead of one large‐volume reaction to give two independent attempts at amplification while conserving expensive PCR reagents. PCR products were examined using gel electrophoresis as described above. If no PCR product was generated after two attempts, no further PCRs were performed. However, if some product was present, additional PCRs were performed until there was enough DNA for sequencing. There are trade‐offs associated with performing additional PCRs to obtain enough product vs. attempting to optimize the PCR protocol for template and primer combinations that produce marginal yields. Optimization may not be practical when a project, as in this case, samples individuals from across a region or a community. When sampling closely related taxa, however, optimization could ultimately save time and resources. Suggestions for optimization and troubleshooting can be found in Appendix 1. To obtain rbcL barcodes, we performed up to four PCR reactions on 75 DNA samples extracted using the Wendel protocol and up to six PCR reactions on 386 DNA samples extracted using the Porebski protocol (55 samples represent extractions from specimens previously extracted with the Wendel protocol; Fig. 1). In total, we attempted to generate rbcL barcodes from 406 specimens (Appendix S1). We used gel electrophoresis to determine PCR yield; we assigned yields to qualitative categories in order to determine which DNA samples should be targets of additional PCR reactions to accumulate sufficient DNA for sequencing. The categories we used were “no product” when there was no visible product band; “some product” when a faint band of the expected size was visible; and “adequate product” when a bright band of the expected size was visible. These categories were based on empirical results from sequencing faint vs. bright bands. We used the same categories as described above to categorize pooled DNA from multiple PCR reactions in order to send samples for sequencing (Fig. 1). Regardless of the DNA extraction method used, we most commonly needed to carry out three or four 12.5‐μL PCR reactions to obtain enough PCR product for sequencing. Of the 75 samples extracted with the Wendel protocol, 19 were sequenced, and of the 386 samples extracted with the Porebski protocol, 76 were sequenced. A summary of these data is shown in Figure 2. A further two specimens were sequenced by pooling the PCR products from both Wendel and Porebski extractions, giving a total of 97 barcodes generated from 406 specimens (24%).

Figure 2

Success rates for rbcL and matK barcodes using DNA extracted using either the Porebski or Wendel CTAB methods. Yields from pooled PCR products for each extraction method were divided into three categories (no product, some product, or adequate product) and expressed as a percentage of the total number of PCR reactions performed for each combination of DNA extraction method and PCR target. DNA extraction methods used were after Tel‐Zur et al. (1999) modified by Wendel (https://www.eeob.iastate.edu/faculty/wendel/dna-extraction), Porebski et al. (1997), and the QIAGEN DNeasy Plant Mini Kit (QIAGEN, Venlo, The Netherlands). To obtain matK barcodes, we performed up to six PCR reactions on 73 samples extracted using the Wendel protocol and up to seven PCR reactions on 386 samples extracted using the Porebski protocol (56 samples represent extractions from specimens previously extracted with the Wendel protocol; Fig. 1). In total, we attempted to generate matK barcodes from 405 specimens (Appendix S1). As described above, PCR products for each extraction method were divided into three categories (no product, some product, and adequate product) based on yield estimated by gel electrophoresis. The PCR results are summarized in Figure 2. We most commonly needed to perform two (Wendel) or four (Porebski) 12.5‐μL PCR reactions per sample to obtain enough product for sequencing. Of the 73 samples extracted with the Wendel protocol, 18 were sequenced, and of the 386 samples extracted with the Porebski protocol, 116 were sequenced. A further 10 specimens were sequenced from pooled products from both Wendel and Porebski extractions, giving a total of 144 barcodes from 405 specimens (35%). Overall, the Wendel and Porebski DNA extraction methods performed similarly (Fig. 2). The DNAs extracted using the QIAGEN DNeasy Plant Mini Kit protocol (Appendix 1) were each subject to a single PCR reaction (Fig. 1). A single PCR reaction from the corresponding CTAB‐extracted DNA was carried out at the same time. As before, PCR products were divided into three categories (no product, some product, adequate product) based on yield estimated by gel electrophoresis. The success of these single PCR reactions for matK and rbcL are shown in Figure 3, and complete details are given in Appendix S1. In terms of DNAs that could be used to generate PCR product, the QIAGEN‐extracted DNA performed similarly to the CTAB‐extracted DNA. Using the DNAs extracted using the QIAGEN kit, we generated an additional 18 rbcL sequences to give a total of 115/406 specimens (28%) and 10 matK sequences to give a total of 154/405 specimens (38%). GenBank accessions are given in Appendix S1.

Figure 3

Success rates for rbcL (A) and matK (B) barcodes using DNA extracted using either CTAB‐based or QIAGEN column‐based methods. PCR products generated from a single PCR reaction using either QIAGEN‐extracted or CTAB‐extracted DNA were divided into three categories (no product, some product, or adequate product) and expressed as a percentage of the total number of PCR reactions performed for each DNA extraction method. DNA extraction methods used were after Tel‐Zur et al. (1999) modified by Wendel (https://www.eeob.iastate.edu/faculty/wendel/dna-extraction), Porebski et al. (1997), and the QIAGEN DNeasy Plant Mini Kit (QIAGEN, Venlo, The Netherlands). Although PCR failure rates appear high, the specimens that had at least some PCR product (Fig. 2, Appendix S1) could likely be sequenced after PCR optimization to increase yield. A total of 51 specimens had some PCR product for rbcL (combined from all three DNA extraction methods). PCR optimization and successful sequencing of these would increase the overall success rate to 41%. Similarly, there were 75 specimens for matK, which if successfully sequenced, would increase the overall success rate to 57%. As discussed above, plant taxonomic groups differ by the presence of compounds that hinder DNA extraction and amplification, and universal primers may not work for all families. Therefore, we expected our overall success to vary among plant families. We found significant association of taxonomic family with overall success of generating DNA barcodes for both matK and rbcL (respectively, χ 2 = 57.6, df = 11, P = 2.57 × 10−8; χ 2 = 25.5, df = 11, P = 0.00768; Table 1, Appendix S1). It is important to note the almost total failure of samples from Clusiaceae and Phyllanthaceae for both markers, and the differences in success between matK and rbcL for Annonaceae and Myristicaceae.

Table 1

Summary of overall success in generating DNA barcodes for matK and rbcL by taxonomic family (abundant families only)

Familya	matK		rbcL
Familya	Barcode generated	Barcode not generated	Barcode generated	Barcode not generated
Annonaceae	18	9	1	26
Apocynaceae	7	7	3	11
Clusiaceae	0	11	1	10
Dipterocarpaceae	12	8	8	12
Lauraceae	6	5	1	10
Meliaceae	7	8	6	9
Moraceae	7	14	7	14
Myristicaceae	12	2	0	14
Phyllanthaceae	1	30	7	24
Primulaceae	4	9	5	8
Rubiaceae	7	28	9	26
Other	73	120	67	127
TOTAL	154	251	115	291

Families differed significantly in success rate (matK: χ 2 = 57.6, df = 11, P = 2.57 × 10‐8; rbcL: χ 2 = 25.5, df = 11, P = 0.00768). See Appendix S1 for full lists of success by family.

Summary of overall success in generating DNA barcodes for matK and rbcL by taxonomic family (abundant families only) Families differed significantly in success rate (matK: χ 2 = 57.6, df = 11, P = 2.57 × 10‐8; rbcL: χ 2 = 25.5, df = 11, P = 0.00768). See Appendix S1 for full lists of success by family. Overall, our data suggest that multiple extraction methods can be used successfully, indicating that other factors, such as kit costs, access to appropriate chemicals and infrastructure, and previous successful experience with similar samples, should be considered when choosing a method.

Reducing contamination

Contamination can be a major problem in any molecular biology laboratory. Previously amplified PCR products are of particular concern because they may amplify much more readily than the original target locus, which may be located in a long fragment of genomic DNA. The lab should be laid out in a way that minimizes the risk of contamination. Ideally, there should be separate rooms with separate equipment and micropipettes for DNA extraction vs. PCR and all post‐PCR processes. If this is not possible, separate areas of the lab with separate micropipettes should be used for DNA extraction and PCR. Filter tips effectively reduce the amount of cross‐contamination by aerosols during pipetting and should be used if at all possible. The additional cost of filter tips is offset by reducing the generation of unusable data. Pipettes should be cleaned regularly, and fresh gloves should be worn at all times and changed frequently. It is very easy for fluids, or aerosols from fluids, to adhere to skin or gloves, and to be transferred to the next processing step. Care should be taken when handling specimens so as not to spread leaf fragments around the work area, or to cross‐contaminate samples. Forceps for sample manipulations can be sterilized by flaming or cleaned in alcohol. Negative controls (complete reaction mixes without DNA template) should be included in every set of PCR reactions to allow contamination to be detected quickly before costly sequencing is performed. The keeping of detailed records in log books on all PCR experiments is indispensable to the task of finding the source of contamination. If access to automated processing of samples is available, this presents further possibilities for reduction in contamination as well as for increasing reproducibility.

DNA sequencing

The cost of DNA sequencing continues to decrease, and more sequencing services and platforms are becoming available. High‐throughput sequencing of barcodes (e.g., Liu et al., 2017) and metabarcoding (Deiner et al., 2017) are good options for barcoding projects that target a very high number of samples and/or ecological networks. Even whole genome shotgun sequencing at low coverage to “skim” the organellar and high‐copy nuclear loci from the sequencing reads is becoming cost‐effective (Twyford and Ness, 2017). For projects that target a small number of barcodes from specimens numbering in the hundreds to a few thousand, Sanger sequencing remains a reasonable option. The main decision is whether to outsource the sequencing of PCR‐generated barcodes, or to complete it within the institution. We recommend outsourcing to a high‐quality, affordable sequencing service as it is often cheaper than importing reagents, performing repeat reactions and troubleshooting, and maintaining instruments. Sequencing services are also in a much better position than are individual laboratories to keep up with the rapid pace of technological change in DNA sequencing approaches. Various companies offer single‐pass sequencing from as little as US$3 per sample. There are usually even greater discounts for submitting larger numbers of samples in plate format, and free shipping is available for submitting larger, but still modest, numbers of samples. Additional services such as PCR product purification are also offered by many companies, which may be more cost‐effective than importing reagents. Unlike specimens or genomic DNA, PCR products for DNA sequencing can usually be sent out of the country of origin because the samples are only a small fragment of the genome, which cannot be used for other purposes, and the sequencing reaction uses up the entire sample. We used the Sanger sequencing service at Macrogen Korea, where the requirements for sample submission were 25 μL of product at 100 ng/μL, plus 2 μL of the sequencing primer at 10 pmol/μL. Macrogen also offers a reasonably priced primer synthesis option. Shipping is free for more than 20 reactions, and one free repeat reaction is provided for failed samples, making this a very cost‐effective way to generate sequence data for a small‐scale laboratory. High‐quality sequencing data can usually be obtained when appropriate quantity and quality standards are met, although certain sequence characteristics (e.g., high GC content and presence of simple sequence repeats) can interfere. Guidelines for quantity and quality typically are available from sequencing services, and an excellent resource for troubleshooting DNA sequence traces has been made available by the Nucleic Acid PCR Research Core Facility (NAPCore Facility, Philadelphia, Pennsylvania, USA; https://napcore.research.chop.edu/problems.php).

DATA MANAGEMENT AND PROCESSING

Sample data management

It is important to keep detailed log books for notes about all aspects of the laboratory work. Although the sequences generated will be submitted to publicly accessible data repositories such as GenBank (Benson et al., 2013) with electronically recorded metadata, good log books trace the history of the samples as they are processed and include all the details needed to repeat the experiments and perform troubleshooting effectively. Spreadsheets or a database should be used to track all samples received by the laboratory, along with their metadata. Spreadsheets should be used to record which samples have been processed and their stage in the processing workflow (e.g., DNA extraction, PCR amplification, clean up, sequencing). The spreadsheet should be available to all users for addition of data as they are generated. It is essential that every user is diligent about adding their data in a timely fashion to prevent duplication, particularly when working with large numbers of samples. As with any file that is edited by several users, great care must be taken to (1) track the “master copy” and (2) make frequent backups. These issues are of less concern if shared online applications are used (e.g., Google Docs, Office 365, iCloud). However, risks associated with using a spreadsheet as a database remain, and all users should be careful to avoid these hazards (formatting and validation errors, sorting only a subset of columns and thus destroying the records’ integrity, etc.). Optimally, the spreadsheet should use data validation for all columns. Guidelines for using spreadsheets for data storage have been detailed by Broman and Woo (2017).

DNA sequence processing and storage

After generating high‐quality sequence traces, contigs need to be trimmed, assembled, and processed. Both raw data and edited files should be stored, and everything should be regularly backed up. Although access to expensive software for processing sequence data such as DNAStar (DNASTAR Inc., Madison, Wisconsin, USA) or Sequencher (Gene Codes Corporation, Ann Arbor, Michigan, USA) can be a major obstacle for laboratories with limited funding, there exist many free (and open source) alternatives. For viewing sequence traces in .abi format, one good free option is FinchTV (Seattle, Washington, USA), which is available for Mac, Windows, and Linux (https://digitalworldbiology.com/FinchTV). Consed, Phred, and Phrap are a free suite of programs that run on both Mac and Linux and can be used to automate base calling and quality control from sequence traces, assemble sequences, and edit sequence assemblies (Gordon et al., 1998; Ewing and Green, 1998; Ewing et al., 1998; http://www.phrap.org/phredphrapconsed.html). Additionally, there are several relatively inexpensive programs, such as ChromasPro (Technelysium, South Brisbane, Australia; Windows and Mac, used in this study) and Geneious (Biomatters Ltd., Auckland, New Zealand; Mac, Windows, and Linux), that can be used for contig assembly from traces and that allow manual editing of base calls. Geneious also includes a range of other tools for bioinformatics such as making alignments, building trees, restriction enzyme mapping, and next‐generation sequencing analysis. An excellent free program for labeling (color coding) sequence text files and restriction enzyme mapping is ApE (http://www.biologylabs.utah.edu/jorgensen/wayned/ape/; Mac and Windows). Once the sequence data had been generated and processed, we used the Barcode of Life Database (BoLD; http://www.boldsystems.org/) to integrate and manage metadata and sequences, and we strongly recommend using this platform. BoLD facilitates sequence submission to GenBank when all of the requisite metadata for a sequence have been assembled.

CONCLUSIONS

DNA barcoding remains a useful tool for studying biodiversity in the age of genomics (Hebert et al., 2016), for example, to provide short sequence tags for community and landscape samples (Miller et al., 2016) and to increase the efficiency of taxonomic practices (e.g., Williams et al., 2014; Wood et al., 2015). The approach is accessible to small laboratories, regardless of the scientific question at the center of the research. Currently, the main obstacles to successful generation of sequence data in resource‐limited settings are limited access to funding and training. The suggestions presented here are designed to be pragmatic and feasible in these situations and are based on our particular set of circumstances in Indonesia. The main areas for consideration are sample collection, laboratory work, and the management and analysis of sequence data. Just as the need for locally or institutionally based sequencers has decreased or disappeared, it is likely that, in the future, more general needs for laboratory infrastructure will continue to decrease. Advances are being made in all relevant areas, including DNA amplification (e.g., isothermal PCR [Boyle et al., 2013; Tröger et al., 2015] and its incorporation into handheld devices [Tsaloglou et al., 2018]) and field‐based DNA sequencing (Parker et al., 2017), making it even easier for smaller efforts to have large in‐country impacts on biodiversity science. Appendix S1 Click here for additional data file.

35 in total

1. Primaclade--a flexible tool to find conserved PCR primers across multiple species.

Authors: Michael D Gadberry; Simon T Malcomber; Andrew N Doust; Elizabeth A Kellogg
Journal: Bioinformatics Date: 2004-11-11 Impact factor: 6.937

2. The front-end logistics of DNA barcoding: challenges and prospects.

Authors: Alex V Borisenko; Jayme E Sones; Paul D N Hebert
Journal: Mol Ecol Resour Date: 2009-05 Impact factor: 7.090

3. Consed: a graphical tool for sequence finishing.

Authors: D Gordon; C Abajian; P Green
Journal: Genome Res Date: 1998-03 Impact factor: 9.043

4. A foundation monograph of Convolvulus L. (Convolvulaceae).

Authors: John R I Wood; Bethany R M Williams; Thomas C Mitchell; Mark A Carine; David J Harris; Robert W Scotland
Journal: PhytoKeys Date: 2015-06-18 Impact factor: 1.635

Review 5. Environmental DNA metabarcoding: Transforming how we survey animal and plant communities.

Authors: Kristy Deiner; Holly M Bik; Elvira Mächler; Mathew Seymour; Anaïs Lacoursière-Roussel; Florian Altermatt; Simon Creer; Iliana Bista; David M Lodge; Natasha de Vere; Michael E Pfrender; Louis Bernatchez
Journal: Mol Ecol Date: 2017-10-26 Impact factor: 6.185

6. A modified protocol for rapid DNA isolation from plant tissues using cetyltrimethylammonium bromide.

Authors: G C Allen; M A Flores-Vergara; S Krasynanski; S Kumar; W F Thompson
Journal: Nat Protoc Date: 2006 Impact factor: 13.491

7. Rapid detection of HIV-1 proviral DNA for early infant diagnosis using recombinase polymerase amplification.

Authors: David S Boyle; Dara A Lehman; Lorraine Lillis; Dylan Peterson; Mitra Singhal; Niall Armes; Mathew Parker; Olaf Piepenburg; Julie Overbaugh
Journal: MBio Date: 2013-04-02 Impact factor: 7.867

Review 8. Advancing taxonomy and bioinventories with DNA barcodes.

Authors: Scott E Miller; Axel Hausmann; Winnie Hallwachs; Daniel H Janzen
Journal: Philos Trans R Soc Lond B Biol Sci Date: 2016-09-05 Impact factor: 6.237

9. Evaluation and comparison of FTA card and CTAB DNA extraction methods for non-agricultural taxa.

Authors: Chloe S Siegel; Florence O Stevenson; Elizabeth A Zimmer
Journal: Appl Plant Sci Date: 2017-02-15 Impact factor: 1.936

Review 10. Strategies for complete plastid genome sequencing.

Authors: Alex D Twyford; Rob W Ness
Journal: Mol Ecol Resour Date: 2016-11-28 Impact factor: 7.090

2 in total

1. EMBL2checklists: A Python package to facilitate the user-friendly submission of plant and fungal DNA barcoding sequences to ENA.

Authors: Michael Gruenstaeudl; Yannick Hartmaring
Journal: PLoS One Date: 2019-01-10 Impact factor: 3.240

Review 2. Strategies for reducing per-sample costs in target capture sequencing for phylogenomics and population genomics in plants.

Authors: Haley Hale; Elliot M Gardner; Juan Viruel; Lisa Pokorny; Matthew G Johnson
Journal: Appl Plant Sci Date: 2020-04-14 Impact factor: 1.936

2 in total