Literature DB >> 32548494

Development and Verification of an Economical Method of Custom Target Library Construction.

Xinyao Miao¹, Bowen Li², Yuesheng Shen³, Huiyun Yu⁴, Guoqiang Zhu⁵, Chen Liang⁶, Xiao Fu⁷, Chu Wang⁸, Shengbin Li¹, Bao Zhang¹.

Abstract

Although technological advances have greatly reduced the cost of DNA sequencing, sample preparation time and reagent costs remain the limiting factors for many studies. Based on low-cost targeted amplification, we developed an economical method for custom target library construction based on DNA nanoball (DNB) technology and two-step polymerase chain reaction (PCR). Here, we refer to this method as the two-step PCR, which was compared to traditional multiplex PCR methods in three aspects, data quality, efficiency, and specificity to humans. The results confirmed that two-step PCR reduces to finishing 128 sequencing libraries in only 2 h 24 min 59 s of the total PCR time and at a data utilization rate of 0.44 at a cost of approximately $1.70 per sample for targeted sequencing via the two-step PCR. The replacement of traditional multiplex PCR methods with this strategy makes the sample preparation process before sequencing relatively more cost-effective and further reduces the cost of next-generation sequencing (NGS). This method may also be free from the interference of other species and the limitations of sample type and DNA content. These findings reveal possibilities for broad applications of this approach in forensic research.

Entities: CellLine Chemical Disease Gene Species

Year: 2020 PMID： 32548494 PMCID： PMC7288555 DOI： 10.1021/acsomega.0c01014

Source DB: PubMed Journal: ACS Omega ISSN： 2470-1343

Introduction

Since the introduction of the Sanger sequencing method for genome sequencing, demands for the transmission of fast, accurate, and inexpensive genomic information have continued to increase.[1,2] For this reason, revolutionary next-generation sequencing (NGS) technology has emerged, which has revolutionized almost all areas of biology, agriculture, and medicine and has been widely used to analyze genetic variation.[3] The simplicity of the NGS pipeline could still be improved to further decrease the overall cost, although the cost of sequencing per base has decreased 5-fold in the last 10 years.[4,36] Prior to sequencing, DNA extraction and library preparation steps are required, which can be laborious and time-consuming processes.[5] Compared to whole-genome sequencing and whole-exon sequencing, the method of selectively capturing genomic regions from DNA samples prior to sequencing (targeted NGS) based on polymerase chain reaction (PCR) or hybridization capture can generate smaller and easier-to-manage data and help analyze data sets for the target locus, thereby reducing the difficulty of data analysis and saving time, cost, and effort.[6−9] However, the hybridization-capture-based targeted NGS approach has drawbacks such as limited design flexibility, high cost, and protocol complexity, which limit its application to sequencing analyses requiring low cost and high efficiency.[6] NGS usually needs to be combined with simplified DNA extraction and library preparation methods to improve efficiency and reduce economic costs.[10] The traditional NGS library preparation process consists of three primary steps: fragmentation, adapter ligation, and amplification. The species specificity of PCR mainly depends on the primer specificity. In addition, the sequence similarity of similar species is very high, especially within the same genus, and gene penetration may occur among different species.[1,2] Selecting the appropriate gene fragments according to the order of the species to be distinguished should be highly conservative within species and highly variable between species.[3] When the number of related species for identification increases and the phenomenon of gene penetration among related species occurs, finding suitable genes and designing specific primers that amplify only one species and accurately distinguish species are difficult. Thus, in this process, the multiplex PCR method has been widely used because the efficiency and species specificity of multiplex PCR are higher than those of traditional PCR methods.[11,12] In forensic science, due to the value of each sample and the high requirements for testing time, scholars have developed direct PCR, which enables PCR to be performed directly from samples such as whole blood, blood cards, and saliva.[13,14] Based on existing PCR and PCR-derived techniques (normal PCR, multiplex PCR, direct PCR, etc.), targeted sequencing has increasingly been applied. There are a number of commercial solutions for applications in forensic science that have been developed by manufacturers based on high multiplex PCR (high-heavy PCR) library construction, such as Illumina MiSeq FGx, which is the first system developed for the preparation and sequencing of targeted libraries for forensic genomes. MiSeq FGx still has potential shortcomings in some respects: (1) although MiSeq FGx supports the input of a 1.2 mm fluorescent treponemal antibody (FTA) blood card, it requires preprocessing; (2) more than 9 h of library preparation time is required; and (3) for the Chinese market, each sample requires a cost of USD 94.26.[15,16] Herein, a simple but robust approach for direct library construction is described, which may increase the efficiency of the NGS pipeline. Thus, we can directly and cost-effectively complete the cumbersome library construction process while taking into consideration accuracy and uniformity. At the same time, the stability of the two-step PCR approach in different species and types of samples is equally important. The above properties made it possible to improve the scope of PCR in the preparation of NGS libraries via this approach.

Results and Discussion

Results

As a direct library construction method of presequencing, the performance of the method was verified in three aspects (data quality, efficiency, species specificity) and compared with traditional library construction methods of presequencing (multiplex PCR library preparation). In addition, DNA extraction and sequencing processes were also included in the comparison.

Quality of the Sequencing Data

Loci Detection Rates

We calculated the detection rates of correct loci for the sequencing data generated by our method and the traditional library construction approach. Two identical human samples (six samples) were used in both approaches. The results showed that the average loci detection rates of the two approaches were almost the same, approaching 100% (Table ). There was no significant difference in the loci detection rate between the two approaches (P > 0.05).

Table 1

Quality of the Sequencing Data from the Two Different Approaches

approach	correct loci detection rate (%)	DOC	CV (filtered)	uniformity frequency
traditional library construction	99.84	14 557.31	1.217	0.726
our approach	99.68	9414	0.684	0.851

Depth Uniformity

The uniformity of the sequencing data generated by our approach and by traditional multiplex PCR library construction was measured in different aspects (Table ): (1) the depths of coverage (DOCs) of the two methods were 14 557.31 and 9414, respectively, and relatively higher DOCs guaranteed the quality of sequencing; (2) the coefficient of variation (CV) values were significantly different in these two approaches (P = 0.0099, P ≤ 0.01); (3) uniformity frequencies were also significantly different between the two groups (P = 0.0277, P ≤ 0.05); and (4) loci depth distribution of the sequencing data obtained by the two methods exhibited some differences. For the two different methods, the slopes of the fitting curve of the loci depth distribution were −0.0024 and −0.0035 (Figure ).

Figure 1

Loci depth distribution of the two-step and traditional multiplex PCR library construction methods. The abscissa indicates the number of amplicons, and the ordinate indicates the log value of the depth. The depth distribution of the data generated by the two-step PCR approach is represented by a black line, and the gray line represents the data generated by the multiplex PCR approach. The corresponding dashed lines represent different linear fits of the two approaches. The slopes of the two-step PCR method and the method of constructing the library after ordinary PCR are 0.0024 and −0.0035, respectively.

Cost-Effectiveness

For efficiency reasons, the temporal cost, the economic cost, and data utilization were the primary considerations for the two-step PCR method. The use of two-step PCR not only shortened the time spent on library construction and DNA extraction but also reduced the monetary cost and data utilization.

Temporal Cost

Temporal Cost of PCR Under the same targeted amplification step, according to the respective protocol, we compared the total PCR time required for the two library preparation methods with all objective influencing factors being excluded, such as laboratory, equipment, operator, and DNA samples. The PCR times for the two-step PCR and the traditional multiplex PCR methods (based on standard BGISEQ-500 library preparation) were 2 h 24 min 59 s and 4 h 7 min 22 s, respectively (Table ). Two-step PCR is a direct way to construct a target library by targeted and undifferentiated amplification. Traditional library construction did not require undifferentiated amplification but three steps of library preparation: end repair, adapter ligation, and pre-PCR. The time spent on reagent preparation, purification, and quantification was not included in the statistical scope because of individual differences. However, because traditional library construction requires more steps than the two-step PCR method, the total time required is correspondingly higher.

Table 2

PCR Time Required by the Two Approaches for Library Construction

steps/approaches	two-step PCR	traditional multiplex PCR
targeted amplification	1 h 54 min 20 s	1 h 54 min 20 s
undifferentiated amplification	30 min 39 s	0
library preparation	0	2 h 13 min 2 s
total time	2 h 24 min 59 s	4 h 7 min 22 s

Temporal Cost of DNA Extraction DNA extraction, which happens before library preparation, is also a time-consuming step. Therefore, the elimination of the DNA extraction step may be a potential method to improve the efficiency of NGS. We used six dry blood cards (diameter = 3.0 mm) as DNA sources without any preprocessing. NH9719 (Nuhighbio, Suzhou, China) was selected and tested for the PCR buffer system, which can effectively reduce the impact of various PCR inhibitors, such as proteins and urban dust. To reflect the depth of each locus, we showed sequencing alignment data in detail (Figure ). The average detection rates of correct loci and DOCs were 94.25% and 18 034, respectively. Detailed genotyping results can be found in the Supporting Information 1). Additionally, there were no significant differences in depth distribution between different samples (P > 0.05). The results demonstrated that the two-step PCR method could maintain stability even without DNA extraction.

Figure 2

Loci depth distribution after direct two-step PCR with six dry blood cards. Figure a–f shows the loci depth distribution of two-step PCR with the six blood card samples used in this study. The abscissa indicates the log value of the depth, and the ordinate indicates the number of amplicons.

Figure 4

Effect of adding/not adding Pfx DNA polymerase on the uniformity of the sequencing data. The gray and black columns represent the data results of the addition and absence of Pfx DNA polymerase in the second-step PCR system, respectively. The average CV values of the two methods are 1.1 and 0.67 (P ≤ 0.05), respectively. The uniformity frequencies of the two methods are 0.82 and 0.88 (P ≤ 0.05), respectively.

Monetary Cost

The economic cost of a target library construction method is also not negligible in the efficiency evaluation. We calculated the cost of reagents and primers for library preparation by two methods. The costs of a single reaction of the two-step and traditional library construction methods were $1.70 and $6.15, respectively (Table ). Therefore, it may be more economical to construct a target library using two-step PCR than using traditional multiplex PCR.

Table 3

Money Spent on Library Construction for the Two Approaches

cost (USD/single reaction)	two-step PCR	traditional Multiplex PCR
primers	$0.90	$0.45
reagents	$0.80	$5.70
total	$1.70	$6.15

Data Utilization Rate

The utilization rate of sequencing data reflects efficiency. The utilization rate of sequencing data is an indispensable indicator to reflect efficiency. We calculated the data utilization rates in two groups of data generated by the two-step and traditional multiplex PCR library construction approaches with the same samples. As displayed in Figure , the average data utilization rates were 0.36 and 0.44, respectively (Figure ). There was a significant difference in the data utilization rate of the two different approaches (P = 0.0433, P ≤ 0.05).

Figure 3

Data utilization rates of the two different methods. Black and gray columns represent the data utilization of the traditional PCR method and the two-step PCR method, respectively. The data utilization rates of the two methods are 36 and 44%, respectively, which are significantly different (P ≤ 0.05). Previous studies have shown that uniform sequencing data might reduce data storage and increase data utilization.[17] The addition of 0.2 U of Platinum Pfx DNA polymerase (Pfx) to the second-step PCR may be an important reason for obtaining sufficient depth uniformity. Targeted library preparation was performed using a two-step PCR method for a group of identical samples (six human DNA blood samples). In the enzyme system of the second-step PCR, Pfx-containing and non-Pfx-containing operations were performed (Figure ). Sequencing data produced from samples supplemented with Pfx had obviously lower CV values (0.67, P = 0.02, P ≤ 0.05) and higher uniformity frequencies (0.88, P = 0.02, P ≤ 0.05) than the data produced from samples without Pfx supplementation. Based on this result, the considerable data utilization posed by the two-step PCR method could be easily explained. Effect of adding/not adding Pfx DNA polymerase on the uniformity of the sequencing data. The gray and black columns represent the data results of the addition and absence of Pfx DNA polymerase in the second-step PCR system, respectively. The average CV values of the two methods are 1.1 and 0.67 (P ≤ 0.05), respectively. The uniformity frequencies of the two methods are 0.82 and 0.88 (P ≤ 0.05), respectively.

Species Specificity

In forensic research, DNA from other species can affect sequencing data, especially when a library is constructed using multiplex PCR. The two-step PCR method designed in this study aimed to address the problem of DNA contamination from other species. All primers of the two-step PCR were designed based on the Homo sapiens hg38 genome, which was species specific for human DNA. Fecal DNA of nine primates (gibbons) was taken as test samples. With the same amount of DNA input, the products of the traditional multiplex PCR library were 48.47 ng, and the products of the two-step PCR library were lower than the detection ranges of Qubit 3.0 (Table ). The PCR amplification efficiency and DOC of the two-step PCR data decreased by 86.04 and 91.74%, respectively, compared with those of the traditional multiplex PCR data. The specific genotyping results generated by traditional multiplex PCR and two-step PCR approaches are discussed in the Supporting Information2 and 3. There were obvious significant differences between these two approaches in PCR amplification efficiency and in DOC (P < 0.0001). The above results suggested that the use of a two-step PCR method for NGS library construction might increase species specificity.

Table 4

Comparison of the Species Specificity of the Two Library Construction Methods Using Nonhuman DNA (Gibbon)

approach	PCR amplification efficiency (%)	average DOC	average amount of DNA product after PCR
multiplex PCR	79.07	1790.47	48.47 ng
two-step PCR	11.04	147.83	below detection range

Discussion

Data-quality control is a key step in ensuring successful and meaningful research.[18] However, there is currently no quality control standard for NGS-multiplex PCR data.[19] Based on the Ion Torrent PGM platform, only the proportion of accurate profiles (i.e., the correct loci detection rate) is generally evaluated.[20,21] Therefore, by introducing the corresponding indicators (loci detection rate, DOC, CV, uniformity frequency), we could evaluate the quality of the multiplex PCR data from different perspectives (accuracy and uniformity).[22] The detection rates of correct loci in the sequencing data produced by the two libraries were close to 100%, which indicated that the accuracy of the two-step PCR method was similar to that of the traditional multiplex PCR library construction. In addition, CV, uniformity frequencies, and the slopes of the fitting curve of the loci depth distribution were all indicators of uniformity. Two-step PCR data had reduced CV values and uniformity frequencies, which indicated reduced dispersion (P ≤ 0.01, P ≤ 0.05). In addition, the data produced by two-step PCR had a smaller slope in the locus depth distribution than the data produced by traditional multiplex PCR. The above indicators suggested that the data produced by two-step PCR might be uniform. The cost-effectiveness of the experimental methodology, data storage, and report generation in NGS needs to be considered.[23,24] Therefore, the limiting factor for many experiments is the temporal and monetary cost of sample preparation.[25] Sample preparation before sequencing generally consists of two parts: DNA extraction and library construction. For DNA extraction, the operation time of different DNA extraction methods is 1.15–3.15 h, and the average cost per sample is $2.75 to $20.31 (approximately €2.5 to €18.45).[26] Moreover, research on direct amplification is usually limited to the capillary electrophoresis (CE) technology, which is rarely involved in NGS.[27,28] Although the FGx operating manual claims that it can process blood cards directly, there is still preprocessing, including reagent preparation and multiple centrifugation.[16] The PCR method itself is not too limited by the purity of DNA, and alkaline environments have proven to be more conducive to PCR reactions in forensic samples, such as stains of blood.[37] Based on the above, we tried to use alkaline buffer in the two-step PCR, which may enable our method to achieve direct amplification using only blood cards. As we expected, the results showed that the DNA extraction process in sample preparation can be omitted under special circumstances (only blood cards as the only source of template DNA). Omittable DNA extraction steps can enable two-step PCR to be applied to complex forensic evidence studies and has considerable cost-effectiveness in terms of time and money. For library construction, compared with the traditional multiplex PCR method based on the BGISEQ-500 platform, the two-step PCR method saved 41.39% of the temporal cost and 72.36% of the monetary cost. According to our market research, Illumina’s ForenSeq DNA Signature Kit has a high market price of $94.26 per reaction and requires more than 9 h of hands-on time.[16,29] In addition to our two-step PCR approach, there are many PCR-based barcoding procedures that use large fusion primers for library construction. However, with the second PCR in our approach, which is undifferentiated amplification, barcode primers can be used repeatedly for barcoding diverse amplicons of interest; thus, costly investment into individual barcoded primer sets for each target gene is not required. As a result, the burden of library construction can be alleviated.[42−44] BGISEQ-500 is based on the DNA nanoball (DNB) sequencing technology. DNBs were generated from the single-stranded DNA (ssDNA) circle using rolling circle amplification (RCA) to enhance the fluorescent signals in the sequencing process.[39] Unlike PCR index amplification, RCA amplification errors do not accumulate and there is no PCR bias because the original ssDNA circle is the only template during the entire amplification process.[30,39,45−48] To some extent, BGISEQ-500 is more suitable for the sequencing of targeted PCR libraries. The quality, capacity, and storage of NGS sequencing data also affect cost-effectiveness.[24] Compared with traditional methods, two-step PCR improved data utilization by 18.18%. In addition, most of the technology patents for multiplex PCR require Pfx DNA polymerase.[31,32] In our previous tests, Pfx (compared to NH9007) was not suitable for multiplex PCR; therefore, we chose to add Pfx in the second-step PCR instead of the first-step PCR. More uniform sequencing data suggested that the improved two-step PCR may be more cost-effective than the traditional multiplex PCR method. In general, a novel library construction method for target sequencing based on multiplex PCR was reported, which was first changed by simplifying the steps for library preparation. After simplification, only two PCR steps were required, and the DNA-processing step could be skipped simultaneously. These steps make the method more cost-effective in terms of reagents and time consumption. Significantly reduced sequencing costs stem from the determination of sequencing directions through the introduction of universal sequences. Other changes were observed in the enzyme system of the second-step PCR. Due to high fidelity, the addition of Pfx improved the uniformity in depth by a certain degree, which may be the main reason for the increase in data utilization for cost-effectiveness. The introduction of phosphate groups and the maintenance of an alkaline environment in the reactions for DNB technology may give the two-step PCR method an advantage in accuracy compared to other sequencing platforms. The species specificity was studied to testify that other biological sources did not interfere with the ability to obtain reliable results on samples recovered from crime scenes,[38] which provides an important support for a robust approach. In a developmental validation study of the MiSeq FGx forensic genomics system, species specificity tests were carried out for rhesus monkeys and baboons, and the loci detection rates were 11 and 25%, respectively.[33] However, rhesus monkeys and baboons belong to the macaque family, which is more distantly related to humans than the gibbon samples we used.[34] In our study, the PCR amplification efficiency of two-step PCR using gibbon samples (more similar to Homo sapiens) was 11.04%, which indicated the species specificity of this method. Thus, the two-step PCR method might maintain the stability of sequencing data in the presence of primate DNA contamination.

Conclusions

Based on the increasing demand for cost reduction in NGS technologies, we developed and evaluated an approach for economical library construction. The results showed that the two-step PCR method designed in this study can solve the problems of common multiplex PCR methods, such as low time efficiency, high monetary expense, and data utilization. In addition, the use of two-step PCR could also eliminate pollution sources from nonhuman samples to the greatest extent. Admittedly, this method could be improved in several aspects: (1) we need to design a two-step PCR system based on other NGS platforms; (2) we will continue to work to reduce the time and economic cost required for targeted library construction; and (3) we will also design two-step PCR systems for different species.

Experimental Section

Target Loci Selection and Primer Design

The sequences of 188 and 128 target regions containing single nucleotide polymorphism (SNP) and short tandem repeat (STR) loci were downloaded from the National Center for Biotechnology Information dbSNP database and STRbase (https://strbase.nist.gov), respectively. In addition to conventional requirements, the two-step PCR primers met the following requirements: (1) the length was between 25 and 30 bp; (2) the Tm value was between 58 and 64 °C; and (3) the GC content was <65%. We tested 33 samples: 24 human samples and 9 gibbon stool samples. The kits used for DNA extraction were the Genomic DNA Extraction Kit and the Stool DNA extraction Kit (Tiangen, Beijing, China). Notably, although our two-step PCR approach was designed based on a BGISEQ-500 sequencer (BGI, Shenzhen, China), the principle of this approach could be applied to each sequencing platform. The protocol for the two-step PCR approach is displayed in Figure .

Figure 5

Procedure for the two-step PCR approach. The two-step PCR approach includes the addition of DNA or blood cards, first-step PCR, product purification twice, second-step PCR, product purification, and quantitation. If the total amount of PCR products quantified was higher than 20 ng, the next step, named the fixed ratio hybrid library, was performed until the final sequencing and bioinformatics analyses steps; if not, the experiment was restarted from the first-step PCR step.

First-Step PCR (Targeted Amplification)

In the first-step PCR of our approach, two different universal sequences (20 bp) (“ACATGGCTACGATCCGACTT” and “GACCGCTTGGCCTCCGACTT”) were added to the 5′ ends of the specific forward and reverse primers. The design of the universal sequences determined the direction of sequencing. Single-direction sequencing reduces economic costs and improves data utilization. The sequencing direction of this method was determined by universal sequence 1. Forward primers were the starting point of sequencing. In Figure , for the first-step PCR, target loci were amplified in 25 μL PCRs, which contained 2.5 μL of the primer mix, 2–4 U of the NH9007 DNA polymerase, 10 μL of the 1× PCR buffer (Nuhighbio, Suzhou, China), and input DNA (78 pg to 10 ng). To ensure the fidelity of PCR and the uniformity of the sequencing data, we designed new targeted amplification (first-step) conditions (Table ). The reactions were then performed on an ABI VERITI Gradient PCR Instrument (Applied Biosystems, MA).

Table 5

Details of the First-Step PCR Conditionsa

temperature (°C)	time	number of cycles
95	15 min
95	20 s	9*
60	2 min 30 s
72	1 min
95	20 s	1
62.5	2 min 30 s
72	1 min
95	20 s	1
65	2 min 30 s
72	1 min
95	20 s	1
62.5	2 min 30 s
72	1 min
95	20 s	8*
60	2 min 30 s
72	1 min
72	10 min
4	∞

It is appropriate to increase the number of cycles by 1–2 cycles at *.

Purification

The PCR products generated by the first-step PCR were purified twice with Agencourt AMPure XP beads (Beckman Coulter, CA) at 1.5 times the product volume.

Second-Step PCR (Undifferentiated Amplification)

Our second-step PCR was undifferentiated amplification, and the ends of the amplified products were the adapter sequences of BGISEQ-500, wherein the sequences of adapter 1 and adapter 2 were “GAACGACATGGCTACGATCCGACTT” and “TGTGAGCCAAGGAGTTG****TTGTCTTCCTAAGACCGCTTGGCCTCCGACTT”, respectively. **** is the barcode sequence of BGISEQ to distinguish between different samples. Barcode details of the BGISEQ sequencer can be viewed in the Supporting Information 4, with a total of 128 different barcode sequences (approximately 10 bp) used. Based on the DNB technology, a single strand of DNA containing hydroxyl and phosphoric acid is required for the subsequent step of DNA cyclization. Therefore, the first base at the 5′ end of adapter 1 was phosphorylated. The second-step PCR system contained a purified product after the first-step PCR, 3 μL of the Adapter Mix, 10 μL of the 1× PCR buffer (Nuhighbio, Suzhou, China), 1 U of NH9007 the DNA polymerase (Nuhighbio, Suzhou, China), and 0.2 U of the Platinum Pfx DNA polymerase (ThermoFisher Scientific, MA). The reaction conditions of the second-step PCR designed here are shown in Table .

Table 6

Details of the Second-Step PCR Conditions

temperature (°C)	time	number of cycles
95	10 min
95	20 s	8
58	30 s
72	1 min
4	∞

To protect the 5′-terminal phosphoric acid, the buffer of the second-step PCR was alkaline (pH > 8.0), which is NH9719. In addition, we replaced nuclease-free water with the Tris–EDTA (TE) buffer (ThermoFisher Scientific, MA), which maintained alkalinity.

Purification and Quantitation

PCR products generated from second-step PCR were purified once with 1.5× Agencourt AMPure XP beads. Then, the concentration of the purified products was determined by Qubit (Applied Biosystems, MA). When the total amount of the final product was less than 20 ng, it was implied that the experiment failed (Figure ).

Sequencing and Bioinformatics Analyses

After quantification of the products, we mixed them proportionally. PCR products were cyclized to form ssDNA, and then, the single-stranded cyclic DNA was amplified by 2–3 orders of magnitude using RCA. Amplified products were called DNBs. Finally, the DNBs were fixed on an arrayed silicon chip by the DNB loading technology. For the sequencing strategy, we chose paired-end 50 (PE50) and single-end 200 (SE200). Raw sequencing data were processed with GATK-BWA. The subsequent data analysis was conducted using statistical software R (version 3.2.1), and a series of proprietary bioinformatics procedures based on Perl (https://www.perl.org). Then, a threshold of 50 was applied for the depth of loci, and loci below the threshold were considered as nondetected loci in the subsequent statistical analyses.

Quality of the Sequencing Data

Two different presequencing methods were used in the same group of human DNA samples (Figure ). We calculated the following parameters that were used to evaluate sequencing quality. (1) Locus detection rate: It is the ratio of the number of typing loci in the sequencing data to the number of input loci. (2) Sequencing depth of loci: The DoC, also known as the read depth, was defined by adding all effective high-quality reads within each locus or sample.[40] We calculated the DOC of a single sample or a single group of samples. However, for human dry blood cards, direct amplification was performed. From the data, we calculated the depth of a single locus. (3) Coefficient of variation (CV): CV, which is a unit-free and effective normalized measure of dispersion and monitoring the CV, is a normal approach in statistical process control. The CV was expressed as the ratio of the standard deviation (σ) to the mean (μ), i.e., γ = σ/μ.[41] In this calculation, we removed the maximum and minimum values. (4) Uniformity frequency: We artificially set the percentage of loci in the range of 90% above and below the median as the uniformity frequency. (5) Loci depth distribution: We used the number of amplicons as the abscissa and the log depth as the ordinate to reflect the depth distribution of the loci. Then, we used a curve fitting method to calculate the slope of the loci depth distribution and compared the uniformity between different groups of data.

Figure 6

Comparison of the principles of multiplex PCR and two-step PCR. The left and right sides of the picture are the principles of a common multiplex PCR process combined with NGS and a two-step PCR process. For the multiplex PCR process combined with NGS, after the multiplex PCR step, the library is constructed by the addition of an “A” tail, the addition of an adapter, and pre-PCR steps. In our two-step PCR process, the introduction of universal sequences and primers occurs in the first-step PCR, and the ligation of adapter occurs in the second-step PCR.

Cost-Effectiveness of Two-Step PCR

Temporal Cost

We compared the amount of time required for PCR for the two approaches before sequencing. One approach was our two-step PCR, and the other was traditional multiplex PCR combined with BGISEQ-500 library construction. For fairness in the comparison of time consumption, all objective influencing factors were excluded, such as laboratory, equipment, operator, and DNA samples. The PCR time is based on the display time of the same model of the instrument (ABI VERITI Gradient PCR Instrument). Since DNA extraction takes a long time and requires manual manipulation, we explored the omission of DNA extraction steps using blood cards as DNA samples. We selected six human blood cards, including four males and two females, with a blood card diameter of 3.0 mm.

Monetary Cost

We calculated the cost of the two different methods using the Chinese manufacturer’s reagents. The costs mentioned here did not include the cost of pretesting, consumables, and instrument depreciation.

Data Utilization

First, the two methods were used to test the same batch of human DNA samples (six samples each). The difference in data utilization between the two groups was calculated by the t-test. Based on the filtered data, which had a Phred-like consensus quality ≥20,[35] data utilization was calculated as follows: total number of reads at all target loci/total number of reads of filtered data. Additionally, the first-step PCR products (purified) of the same two groups of human blood DNA samples (three males and three females) were put in the two different enzyme systems of second-step PCR, which were 1 U of NH9007 DNA polymerase and 1 U of NH9007 + 0.3 U Platinum Pfx DNA polymerase. Then, the uniformity indicators of the two groups of sequencing data were calculated and analyzed.

Species Specificity

To avoid the effects of DNA incorporation by other species on the sequencing results, we tested nine gibbon stool samples (primates). All DNA starting amounts were the same, and all library products were used for sequencing. We compared the data quality and amplification efficiency of the two groups of data after sequencing.

40 in total

1. High quality SNP calling using Illumina data at shallow coverage.

Authors: Nawar Malhis; Steven J M Jones
Journal: Bioinformatics Date: 2010-02-26 Impact factor: 6.937

Review 2. Sequencing technologies - the next generation.

Authors: Michael L Metzker
Journal: Nat Rev Genet Date: 2009-12-08 Impact factor: 53.242

3. Evaluation of the Early Access STR Kit v1 on the Ion Torrent PGM™ platform.

Authors: Fei Guo; Yishu Zhou; Feng Liu; Jiao Yu; He Song; Hongying Shen; Bin Zhao; Fei Jia; Guangwei Hou; Xianhua Jiang
Journal: Forensic Sci Int Genet Date: 2016-04-04 Impact factor: 4.882

4. Cost of cancer diagnosis using next-generation sequencing targeted gene panels in routine practice: a nationwide French study.

Authors: Patricia Marino; Rajae Touzani; Lionel Perrier; Etienne Rouleau; Dede Sika Kossi; Zou Zhaomin; Nathanaël Charrier; Nicolas Goardon; Claude Preudhomme; Isabelle Durand-Zaleski; Isabelle Borget; Sandrine Baffert
Journal: Eur J Hum Genet Date: 2018-01-24 Impact factor: 4.246

5. Comparative phylogeography of Atlantic bluefin tuna and swordfish: the combined effects of vicariance, secondary contact, introgression, and population expansion on the regional phylogenies of two highly migratory pelagic fishes.

Authors: Jaime R Alvarado Bremer; Jordi Viñas; Jaime Mejuto; Bert Ely; Carles Pla
Journal: Mol Phylogenet Evol Date: 2005-07 Impact factor: 4.286

6. Human genome sequencing using unchained base reads on self-assembling DNA nanoarrays.

Authors: Radoje Drmanac; Andrew B Sparks; Matthew J Callow; Aaron L Halpern; Norman L Burns; Bahram G Kermani; Paolo Carnevali; Igor Nazarenko; Geoffrey B Nilsen; George Yeung; Fredrik Dahl; Andres Fernandez; Bryan Staker; Krishna P Pant; Jonathan Baccash; Adam P Borcherding; Anushka Brownley; Ryan Cedeno; Linsu Chen; Dan Chernikoff; Alex Cheung; Razvan Chirita; Benjamin Curson; Jessica C Ebert; Coleen R Hacker; Robert Hartlage; Brian Hauser; Steve Huang; Yuan Jiang; Vitali Karpinchyk; Mark Koenig; Calvin Kong; Tom Landers; Catherine Le; Jia Liu; Celeste E McBride; Matt Morenzoni; Robert E Morey; Karl Mutch; Helena Perazich; Kimberly Perry; Brock A Peters; Joe Peterson; Charit L Pethiyagoda; Kaliprasad Pothuraju; Claudia Richter; Abraham M Rosenbaum; Shaunak Roy; Jay Shafto; Uladzislau Sharanhovich; Karen W Shannon; Conrad G Sheppy; Michel Sun; Joseph V Thakuria; Anne Tran; Dylan Vu; Alexander Wait Zaranek; Xiaodi Wu; Snezana Drmanac; Arnold R Oliphant; William C Banyai; Bruce Martin; Dennis G Ballinger; George M Church; Clifford A Reid
Journal: Science Date: 2009-11-05 Impact factor: 47.728

7. Predicting risk sensitivity in humans and lower animals: risk as variance or coefficient of variation.

Authors: Elke U Weber; Sharoni Shafir; Ann-Renee Blais
Journal: Psychol Rev Date: 2004-04 Impact factor: 8.934

8. Comparison of eleven methods for genomic DNA extraction suitable for large-scale whole-genome genotyping and long-term DNA banking using blood samples.

Authors: Androniki Psifidi; Chrysostomos I Dovas; Georgios Bramis; Thomai Lazou; Claire L Russel; Georgios Arsenos; Georgios Banos
Journal: PLoS One Date: 2015-01-30 Impact factor: 3.240

9. Reliable multiplex sequencing with rare index mis-assignment on DNB-based NGS platform.

Authors: Qiaoling Li; Xia Zhao; Wenwei Zhang; Lin Wang; Jingjing Wang; Dongyang Xu; Zhiying Mei; Qiang Liu; Shiyi Du; Zhanqing Li; Xinming Liang; Xiaman Wang; Hanmin Wei; Pengjuan Liu; Jing Zou; Hanjie Shen; Ao Chen; Snezana Drmanac; Jia Sophie Liu; Li Li; Hui Jiang; Yongwei Zhang; Jian Wang; Huanming Yang; Xun Xu; Radoje Drmanac; Yuan Jiang
Journal: BMC Genomics Date: 2019-03-13 Impact factor: 3.969

10. Amplification-free Illumina sequencing-library preparation facilitates improved mapping and assembly of (G+C)-biased genomes.

Authors: Iwanka Kozarewa; Zemin Ning; Michael A Quail; Mandy J Sanders; Matthew Berriman; Daniel J Turner
Journal: Nat Methods Date: 2009-03-15 Impact factor: 28.547

1 in total

1. DENSEN: a convolutional neural network for estimating chronological ages from panoramic radiographs.

Authors: Xuedong Wang; Yanle Liu; Xinyao Miao; Yin Chen; Xiao Cao; Yuchen Zhang; Shuaicheng Li; Qin Zhou
Journal: BMC Bioinformatics Date: 2022-10-14 Impact factor: 3.307

1 in total