Xinyao Miao1, Bowen Li2, Yuesheng Shen3, Huiyun Yu4, Guoqiang Zhu5, Chen Liang6, Xiao Fu7, Chu Wang8, Shengbin Li1, Bao Zhang1. 1. School of Forensic Sciences, Xi'an Jiaotong University, 710049 Xi'an, P. R. China. 2. School of Life Sciences, Sichuan University, 610207 Chengdu, P. R. China. 3. School of Life Sciences, Northwest University, 710069 Xi'an, P. R. China. 4. School of Life Sciences, Northwest A&F University, 712100 Yangling, P. R. China. 5. Key Laboratory of Bio-Resources and Eco-Environment of Ministry of Education, College of Life Sciences, Sichuan University, 610065 Chengdu, P. R. China. 6. School of Mechanical Engineering, Xi'an Jiaotong University, 710049 Xi'an, P. R. China. 7. The Beijing Genomics Institute (BGI)-Tianjin, 301700 Tianjin, P. R. China. 8. School of Life Sciences, Xiamen Medical College, 361023 Xiamen, P. R. China.
Abstract
Although technological advances have greatly reduced the cost of DNA sequencing, sample preparation time and reagent costs remain the limiting factors for many studies. Based on low-cost targeted amplification, we developed an economical method for custom target library construction based on DNA nanoball (DNB) technology and two-step polymerase chain reaction (PCR). Here, we refer to this method as the two-step PCR, which was compared to traditional multiplex PCR methods in three aspects, data quality, efficiency, and specificity to humans. The results confirmed that two-step PCR reduces to finishing 128 sequencing libraries in only 2 h 24 min 59 s of the total PCR time and at a data utilization rate of 0.44 at a cost of approximately $1.70 per sample for targeted sequencing via the two-step PCR. The replacement of traditional multiplex PCR methods with this strategy makes the sample preparation process before sequencing relatively more cost-effective and further reduces the cost of next-generation sequencing (NGS). This method may also be free from the interference of other species and the limitations of sample type and DNA content. These findings reveal possibilities for broad applications of this approach in forensic research.
Although technological advances have greatly reduced the cost of DNA sequencing, sample preparation time and reagent costs remain the limiting factors for many studies. Based on low-cost targeted amplification, we developed an economical method for custom target library construction based on DNA nanoball (DNB) technology and two-step polymerase chain reaction (PCR). Here, we refer to this method as the two-step PCR, which was compared to traditional multiplex PCR methods in three aspects, data quality, efficiency, and specificity to humans. The results confirmed that two-step PCR reduces to finishing 128 sequencing libraries in only 2 h 24 min 59 s of the total PCR time and at a data utilization rate of 0.44 at a cost of approximately $1.70 per sample for targeted sequencing via the two-step PCR. The replacement of traditional multiplex PCR methods with this strategy makes the sample preparation process before sequencing relatively more cost-effective and further reduces the cost of next-generation sequencing (NGS). This method may also be free from the interference of other species and the limitations of sample type and DNA content. These findings reveal possibilities for broad applications of this approach in forensic research.
Since the introduction of the Sanger sequencing method for genome
sequencing, demands for the transmission of fast, accurate, and inexpensive
genomic information have continued to increase.[1,2] For
this reason, revolutionary next-generation sequencing (NGS) technology
has emerged, which has revolutionized almost all areas of biology,
agriculture, and medicine and has been widely used to analyze genetic
variation.[3] The simplicity of the NGS pipeline
could still be improved to further decrease the overall cost, although
the cost of sequencing per base has decreased 5-fold in the last 10
years.[4,36] Prior to sequencing, DNA extraction and
library preparation steps are required, which can be laborious and
time-consuming processes.[5] Compared to
whole-genome sequencing and whole-exon sequencing, the
method of selectively capturing genomic regions from DNA samples prior
to sequencing (targeted NGS) based on polymerase chain reaction (PCR)
or hybridization capture can generate smaller and easier-to-manage
data and help analyze data sets for the target locus, thereby reducing
the difficulty of data analysis and saving time, cost, and effort.[6−9] However, the hybridization-capture-based targeted NGS approach has
drawbacks such as limited design flexibility, high cost, and protocol
complexity, which limit its application to sequencing analyses requiring
low cost and high efficiency.[6] NGS usually
needs to be combined with simplified DNA extraction and library preparation
methods to improve efficiency and reduce economic costs.[10]The traditional NGS library preparation
process consists of three
primary steps: fragmentation, adapter ligation, and amplification.
The species specificity of PCR mainly depends on the primer specificity.
In addition, the sequence similarity of similar species is very high,
especially within the same genus, and gene penetration may occur among
different species.[1,2] Selecting the appropriate gene
fragments according to the order of the species to be distinguished
should be highly conservative within species and highly variable between
species.[3] When the number of related species
for identification increases and the phenomenon of gene penetration
among related species occurs, finding suitable genes and designing
specific primers that amplify only one species and accurately distinguish
species are difficult. Thus, in this process, the multiplex PCR method
has been widely used because the efficiency and species specificity
of multiplex PCR are higher than those of traditional PCR methods.[11,12] In forensic science, due to the value of each sample and the high
requirements for testing time, scholars have developed direct PCR,
which enables PCR to be performed directly from samples such as whole
blood, blood cards, and saliva.[13,14]Based on existing
PCR and PCR-derived techniques (normal PCR, multiplex
PCR, direct PCR, etc.), targeted sequencing has increasingly been
applied. There are a number of commercial solutions for applications
in forensic science that have been developed by manufacturers based
on high multiplex PCR (high-heavy PCR) library construction, such
as Illumina MiSeq FGx, which is the first system developed for the
preparation and sequencing of targeted libraries for forensic genomes.
MiSeq FGx still has potential shortcomings in some respects: (1) although
MiSeq FGx supports the input of a 1.2 mm fluorescent treponemal antibody
(FTA) blood card, it requires preprocessing; (2) more than 9 h of
library preparation time is required; and (3) for the Chinese market,
each sample requires a cost of USD 94.26.[15,16]Herein, a simple but robust approach for direct library construction
is described, which may increase the efficiency of the NGS pipeline.
Thus, we can directly and cost-effectively complete the cumbersome
library construction process while taking into consideration accuracy
and uniformity. At the same time, the stability of the two-step PCR
approach in different species and types of samples is equally important.
The above properties made it possible to improve the scope of PCR
in the preparation of NGS libraries via this approach.
Results and Discussion
Results
As a direct library construction
method of
presequencing, the performance of the method was verified in three
aspects (data quality, efficiency, species specificity) and compared
with traditional library construction methods of presequencing (multiplex
PCR library preparation). In addition, DNA extraction and sequencing
processes were also included in the comparison.
Quality
of the Sequencing Data
Loci Detection Rates
We calculated
the detection rates of correct loci for the sequencing data generated
by our method and the traditional library construction approach. Two
identical human samples (six samples) were used in both approaches.
The results showed that the average loci detection rates of the two
approaches were almost the same, approaching 100% (Table ). There was no significant
difference in the loci detection rate between the two approaches (P > 0.05).
Table 1
Quality of the Sequencing
Data from
the Two Different Approaches
approach
correct loci
detection rate (%)
DOC
CV (filtered)
uniformity
frequency
traditional library construction
99.84
14 557.31
1.217
0.726
our approach
99.68
9414
0.684
0.851
Depth
Uniformity
The uniformity
of the sequencing data generated by our approach and by traditional
multiplex PCR library construction was measured in different aspects
(Table ): (1) the
depths of coverage (DOCs) of the two methods were 14 557.31
and 9414, respectively, and relatively higher DOCs guaranteed the
quality of sequencing; (2) the coefficient of variation (CV) values
were significantly different in these two approaches (P = 0.0099, P ≤ 0.01); (3) uniformity frequencies
were also significantly different between the two groups (P = 0.0277, P ≤ 0.05); and (4) loci
depth distribution of the sequencing data obtained by the two methods
exhibited some differences. For the two different methods, the slopes
of the fitting curve of the loci depth distribution were −0.0024
and −0.0035 (Figure ).
Figure 1
Loci depth distribution of the two-step and traditional multiplex
PCR library construction methods. The abscissa indicates the number
of amplicons, and the ordinate indicates the log value of the depth.
The depth distribution of the data generated by the two-step PCR approach
is represented by a black line, and the gray line represents the data
generated by the multiplex PCR approach. The corresponding dashed
lines represent different linear fits of the two approaches. The slopes
of the two-step PCR method and the method of constructing the library
after ordinary PCR are 0.0024 and −0.0035, respectively.
Loci depth distribution of the two-step and traditional multiplex
PCR library construction methods. The abscissa indicates the number
of amplicons, and the ordinate indicates the log value of the depth.
The depth distribution of the data generated by the two-step PCR approach
is represented by a black line, and the gray line represents the data
generated by the multiplex PCR approach. The corresponding dashed
lines represent different linear fits of the two approaches. The slopes
of the two-step PCR method and the method of constructing the library
after ordinary PCR are 0.0024 and −0.0035, respectively.
Cost-Effectiveness
For efficiency
reasons, the temporal cost, the economic cost, and data utilization
were the primary considerations for the two-step PCR method. The use
of two-step PCR not only shortened the time spent on library construction
and DNA extraction but also reduced the monetary cost and data utilization.
Temporal Cost
Temporal Cost of
PCRUnder the same targeted amplification step, according to the respective protocol, we compared the total PCR time
required for the two library preparation methods with all objective
influencing factors being excluded, such as laboratory, equipment,
operator, and DNA samples. The PCR times for the two-step PCR and
the traditional multiplex PCR methods (based on standard BGISEQ-500
library preparation) were 2 h 24 min 59 s and 4 h 7 min 22 s, respectively
(Table ). Two-step
PCR is a direct way to construct a target library by targeted and
undifferentiated amplification. Traditional library construction did
not require undifferentiated amplification but three steps of library
preparation: end repair, adapter ligation, and pre-PCR. The time spent
on reagent preparation, purification, and quantification was not included
in the statistical scope because of individual differences. However,
because traditional library construction requires more steps than
the two-step PCR method, the total time required is correspondingly
higher.
Table 2
PCR Time Required by the Two Approaches
for Library Construction
steps/approaches
two-step
PCR
traditional
multiplex PCR
targeted amplification
1 h 54 min 20 s
1 h 54 min 20 s
undifferentiated amplification
30 min 39 s
0
library preparation
0
2 h 13 min 2 s
total time
2 h 24 min 59 s
4 h 7 min 22 s
Temporal Cost of DNA ExtractionDNA extraction, which happens
before library preparation, is also
a time-consuming step. Therefore, the elimination of the DNA extraction
step may be a potential method to improve the efficiency of NGS. We
used six dry blood cards (diameter = 3.0 mm) as DNA sources without
any preprocessing. NH9719 (Nuhighbio, Suzhou, China) was selected
and tested for the PCR buffer system, which can effectively reduce
the impact of various PCR inhibitors, such as proteins and urban dust.
To reflect the depth of each locus, we showed sequencing alignment
data in detail (Figure ). The average detection rates of correct loci and DOCs were 94.25%
and 18 034, respectively. Detailed genotyping results can be
found in the Supporting Information 1).
Additionally, there were no significant differences in depth distribution
between different samples (P > 0.05). The results
demonstrated that the two-step PCR method could maintain stability
even without DNA extraction.
Figure 2
Loci depth distribution after direct two-step
PCR with six dry
blood cards. Figure a–f shows the loci depth distribution of two-step PCR with
the six blood card samples used in this study. The abscissa indicates
the log value of the depth, and the ordinate indicates the number
of amplicons.
Loci depth distribution after direct two-step
PCR with six dry
blood cards. Figure a–f shows the loci depth distribution of two-step PCR with
the six blood card samples used in this study. The abscissa indicates
the log value of the depth, and the ordinate indicates the number
of amplicons.
Figure 4
Effect of adding/not adding Pfx DNA polymerase
on the uniformity
of the sequencing data. The gray and black columns represent the data
results of the addition and absence of Pfx DNA polymerase in the second-step
PCR system, respectively. The average CV values of the two methods
are 1.1 and 0.67 (P ≤ 0.05), respectively.
The uniformity frequencies of the two methods are 0.82 and 0.88 (P ≤ 0.05), respectively.
Monetary
Cost
The economic cost
of a target library construction method is also not negligible in
the efficiency evaluation. We calculated the cost of reagents and
primers for library preparation by two methods. The costs of a single
reaction of the two-step and traditional library construction methods
were $1.70 and $6.15, respectively (Table ). Therefore, it may be more economical to
construct a target library using two-step PCR than using traditional
multiplex PCR.
Table 3
Money Spent on Library Construction
for the Two Approaches
cost (USD/single
reaction)
two-step
PCR
traditional
Multiplex PCR
primers
$0.90
$0.45
reagents
$0.80
$5.70
total
$1.70
$6.15
Data Utilization Rate
The utilization
rate of sequencing data reflects efficiency. The utilization rate
of sequencing data is an indispensable indicator to reflect efficiency.
We calculated the data utilization rates in two groups of data generated
by the two-step and traditional multiplex PCR library construction
approaches with the same samples. As displayed in Figure , the average data utilization
rates were 0.36 and 0.44, respectively (Figure ). There was a significant difference in
the data utilization rate of the two different approaches (P = 0.0433, P ≤ 0.05).
Figure 3
Data utilization
rates of the two different methods. Black and
gray columns represent the data utilization of the traditional PCR
method and the two-step PCR method, respectively. The data utilization
rates of the two methods are 36 and 44%, respectively, which are significantly
different (P ≤ 0.05).
Data utilization
rates of the two different methods. Black and
gray columns represent the data utilization of the traditional PCR
method and the two-step PCR method, respectively. The data utilization
rates of the two methods are 36 and 44%, respectively, which are significantly
different (P ≤ 0.05).Previous studies have shown that uniform sequencing data might
reduce data storage and increase data utilization.[17] The addition of 0.2 U of PlatinumPfx DNA polymerase (Pfx)
to the second-step PCR may be an important reason for obtaining sufficient
depth uniformity. Targeted library preparation was performed using
a two-step PCR method for a group of identical samples (six human
DNA blood samples). In the enzyme system of the second-step PCR, Pfx-containing
and non-Pfx-containing operations were performed (Figure ). Sequencing data produced from samples supplemented with
Pfx had obviously lower CV values (0.67, P = 0.02, P ≤ 0.05) and higher uniformity frequencies (0.88, P = 0.02, P ≤ 0.05) than the data
produced from samples without Pfx supplementation. Based on this result,
the considerable data utilization posed by the two-step PCR method
could be easily explained.Effect of adding/not adding Pfx DNA polymerase
on the uniformity
of the sequencing data. The gray and black columns represent the data
results of the addition and absence of Pfx DNA polymerase in the second-step
PCR system, respectively. The average CV values of the two methods
are 1.1 and 0.67 (P ≤ 0.05), respectively.
The uniformity frequencies of the two methods are 0.82 and 0.88 (P ≤ 0.05), respectively.
Species Specificity
In forensic research,
DNA from other species can affect sequencing data, especially when
a library is constructed using multiplex PCR. The two-step PCR method
designed in this study aimed to address the problem of DNA contamination
from other species.All primers of the two-step PCR were designed
based on the Homo sapienshg38 genome, which was species specific
for human DNA. Fecal DNA of nine primates (gibbons) was taken as test
samples. With the same amount of DNA input, the products of the traditional
multiplex PCR library were 48.47 ng, and the products of the two-step
PCR library were lower than the detection ranges of Qubit 3.0 (Table ). The PCR amplification
efficiency and DOC of the two-step PCR data decreased by 86.04 and
91.74%, respectively, compared with those of the traditional multiplex
PCR data. The specific genotyping results generated by traditional
multiplex PCR and two-step PCR approaches are discussed in the Supporting Information2 and 3. There were obvious significant differences
between these two approaches in PCR amplification efficiency and in
DOC (P < 0.0001). The above results suggested
that the use of a two-step PCR method for NGS library construction
might increase species specificity.
Table 4
Comparison of the
Species Specificity
of the Two Library Construction Methods Using Nonhuman DNA (Gibbon)
approach
PCR amplification
efficiency (%)
average DOC
average amount
of DNA product after PCR
multiplex PCR
79.07
1790.47
48.47 ng
two-step PCR
11.04
147.83
below detection
range
Discussion
Data-quality control is a key step in ensuring successful and meaningful
research.[18] However, there is currently
no quality control standard for NGS-multiplex PCR data.[19] Based on the Ion Torrent PGM platform, only
the proportion of accurate profiles (i.e., the correct loci detection
rate) is generally evaluated.[20,21]Therefore, by
introducing the corresponding indicators (loci detection
rate, DOC, CV, uniformity frequency), we could evaluate the quality
of the multiplex PCR data from different perspectives (accuracy and
uniformity).[22] The detection rates of correct
loci in the sequencing data produced by the two libraries were close
to 100%, which indicated that the accuracy of the two-step PCR method
was similar to that of the traditional multiplex PCR library construction.
In addition, CV, uniformity frequencies, and the slopes of the fitting
curve of the loci depth distribution were all indicators of uniformity.
Two-step PCR data had reduced CV values and uniformity frequencies,
which indicated reduced dispersion (P ≤ 0.01, P ≤ 0.05). In addition, the data produced by two-step
PCR had a smaller slope in the locus depth distribution than the data
produced by traditional multiplex PCR. The above indicators suggested
that the data produced by two-step PCR might be uniform.The
cost-effectiveness of the experimental methodology, data storage,
and report generation in NGS needs to be considered.[23,24] Therefore, the limiting factor for many experiments is the temporal
and monetary cost of sample preparation.[25] Sample preparation before sequencing generally consists of two parts:
DNA extraction and library construction. For DNA extraction, the operation
time of different DNA extraction methods is 1.15–3.15 h, and
the average cost per sample is $2.75 to $20.31 (approximately €2.5
to €18.45).[26] Moreover, research
on direct amplification is usually limited to the capillary electrophoresis
(CE) technology, which is rarely involved in NGS.[27,28] Although the FGx operating manual claims that it can process blood cards directly,
there is still preprocessing, including reagent preparation and multiple
centrifugation.[16] The PCR method itself
is not too limited by the purity of DNA, and alkaline environments
have proven to be more conducive to PCR reactions in forensic samples,
such as stains of blood.[37] Based on the
above, we tried to use alkaline buffer in the two-step PCR, which
may enable our method to achieve direct amplification using only blood
cards. As we expected, the results showed that the DNA extraction
process in sample preparation can be omitted under special circumstances
(only blood cards as the only source of template DNA). Omittable DNA
extraction steps can enable two-step PCR to be applied to complex
forensic evidence studies and has considerable cost-effectiveness
in terms of time and money.For library construction, compared
with the traditional multiplex
PCR method based on the BGISEQ-500 platform, the two-step PCR method
saved 41.39% of the temporal cost and 72.36% of the monetary cost.
According to our market research, Illumina’s ForenSeq DNA Signature
Kit has a high market price of $94.26 per reaction and requires more
than 9 h of hands-on time.[16,29] In addition to our
two-step PCR approach, there are many PCR-based barcoding procedures
that use large fusion primers for library construction. However, with
the second PCR in our approach, which is undifferentiated amplification,
barcode primers can be used repeatedly for barcoding diverse amplicons
of interest; thus, costly investment into individual barcoded primer
sets for each target gene is not required. As a result, the burden
of library construction can be alleviated.[42−44]BGISEQ-500
is based on the DNA nanoball (DNB) sequencing technology.
DNBs were generated from the single-stranded DNA (ssDNA) circle using
rolling circle amplification (RCA) to enhance the fluorescent signals
in the sequencing process.[39] Unlike PCR
index amplification, RCA amplification errors do not accumulate and
there is no PCR bias because the original ssDNA circle is the only
template during the entire amplification process.[30,39,45−48] To some extent,
BGISEQ-500 is more suitable for the sequencing of targeted PCR libraries.The quality, capacity, and storage of NGS sequencing data also
affect cost-effectiveness.[24] Compared with
traditional methods, two-step PCR improved data utilization by 18.18%.
In addition, most of the technology patents for multiplex PCR require
Pfx DNA polymerase.[31,32] In our previous tests, Pfx (compared
to NH9007) was not suitable for multiplex PCR; therefore, we chose
to add Pfx in the second-step PCR instead of the first-step PCR. More
uniform sequencing data suggested that the improved two-step PCR may
be more cost-effective than the traditional multiplex PCR method.In general, a novel library construction method for target sequencing
based on multiplex PCR was reported, which was first changed by simplifying
the steps for library preparation. After simplification, only two
PCR steps were required, and the DNA-processing step could be skipped
simultaneously. These steps make the method more cost-effective in
terms of reagents and time consumption. Significantly reduced sequencing
costs stem from the determination of sequencing directions through
the introduction of universal sequences. Other changes were observed
in the enzyme system of the second-step PCR. Due to high fidelity,
the addition of Pfx improved the uniformity in depth by a certain
degree, which may be the main reason for the increase in data utilization
for cost-effectiveness. The introduction of phosphate groups and the
maintenance of an alkaline environment in the reactions for DNB technology
may give the two-step PCR method an advantage in accuracy compared
to other sequencing platforms.The species specificity was studied
to testify that other biological
sources did not interfere with the ability to obtain reliable results
on samples recovered from crime scenes,[38] which provides an important support for a robust approach. In a
developmental validation study of the MiSeq FGx forensic genomics
system, species specificity tests were carried out for rhesus monkeys
and baboons, and the loci detection rates were 11 and 25%, respectively.[33] However, rhesus monkeys and baboons belong to
the macaque family, which is more distantly related to humans than
the gibbon samples we used.[34] In our study,
the PCR amplification efficiency of two-step PCR using gibbon samples
(more similar to Homo sapiens) was 11.04%, which indicated the species
specificity of this method. Thus, the two-step PCR method might maintain
the stability of sequencing data in the presence of primate DNA contamination.
Conclusions
Based on the increasing demand for cost
reduction in NGS technologies,
we developed and evaluated an approach for economical library construction.
The results showed that the two-step PCR method designed in this study
can solve the problems of common multiplex PCR methods, such as low
time efficiency, high monetary expense, and data utilization. In addition,
the use of two-step PCR could also eliminate pollution sources from
nonhuman samples to the greatest extent. Admittedly, this method could
be improved in several aspects: (1) we need to design a two-step PCR
system based on other NGS platforms; (2) we will continue to work
to reduce the time and economic cost required for targeted library
construction; and (3) we will also design two-step PCR systems for
different species.
Experimental Section
Target Loci Selection and Primer Design
The sequences
of 188 and 128 target regions containing single nucleotide
polymorphism (SNP) and short tandem repeat (STR) loci were downloaded
from the National Center for Biotechnology Information dbSNP database
and STRbase (https://strbase.nist.gov), respectively. In addition to conventional requirements, the two-step
PCR primers met the following requirements: (1) the length was between
25 and 30 bp; (2) the Tm value was between
58 and 64 °C; and (3) the GC content was <65%.We tested
33 samples: 24 human samples and 9 gibbon stool samples. The kits
used for DNA extraction were the Genomic DNA Extraction Kit and the
Stool DNA extraction Kit (Tiangen, Beijing, China). Notably, although
our two-step PCR approach was designed based on a BGISEQ-500 sequencer
(BGI, Shenzhen, China), the principle of this approach could be applied
to each sequencing platform. The protocol for the two-step PCR approach
is displayed in Figure .
Figure 5
Procedure for the two-step PCR approach. The two-step PCR approach
includes the addition of DNA or blood cards, first-step PCR, product
purification twice, second-step PCR, product purification, and quantitation.
If the total amount of PCR products quantified was higher than 20
ng, the next step, named the fixed ratio hybrid library, was performed
until the final sequencing and bioinformatics analyses steps; if not,
the experiment was restarted from the first-step PCR step.
Procedure for the two-step PCR approach. The two-step PCR approach
includes the addition of DNA or blood cards, first-step PCR, product
purification twice, second-step PCR, product purification, and quantitation.
If the total amount of PCR products quantified was higher than 20
ng, the next step, named the fixed ratio hybrid library, was performed
until the final sequencing and bioinformatics analyses steps; if not,
the experiment was restarted from the first-step PCR step.
First-Step PCR (Targeted Amplification)
In the first-step PCR of our approach, two different universal
sequences (20 bp) (“ACATGGCTACGATCCGACTT”
and “GACCGCTTGGCCTCCGACTT”)
were added to the 5′ ends of the specific forward and reverse
primers.The design of the universal sequences determined the
direction of sequencing. Single-direction sequencing reduces economic
costs and improves data utilization. The sequencing direction of this
method was determined by universal sequence 1. Forward primers were
the starting point of sequencing.In Figure , for
the first-step PCR, target loci were amplified in 25 μL PCRs,
which contained 2.5 μL of the primer mix, 2–4 U of the
NH9007 DNA polymerase, 10 μL of the 1× PCR buffer (Nuhighbio,
Suzhou, China), and input DNA (78 pg to 10 ng). To ensure the fidelity
of PCR and the uniformity of the sequencing data, we designed new
targeted amplification (first-step) conditions (Table ). The reactions were then performed on an
ABI VERITI Gradient PCR Instrument (Applied Biosystems, MA).
Table 5
Details of the First-Step PCR Conditionsa
temperature
(°C)
time
number of
cycles
95
15 min
95
20 s
9*
60
2 min 30 s
72
1 min
95
20 s
1
62.5
2 min 30 s
72
1 min
95
20 s
1
65
2 min 30 s
72
1 min
95
20 s
1
62.5
2 min 30 s
72
1 min
95
20 s
8*
60
2 min 30 s
72
1 min
72
10 min
4
∞
It is appropriate to increase the
number of cycles by 1–2 cycles at *.
It is appropriate to increase the
number of cycles by 1–2 cycles at *.
Purification
The PCR products generated
by the first-step PCR were purified twice with Agencourt AMPure XP
beads (Beckman Coulter, CA) at 1.5 times the product volume.
Second-Step PCR (Undifferentiated Amplification)
Our
second-step PCR was undifferentiated amplification, and the
ends of the amplified products were the adapter sequences of BGISEQ-500,
wherein the sequences of adapter 1 and adapter 2 were “GAACGACATGGCTACGATCCGACTT”
and “TGTGAGCCAAGGAGTTG****TTGTCTTCCTAAGACCGCTTGGCCTCCGACTT”,
respectively. **** is the barcode sequence of BGISEQ to distinguish
between different samples. Barcode details of the BGISEQ sequencer
can be viewed in the Supporting Information 4, with a total of 128 different barcode sequences (approximately
10 bp) used. Based on the DNB technology, a single strand of DNA containing
hydroxyl and phosphoric acid is required for the subsequent step of
DNA cyclization. Therefore, the first base at the 5′ end of
adapter 1 was phosphorylated.The second-step PCR system contained
a purified product after the first-step PCR, 3 μL of the Adapter
Mix, 10 μL of the 1× PCR buffer (Nuhighbio, Suzhou, China),
1 U of NH9007 the DNA polymerase (Nuhighbio, Suzhou, China), and 0.2
U of the PlatinumPfx DNA polymerase (ThermoFisher Scientific, MA).
The reaction conditions of the second-step PCR designed here are shown
in Table .
Table 6
Details of the Second-Step PCR Conditions
temperature
(°C)
time
number of
cycles
95
10 min
95
20 s
8
58
30 s
72
1 min
4
∞
To protect the 5′-terminal phosphoric acid, the buffer of
the second-step PCR was alkaline (pH > 8.0), which is NH9719. In
addition,
we replaced nuclease-free water with the Tris–EDTA (TE) buffer
(ThermoFisher Scientific, MA), which maintained alkalinity.
Purification and Quantitation
PCR
products generated from second-step PCR were purified once with 1.5×
Agencourt AMPure XP beads. Then, the concentration of the purified
products was determined by Qubit (Applied Biosystems, MA). When the
total amount of the final product was less than 20 ng, it was implied
that the experiment failed (Figure ).
Sequencing and Bioinformatics
Analyses
After quantification of the products, we mixed them
proportionally.
PCR products were cyclized to form ssDNA, and then, the single-stranded
cyclic DNA was amplified by 2–3 orders of magnitude using RCA.
Amplified products were called DNBs. Finally, the DNBs were fixed
on an arrayed silicon chip by the DNB loading technology.For
the sequencing strategy, we chose paired-end 50 (PE50) and single-end
200 (SE200). Raw sequencing data were processed with GATK-BWA. The
subsequent data analysis was conducted using statistical software
R (version 3.2.1), and a series of proprietary bioinformatics procedures
based on Perl (https://www.perl.org). Then, a threshold of 50 was applied for the depth of loci, and
loci below the threshold were considered as nondetected loci in the
subsequent statistical analyses.
Quality
of the Sequencing Data
Two
different presequencing methods were used in the same group of human
DNA samples (Figure ). We calculated the following parameters that were used to evaluate
sequencing quality. (1) Locus detection rate: It is the ratio of the
number of typing loci in the sequencing data to the number of input
loci. (2) Sequencing depth of loci: The DoC, also known as the read
depth, was defined by adding all effective high-quality reads within
each locus or sample.[40] We calculated the
DOC of a single sample or a single group of samples. However, for
humandry blood cards, direct amplification was performed. From the
data, we calculated the depth of a single locus. (3) Coefficient of
variation (CV): CV, which is a unit-free and effective normalized
measure of dispersion and monitoring the CV, is a normal approach
in statistical process control. The CV was expressed as the ratio
of the standard deviation (σ) to the mean (μ), i.e., γ
= σ/μ.[41] In this calculation,
we removed the maximum and minimum values. (4) Uniformity frequency:
We artificially set the percentage of loci in the range of 90% above
and below the median as the uniformity frequency. (5) Loci depth distribution:
We used the number of amplicons as the abscissa and the log depth
as the ordinate to reflect the depth distribution of the loci. Then,
we used a curve fitting method to calculate the slope of the loci
depth distribution and compared the uniformity between different groups
of data.
Figure 6
Comparison of the principles of multiplex PCR and two-step PCR.
The left and right sides of the picture are the principles of a common
multiplex PCR process combined with NGS and a two-step PCR process.
For the multiplex PCR process combined with NGS, after the multiplex
PCR step, the library is constructed by the addition of an “A”
tail, the addition of an adapter, and pre-PCR steps. In our two-step
PCR process, the introduction of universal sequences and primers occurs
in the first-step PCR, and the ligation of adapter occurs in the second-step
PCR.
Comparison of the principles of multiplex PCR and two-step PCR.
The left and right sides of the picture are the principles of a common
multiplex PCR process combined with NGS and a two-step PCR process.
For the multiplex PCR process combined with NGS, after the multiplex
PCR step, the library is constructed by the addition of an “A”
tail, the addition of an adapter, and pre-PCR steps. In our two-step
PCR process, the introduction of universal sequences and primers occurs
in the first-step PCR, and the ligation of adapter occurs in the second-step
PCR.
Cost-Effectiveness
of Two-Step PCR
Temporal Cost
We compared the amount of
time required for PCR for
the two approaches before sequencing. One approach was our two-step
PCR, and the other was traditional multiplex PCR combined with BGISEQ-500
library construction. For fairness in the comparison of time consumption,
all objective influencing factors were excluded, such as laboratory,
equipment, operator, and DNA samples. The PCR time is based on the
display time of the same model of the instrument (ABI VERITI Gradient
PCR Instrument).Since DNA extraction
takes a long time and requires
manual manipulation, we explored the omission of DNA extraction steps
using blood cards as DNA samples. We selected six human blood cards,
including four males and two females, with a blood card diameter of
3.0 mm.
Monetary Cost
We calculated the
cost of the two different methods using the Chinese manufacturer’s
reagents. The costs mentioned here did not include the cost of pretesting,
consumables, and instrument depreciation.
Data
Utilization
First, the two
methods were used to test the same batch of human DNA samples (six
samples each). The difference in data utilization between the two
groups was calculated by the t-test. Based on the
filtered data, which had a Phred-like consensus quality ≥20,[35] data utilization was calculated as follows:
total number of reads at all target loci/total number of reads of
filtered data.Additionally, the first-step PCR products (purified)
of the same two groups of human blood DNA samples (three males and
three females) were put in the two different enzyme systems of second-step
PCR, which were 1 U of NH9007 DNA polymerase and 1 U of NH9007 + 0.3
U PlatinumPfx DNA polymerase. Then, the uniformity indicators of
the two groups of sequencing data were calculated and analyzed.
Species Specificity
To avoid the
effects of DNA incorporation by other species on the sequencing results,
we tested nine gibbon stool samples (primates). All DNA starting amounts
were the same, and all library products were used for sequencing.
We compared the data quality and amplification efficiency of the two
groups of data after sequencing.
Authors: Radoje Drmanac; Andrew B Sparks; Matthew J Callow; Aaron L Halpern; Norman L Burns; Bahram G Kermani; Paolo Carnevali; Igor Nazarenko; Geoffrey B Nilsen; George Yeung; Fredrik Dahl; Andres Fernandez; Bryan Staker; Krishna P Pant; Jonathan Baccash; Adam P Borcherding; Anushka Brownley; Ryan Cedeno; Linsu Chen; Dan Chernikoff; Alex Cheung; Razvan Chirita; Benjamin Curson; Jessica C Ebert; Coleen R Hacker; Robert Hartlage; Brian Hauser; Steve Huang; Yuan Jiang; Vitali Karpinchyk; Mark Koenig; Calvin Kong; Tom Landers; Catherine Le; Jia Liu; Celeste E McBride; Matt Morenzoni; Robert E Morey; Karl Mutch; Helena Perazich; Kimberly Perry; Brock A Peters; Joe Peterson; Charit L Pethiyagoda; Kaliprasad Pothuraju; Claudia Richter; Abraham M Rosenbaum; Shaunak Roy; Jay Shafto; Uladzislau Sharanhovich; Karen W Shannon; Conrad G Sheppy; Michel Sun; Joseph V Thakuria; Anne Tran; Dylan Vu; Alexander Wait Zaranek; Xiaodi Wu; Snezana Drmanac; Arnold R Oliphant; William C Banyai; Bruce Martin; Dennis G Ballinger; George M Church; Clifford A Reid Journal: Science Date: 2009-11-05 Impact factor: 47.728