| Literature DB >> 28854615 |
Sarah Siu Tze Mak1, Shyam Gopalakrishnan1, Christian Carøe1,2, Chunyu Geng3, Shanlin Liu1,4, Mikkel-Holger S Sinding1,5,6, Lukas F K Kuderna7,8, Wenwei Zhang3, Shujin Fu3, Filipe G Vieira1, Mietje Germonpré9, Hervé Bocherens10,11, Sergey Fedorov12, Bent Petersen2, Thomas Sicheritz-Pontén2, Tomas Marques-Bonet7,8,13, Guojie Zhang4,14, Hui Jiang3, M Thomas P Gilbert1,15,16.
Abstract
Ancient DNA research has been revolutionized following development of next-generation sequencing platforms. Although a number of such platforms have been applied to ancient DNA samples, the Illumina series are the dominant choice today, mainly because of high production capacities and short read production. Recently a potentially attractive alternative platform for palaeogenomic data generation has been developed, the BGISEQ-500, whose sequence output are comparable with the Illumina series. In this study, we modified the standard BGISEQ-500 library preparation specifically for use on degraded DNA, then directly compared the sequencing performance and data quality of the BGISEQ-500 to the Illumina HiSeq2500 platform on DNA extracted from 8 historic and ancient dog and wolf samples. The data generated were largely comparable between sequencing platforms, with no statistically significant difference observed for parameters including level (P = 0.371) and average sequence length (P = 0718) of endogenous nuclear DNA, sequence GC content (P = 0.311), double-stranded DNA damage rate (v. 0.309), and sequence clonality (P = 0.093). Small significant differences were found in single-strand DNA damage rate (δS; slightly lower for the BGISEQ-500, P = 0.011) and the background rate of difference from the reference genome (θ; slightly higher for BGISEQ-500, P = 0.012). This may result from the differences in amplification cycles used to polymerase chain reaction-amplify the libraries. A significant difference was also observed in the mitochondrial DNA percentages recovered (P = 0.018), although we believe this is likely a stochastic effect relating to the extremely low levels of mitochondria that were sequenced from 3 of the samples with overall very low levels of endogenous DNA. Although we acknowledge that our analyses were limited to animal material, our observations suggest that the BGISEQ-500 holds the potential to represent a valid and potentially valuable alternative platform for palaeogenomic data generation that is worthy of future exploration by those interested in the sequencing and analysis of degraded DNA.Entities:
Keywords: BGISEQ-500; Illumina HiSeq 2500; ancient DNA; comparative performance
Mesh:
Substances:
Year: 2017 PMID: 28854615 PMCID: PMC5570000 DOI: 10.1093/gigascience/gix049
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Samples from which aDNA was extracted
| Sample | Original ID | Material | Species | Locality | Age | Extraction |
|---|---|---|---|---|---|---|
| 214 | CN 214 | Hide | Wolf | Uummannaq, Greenland | Before 1869 AD | A |
| 1921 | CN 1921 | Hide | Wolf | Rosenvinge Bugt, Greenland | 1925 AD | A |
| P84 | MGUH VP 3332 | Humerus | Wolf | Vølvedal, Greenland | ca. 7620 cal YBP | B |
| P83 | NKA 1950 × 2906 | Canine tooth | Dog | GUS, Greenland | ca. 600–1000 YBP | B |
| P79 | ZMK 350/1982 | Tibia | Dog | Qajâ, Greenland | ca. 3,6–2700 YBP | B |
| FRC | FRC | Cartilage | Large canid | Tumat, Siberia | ca. 14 122 cal YBP | C |
| L | L | Liver | Large canid | Tumat, Siberia | ca. 14 122 cal YBP | C |
| M1 | M1 | Muscle | Large canid | Tumat, Siberia | ca. 14 122 cal YBP | C |
Summary of data generated
| Sample | Platform | Total reads | Normalized % reads retained after adapter removal | Normalized clonality | Normalized endogenous DNA (%) | Normalized length of uniquely mapped reads | θ | δD | δS | GC content (%) | mtDNA (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1921 | Illumina | 3.08E+07 | 94.69 | 0.11 | 58.73 | 40.77 | 0.008 | 0.008 | 0.154 | 51.58 | 4.51E-03 |
| BGISEQ-500 | 5.32E+07 | 83.97 | 0.15 | 59.37 | 42.14 | 0.009 | 0.008 | 0.132 | 50.42 | 2.57E-03 | |
| 214 | Illumina | 1.35E+07 | 99.13 | 0.07 | 74.25 | 49.37 | 0.008 | 0.011 | 0.084 | 48.60 | 4.15E-03 |
| BGISEQ-500 | 1.98E+08 | 99.55 | 0.07 | 75.51 | 53.08 | 0.009 | 0.012 | 0.061 | 47.75 | 3.11E-04 | |
| FRC | Illumina | 1.64E+07 | 99.54 | 0.03 | 11.58 | 73.05 | 0.008 | 0.012 | 0.399 | 44.01 | 4.55E-03 |
| BGISEQ-500 | 3.39E+08 | 99.79 | 0.02 | 10.22 | 75.63 | 0.012 | 0.012 | 0.325 | 43.64 | 1.98E-04 | |
| L | Illumina | 2.91E+07 | 99.63 | 0.09 | 1.03 | 64.65 | 0.013 | 0.010 | 0.415 | 43.24 | 6.04E-03 |
| BGISEQ-500 | 2.44E+08 | 99.77 | 0.08 | 0.85 | 66.72 | 0.013 | 0.009 | 0.262 | 45.99 | 7.09E-04 | |
| M1 | Illumina | 5.10E+07 | 99.38 | 0.06 | 64.09 | 72.95 | 0.007 | 0.010 | 0.395 | 44.27 | 8.02E-03 |
| BGISEQ-500 | 1.79E+08 | 99.74 | 0.06 | 54.80 | 76.76 | 0.012 | 0.010 | 0.258 | 43.23 | 2.31E-03 | |
| P79 | Illumina | 4.18E+07 | 98.48 | 0.38 | 0.07 | 52.45 | 0.030 | 0.012 | 0.880 | 43.36 | 4.65E-06 |
| BGISEQ-500 | 8.55E+07 | 98.08 | 0.10 | 0.06 | 45.77 | 0.039 | 0.011 | 0.550 | 44.21 | 6.40E-07 | |
| P83 | Illumina | 2.77E+07 | 84.67 | 0.58 | 0.64 | 65.78 | 0.014 | 0.040 | 0.842 | 42.32 | 4.85E-04 |
| BGISEQ-500 | 2.32E+07 | 86.84 | 0.32 | 0.47 | 66.55 | 0.017 | 0.040 | 0.773 | 44.30 | 3.87E-04 | |
| P84 | Illumina | 5.94E+07 | 98.70 | 0.31 | 0.12 | 54.79 | 0.015 | 0.030 | 0.355 | 44.42 | 2.71E-06 |
| BGISEQ-500 | 1.57E+08 | 92.45 | 0.08 | 0.10 | 51.13 | 0.022 | 0.020 | 0.154 | 47.99 | 5.15E-07 |
Results of statistical analyses of the data
| Test | Paired |
|
|---|---|---|
| % reads retained | −1.131308 | 0.295 |
| Clonality levels | −1.942886 | 0.093 |
| % endogenous DNA | −0.956158 | 0.371 |
| Endogenous DNA average read length | 0.0375544 | 0.718 |
| θ | 3.366145 | 0.012a |
| δD | −1.09765 | 0.309 |
| δS | −3.425669 | 0.011a |
| % GC | 1.091076 | 0.311 |
| % mtDNA | −3.073585 | 0.018a |
aSignificant at P < 0.05.
Figure 1:Library complexity estimated as the number of unique reads as a function of the total number of reads sequenced. These numbers are estimated and extrapolated using the program preseq [84]. The total number of reads sequenced for each library can be found in Table 2 and Supplemental Table S1. The solid lines are the estimates for the libraries sequenced on the Illumina HiSeq 2500 platform, while the dotted lines are the estimates for the libraries sequenced on the BGISEQ-500. Each of the 8 samples is represented by a different colour.
Figure 2:Heatmap of k-mer counts across libraries. Libraries (columns) were hierarchically clustered based on Pearson correlation. Proportion of each of the 4096 6-mer (rows) are depicted using colours.
Figure 3:Top: median normalized fragment count (NFC) per 100 Kb windows, with 10 Kb offset for the sample 214 along scaffold_0. The solid line shows Illumina data, and the dotted line shows BGISEQ-500 data. Bottom: percentage GC calculated over the same the same windows as in the upper panel.
Figure 4:Median NFC of Illumina vs BGISEQ-500 for all samples in windows of 100 Kb, with an offset of 1 0Kb along scaffold_0. The color of each point corresponds to the windows’ GC content. For the high-quality samples (1921, 214, FRC, M1), a very good correlation of NFC between the 2 platforms can be observed. The fragment count seems to be correlated with GC content.
Overview of r2 values for normalized fragment counts between Illumina and BGISEQ-500 for windows of 100 Kb
| Sample |
|
|---|---|
| CN 1921 | 0.976 |
| CN 214 | 0.965 |
| FRC | 0.772 |
| L | 0.904 |
| M1 | 0.954 |
| P83 | 0.084 |
| P84 | 0.513 |
Coefficients of determination for copy number in the same genomic windows between platforms, for all extracts at varying resolution
| CW size | ||||||
|---|---|---|---|---|---|---|
| Sample | 1000 Kbp | 100 Kbp | 50 Kbp | 10 Kbp | 5 Kbp | 1 Kbp |
| 214 | 0.905 | 0.331 | 0.354 | 0.506 | 0.519b | 0.433c |
| 1921 | 0.963 | 0.384 | 0.392 | 0.428 | 0.432b | 0.393c |
| FRC | 0.582 | 0.847 | 0.870 | 0.873b | 0.870c | 0.783c |
| L | 0.941b | 0.957c | 0.964c | 0.958c | 0.955c | ND |
| M1 | 0.665 | 0.943 | 0.952 | 0.953 | 0.950 | 0.910b |
| P79 | 0.672b | ND | ND | ND | ND | ND |
| P83 | 0.203b | 0.003c | 0.004c | 0.003c | 0.002c | ND |
| P84 | 0.919b | 0.001c | 0.001c | ND | ND | ND |
ND = insufficient data for at least 1 platform.
aDenotes a pass of quality control (visual inspection of read depth density in control regions and proper SW/CW and LW/CW ratios).
bDenotes suboptimal quality, e.g., not perfectly symmetrical, bell-shaped read depth distribution in control regions.
cDenotes failed QC for at least 1 platform.