| Literature DB >> 31515274 |
Brett Trost1, Susan Walker1, Syed A Haider1, Wilson W L Sung1, Sergio Pereira1, Charly L Phillips1, Edward J Higginbotham1,2, Lisa J Strug1,3, Charlotte Nguyen1,2, Akshaya Raajkumar1, Michael J Szego4,5, Christian R Marshall6,7, Stephen W Scherer8,2.
Abstract
BACKGROUND: Whole blood is currently the most common DNA source for whole-genome sequencing (WGS), but for studies requiring non-invasive collection, self-collection, greater sample stability or additional tissue references, saliva or buccal samples may be preferred. However, the relative quality of sequencing data and accuracy of genetic variant detection from blood-derived, saliva-derived and buccal-derived DNA need to be thoroughly investigated.Entities:
Keywords: blood; buccal; dna source; saliva; whole-genome sequencing
Mesh:
Substances:
Year: 2019 PMID: 31515274 PMCID: PMC6929712 DOI: 10.1136/jmedgenet-2019-106281
Source DB: PubMed Journal: J Med Genet ISSN: 0022-2593 Impact factor: 6.318
Figure 1Study design. From each of four individuals, three sources of DNA were collected (blood, saliva and buccal). Five DNA libraries were prepared per individual—blood, saliva without eukaryotic DNA enrichment, saliva with enrichment, buccal without enrichment and buccal with enrichment. Whole-genome sequencing and genetic variant detection were performed for the 20 DNA libraries, which were compared with one another to determine the impact of DNA source and eukaryotic DNA enrichment on sequencing data quality and variant detection. B_NE, non-enriched buccal; B_WE, enriched buccal; S_NE, non-enriched saliva; S_WE, enriched saliva.
Figure 2Relationship between bacterial DNA concentration and sequencing metrics. Higher 16S:RPPH1 ratios indicate higher bacterial DNA concentrations. Metrics prefixed with an asterisk were corrected for the total number of reads in a given sample. For saliva and buccal samples, the same sample is shown for sequencing data generated either with or without prior enrichment for eukaryotic DNA. For example, when the saliva sample with 16S:RPPH1 ratio ~2900 (online supplementary figure 1) was sequenced without first performing eukaryotic DNA enrichment, approximately 77% of reads aligned (top-left scatterplot), versus 97% when eukaryotic DNA enrichment was performed. Higher values for the inter-quartile range of sequencing depth indicate lower read-depth uniformity.
SNV- and indel-detection concordance between blood samples and non-enriched saliva or buccal samples and between enriched saliva or buccal samples and the corresponding non-enriched samples, for filtered variants detected anywhere in the genome.
| Concordant | Unique to sample type 1 | Unique to sample type 2 | Concordant | Unique to sample type 1 | Unique to sample type 2 | ||
|
|
| | | ||||
| SNVs | |||||||
| HuRef blood 1 | HuRef blood 2 | 94.8 | 3.6 | 1.6 | 52.4 | 39.5 | 8.1 |
| Blood | Non-enriched saliva | 96.4 | 1.7 | 1.8 | 56.6 | 14.0 | 29.4 |
| Blood | Non-enriched buccal | 96.1 | 2.0 | 1.9 | 49.7 | 14.9 | 35.4 |
| Enriched saliva | Non-enriched saliva | 96.8 | 1.6 | 1.6 | 56.7 | 17.0 | 26.4 |
| Enriched buccal | Non-enriched buccal | 96.1 | 1.8 | 2.1 | 48.8 | 16.3 | 34.9 |
| Indels | |||||||
| HuRef blood 1 | HuRef blood 2 | 87.4 | 5.9 | 6.7 | 65.4 | 14.2 | 20.3 |
| Blood | Non-enriched saliva | 87.0 | 5.9 | 7.1 | 63.4 | 16.6 | 20.0 |
| Blood | Non-enriched buccal | 86.4 | 6.4 | 7.2 | 63.4 | 16.2 | 20.3 |
| Enriched saliva | Non-enriched saliva | 87.1 | 6.2 | 6.7 | 63.3 | 18.9 | 17.7 |
| Enriched buccal | Non-enriched buccal | 86.4 | 6.6 | 7.0 | 64.5 | 18.0 | 17.4 |
Concordances are shown for known variants (those present in gnomAD) and novel variants. Numbers represent the percentage of variants in that category; for instance, of all known SNVs detected in either the non-enriched or the enriched buccal samples from a given individual, 96.1% were detected in both non-enriched and enriched, 1.8% were detected only in enriched, and 2.1% were detected only in non-enriched. HuRef blood 1 and HuRef blood 2 refer to replicates sequenced from the same blood-derived DNA sample and represent a baseline level of concordance; all other values were aggregated across the four study participants. For individual-specific data, see online supplementary file 3.
SNV, single nucleotide variant.
Figure 3Bacterial contamination and the detection of false single nucleotide variants (SNVs). (A) Relationship between bacterial DNA concentration and the number of novel coding SNVs detected in each sample. For further details, see figure 2. (B) Integrative Genomics Viewer read pile-up showing a false SNV in an exon of PTCHD1 detected in the non-enriched saliva sample from individual PGPC-0050, but not in the enriched saliva sample or blood sample from the same individual. The false SNV was detected because many short segments of bacterial reads containing a sequence difference relative to the human reference genome aligned to this region. A BLAST search suggested that the aligned bacterial reads were derived from the genome of Fusobacterium periodonticum (99% query cover, 97% identity), a bacterium known to be found in the human oral cavity.45
Summary of the impact of DNA source and eukaryotic DNA enrichment on the accuracy of genetic variant detection from whole-genome sequencing data.
| Variant type | Sensitivity | False positive rate |
| Blood versus non-enriched saliva or buccal | ||
| SNVs | Little or no difference | Blood |
| Indels | Little or no difference | Little or no difference |
| CNVs (deletions) | Blood | Little or no difference |
| CNVs (duplications) | Blood | Blood |
| Enriched versus non-enriched saliva or buccal | ||
| SNVs | Little or no difference | Enriched |
| Indels | Little or no difference | Little or no difference |
| CNVs (deletions) | Non-enriched | Little or no difference |
| CNVs (duplications) | Non-enriched | Non-enriched |
For each comparison, the better sample type (ie, the one having higher sensitivity or a lower false positive rate) is indicated. Blood and enriched saliva and buccal samples tended to have lower false positive rates for SNVs than non-enriched saliva and buccal samples, but the magnitude of the differences were small except when considering rare SNVs (see text) and exhibited variability across individuals.
CNV, copy number variant; SNV, single nucleotide variant.
Concordance between blood samples and non-enriched saliva or buccal samples and between enriched saliva or buccal samples and the corresponding non-enriched samples, for common CNVs (those with >1% frequency in MSSNG parents32).
| Concordant | Unique to | Unique to | Concordant | Unique to | Unique to | Concordant | Unique to | Unique to | ||
|
|
|
|
|
| ||||||
| Deletions | ||||||||||
| HuRef blood 1 | HuRef blood 2 | 127 | 41 (3/3) | 35 (2/3) | 66 | 7 (3/3) | 2 (2/2) | 31 | 0 (0/0) | 1 (1/1) |
| Blood | Non-enriched saliva | 463 | 244 (32/32) | 117 (23/23) | 222 | 64 (29/29) | 14 (13/13) | 147 | 15 (12/12) | 4 (3/4) |
| Blood | Non-enriched buccal | 460 | 247 (35/35) | 107 (19/19) | 248 | 38 (23/23) | 13 (10/10) | 149 | 13 (11/11) | 9 (7/9) |
| Enriched saliva | Non-enriched saliva | 359 | 100 (18/18) | 220 (22/23) | 190 | 37 (21/21) | 47 (19/19) | 123 | 7 (7/7) | 28 (8/9) |
| Enriched buccal | Non-enriched buccal | 209 | 18 (14/14) | 360 (36/37) | 104 | 7 (6/7) | 154 (34/34) | 71 | 12 (4/9) | 88 (23/24) |
| Duplications | ||||||||||
| HuRef blood 1 | HuRef blood 2 | 28 | 2 (1/2) | 10 (1/3) | 17 | 3 (2/3) | 2 (1/2) | 32 | 2 (0/2) | 3 (0/3) |
| Blood | Non-enriched saliva | 107 | 34 (10/21) | 17 (6/15) | 49 | 13 (6/13) | 3 (1/3) | 150 | 9 (1/8) | 14 (0/12) |
| Blood | Non-enriched buccal | 105 | 36 (7/21) | 31 (5/22) | 48 | 14 (7/14) | 10 (2/10) | 146 | 13 (0/10) | 11 (1/10) |
| Enriched saliva | Non-enriched saliva | 85 | 12 (5/10) | 39 (7/19) | 33 | 6 (3/6) | 18 (7/12) | 123 | 27 (1/17) | 42 (4/22) |
| Enriched buccal | Non-enriched buccal | 49 | 3 (0/3) | 84 (12/36) | 22 | 1 (1/1) | 40 (12/27) | 110 | 33 (0/22) | 46 (5/19) |
The ‘concordant’ columns contain the number of CNVs detected in both sample type 1 and sample type 2. The ‘unique to sample type 1’ columns contain the total number of CNVs detected in sample type 1 but not sample type 2, followed by an expression of the form , where is the number of CNVs verified as correct by visual inspection and is the total number inspected (and analogously for the ‘unique to sample type 2’ columns). For example, 209 common deletions between 1 and 5 kb were detected in both the enriched buccal sample and the non-enriched buccal sample in the same individual, while 18 were detected only in the enriched buccal sample and 360 were detected only in the non-enriched buccal sample. Of the 37 deletions detected only in non-enriched buccal samples that were checked by visual confirmation, 36 were classified as true. HuRef blood 1 and HuRef blood 2 refer to replicates sequenced from the same blood-derived DNA sample and represent a baseline level of concordance. All other counts were aggregated across the four study participants. For individual-specific data, see online supplementary file 3.
CNV, copy number variant.