| Literature DB >> 30765883 |
Ólavur Mortensen1, Leivur Nattestad Lydersen1, Katrin Didriksen Apol1, Guðrið Andorsdóttir1, Bjarni Á Steig2, Noomi Oddmarsdóttir Gregersen3.
Abstract
Long-term collection of dried blood spot (DBS) samples through newborn screening may have retrospective and prospective advantages, especially in combination with advanced analytical techniques. This work concerns whether linked-reads may overcome some of the limitations of short-read sequencing of DBS samples, such as performing molecular phasing. We performed whole-exome sequencing of DNA extracted from DBS and corresponding whole blood (WB) reference samples, belonging to a trio with unaffected parents and a proband affected by primary carnitine deficiency (PCD). For the DBS samples we were able to phase >21% of the genes under 100 kb, >40% of the SNPs, and the longest phase block was >72 kb. Corresponding results for the WB reference samples was >85%, >75%, and >915 kb, respectively. Concerning the PCD causing variant (rs72552725:A > G) in the SLC22A5 gene we observe full genotype concordance between DBS and WB for all three samples. Furthermore, we were able to phase all variants within the SLC22A5 gene in the proband's WB data, which shows that linked-read sequencing may replace the trio information for haplotype detection. However, due to smaller molecular lengths in the DBS data only small phase blocks were observed in the proband's DBS sample. Therefore, further optimisation of the DBS workflow is needed in order to explore the full potential of DBS samples as a test bed for molecular phasing.Entities:
Mesh:
Year: 2019 PMID: 30765883 PMCID: PMC6777531 DOI: 10.1038/s41431-019-0343-3
Source DB: PubMed Journal: Eur J Hum Genet ISSN: 1018-4813 Impact factor: 4.246
Results from linked-read whole-exome sequencing of a trio using dried blood spots and whole-blood samples
| Whole blood | Dried blood spots | |||||||
|---|---|---|---|---|---|---|---|---|
| Father | Mother | Proband | Average | Father | Mother | Proband | Average | |
| Median molecule length (bp) | 45,239 | 47,415 | 55,129 | 49,261 | 11,155 | 2425 | 15,643 | 9741 |
| % DNA in molecules >20 kb | 80 | 82 | 84 | 82 | 14 | 5 | 12 | 11 |
| % DNA in molecules >100 kb | 7 | 13 | 13 | 11 | 0.42 | 1.29 | 0.75 | 0.82 |
| DNA loaded (ng) | 1.14 | 1.58 | 1.17 | 1.30 | 0.20 | 0.29 | 0.30 | 0.26 |
| GEM Performance | ||||||||
| GEMs detected | 1,472,690 | 1,627,223 | 1,450,403 | 1,516,772 | 1,236,375 | 1,235,548 | 1,124,027 | 1,198,650 |
| Mean DNA per GEM (bp) | 488,116 | 673,692 | 501,609 | 554,472 | 86,500 | 122,798 | 128,697 | 112,665 |
|
| ||||||||
| Number of reads | 102,043,815 | 146,325,136 | 87,244,491 | 111,871,147 | 134,915,555 | 111,553,510 | 110,707,511 | 119,058,859 |
| Fold coverage | 66 | 94 | 56 | 72 | 77 | 67 | 61 | 68 |
| Zero coverage | 0.19 | 0.37 | 0.62 | 0.39 | 0.85 | 1.23 | 1.24 | 1.11 |
| Median insert size (bp) | 197 | 204 | 183 | 195 | 166 | 172 | 184 | 174 |
| % bases on target | 60 | 58 | 59 | 59 | 60 | 60 | 55 | 58 |
| % fragments on target | 60 | 59 | 59 | 59 | 60 | 60 | 56 | 59 |
| % mapped reads | 98 | 98 | 98 | 98 | 94 | 96 | 96 | 95 |
| Avg. mapping quality | 59 | 56 | 56 | 57 | 52 | 53 | 54 | 53 |
| % duplication | 17 | 24 | 28 | 23 | 40 | 43 | 45 | 43 |
Number of variants called from whole-blood (WB) and dried blood spots (DBS) samples from the trio
| WB | DBS | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Father | Mother | Proband | Total | Father | Mother | Proband | Total | Overlap | |
| Variants called | 592,661 | 747,825 | 591,139 | 1,538,193 | 898,132 | 726,946 | 862,282 | 2,135,974 | 430,865 |
| Filtered variants* | 161,705 | 171,137 | 162,323 | 219,586 | 163,433 | 160,447 | 162,179 | 245,736 | 209,931 |
| Variants by impact | |||||||||
| High | 842 | 928 | 896 | 1219 | 1361 | 1415 | 1386 | 2804 | 1146 |
| Moderate | 11.254 | 11,683 | 11,328 | 15,583 | 12,897 | 12,853 | 12,368 | 21,149 | 14,855 |
| Low | 14,988 | 15,616 | 15,186 | 20,413 | 15,212 | 15,175 | 14,883 | 22,731 | 19,627 |
| Modifier | 134,621 | 142,910 | 134,913 | 182,371 | 133,963 | 131,004 | 133,542 | 199,052 | 174,303 |
|
| |||||||||
| SNPs phased (%) | 81 | 84 | 75 | 58 | 40 | 49 | |||
| Genes phased (<100 kb) (%) | 92 | 94 | 85 | 42 | 21 | 33 | |||
| Longest phase block (kb) | 938 | 2 071 | 1 058 | 149 | 116 | 100 | |||
| N50 phase block (kb) | 124 | 176 | 113 | 13 | 4 | 11 | |||
The total number of variants corresponds to variants seen in at least one of the individuals in the trio. The overlap is the number of variants present in the total DBS and WB data sets. Phasing summary is based on filtered variants only
*Filters: read depth >20, variant call quality >20, and multi-allelic sites removed
Fig. 1Genotype concordance rate between whole-blood (WB) and corresponding dried blood spot (DBS) samples is shown individually (a) and averaged overall three samples for SNPs, insertions, and deletions (b). Filtered variants are filtered according to genotype quality >20 and read depth >20, and multi-allelic variants are removed
Fig. 2The plot shows coverage and variants in the SLC22A5 gene using data from the WB and DBS trio samples. The row with green bands correspond to exons of transcript uc003kww.4 (UCSC identifier) of the SLC22A5 gene. Regions in between the exons are excluded from the plot to increase clarity. The two rows with blue bands are coverage in the WB and DBS data sets. Only regions where all three samples have coverage >20x are shown. Variants (variant call quality >20 and read depth >20x ) within the SLC22A5 gene identified in the trio (F = father, M = mother, P = proband) are shown in the inner rows (WB and DBS); alleles are colour coded as follows: reference allele = blue and alternate alleles = red or green (random order), no calls = black. Phase blocks are shown in the inner rows as grey boxes (note that phase blocks spanning multiple exons are connected with dotted lines). Heterozygous variants that are not phased are shown closer together in the middle of the band. H1 = haplotype 1, and H2 = haplotype 2. The plot was generated using Circos [20]