| Literature DB >> 34350716 |
Julie F Foley1, Brian Elgart2, B Alex Merrick1, Dhiral P Phadke2, Molly E Cook3, Jason A Malphurs3, Gregory G Solomon3, Ruchir R Shah2, Michael B Fessler3, Frederick W Miller3, Kevin E Gerrish3.
Abstract
Cell-free DNA circulates in plasma at low levels as a normal by-product of cellular apoptosis. Multiple clinical pathologies, as well as environmental stressors can lead to increased circulating cell-free DNA (ccfDNA) levels. Plasma DNA studies frequently employ targeted amplicon deep sequencing platforms due to limited concentrations (ng/ml) of ccfDNA in the blood. Here, we report whole genome sequencing (WGS) and read distribution across chromosomes of ccfDNA extracted from two human plasma samples from normal, healthy subjects, representative of limited clinical samples at <1 ml. Amplification was sufficiently robust with ~90% of the reference genome (GRCh38.p2) exhibiting 10X coverage. Chromosome read coverage was uniform and directly proportional to the number of reads for each chromosome across both samples. Almost 99% of the identified genomic sequence variants were known annotated dbSNP variants in the hg38 reference genome. A high prevalence of C>T and T>C mutations was present along with a strong concordance of variants shared between the germline genome databases; gnomAD (81.1%) and the 1000 Genome Project (93.6%). This study demonstrates isolation and amplification procedures from low input ccfDNA samples that can detect sequence variants across the whole genome from amplified human plasma ccfDNA that can translate to multiple clinical research disciplines.Entities:
Keywords: circulating cell-free DNA; genomic sequencing; plasma DNA; variants
Mesh:
Substances:
Year: 2021 PMID: 34350716 PMCID: PMC8339531 DOI: 10.14814/phy2.14993
Source DB: PubMed Journal: Physiol Rep ISSN: 2051-817X
Plasma ccfDNA sample summary
| Sample | Pre‐amplification concentration (ng/ul) | ccfDNA input | PCR cycles | Post‐amplification clean up concentration (ng/uL) | Library size | Fold enrichment |
|---|---|---|---|---|---|---|
| 1 | 0.1 | 1 ng | 14 | 13 | 340 bp | 650 |
| 2 | 0.1 | 1 ng | 14 | 11 | 340 bp | 550 |
Summary performance statistics for WGS from amplified ccfDNA
| Sample ID | Read length | Total reads (Million) | Aligned reads | Alignment rate | Duplicate reads |
|---|---|---|---|---|---|
| 1 | PE−150 | 857.9 | 456.7 | 53% | 45% |
| 2 | PE−150 | 955.4 | 699.3 | 73% | 25% |
PE‐150: Paired‐end, 150 base pairs
FIGURE 1Coverage on the hg38 reference genome. The percentage of bases covered in the human GRCh38.p2 reference genome is shown at 1X, 10X, 20X, 30X, 50X, and 100X depth of coverage. Testing at 10X and 20X demonstrated sufficient coverage of the reference genome for accurate future downstream variant calls
FIGURE 2Parallel sample coverage of RPM normalized signal for the housekeeping genes, ALB and B2M
FIGURE 3Distribution of coverage across each chromosome to determine the evidence of sequence bias. The percentage of aligned reads was normalized by chromosome length. For each sample, there is uniform coverage of each chromosome across the genome. Variant calls were performed using the GATK Pipeline (Broad)
FIGURE 4Coverage in protein coding exon bases from Agilent SureSelect® Human All Exon V7 probe design. The percentage of protein coding exon bases is shown at 1X, 10X, 20X, 30X, 50X, and 100X depth of coverage. Depth of coverage at 20X demonstrated sufficient coverage of the exonic bases for accurate, future downstream variant calls
FIGURE 5Mutational spectrum of the final variant set after excluding the low impact variants. Approximately two thirds of the variants were C>T and T>C mutations