| Literature DB >> 27437397 |
Baohong Liu1, Xiaoyan Tang2, Feng Qiu3, Chunmei Tao4, Junhui Gao3, Mengmeng Ma4, Tingyan Zhong4, JianPing Cai1, Yixue Li5, Guohui Ding6.
Abstract
Background. With the development of massively parallel sequencing (MPS), noninvasive prenatal diagnosis using maternal cell-free DNA is fast becoming the preferred method of fetal chromosomal abnormality detection, due to its inherent high accuracy and low risk. Typically, MPS data is parsed to calculate a risk score, which is used to predict whether a fetal chromosome is normal or not. Although there are several highly sensitive and specific MPS data-parsing algorithms, there are currently no tools that implement these methods. Results. We developed an R package, detection of autosomal abnormalities for fetus (DASAF), that implements the three most popular trisomy detection methods-the standard Z-score (STDZ) method, the GC correction Z-score (GCCZ) method, and the internal reference Z-score (IRZ) method-together with one subchromosome abnormality identification method (SCAZ). Conclusions. With the cost of DNA sequencing declining and with advances in personalized medicine, the demand for noninvasive prenatal testing will undoubtedly increase, which will in turn trigger an increase in the tools available for subsequent analysis. DASAF is a user-friendly tool, implemented in R, that supports identification of whole-chromosome as well as subchromosome abnormalities, based on maternal cell-free DNA sequencing data after genome mapping.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27437397 PMCID: PMC4942598 DOI: 10.1155/2016/2714341
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Description of reference datasets.
| Dataset name | Description |
|---|---|
| NCR_ref | Ratios of uniquely mapping reads for every chromosome and the total number of sequences uniquely mapped to the genome for all 120 samples |
|
| |
| GC_ref | GC content for every chromosome for all 120 samples |
|
| |
| Tag_pos | Genomic bins positions for widths of 1 Mb and 100 kb |
|
| |
| Bin_GC | GC content calculated for genomic bins for all samples |
|
| |
| Nearest_bin_ref | Bin names of the 10 bins for the 1 Mb data and the 40 bins for the 100 kb data, which are with the nearest GC content for every divided bin |
|
| |
| BRV_ref | Ratios of reads within a bin to the total number of reads in bins with the nearest GC percentages |
Figure 1Workflow of the R package DASAF. The DASAF workflow consists of two main parts: (a) mapping reads statistics and (b) autosomal aneuploidy prediction. The results from (a) are used to calculate the risk score using any of the four methods implemented in (b). STDZ: standard Z-score; GCCZ: GC correction Z-score; IRZ: internal reference Z-score; and SCAZ: subchromosome abnormalities Z-score.
Figure 2Z-score distribution for maternal cell-free DNA samples at varying sequencing depths. The black dots represent samples from the 100 bp pair-end run at a depth of 12 M reads. The black squares, diamonds, and triangles represent samples from the original 100 bp paired end run that have been randomly subsampled to 7 M, 5 M, and 3 M reads, respectively. The black line indicates the Z-score threshold of 3.
Execution time (in seconds) to detect chromosomal abnormalities using different methods.
| 12 M reads | 20 M reads | 40 M reads | |
|---|---|---|---|
| Standard | 1 | 1 | 2 |
| GC Correction | 312 | 629 | 1,156 |
| Internal reference | 1 | 1 | 1 |
| Subchromosome abnormality | 2,074 | 2,105 | 2,278 |
The computing platform is a Linux system with 16 threads (0.8 GHZ for each) and RAM of 64 GB. Execution time was averaged over five repetitive runs.