| Literature DB >> 31416246 |
Juraj Gazdarica1,2,3, Rastislav Hekel4,5,6, Jaroslav Budis4,6,7, Marcel Kucharik4, Frantisek Duris8, Jan Radvanszky4,9, Jan Turna5,7,9, Tomas Szemes4,5,7.
Abstract
The reliability of non-invasive prenatal testing is highly dependent on accurate estimation of fetal fraction. Several methods have been proposed up to date, utilizing different attributes of analyzed genomic material, for example length and genomic location of sequenced DNA fragments. These two sources of information are relatively unrelated, but so far, there have been no published attempts to combine them to get an improved predictor. We collected 2454 single euploid male fetus samples from women undergoing NIPT testing. Fetal fractions were calculated using several proposed predictors and the state-of-the-art SeqFF method. Predictions were compared with the reference Y-based method. We demonstrate that prediction based on length of sequenced DNA fragments may achieve nearly the same precision as the state-of-the-art methods based on their genomic locations. We also show that combination of several sample attributes leads to a predictor that has superior prediction accuracy over any single approach. Finally, appropriate weighting of samples in the training process may achieve higher accuracy for samples with low fetal fraction and so allow more reliability for subsequent testing for genomic aberrations. We propose several improvements in fetal fraction estimation with a special focus on the samples most prone to wrong conclusion.Entities:
Keywords: DNA; NIPT; fetal cells; fetal fraction; maternal serum screening; statistical methods
Mesh:
Year: 2019 PMID: 31416246 PMCID: PMC6719007 DOI: 10.3390/ijms20163959
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Average fragment-length profiles determined from samples with selected ranges of fetal fractions calculated across lengths. Samples with higher fetal fraction (FF) also have shorter fragments indicating that maternal fragments are longer than fetal ones.
Figure 2Boxplots of MSE and Pearson correlation of Y-based FF and estimated FF obtained from several different methods calculated for 100 testing sets. Training was performed on 100 complementary training sets. FRAC(PUB) represents the ratio of fragment length intervals designed by the study (Yu et al., 2014), FRAC means our best ratio of fragment length intervals derived by the best Pearson correlation with Y-based FF, NN—neural network, NLRM—non-linear regression model based on FRAC parameters, LRM—linear model using all fragment length parameters from 50–220 bp, SVM—support vector machine using all fragment length parameters from 50–220 bp.
Figure 3Boxplots of MSE and Pearson correlation of Y-based FF and estimated FF using SVM method calculated for 100 testing sets. Training was performed on 100 complementary training sets. Combined approach is denoted SeqFF + SVM. Improvement for every method was achieved by adding sample attributes (DNA library concentration, BMI, gestational age)—represented by “+ SA”.
Figure 4Linear regression of Y-based method with the combined method. Black circles denote individual testing samples. Dashed line represents overall trend of the prediction and the grey line is the 45° line.
The table shows corresponding feature weights of four trained linear models. Each row represents a single feature of a linear model and each column represents a specific model. Each model has a different combination of features. If a feature is not part of the model, the value is empty (labeled by dash). Since one hundred models were trained for each combination, only the model with median correlation is displayed. SVM: support vector machine estimator prediction based on fragment length profile. SeqFF: prediction of the SeqFF model. BMI: body mass index of the mother. LC: DNA library concentration. GA: gestational age. SA: sample attributes (BMI + LC + GA).
| Method | SeqFF + SA | SVM + SA | SVM + SeqFF | Seqff + SVM + SA |
|---|---|---|---|---|
| SVM | – | 0.0418 | 0.0269 | 0.0243 |
| SeqFF | 1.1325 | – | 0.0237 | 0.0255 |
| BMI | −0.0006 | −0.0058 | – | −0.0019 |
| LC | −0.0020 | −0.0005 | – | −0.0015 |
| GA | ~0.0000 | 0.0037 | – | 0.0016 |
| Intercept | 0.0204 | 0.1222 | 0.1223 | 0.1227 |
Figure 5Mean absolute error Y-based FF and estimated FF using SVM method (lower is better). Only samples with FF < 10% used in training were sampled (weighted) with weights 2×, 3×, and 4×. Samples with FF > 10% were selected once. Scatterplots represented by linear regression with weighted samples are presented in the supplement (Supplementary Figures S1–S3).