| Literature DB >> 29618827 |
Martin Sauk1, Olga Žilina1, Ants Kurg1, Eva-Liina Ustav2,3, Maire Peters2,4, Priit Paluoja4, Anne Mari Roost4, Hindrek Teder4,5, Priit Palta6, Nathalie Brison7, Joris R Vermeesch7, Kaarel Krjutškov4,8,9, Andres Salumets10,11,12,13, Lauris Kaplinski14.
Abstract
Non-invasive prenatal testing (NIPT) is a recent and rapidly evolving method for detecting genetic lesions, such as aneuploidies, of a fetus. However, there is a need for faster and cheaper laboratory and analysis methods to make NIPT more widely accessible. We have developed a novel software package for detection of fetal aneuploidies from next-generation low-coverage whole genome sequencing data. Our tool - NIPTmer - is based on counting pre-defined per-chromosome sets of unique k-mers from raw sequencing data, and applying linear regression model on the counts. Additionally, the filtering process used for k-mer list creation allows one to take into account the genetic variance in a specific sample, thus reducing the source of uncertainty. The processing time of one sample is less than 10 CPU-minutes on a high-end workstation. NIPTmer was validated on a cohort of 583 NIPT samples and it correctly predicted 37 non-mosaic fetal aneuploidies. NIPTmer has the potential to reduce significantly the time and complexity of NIPT post-sequencing analysis compared to mapping-based methods. For non-commercial users the software package is freely available at http://bioinfo.ut.ee/NIPTMer/ .Entities:
Mesh:
Substances:
Year: 2018 PMID: 29618827 PMCID: PMC5884839 DOI: 10.1038/s41598-018-23589-8
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1The number of 25-mers in the final k-mer lists used in the NIPT analysis (black) and the maximum number of possible k-mers (white) for each chromosome. The final k-mer lists for autosomes ranged from 7,781,826 (chr 21) to 54,629,103 (chr 2) unique 25-mers. The lists for chromosomes X and Y were the smallest: 5,860,559 and 5,071,089 unique 25-mers, respectively.
Figure 2NIPTmer results for 294 subjects of Estonian cohort collected from Tartu University Hospital. Chromosome 21 (A) and chromosome 18 (B). Samples are ordered on the x-axis by the order of subject enrollment. The y-axis represents the standardized deviation (z-score) between the predicted and observed coverage of a given chromosome. Euploid samples are represented by gray points and aneuploid samples by red points. If we apply the cut-off line at 3.5 SD, all five T21, four T18 samples and one mosaic T18 case (z-score = 3.8) are recognized.
Figure 3NIPTmer results for 289 subjects of Belgian cohort. Chromosome 21 (A), chromosome 18 (B) and chromosome 13 (C). Samples are ordered on the x-axis by the order of subject enrollment. The y-axis represents the standardized deviation (z-score) between the predicted and observed coverage of given chromosome. Euploid samples are represented by gray points and aneuploidy samples by red points. All non-mosaic aneuploidies, including T21 (15/15), T18 (10/10), and T13 (3/3), had higher z-scores than the highest control score, although the difference between lowest trisomy and highest normal score was small for trisomies 18 and 21. Unfortunately, we could not detect one mosaic T13 with our pipeline.
Figure 4Creation of per-chromosome k-mer lists is composed of the following steps: creating lists of per-chromosome k-mers from reference genome; removing non-unique k-mers; removing potentially polymorphic k-mers; removing k-mers from problematic regions of genome like centromeres, telomeres, pseudoautosomal regions etc; and intersecting with list of k-mers that have normal copy number in population.
Figure 5Data flow during aneuploidy calling step. Raw counts of each chromosome are divided by the length of k-mer list of respective chromosome to obtain the coverage. The coverage is separated between sample group and reference group. Only reference group is used to generate the linear regression model and later to calculate the average and standard deviation between the observed and predicted coverages. Z-score is used to call the aneuploidy. Cut-off is customizable depending on the type 1 and type 2 error tolerated in the study. Mahalanobis distance is used as a quality control parameter.