| Literature DB >> 34467602 |
Zeyu Yang1, Jesse Slone1, Xinjian Wang1, Jack Zhan1, Yongbo Huang2, Bahram Namjou2, Kenneth M Kaufman2,3, Michael Pauciulo1, John B Harley2,3, Louis J Muglia1,4, Iouri Chepelev2, Taosheng Huang1.
Abstract
Preterm birth (PTB), or birth that occurs earlier than 37 weeks of gestational age, is a major contributor to infant mortality and neonatal hospitalization. Mutations in the mitochondrial genome (mtDNA) have been linked to various rare mitochondrial disorders and may be a contributing factor in PTB given that maternal genetic factors have been strongly linked to PTB. However, to date, no study has found a conclusive connection between a particular mtDNA variant and PTB. Given the high mtDNA copy number per cell, an automated pipeline was developed for detecting mtDNA variants using low-coverage whole-genome sequencing (lcWGS) data. The pipeline was first validated against samples of known heteroplasmy, and then applied to 929 samples from a PTB cohort from diverse ethnic backgrounds with an average gestational age of 27.18 weeks (range: 21-30). Our new pipeline successfully identified haplogroups and a large number of mtDNA variants in this large PTB cohort, including 8 samples carrying known pathogenic variants and 47 samples carrying rare mtDNA variants. These results confirm that lcWGS can be utilized to reliably identify mtDNA variants. These mtDNA variants may make a contribution toward preterm birth in a small proportion of live births.Entities:
Keywords: human genetics; low-coverage whole-genome sequencing; mitochondrial disease; mitochondrial genome; preterm birth
Mesh:
Substances:
Year: 2021 PMID: 34467602 PMCID: PMC9290920 DOI: 10.1002/humu.24279
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.700
Figure 1Summary of the preterm patient cohort and low‐coverage whole‐genome sequencing (lcWGS) analysis pipeline. (a) Histogram of gestational age in the 926 preterm samples, not including the three discarded samples and 31 technical controls. (b) Summary of racial demographics for the 926 preterm samples. (c) Flowchart for Mutect2 pipeline and visual summary of the lcWGS data analysis. The solid lines represent the flow of information while the dashed lines indicate the criteria and settings for the data filtration steps. Manual inspection was required for the identification of pathogenic variants (see Section 2)
Figure 2Comparison of sequencing coverage between the mitochondrial and nuclear genomes. (a) Mitochondrial genome (mtDNA) coverage for the low‐coverage whole‐genome sequencing (lcWGS) samples. The mean mtDNA read depth for the 957 samples that passed the haplogroup identification step was 1389X, with a standard deviation of 797.6X. The range of coverage was from 135.1X to 6169X. (b) Nuclear genome coverage for the lcWGS samples. The mean nuclear genome read depth for the 957 samples that passed the haplogroup identification step was 1.938X, with a standard deviation of 1.200. The range of coverage was from 0.1205X to 12.46X. (c) Comparison of the nuclear genome coverage to the mtDNA coverage across all samples. There was a moderate positive correlation between nuclear genome coverage and mtDNA coverage, with an R 2 value of .5959. Please note that the scale for the y axis (mtDNA coverage) is 500 times that of the x axis (nuclear genome coverage) to aid in the visualization of the line of best fit
Figure 3Distribution of the top 20 haplogroups in the preterm patient cohort by race. Heatmap shows the haplogroup count by race for the 20 most frequently observed haplogroups in the preterm patient cohort. The labels on the left axis represent the various self‐identified ethnic or ancestry groupings, while the labels along the bottom axis represent the 20 most frequently observed mitochondrial genome (mtDNA) haplogroups in the preterm patient cohort. The numbers in black within each square indicate the number of samples where a particular mtDNA haplogroup was detected for a particular racial or ethnic category. Each square is also color‐coded based on the number of samples with that particular combination of mtDNA haplogroup and self‐identified racial category, based on the key on the right side of the heatmap
Patient samples from the 960 sample data set that contain known pathogenic variants
| Sample name | Mutation | Heteroplasmy (%) | Phenotype associated with variant | Igenomix coverage | mtDNA coverage | Haplogroup | Current age (years) | Gestational age (months) | Gender | Patient race | Ethnicity |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Patient_116 | m.1494C>T | 85.1 | DEAF | 1.47553 | 559.1176293 | L1c | 19 | 24 | Male | Black or African‐American | Non‐Hispanic |
| Patient_152 | m.7471dup | 16.4 | PEM/AMDF/motor neuron disease‐like | 1.38653 | 738.1543847 | L2b | 16 | 25 | Male | Black or African‐American | Non‐Hispanic |
| Patient_203 | m.3243A>G | 7.1 | MELAS/LS/DMDF/MIDD/SNHL/CPEO/MM/FSGS/ASD/cardiac + multi‐organ dysfunction | 1.31811 | 809.5704629 | K2b | 16 | 24 | Female | White | Non‐Hispanic |
| Patient_242 | m.3243A>G | 53 | MELAS/LS/DMDF/MIDD/SNHL/CPEO/MM/FSGS/ASD/cardiac + multi‐organ dysfunction | 3.09628 | 1383.373529 | H | 16 | 30 | Female | White | Non‐Hispanic |
| Patient_727 | m.1555A>G | 68.6 | DEAF; autism spectrum intellectual disability; possibly antiatherosclerotic | 1.70454 | 1026.711509 | L3 | 2 | 24 | Female | Unknown | Non‐Hispanic |
| Patient_823 | m.1555A>G | 70.8 | DEAF; autism spectrum intellectual disability; possibly antiatherosclerotic | 2.26769 | 1117.326332 | T2b | 2 | 27 | Female | White | Non‐Hispanic |
| Patient_875 | m.11778G>A | 91.7 | LHON/progressive dystonia | 1.5207 | 1546.552296 | H27 | 3 | 28 | Male | White | Non‐Hispanic |
| Patient_877 | m.3243A>G | 85.8 | MELAS/LS/DMDF/MIDD/SNHL/CPEO/MM/FSGS/ASD/cardiac + multi‐organ dysfunction | 1.96847 | 1757.391514 | L1c | 11 | 29 | Female | Black or African‐American | Non‐Hispanic |
Note: All 957 samples that successfully completed the automated pipeline were manually examined for the presence of known pathogenic variants in their mtDNA. Eight samples were identified as containing known pathogenic variants, as shown in the table.
Abbreviations: AMDF, ataxia, myoclonus, and deafness; ASD, autism spectrum disorder; CPEO, chronic progressive external ophthalmoplegia; DEAF, deafness; DMDF, diabetes mellitus and deafness; FSGS, focal segmental glomerulosclerosis; LHON, Leber hereditary optic neuropathy; LS, Leigh syndrome; MELAS, mitochondrial encephalopathy, lactic acidosis, and stroke‐like episodes; MIDD, maternally inherited diabetes and deafness; MM, mitochondrial myopathy; mtDNA, mitochondrial genome; PEM, progressive encephalomyopathy; SNHL, sensorineural hearing loss.
Figure 4Correlation of estimated heteroplasmy between traditional PCR‐NGS and lcWGS. The heteroplasmy level for the eight pathogenic variants detected in the 960‐sample data set was calculated based on the novel lcWGS pipeline, versus a traditional PCR‐NGS methodology. The two methods showed a high degree of correlation for these variants, with an R 2 value of .9228. lcWGS, low‐coverage whole‐genome sequencing; NGS, next‐generation sequencing; PCR, polymerase chain reaction