| Literature DB >> 30204152 |
Shahid Y Khan1, Firoz Kabir1, Oussama M'Hamdi2, Xiaodong Jiao2, Muhammad Asif Naeem3, Shaheen N Khan3, Sheikh Riazuddin3, J Fielding Hejtmancik2, S Amer Riazuddin1.
Abstract
Here we report next-generation based whole genome sequencing of two individuals (H1 and H2) from a family of Pakistani descent. The genomic DNA was used to prepare paired-end libraries for whole-genome sequencing. Deep sequencing yielded 706.49 and 778.12 million mapped reads corresponding to 70.64 and 77.81 Gb sequence data and 23× and 25× average coverage for H1 and H2, respectively. Notably, a total of 448,544 and 470,683 novel variants, not present in the single nucleotide polymorphism database (dbSNP), were identified in H1 and H2, respectively. Comparative analysis identified 2,415,852 variants common in both genomes including 240,181 variants absent in the dbSNP. Principal component analysis linked the ancestry of both genomes with South Asian populations. In conclusion, we report whole genome sequences of two individuals from a family of Pakistani descent.Entities:
Mesh:
Year: 2018 PMID: 30204152 PMCID: PMC6137601 DOI: 10.1038/sdata.2018.174
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Summary of whole genome sequencing data.
| Sample ID | Read Length (bp) | Total Reads (106) | Mapped Reads (106) | Mapped reads | Sequenced bases (Gb) | Mean Depth (x) |
|---|---|---|---|---|---|---|
| H1 | 2×100 | 760.62 | 706.49 | 93% | 70.64 | 22.73 (~23) |
| H2 | 2×100 | 835.45 | 778.12 | 93% | 77.81 | 25.10 (~25) |
Figure 1Identification and annotation of variants in two individuals (H1 and H2) from a family of Pakistani descent.
(a) Flowchart representation of common variants present in H1 and H2 genomes. (b) Venn diagram illustrating a comparative analysis of the novel variants in H1 and H2 genomes. Note: Red and green represent novel variants present in H1 and H2, respectively whereas their intersection (yellow) represents novel variants common in both genomes. (c) Venn diagram illustrating a comparative analysis of variants identified in H1, H2, and the Caucasian reference genome. Note: Red and green represent variants in H1 and H2 genomes, respectively while the blue circle represents variants in the Caucasian reference genome. The intersections represent variants common in these genomes i.e. yellow: common in H1 and H2; purple: common in H1 and Caucasian reference; light blue: common in H2 and the Caucasian reference; and white: common in H1, H2, and Caucasian reference.
Figure 2Principal component analysis (PCA) examining the ancestral roots of H1 and H2.
(a) PCA plots of H1 and (b) PCA plot of H2. PCA localized both H1 and H2 (arrows pointing to samples shown as red circles in the PCA plots) with South Asian (SAS) populations in principal component 1 and 3 (PC1 and PC3) and between SAS, and with European (EUR) populations in principal component 2 (PC2). Note: The x-axis represents PC1 while the y- and z-axis represents PC2 and PC3, respectively. African: AFR; Ad Mixed American: AMR; and East Asian: EAS.