| Literature DB >> 24416366 |
Haluk Dogan1, Handan Can1, Hasan H Otu1.
Abstract
Although whole human genome sequencing can be done with readily available technical and financial resources, the need for detailed analyses of genomes of certain populations still exists. Here we present, for the first time, sequencing and analysis of a Turkish human genome. We have performed 35x coverage using paired-end sequencing, where over 95% of sequencing reads are mapped to the reference genome covering more than 99% of the bases. The assembly of unmapped reads rendered 11,654 contigs, 2,168 of which did not reveal any homology to known sequences, resulting in ∼1 Mbp of unmapped sequence. Single nucleotide polymorphism (SNP) discovery resulted in 3,537,794 SNP calls with 29,184 SNPs identified in coding regions, where 106 were nonsense and 259 were categorized as having a high-impact effect. The homo/hetero zygosity (1,415,123∶2,122,671 or 1∶1.5) and transition/transversion ratios (2,383,204∶1,154,590 or 2.06∶1) were within expected limits. Of the identified SNPs, 480,396 were potentially novel with 2,925 in coding regions, including 48 nonsense and 95 high-impact SNPs. Functional analysis of novel high-impact SNPs revealed various interaction networks, notably involving hereditary and neurological disorders or diseases. Assembly results indicated 713,640 indels (1∶1.09 insertion/deletion ratio), ranging from -52 bp to 34 bp in length and causing about 180 codon insertion/deletions and 246 frame shifts. Using paired-end- and read-depth-based methods, we discovered 9,109 structural variants and compared our variant findings with other populations. Our results suggest that whole genome sequencing is a valuable tool for understanding variations in the human genome across different populations. Detailed analyses of genomes of diverse origins greatly benefits research in genetics and medicine and should be conducted on a larger scale.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24416366 PMCID: PMC3887021 DOI: 10.1371/journal.pone.0085233
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Position of the Turkish population and the sequenced individual based on Principal Components Analysis of genotyping data.
(a) PCA of the genotyping data from the populations in the HapMap project and GWAS targeting the Turkish population (TUR); (b) PCA of the genotyping data from the GWAS targeting the Turkish population and the individual used for sequencing in this project (IND).
Read Sequencing and Analysis Statistics.
|
| |||||
| No. of Reads (Raw) | Total Base Pairs (Raw) | No. of Reads (Trimmed and Filtered) | Total Base Pairs (Trimmed and Filtered) | ||
| ∼1.24×109 | ∼125×109 | ∼1.18×109 | ∼117×109 | ||
|
| |||||
| No. of Mapped High Quality Reads | Total Base Pairs Mapped | No. of Unmapped High Quality Reads | Total Base Pairs Unmapped | ||
| ∼1.13×109 | ∼112×109 | ∼50×106 | ∼5×109 | ||
|
| |||||
| No. of Contigs | Total Length of the Assembly (bp) | Min.–Max.–Mean Contig Length (bp) | N50 (bp) | ||
| 11,654 | 9,987,256 | 100–43,190–856 | 1,378 | ||
|
| |||||
| Contigs Without a Hit | Total Length of Unhit Contigs (bp) | Min.–Max.–Mean Unhit Contig Length | N50 of Unhit Contigs (bp) | ||
| 2,168 (19%) | 927,213 | 100–9,345–427 | 469 | ||
| Contigs With a Hit | Reference Genome | Alternate Assemblies | Other Human Sequences | Non-human primates | Other |
| 9,486 (81%) | 983 (8.5%) | 7,814 (67.0%) | 376 (3.2%) | 218 (1.9%) | 95 (0.8%) |
Figure 2Distribution of the 3,537,794 identified SNPs based on their genomic location.
Annotation for the 23 genes that were affected by a novel nonsense SNP.
| Symbol | Descriptions | Chr | Disorder/Disease | Function | Pathway |
| ABCA9 | ATP-binding cassette A9 | 17 | Pseudoxanthoma elasticum | Monocyte differentiation; Lipid homeostasis | ABC transporters |
| ADCK3 | aarF domain containing kinase 3 | 1 | Spinocerebellar ataxia | Protein serine/threonine kinase activity | |
| ANKRD35 | Ankyrin repeat domain-containing protein 35 | 1 | Protein binding | ||
| CAD | CAD trifunctional protein | 2 | Fibrosarcoma | Aspartate carbamoyltransferase activity | Pyrimidine metabolism; Transcription/Ligand-dependent activation of the ESR1/SP pathway |
| CDC27 | cell division cycle 27 | 17 | Cell cycle checkpoint | Cell cycle_Regulation of G1/S transition | |
| DPRX | Divergent-paired related homeobox | 19 | Sequence-specific DNA binding TF activity | ||
| FRG2C | FSHD region gene 2 family, member C | 3 | |||
| GIMAP6 | GTPase, IMAP family member 6 | 7 | GTP binding | ||
| HSPBAP1 | 27 kDa heat shock protein-associated protein 1 | 3 | Intractable epilepsy; Renal carcinoma | Cellular stress response | |
| HTR2C | 5-hydroxytryptamine receptor 1C | X | Schizophrenia; Migraine; Prader-Willi syndrome; Attention deficit hyperactivity disease | Phosphatidylinositol phospholipase C activity | Calcium signaling pathway; Neuroactive ligand-receptor interaction |
| KBTBD3 | BTB and kelch domain-containing protein 3 | 11 | Protein binding | ||
| KRTAP2-2 | Keratin-associated protein 2.2 | 17 | Keratin filament | ||
| MLL3 | Myeloid/lymphoid leukemia3 | 7 | Leukemia | Methyltransferase activity | Lysine degradation |
| MYT1 | Myelin transcription factor I | 20 | Dysembryoplastic neuroepithelial tumor; Periventricular leukomalacia | Oligodendrocyte lineage development | |
| PCNT | Pericentrin | 21 | Seckel syndrome; Microcephaly | M transition of mitotic cell cycle | Centrosome maturation |
| PPP2R2B | Protein phosphatase 2, regulatory subunit B | 5 | Spinocerebellar ataxia | Apoptotic process | mRNA surveillance pathway; Tight junction; Reg'n. of CFTR activity |
| PROSER1 | Proline and serine rich 1 | 13 | |||
| TBCK | TBC1 domain containing kinase | 4 | |||
| TCP10L2 | T-complex 10 like prtn. 2 | 6 | Spina bifida | Cytosol | |
| TECTA | Tectorin alpha | 11 | Nonsyndromic deafness; Scotoma; Sensorineural hearing loss | Cell-matrix adhesion | |
| TFAP2B | Transcription factor AP-2 beta | 6 | Patent ductus arteriosus; Skeletal muscle neoplasm | Cellular ammonia/urea/creatinine homeostasis | |
| XIAP | X-linked inhibitor of apoptosis protein | X | Leukemia; Lymphoma | Caspases, apoptosis regulation; inflammation | Ubiquitin mediated proteolysis; SMAC-mediated apoptosis |
| ZNF778 | Zinc finger protein 778 | 16 | KBG syndrome; Learning disability | Zinc ion binding |
Figure 3Ingenuity Network analysis of 45 genes affected by a high impact novel SNP.
Genes indicated by red are affected by a nonsense SNP and genes indicated by green are affected by an SNP targeting a splice site donor/acceptor region. Drug targets, hereditary and neurological disorders/diseases are indicated where applicable.