| Literature DB >> 28839204 |
Maoxuan Lin1, Sarah Whitmire1, Jing Chen1, Alvin Farrel1, Xinghua Shi1, Jun-Tao Guo2.
Abstract
Insertions and deletions (indels) represent the second most common type of genetic variations in human genomes. Indels can be deleterious and contribute to disease susceptibility as recent genome sequencing projects revealed a large number of indels in various cancer types. In this study, we investigated the possible effects of small coding indels on protein structure and function, and the baseline characteristics of indels in 2504 individuals of 26 populations from the 1000 Genomes Project. We found that each population has a distinct pattern in genes with small indels. Frameshift (FS) indels are enriched in olfactory receptor activity while non-frameshift (NFS) indels are enriched in transcription-related proteins. Structural analysis of NFS indels revealed that they predominantly adopt coil or disordered conformations, especially in proteins with transcription-related NFS indels. These results suggest that the annotated coding indels from the 1000 Genomes Project, while contributing to genetic variations and phenotypic diversity, generally do not affect the core protein structures and have no deleterious effect on essential biological processes. In addition, we found that a number of reference genome annotations might need to be updated due to the high prevalence of annotated homozygous indels in the general population.Entities:
Mesh:
Substances:
Year: 2017 PMID: 28839204 PMCID: PMC5570956 DOI: 10.1038/s41598-017-09287-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Indel size distribution.
Figure 2Number of unique indels on each chromosome.
Figure 3Relative positions of NFS (A) and FS (B) indels on proteins.
Figure 4PCA analysis of indel patterns in 26 populations. (A) All indels; (B) Homozygous indels only
Figure 5Gene enrichment analysis of genes with NFS or FS indels. (A) Significantly enriched categories in terms of Biological Process; (B) Significantly enriched categories in terms of Molecular Function.
Figure 6PCA analysis of transcription-related indel patterns in 26 populations. (A) All indels; (B) Homozygous indels only
Figure 7Secondary structure and residue disorder types for NFS indels. (A) Distribution of secondary structure types of all NFS indels; (B) Distribution of secondary structure types of transcription-related NFS indels; (C) Distribution of residue disorder of all NFS indels; (D) Distribution of residue disorder of transcription-related NFS indels. A residue in an indel is considered “disordered” or “ordered” if both IUPred and DisProt agree; otherwise it is annotated as “inconclusive”.
Figure 8Comparisons of structural types between high and low allele frequency NFS indels. (A) Distribution of secondary structure types; (B) Distribution of residue disorder. A residue in an indel is considered “disordered” or “ordered” if both IUPred and DisProt predictions agree. Otherwise it is annotated as “Inconclusive”.