T-box transcription factor 5 gene (TBX5) encodes the transcription factor TBX5, which plays a crucial role in the development of heart and upper limbs. Damaging single nucleotide variants in this gene alter the protein structure, disturb the functions of TBX5, and ultimately cause Holt-Oram Syndrome (HOS). By analyzing the available single nucleotide polymorphism information in the dbSNP database, this study was designed to identify the most deleterious TBX5 SNPs through in silico approaches and predict their structural and functional consequences. Fifty-eight missense substitutions were found damaging by sequence homology-based tools: SIFT and PROVEAN, and structure homology-based tool PolyPhen-2. Various disease association meta-predictors further scrutinized these SNPs. Additionally, conservation profile of the amino acid residues, their surface accessibility, stability, and structural integrity of the native protein upon mutations were assessed. From these analyses, finally 5 SNPs were detected as the most damaging ones: [rs1565941579 (P85S), rs1269970792 (W121R), rs772248871 (V153D), rs769113870 (E208D), and rs1318021626 (I222N)]. Analyses of stop-lost, nonsense, UTR, and splice site SNPs were also conducted. Through integrative bioinformatics analyses, this study has identified the SNPs that are deleterious to the TBX5 protein structure and have the potential to cause HOS. Further wet-lab experiments can validate these findings.
T-box transcription factor 5 gene (TBX5) encodes the transcription factor TBX5, which plays a crucial role in the development of heart and upper limbs. Damaging single nucleotide variants in this gene alter the protein structure, disturb the functions of TBX5, and ultimately cause Holt-Oram Syndrome (HOS). By analyzing the available single nucleotide polymorphism information in the dbSNP database, this study was designed to identify the most deleterious TBX5 SNPs through in silico approaches and predict their structural and functional consequences. Fifty-eight missense substitutions were found damaging by sequence homology-based tools: SIFT and PROVEAN, and structure homology-based tool PolyPhen-2. Various disease association meta-predictors further scrutinized these SNPs. Additionally, conservation profile of the amino acid residues, their surface accessibility, stability, and structural integrity of the native protein upon mutations were assessed. From these analyses, finally 5 SNPs were detected as the most damaging ones: [rs1565941579 (P85S), rs1269970792 (W121R), rs772248871 (V153D), rs769113870 (E208D), and rs1318021626 (I222N)]. Analyses of stop-lost, nonsense, UTR, and splice site SNPs were also conducted. Through integrative bioinformatics analyses, this study has identified the SNPs that are deleterious to the TBX5 protein structure and have the potential to cause HOS. Further wet-lab experiments can validate these findings.
Heart is the first organ to develop in the human body and numerous genes and their products orchestrate this process. Among these genes, TBX5 is an important one. It is a member of the T-box gene family that encodes transcription factors with a highly conserved DNA binding domain (T-box) of 180–200 amino acids [1]. Members of this gene family have been found from metazoans to humankind, and in humans, more than 20 members of this gene family have been reported till now [2,3]. Moreover, several human diseases are associated with mutations in the T-box genes, namely, DiGeorge Syndrome (TBX1), Ulnar-Mammary Syndrome (TBX3), Small Patella Syndrome (TBX4), Holt-Oram Syndrome (TBX5), Spondylocostal Dysostosis (TBX6), Cousin Syndrome (TBX15), Isolated ACTH Deficiency (TBX19), Congenital Heart Defects (TBX20), X-Linked Cleft Palate with or without Ankyloglossia (TBX22) [4].According to the recent information retrieved from NCBI and UniProt databases, the TBX5 gene is located in the long arm of chromosome 12 at position 24.21 (12q24.21), contains ten exons, and has three known transcript variants (1, 3, and 4). Transcript variants 1 and 4 encode a protein of 518 amino acid residues and transcript variant 3 encodes a 468 (51–518) amino acid containing protein with a shortened N-terminus. In the TBX5 transcription factor, the T-box spans from amino acid position 58 to 238. The binding site of TBX5 has been identified to be present in the upstream regions of various cardiac-expressed genes, such as cardiac α actin, atrial natriuretic factor, cardiac myosin heavy chain α, cardiac myosin heavy chain β, myosin light chain 1A, myosin light chain 1V and Nkx2-5. The consensus binding site of TBX5 corresponds to the first eight bases from one-half of the Brachyury binding 22 palindromic consensus sequence. It was shown in vitro that TBX5, when in total length, binds to the target site mainly as a dimer, and when truncated, tends to bind mainly as a monomer [5]. It was further showed in vitro that TBX5 can form a heterodimer complex with Nkx2-5, another essential transcription factor for cardiac development [6]. During the early stages of heart development, TBX5 acts mainly as a transcriptional activator of genes responsible for cardiomyocyte maturation and cardiac septa formation. It is also necessary for effective establishment of the cardiac conduction system [7]. TBX5 also plays a vital role in the very initiation of upper (fore) limbs [8], but the necessity of its presence during limb outgrowth phase remains controversial [8,9].Holt-Oram Syndrome (HOS) is an autosomal dominant disease due to mutation in the TBX5 gene. It is considered 100% penetrant with varying expressivity in the heart and upper extremity [10]. However, incomplete penetrance [11], lack of penetrance [12], somatic mosaicism [12], and probable germinal mosaicism [13] have also been reported for this disease. The majority of the patients suffer from this syndrome due to de novo mutations [14]. Cardiac presentations include atrial septal defect, ventricular septal defect, atrioventricular septal defect, pulmonary atresia/stenosis, double outlet right ventricle, aortic valve insufficiency/stenosis, tricuspid valve atresia, mitral valve abnormality, patent ductus arteriosus, tetralogy of pentalogy of Fallot, common arterial truncus, dextrocardia, right aortic arch, and other non-specified congenital heart diseases [15]. Upper limb anomalies may include a hypoplastic or absent thumb, triphalangeal thumb, accessory/bifid thumb, aplasia/hypoplasia of hand and fingers, syndactyly, radial with or without ulnar aplasia or hypoplasia, radio-ulnar synostosis, hypoplasia of humerus, aplasia of humerus, and clavicular abnormalities [15].Nonsense mutation [16], missense mutation [17], frameshift mutation [16,17], deletion [18], duplication [19], and splice site mutation [[20], [21], [22]] of the TBX5 gene have been reported to be responsible for HOS. It has been found that missense mutations tend to cause more serious cardiac anomalies than mutations that cause a truncated TBX5 protein (nonsense, frameshift, splice site point variants, intragenic deletions, or duplications) [12]. HOS is caused by heterozygous mutation. To the best of our knowledge, only one homozygous missense mutation in the TBX5 gene has been reported until present but only with cardiac involvement (atrioventricular septal defect) and no skeletal deformity [23].TBX5 missense mutations are also associated with familial [24] and sporadic dilated cardiomyopathy [25], atrial fibrillation, and lone atrial fibrillation [26]. Furthermore, TBX5 non-synonymous (missense and nonsense) mutations have also been reported in association with tetralogy of Fallot without any skeletal abnormality [27] and a TBX5 3′UTR variant makes its carriers more prone to congenital heart diseases [28]. Considering the above-stated findings, this study was conducted to find out the most deleterious TBX5 SNPs and analyze their structural and functional outcomes in silico.
Materials and methods
The overall methodology adopted in this study is graphically presented in Fig. 1.
Fig. 1
An overview of the steps followed to identify the deleterious TBX5 SNPs and analyze their structural and functional consequences.
An overview of the steps followed to identify the deleterious TBX5 SNPs and analyze their structural and functional consequences.
Retrieval of amino acid sequence and wild structure
Amino acid sequence of the human TBX5 protein was downloaded in FASTA format from the UniProt database (UniProt ID: Q99593) (https://www.uniprot.org/uniprot/Q99593). Experimental structures of TBX5 were collected from Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB) (https://www.rcsb.org/).
Retrieval of SNPs
SNP information of the human TBX5 gene was collected using NCBI Variation Viewer (https://www.ncbi.nlm.nih.gov/variation/view/). Nonsense, stop lost, splice acceptor, splice donor, and UTR SNPs of the TBX5 gene along with their rs (reference SNP) IDs were retrieved from Variation Viewer applying the following filters: dbSNP under ‘Source database’ menu, single nucleotide variant under ‘Variant type’ menu, and nonsense (stop gained)/stop lost/splice acceptor variant/splice donor variant/5 prime UTR variant/3 prime UTR variant under ‘Molecular consequence’ menu. In addition, information about the missense SNPs was downloaded directly from the NCBI dbSNP database (https://www.ncbi.nlm.nih.gov/snp/).
Identification and analysis of damaging missense SNPs
A total of 355 missense SNPs were retrieved from dbSNP.
Prediction of damaging missense SNPs by sequence-homology based tools
The missense SNPs were first subjected to analysis by sequence-homology-based predictor PROVEAN [29]. PROVEAN predicts the effects of single or multiple amino acid substitutions or indel mutations on the functions of a protein. In PROVEAN, BLAST hits with more than 75% global sequence identity are clustered first. The top 30 clusters constitute the supporting sequence set, and a delta alignment score is computed for every supporting sequence. These scores are averaged within and across clusters to obtain the final PROVEAN score. An amino acid substitution is predicted ‘Deleterious’ if the PROVEAN score is below or equal to the predefined threshold value of −2.5, and is considered ‘Neutral’ if the PROVEAN score is above this threshold value [29]. In this study, a list of TBX5 gene SNPs defined by their chromosomal positions, reference alleles, and variant alleles was used as input to the PROVEAN server.PROVEAN also provides scores from SIFT, another prediction tool. SIFT (Sorting Intolerant From Tolerant) presumes that essential amino acids are well conserved in a protein family, and a replacement in a conserved position is predicted as ‘Damaging’ [30]. SIFT scores lie between 0 and 1, and amino acid substitutions with a score ≤0.05 are predicted to alter protein function, and hence are considered ‘Damaging’ [31].85 SNPs of the TBX5 gene were predicted as 'Deleterious' by PROVEAN, and 236 SNPs were predicted as ‘Damaging’ by SIFT. 80 SNPs were predicted as harmful by both tools.
Prediction of damaging missense SNPs by structure-homology based tool
PolyPhen-2 (Polymorphism Phenotyping version 2) prediction is made on eight sequence-based and three structure-based parameters. Two pairs of datasets, viz. HumDiv and HumVar were recruited to train and test two PolyPhen-2 models. PolyPhen-2 computes the Naive Bayes posterior probability of a given mutation's being damaging and estimates the false-positive rate and true-positive rate. It classifies SNPs as ‘Probably Damaging,’ ‘Possibly Damaging,’ and ‘Benign’ based on the false-positive rate (FPR) thresholds [32]. The 80 SNPs that were predicted harmful by both PROVEAN and SIFT were next analyzed by both HumDiv and HumVar trained PolyPhen-2 using the ‘batch query’ option. A list of SNPs defined by chromosome number, chromosomal position, reference allele, and variant allele was used as the input. 58 SNPs were predicted as ‘Probably Damaging’ by either HumDiv or HumVar PolyPhen-2, and these SNPs were considered of high fidelity in disease association.
Prediction of pathogenic amino acid substitutions by MutPred2
MutPred2 was developed by integrating genetic and molecular information to predict pathogenicity caused by amino acid substitutions [33]. The cutoff value for predicted pathogenicity is 0.5, and the higher the score, the greater the probability of that amino acid substitution's disease association. TBX5 protein sequence in FASTA format and wild type and substituted amino acids of 58 high-confidence deleterious SNPs with their respective positions were submitted to MutPred2 server using the default P-value threshold (0.05).
Scrutinization of predicted pathogenicity by meta-predictors
Several meta-predictors have been developed for predicting disease-causing SNPs in recent years. Such meta-predictors employ various single pathogenicity prediction tools and provide a consensus score derived from different calculations. It is well established that ensemble methods render improved predictive performance than any of its constituent algorithms alone. We decided to crosscheck the predicted 58 pathogenic SNPs with 5 such meta-predictors, namely i) PredictSNP2, ii) PredictSNP1, iii) MetaLR, iv) MetaSVM, and v) REVEL.PredictSNP2 generates a consensus score from the predictions of five tools- CADD, DANN, FATHMM, FunSeq2, and GWAVA [34]. The input to PredictSNP2 was similar to that used for PolyPhen-2. After analysis by PredictSNP2, the SNPs were sent for further analysis by PredicteSNP1. PredictSNP1 has incorporated six individual predictors- MAPP, PhD-SNP, PolyPhen-1, PolyPhen-2, SIFT, and SNAP [35]. MetaSVM and MetaLR are two meta-classifiers developed for the dbNSFP (database for non-synonymous SNPs' functional predictions) project. They calculate ensemble scores based on the predictions of 10 component tools and the maximum frequency observed in the 1000 genomes project populations [36]. The difference between them lies in their adopted approaches- one uses Support Vector Machine (MetaSVM) algorithm and the other uses Logistic Regression (MetaLR) algorithm. They were both accessed through Ensembl Variant Effect Predictor (VEP) (http://www.ensembl.org/Tools/VEP). rsID (Reference SNP ID number)s of the SNPs were used as the inputs and the chosen reference genome was GRCh37. REVEL (Rare Exome Variant Ensemble Learner) is a meta-predictor that integrates 13 tools to predict the probability of disease causation by a missense SNP. It is particularly efficient in differentiating pathogenic from rare neutral variants with allele frequencies <0.5% [37]. REVEL scores range between 0 and 1, and the threshold score for pathogenic variants is 0.5. REVEL scores in this study were obtained alongside MetaSVM and MetaLR scores using the Ensembl VEP.
Identification of conserved amino acid residues
Structural and functional importance of an amino acid residue in a protein is often strongly related to its level of conservation. ConSurf ranks evolutionary conservation status of amino acid and nucleic acid positions in protein and DNA/RNA molecules, respectively [38]. The continuous conservation scores of residues are split into an integer scale of 1–9 and depicted with a color scheme. The most variable positions (grade 1) are colored turquoise, intermediate conserved positions (grade 5) are colored white, and the most conserved positions (grade 9) are colored maroon. TBX5 protein sequence in FASTA format served as the input to ConSurf.
Solvent accessible surface area (SASA) and secondary structure prediction
Knowing the solvent accessibility of amino acid residues is essential to identify the interaction interfaces and active sites in a fully folded protein. Amino acid substitutions in such sites may bring change in binding affinity [39], or disturb catalytic activity if the protein is an enzyme [40]. NetSurfP-2.0 predicted surface accessibility of TBX5 residues and its secondary structure. It accepts protein sequence in FASTA format as the input and recruits deep neural networks that have been trained on crystal protein structures [41].
Prediction of change in stability of the mutated proteins by iStable 2.0
Amino acid substitutions caused by missense SNPs can significantly alter the folding free energy. Missense SNPs decreasing the stability of a native protein can affect the functions of that protein and ultimately result in diseases [42]. Missense SNPs causing destabilized proteins are associated with myopathy, von Willebrand disease, retinitis pigmentosa, hemophagocytic lymphohistiocytosis, and prion diseases. However, stabilizing missense SNPs can also cause disease [42]. In our study, changes in TBX5 protein stability due to missense SNPs were predicted by iStable 2.0 [43]. iStable 2.0 results are derived from 11 protein stability prediction tools [43]. For amino acid substitutions up to F232V, chain A of the TBX5 dimer crystal structure containing 1–239 residues (PDB ID: 5BQD) was used as the native structure. Due to the unavailability of an experimental structure, rest of the substitutions were analyzed using the protein's FASTA sequence.
Prediction of the effects of amino acid substitutions on the structural integrity of TBX5 protein by Missense3D and normal mode analysis
To analyze the impacts of amino acid substitutions on TBX5 three-dimensional structure, Missense3D server (Missense3D | Structural Bioinformatics Group | Imperial College London) was utilized. Missense3D considers 16 structural features to differentiate disease-associated SNPs from the neutral ones [45]. For substitutions up to F232V, ‘Position on Protein Sequence’ input option was chosen, and the UniProt ID (Q99593) and PDB code & chain ID (5BQD; A) along with the substitutions were used as inputs. For the rest of the substitutions, the energy minimized model generated by Robetta was used [see 2.8]. The mutated protein structures due to SNPs were later subjected to normal mode analysis by DynaMut [44]. DynaMut calculates probable consequences of an amino acid replacement on the stability of a submitted protein from vibrational entropy changes. Normal mode analysis is a widely applied tool for protein structure study [44]. The input was similar to Missense3D.
Molecular dynamics (MD) simulation
MD simulation studies of the five mutant structures distilled from the consensus of Missense3D, DynaMut, and iStable 2.0 [see 3.2.7], and the native TBX5 protein structure (PDB ID:5BQD, chain A) were conducted by employing GROMOS96 43a1 force field [46]. To mimic the physiological environment, a triclinic simulation box (36 × 36 × 44 Å) and 0.9% (0.15 M) NaCl concentration was chosen. Depending on total positive and negative charges, the system was made neutral by adding sodium or chlorine ions. Chosen solvation model was SPC. The system was energy minimized before starting simulation by applying Steepest Descent Algorithm for 5000 steps. MD simulation was continued for 100 ns under 300K constant temperature and 1.0 bar constant pressure for each protein structure. The trajectories were saved at 100 ps intervals.
Prediction of post-translational modification site missense SNPs
Protein post-translational modification (PTM)s occur at the time of or after protein synthesis and such modifications are mediated by different enzymes or covalent bonds with other structures. SNPs in the PTM sites can result in diseases [47]. SNPs in the putative PTM sites of the TBX5 protein were detected by AWESOME [47]. AWESOME predicts the PTM site SNPs based on the evaluation of 20 PTM site predictors. Gene name was used as the input to AWESOME.
Analysis of nonsense and stop lost SNPs
31 nonsense (stop gained) and one stop lost SNPs were retrieved from NCBI Variation Viewer. Among the 31 nonsense SNPs, 29 are already annotated as ‘Pathogenic’ in the ClinVar database (https://www.ncbi.nlm.nih.gov/clinvar/). Rest 2 nonsense SNPs, rs1393693495 (W514Ter) and rs1555226308 (Q156Ter), and 1 stop lost SNP, rs1394777873 (Ter519Q) were analyzed by PredictSNP2 [35].
Prediction of the effects of UTR SNPs
Untranslated region (UTR)s play a pivotal role in the regulation of gene expression post-transcriptionally. They ascertain nucleo-cytoplasmic transport of mRNA, maintain translational efficiency, subcellular localization, and RNA stability [48]. The essential regulators in the 5′-UTR are Kozak consensus sequence (ACCAUGG) [Shine-Dalgarno consensus sequence (AGGAGG) in prokaryotes], CpG (5′—C—phosphate—G—3′) sites, uORFs (upstream open reading frames), IRESs (Internal Ribosome Entry Sites), and RBP (Ribosome-binding protein) binding sites. 3′-UTR length, RBP binding sites, and miRNA binding sites are the critical players in the 3′-UTR [49]. Single nucleotide variants in these regions have been found to be associated with many diseases like thalassemia intermedia, atrial septal defect [50], campomelic dysplasia [51], hereditary chronic pancreatitis [52], Marie Unna hereditary hypotrichosis [53], Charcot-Marie-Tooth disease [54], and cerebral amyloid angiopathy [55] to name a few. So, it is essential to identify potentially harmful SNPs in the 3′ & 5′-UTR. TBX5 UTR SNPs that were predicted to affect transcriptional motifs were retrieved from the UTR database (UTRdb (cnr.it)) [56]. ‘UTRef’ (NCBI RefSeq transcripts) as the searching database and ‘Gene Name’ as the accession type were chosen during retrieval.MicroRNA (miRNA)s also have critical roles in post-transcriptional regulation of gene expression through mRNA cleavage or translational repression [57]. miRNAs are small non-coding RNAs having about 22 nucleotides. miRNA seed region (positions 2–8 on miRNA) is the most crucial sequence in finding the complementary sequence on mRNA and subsequently binding with it. SNPs in the miRNA target sites in the 3′ UTR may create or delete an miRNA target or change the binding efficiency between miRNA and mRNA. Creation or deletion of an miRNA target site may also affect binding in the neighboring miRNA target sequences. Such SNPs have been reported in several diseases, including cancers, psychiatric illnesses, cardiomyopathy, asthma, and Parkinson's disease [58,59]. In our study, SNPs in the predicted miRNA target sites were identified using the MirSNP server [59]. Gene name was used as the input to MirSNP.
Prediction of the effects of splice site SNPs
RNA splicing is necessary for converting a precursor mRNA to a mature mRNA ready for translation. Various cis-regulatory elements and trans-acting elements maintain this splicing process. Cis-regulatory elements include 5′ and 3′ consensus splice site sequences in the exon-intron boundaries, branch point sequences, and polypyrimidine tract sequences. Trans-acting elements include spliceosomal small nuclear RNAs, proteins, and various other splicing repressors and activators, many of which are still unknown [60,61]. Mutations that hamper the controlled process of splicing lead to a good number of diseases. Around 9% of all the pathogenic mutations recorded in the Human Gene Mutation Database are splicing ones [60,61]. SNPs in the splice donor and acceptor sites may break an established splice site or create a new splice site. They may also activate cryptic sequences (sequences similar to splice sites) or affect splicing enhancers and silencers. They may even intercept the binding of spliceosomal components by making a conformational change in the mRNA secondary structure [60]. In this study, seven splice donor single nucleotide variants and seven splice acceptor single nucleotide variants were retrieved from dbSNP using NCBI Variation Viewer. Among the seven donor variants, two are annotated as ‘Likely-pathogenic’. Among the seven acceptor variants, one is annotated as ‘Pathogenic’ and three are annotated as ‘Likely-pathogenic’ in the ClinVar database. Excluding them, the rest 8 SNPs were analyzed employing Human Splicing Finder (HSF) [62]. For our analysis, ‘Analyze a sequence’ and ‘Paste your own sequence’ options were chosen and individual variants with a flanking sequence of 100 nucleotides in both the 5′ and 3′ directions were used as inputs.
Prediction of the effects of deep functional intronic SNPs
Intronic regions located >100 base pairs away from the exon-intron borders are known as deep intronic regions [63]. Although they have been overlooked for a long time, they are now considered important, and mutations in these regions have been found in association with more than 75 diseases. Deep intronic point mutations, deletions, and insertions are held accountable for these diseases, with the first one being the most frequent [63].The rs (reference SNP) IDs of all the TBX5 intron variants were submitted to SNP Function Prediction (FuncPred; SNPinfo Web Server (nih.gov)) using default settings [64]. 19 SNPs were predicted by FuncPred to affect TF binding sites and were next analyzed by RegulomeDB 2.0 server. RegulomeDB 2.0 is an annotated database of SNPs in the human intergenic regions that have either established or predicted regulatory roles [65].
3D structure prediction of full TBX5 protein
Currently, there are four experimental structures of TBX5 (2X6U, 2X6V, 4SOH, and 5BQD) deposited in the Protein Data Bank (PDB). Nevertheless, all of them cover less than half of the total protein. So, we decided to predict the whole structure of TBX5 protein by three reliable protein structure prediction tools, namely I-TASSER [66], Phyre2 [67], and Robetta [68]. Energy minimization of the generated TBX5 models was achieved by applying YASARA force field. YASARA force field is a combination of AMBER all-atom force field and multi-dimensional knowledge-based torsional potentials [69]. The energy minimized models were further refined by GalaxyRefine server (http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=REFINE).
Evaluation of TBX5 3D models
The quality of the three predicted TBX5 models was evaluated by PROCHECK [70], Verify3D [71], and SWISS-MODEL Structure Assessment Tool [72]. PROCHECK examines specific stereochemical and geometrical properties of the query protein structure and compares them against a set of ideal values of these properties obtained from well-refined high-resolution protein structures. It assesses residue-by-residue geometry and overall structural geometry of the query structure [70]. A Ramachandran plot of φ-ψ torsion angles generated by PROCHECK is the most helpful quality indicator and a good quality model is expected to have over 90% of its residues in the most favored regions. Verify3D checks the agreement between a protein tertiary structure and its primary structure [71]. SWISS-MODEL Structure Assessment Tool provides assessments from MolProbity and QMEAN algorithms. These algorithms evaluate a model quality at both global (i.e. for the entire structure) and local (i.e. per residue) levels [72,73]. QMEAN Z-score is an indicator of the “degree of nativeness” of a model [74].
Results
SNP information retrieval
Searching the NCBI Variation Viewer using dbSNP as the ‘source database’ and single nucleotide variant (SNV) as the ‘variant type’ returned 14,413 rsIDs associated with the TBX5 gene as single nucleotide variants. Three hundred fifty-five rsIDs were found to be missense variants, 30 rsIDs to be nonsense (stop gained) variants, one rsID to be ‘stop lost’ variant, 210 rsIDs to be synonymous variants, 11,300 rsIDs to be intron variants, 821 rsIDs to be non-coding transcript variants, seven rsIDs to be splice acceptor and seven rsIDs to be splice donor variants, 689 rsIDs to be five prime UTR variants, and 345 rsIDs to be three prime UTR variants. Fig. 2 is a pie chart showing different types of SNPs. On further inspection, 404 missense variants with 355 rsIDs (3 nonsense variants are included in rs377649723, rs765204502, and rs903933027, and 11 synonymous variants are included in rs186780790, rs377532269, rs533581420, rs374600913, rs753688559, rs762204624, rs771883815, rs777853147, rs972633602, rs973631007, and rs769107196) and 28 nonsense variants with 28 rs IDs (27 stop gained and 1 stop lost-rs1394777873) were found [total nonsense variants= 28+3= 31]. The reason behind a single rsID having more than one variant is these variants were reported on the same chromosomal location. As per the NCBI rule, only one rsID is assigned to a specific chromosomal location [75].
Fig. 2
Pie-chart displaying the distribution of TBX5 SNPs.
Pie-chart displaying the distribution of TBX5 SNPs.
Prediction of missense SNPs by sequence-homology based tools
The retrieved 404 missense SNPs were analyzed first using PROVEAN. Out of 404 submitted substitutions, PROVEAN predicted 85 as 'Deleterious' and 301 as ‘Neutral’, and SIFT predicted 236 as ‘Damaging’ and 150 as ‘Tolerated’. 242 substitutions were predicted as 'Deleterious' or ‘Damaging’ by either PROVEAN or SIFT, respectively, and 80 substitutions were predicted as 'Deleterious' or ‘Damaging’ by both tools. Results of rs571328934, rs755664220, rs756902625, rs763178387, rs769107196, rs769895888, rs777370158, rs907398622, rs945043550, rs1263504354, rs1407437475, and rs1489969947 could not be predicted by PROVEAN.
Prediction of missense SNPs by structure-homology based tool
For high fidelity, the 80 SNPs predicted by both PROVEAN and SIFT as 'Deleterious'/‘Damaging’ were next analyzed by PolyPhen-2. Both HumDiv and HumVar dataset trained PolyPhen-2 models were used. The HumDiv trained PolyPhen-2 predicted 58 substitutions as ‘probably damaging’ (Supplementary Tables 1) and 16 substitutions as ‘possibly damaging’, and 6 substitutions as ‘benign’. The HumVar trained model predicted 56 substitutions as ‘probably damaging’ (Supplementary Tables 1) and 10 substitutions as ‘possibly damaging’, and 14 substitutions as ‘benign’. The 58 SNPs that were predicted as ‘probably damaging’ by HumDiv were also predicted as ‘probably damaging by HumVar except for rs190881877 (V205F) and rs944423586 (P337H), which were predicted as ‘possibly damaging’. rs1411518530 is associated with two nucleotide variants (G > C, T), but they encode the same amino acid (D110E) and were considered a single variant in the subsequent evaluations. Predictions by PROVEAN, SIFT, and PolyPhen-2 are available in Supplementary Table 1.The threshold score of pathogenicity, as predicted by MutPred2, is 0.5 and a substitution having a MutPred2 score ≥0.8 can be considered highly likely to be pathogenic. All but two (K226R and P337H) of the 58 high-confidence deleterious SNPs have a MutPred2 prediction score ≥0.5. Supplementary Table 2 provides MutPred2 outcomes.Pathogenicity of the aforementioned substitutions was further examined by five deleteriousness meta-predictors whose predictions are based on various ensemble methods. Forty-eight substitutions were predicted ‘Damaging’ by all the meta-predictors and the rest were predicted ‘Damaging’ by at least two meta-predictors. The results are shown in Table 1.
Table 1
Amino acid substitution and disease causation predictions by five meta-predictors.
AAS
PredictSNP 1
PredictSNP 2
MetaLR
MetaSVM
REVEL
Prediction
Expected Accuracy
Prediction
Expected Accuracy
Prediction
Score
Prediction
Score
Prediction
Score
L58P
D†
0.87
D
0.87
D
0.93
D
1.08
D
0.98
T72K
D
0.87
D
0.87
D
0.88
D
1.02
D
0.94
T72 M
D
0.87
D
0.87
D
0.83
D
0.86
D
0.95
G80R
D
0.87
D
0.87
D
0.94
D
1.10
D
0.98
F84L
D
0.72
D
0.87
D
0.91
D
1.06
D
0.97
P85L
D
0.87
D
0.87
D
0.96
D
1.09
D
0.96
P85S
D
0.87
D
0.87
D
0.96
D
1.10
D
0.97
V91A
D
0.65
D
0.87
D
0.78
D
0.62
D
0.73
G93S
D
0.87
D
0.87
D
0.96
D
1.10
D
0.97
G93D
D
0.87
D
0.87
D
0.96
D
1.10
D
0.94
G93V
D
0.61
D
0.87
D
0.96
D
1.10
D
0.94
P96S
N§
0.63
D
0.87
D
0.86
D
0.87
D
0.83
Y100C
D
0.87
D
0.87
D
0.96
D
1.10
D
0.98
Y100F
D
0.76
D
0.82
D
0.93
D
1.08
D
0.93
M104T
D
0.65
D
0.87
D
0.86
D
0.93
D
0.93
D105 N
D
0.87
D
0.87
D
0.89
D
1.01
D
0.87
V107E
D
0.87
D
0.82
D
0.76
D
0.67
D
0.92
P108T
D
0.72
D
0.87
D
0.86
D
0.92
D
0.88
D110E
D
0.87
N
0.65
D
0.83
D
0.76
D
0.72
D110 N
D
0.87
D
0.87
D
0.86
D
0.91
D
0.77
D111Y
D
0.76
D
0.87
D
0.82
D
0.79
D
0.87
R113I
D
0.87
D
0.82
D
0.89
D
1.00
D
0.95
R113K
D
0.61
D
0.87
D
0.83
D
0.84
D
0.92
Y114H
D
0.87
D
0.87
D
0.85
D
0.89
D
0.97
W121R
D
0.87
D
0.87
D
0.97
D
1.08
D
0.98
G125R
D
0.87
D
0.87
D
0.89
D
1.00
D
0.90
G133C
N
0.63
D
0.87
D
0.76
D
0.60
D
0.68
L135R
D
0.87
D
0.87
D
0.78
D
0.72
D
0.87
V137A
D
0.51
D
0.87
D
0.79
D
0.74
D
0.89
D140E
N
0.65
D
0.87
D
0.79
D
0.65
D
0.74
P142S
D
0.87
D
0.87
D
0.84
D
0.90
D
0.76
A143T
D
0.55
D
0.87
D
0.81
D
0.73
D
0.59
G145W
D
0.87
D
0.87
D
0.98
D
1.04
D
0.89
V153D
D
0.87
D
0.82
D
0.89
D
1.00
D
0.95
L160F
D
0.87
D
0.87
D
0.89
D
1.02
D
0.88
H170L
D
0.72
N
0.63
D
0.78
D
0.75
D
0.96
N174D
D
0.87
D
0.87
D
0.87
D
0.99
D
0.81
V186G
D
0.87
D
0.87
D
0.87
D
0.93
D
0.94
V186 M
D
0.61
D
0.87
D
0.86
D
0.86
D
0.85
D189Y
D
0.87
D
0.87
D
0.84
D
0.80
D
0.68
H204D
D
0.87
D
0.87
D
0.80
D
0.69
D
0.88
V205F
D
0.72
D
0.87
D
0.80
D
0.70
D
0.75
E208D
D
0.51
D
0.87
D
0.84
D
0.80
D
0.71
A213T
D
0.55
D
0.87
D
0.85
D
0.71
D
0.72
I222N
D
0.87
D
0.87
D
0.89
D
0.94
D
0.90
T223 M
D
0.87
D
0.87
D
0.91
D
1.05
D
0.96
K226R
D
0.76
D
0.87
D
0.92
D
1.01
D
0.88
K226E
D
0.87
D
0.87
D
0.93
D
1.10
D
0.96
F232V
D
0.87
D
0.82
D
0.92
D
1.08
D
0.95
R237P
D
0.87
D
0.87
D
0.91
D
1.06
D
0.95
R237Q
D
0.87
D
0.87
D
0.89
D
1.00
D
0.92
R237W
D
0.87
D
0.87
D
0.90
D
1.03
D
0.91
S261C
D
0.55
D
0.82
D
0.71
D
0.54
D
0.88
C323Y
D
0.72
D
0.87
D
0.85
D
0.64
T
0.47
P337H
D
0.61
D
0.87
T‡
0.20
T
−0.66
T
0.20
R375W
D
0.76
D
0.87
T
0.33
T
−0.32
T
0.38
W401C
D
0.72
D
0.82
T
0.33
T
−0.36
T
0.44
AAS = Amino Acid Substitution, D = Deleterious/Damaging/Disease, T = Tolerated, §N=Neutral.
Amino acid substitution and disease causation predictions by five meta-predictors.AAS = Amino Acid Substitution, D = Deleterious/Damaging/Disease, T = Tolerated, §N=Neutral.
Prediction of amino acid conservation status, solvent accessibility, and TBX5 secondary structure
ConSurf provided position-specific amino acid conservation score of the TBX5 residues. It also ranked the amino acid residues in a color-coded scale of integers ranging from 1 to 9 where 1 indicates a highly variable and 9 indicates the most conserved residues. It was found that most of the substitutions are located in highly conserved positions. NetSurfP-2.0 predicted solvent accessibility (exposed or buried) of the amino acids and provided each residue's relative and total accessible surface area. 37 substitutions were found to occur in buried residues and 21 substitutions were found to be in exposed residues. Table 2 summarizes ConSurf and NetSurfP-2.0 outcomes. Fig. 3, Fig. 4 are graphical presentations of these results.
Table 2
Predictions by ConSurf and NetSurfP-2.0.
ConSurf
NetSurfP-2.0
Position
AA
Normalized Conservation Score
COLOR
RESIDUE VARIETY
Assigned Class
RSAa
ASAb (Å)
58
L
−0.784
8
T,L,M,H
Buried
0.068
12.397
72
T
−1.098
9
T, S
Buried
0.094
13.02
80
G
−0.858
8
N,H,G
Buried
0.184
14.447
84
F
−0.965
9
C, F
Buried
0.184
37.024
85
P
−0.917
9
G, V, P
Buried
0.094
13.32
91
V
−0.972
9
L,X,V,C,M
Buried
0.062
9.596
93
G
−0.889
8
K,G,R,X
Exposed
0.526
41.429
96
P
−0.715
8
S,A,K,P,E
Exposed
0.369
52.33
100
Y
−1.095
9
Y
Buried
0.068
14.536
104
M
−0.334
6
M,V,I,A,T
Buried
0.009
1.773
105
D
−1.138
9
D
Buried
0.174
25.002
107
V
−1.025
9
I, V
Buried
0.163
25.004
108
P
−0.651
8
A,T,S,Q,P
Buried
0.218
31.002
110
D
−1.138
9
D
Exposed
0.347
50.019
111
D
−0.774
8
E, D
Exposed
0.653
94.129
113
R
−1.137
9
R
Exposed
0.277
63.479
114
Y
−1.095
9
Y
Buried
0.064
13.575
121
W
−0.647
8
W,R,C
Buried
0.249
59.799
125
G
−0.861
8
G,P,S
Exposed
0.293
23.038
133
G
−0.794
8
P,R,G,K
Exposed
0.507
39.935
135
L
−0.875
9
M,L,P
Buried
0.216
39.513
137
V
−0.981
9
P,I,V,L
Exposed
0.292
44.873
140
D
−0.944
9
A,S,D
Exposed
0.559
80.603
142
P
−1.116
9
P
Exposed
0.297
42.08
143
A
−0.875
8
P,V,G,S,A
Buried
0.223
24.613
145
G
−0.901
9
D,G,T
Buried
0.042
3.271
153
V
−0.761
8
I, V, L
Buried
0.041
6.232
160
L
−0.81
8
R,P,T,L
Buried
0.034
6.174
170
H
−0.889
8
H,F,S,Q
Buried
0.165
29.977
174
N
−0.884
8
A,G,S,N,R
Exposed
0.279
40.854
186
V
−0.825
8
A,L,S,M,V,P
Buried
0.046
7.095
189
D
−0.802
8
H,P,A,S,D,T
Exposed
0.44
63.434
204
H
−0.657
8
H,R,F,Q,Y,A
Buried
0.141
25.641
205
V
−0.275
6
W,A,S,L,V,M,I
Exposed
0.44
67.687
208
E
−0.846
8
L,W,P,V,E
Exposed
0.283
49.385
213
A
−0.689
8
G,T,S,A,P,M
Buried
0.028
3.079
222
I
−0.944
9
I,V,N,W
Buried
0.044
8.056
223
T
−1.012
9
T, Q, N
Buried
0.159
22.092
226
K
−1.021
9
R, K
Buried
0.128
26.428
232
F
−0.808
8
N,Y,F,L
Exposed
0.4
80.298
237
R
−1.099
9
R
Exposed
0.545
124.801
261
S
−0.451
7
N,D,S,T
Exposed
0.424
49.711
323
C
−0.522
7
C,R,Y,L,S,G
Buried
0.247
34.65
337
P
1.6
1
G,T,D,L,S,Q,E,P,A,H
Exposed
0.578
81.958
375
R
−0.72
8
G,S,Q,R
Exposed
0.412
94.327
401
W
−1.02
9
W
Exposed
0.309
74.289
RSA = Relative Surface Accessibility.
ASA = Absolute Surface Accessibility.
Fig. 3
Detection of conserved amino acids by ConSurf. Here, e = an exposed residue, b = a buried residue, f = a functional residue (highly conserved and exposed), and s = a structural residue (highly conserved and buried).
Fig. 4
Solvent accessibility predictions of TBX5 residues. Here, Relative Surface Accessibility: Red upward elevations are exposed residues and sky-blue low elevations are buried residues, thresholded at 25%; Secondary Structures: orange spiral = helix, indigo arrow = strand, and pink straight line = coil; disorder: swollen black line, thickness of line corresponds to the probability of disorder. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Predictions by ConSurf and NetSurfP-2.0.RSA = Relative Surface Accessibility.ASA = Absolute Surface Accessibility.Detection of conserved amino acids by ConSurf. Here, e = an exposed residue, b = a buried residue, f = a functional residue (highly conserved and exposed), and s = a structural residue (highly conserved and buried).Solvent accessibility predictions of TBX5 residues. Here, Relative Surface Accessibility: Red upward elevations are exposed residues and sky-blue low elevations are buried residues, thresholded at 25%; Secondary Structures: orange spiral = helix, indigo arrow = strand, and pink straight line = coil; disorder: swollen black line, thickness of line corresponds to the probability of disorder. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)The amino acid substitutions were next analyzed by protein stability prediction tool, iStable 2.0. iStable 2.0 predicted 47 substitutions as destabilizing (Table 3).
Table 3
Prediction of stability of the mutated proteins by iStable 2.0, DynaMut, and Missense3D.
Cavity altered (expansion of the cavity volume by 113.616 Å^3)
C323Y
−0.45005268
Increase
0.91
Stabilizing
Yes
Clash alert
P337H
−0.93664145
Decrease
0.614
Stabilizing
No
R375W
−0.0087457895
Increase
−0.659
Destabilizing
No
W401C
−0.64045143
Decrease
1.108
Stabilizing
Yes
Cavity altered (expansion of the cavity volume by 103.68 Å^3)
Prediction of stability of the mutated proteins by iStable 2.0, DynaMut, and Missense3D.The amino acid substitutions were subsequently analyzed by Missense3D to see how these amino acid substitutions would affect protein tertiary structure. 15 substituted structures were predicted to be ‘Damaging’ by Missense3D (Table 3). Details of Missense3D predictions can be found in Supplementary Figs. 3–17. DynaMut predicted 37 substitutions as destabilizing for the TBX5 protein. Among the 15 substitutions predicted to be detrimental by Misssense3D, 13 were also predicted as damaging by either iStable2.0 or DynaMut, and 5 were predicted to be destabilizing by both (Fig. 5). Table 3 provides predictions of these tools.
Fig. 5
Close-up views of the 5 substitutions predicted to be damaging by iStable 2.0, Missense3D, and DynaMut. These 5 substitutions are: A) P85S, B) W121R, C) V153D, D) E208D, and E) I222N. Native amino acids are shown in green and mutated amino acids are shown in red in this figure. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Close-up views of the 5 substitutions predicted to be damaging by iStable 2.0, Missense3D, and DynaMut. These 5 substitutions are: A) P85S, B) W121R, C) V153D, D) E208D, and E) I222N. Native amino acids are shown in green and mutated amino acids are shown in red in this figure. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of this article.)
Molecular dynamics simulation
Differences between the backbone atom RMSD profiles of the native (5BQD_A) and mutant structures are visible in Fig. 6 (A). RMSD profiles of most mutant structures (P85S, W121R, V153D, and E208D) showed more deviations than the native structure. RMSD of these mutants showed an upward pattern from approximately 50 ns, whereas RMSD of 5BQD_A remained uniform from 15 ns onward. From the RMSF profiles (Fig. 6 (B)), it can be found that residue-wise RMSF values of the mutant structures were greater than the native structure for most residues. Greater deviations in the RMSD and RMSF profiles support more flexible nature of these TBX5 mutants. However, RMSD and RMSF profiles of I222N were found more stable than 5BQD_A.
Fig. 6
(A) RMSD values of the backbone atoms of native and mutant TBX5 proteins, (B) Residue-wise RMSF values of native and mutant TBX5 proteins, (C) radius of gyration values, (D) SASA values, and (E) number of hydrogen bonds of native and mutant TBX5 proteins.
(A) RMSD values of the backbone atoms of native and mutant TBX5 proteins, (B) Residue-wise RMSF values of native and mutant TBX5 proteins, (C) radius of gyration values, (D) SASA values, and (E) number of hydrogen bonds of native and mutant TBX5 proteins.The radius of gyration (Rg) profile of 5BQD_A remained regular during the simulation period (Fig. 6 (C)). All the mutants showed increased Rg values than 5BQD_A up to 50 ns. Then, the Rg values of W121R and E208D decreased than 5BQD_A but became fluctuating, especially of W121R. V153D and I222N maintained a higher Rg profile than 5BQD_A throughout the simulation period. Rg profile reflects the compactness of a protein. From Fig. 6(C), it can be deduced that none of the mutants would be as stable as the native protein.SASA values of P85S, W121R, and I222N were greater than 5BQD_A for most of the time, and V153D and E208D had lower SASA than 5BQD_A. A lower than normal SASA may prevent effective contact between TBX5 and DNA, and thereby interfere with its activity as a transcription factor.E208D and I222N had more hydrogen bonds present than 5BQD_A during the simulation (Fig. 6 (E)). This may be the cause of stable RMSD and RMSF of I222N and decreased SASA and Rg values of E208D. Other mutants had hydrogen bonds more or less similar to 5BQD_A.The above-mentioned findings from molecular dynamics simulations suggest TBX5 mutant proteins bear the possibility to hamper the biological activities of the native protein, because these mutants become either more flexible or rigid compared to the native state.Among the 58 high-confidence deleterious SNPs, AWESOME predicted P108T, D111Y, and A143T substitutions could create new phosphorylation sites (AWESOME scores are −0.711, −1.458, and −0.05, respectively), and S261C substitution can cause loss of a phosphorylation site (AWESOME score 2.752). R237P, R237Q, R237W, and R375W can cause loss of methylation sites (AWESOME scores are 0.834, 0.834, 0.834, and 0.768, respectively). S261C substitution can cause loss of serine mediated O-linked N-acetylgalactosamine (AWESOME score 0.8291) and O-linked N-acetylglucosamine formation (AWESOME score 0.4322). P108T and A143T substitutions can cause threonine mediated O-linked N-acetylglucosamine formation (AWESOME scores are −0.3507 and −0.354, respectively).
Evidence of deleteriousness from experiments
Among our identified 58 missense SNPs, G80R [76], G125R [77], T223 M [78], R237P [79], R237Q, and R237W [76] have been confirmed experimentally to be responsible for HOS. G80R, R237P, R237Q, and R237W cause loss of function, whereas G125R leads to gain of function [79].
Identification and analyzing nonsense and stop lost SNPs
2 stop gained (nonsense) and 1 stop lost SNP were analyzed by PredictSNP2. It predicted the nonsense mutations as deleterious and the stop lost mutation as neutral. Q156Ter nonsense mutation is significant because it disrupts the highly conserved T-box (amino acid position 58–238). Table 4 shows the results from PredictSNP2 server.
Table 4
Predictions about nonsense and stop lost SNPs by PredictSNP2.
Variant
PredictSNP2
CADD
DANN
FATHMM
FunSeq2
GWAVA
P
S
P
S
P
S
P
S
P
S
P
S
W514Ter
Del
0.6777
Del
46
Del
0.9959
Del
0.9623
Del
5
?
0.39
Q156Ter
Del
1
Del
38
Del
0.9984
Del
0.9167
Del
5
Del
0.5
Ter519Q
Neu
−0.1146
Neu
14.06
Neu
0.758
Del
0.9717
Neu
0
Del
0.31
P= Prediction, S= Score, Del = Deleterious, and Neu = Neutral.
Predictions about nonsense and stop lost SNPs by PredictSNP2.P= Prediction, S= Score, Del = Deleterious, and Neu = Neutral.UTR SNPs that can affect transcriptional motifs were retrieved from the UTRdb server. Three SNPs in the 3′ UTR (rs28730760, rs28730761, and rs883079) were found to be present in the polyadenylation (poly-A) sites, and hence may be pathogenic. In addition, UTRdb output returned six transcriptional motif matches in the 5′ UTR and three transcriptional motif matches in the 3′ UTR. Table 5 provides results from UTRdb.
Table 5
Transcriptional motifs present in TBX5 untranslated regions.
Signal Name
UTR region
Match total
NCBI Reference Sequence
Genomic Position (NCBI36/hg18)
Uorf (Upstream Open Reading Frame)
5′
1
NM_181486.4
Chr12:113328041–113328106
2
NM_000192.3
Chr12:113330378–113330629
Chr12:113330136–113330216
2
NM_080717.3
Chr12:113330378–113330629
Chr12:113330136–113330216
IRES (Internal Ribosome Entry Site)
5′
1
NM_000192.3
Chr12:113326087–113330055
CPE (Cytoplasmic polyadenylation element)
3′
1
NM_181486.4
Chr12:113276118–113276332
K-Box
3′
1
NM_181486.5
Chr12:113276848–113276855
PAS (Polyadenylation Signal)
3′
1
NM_181486.6
Chr12:113276118–113276144
Transcriptional motifs present in TBX5 untranslated regions.3′ UTR SNPs that might create or break an miRNA target site, or enhance or decrease miRNA binding to mRNA were predicted by MirSNP. A total number of 86 SNPs in the 3′ untranslated region of the TBX5 gene were predicted to alter miRNA-mRNA binding. Supplementary Table 3 shows the results from MirSNP server.Human Splicing Finder (HSF) web tool was employed to predict the effects of SNPs located in the 5′ and 3′ splice sites. A total number of 11 splice site SNPs were subjected to assessment. All were found to have ‘probably no impact on splicing’ by HSF. Table 6 shows the predictions obtained from HSF.
Table 6
Predictions about splice site SNPs by HSF.
rsIDs
Alleles
Predicted Signal
Interpretation
Splice Acceptor Variant
rs1555226330
C > A
No significant splicing motif alteration detected
This mutation has probably no impact on splicing
rs1031873727
G > A
ESS Site broken
Alteration of an intronic ESS site, Probably no impact on splicing
rs1031873727
G > T
ESS Site broken
Alteration of an intronic ESS site, Probably no impact on splicing
rs891464399
G > A
New ESE Site
Creation of an intronic ESE site, Probably no impact on splicing
rs958951320
C > T
No significant splicing motif alteration detected
This mutation has probably no impact on splicing
rs181078973
A > C
No significant splicing motif alteration detected
This mutation has probably no impact on splicing
rs181078973
A > G
No significant splicing motif alteration detected
This mutation has probably no impact on splicing
rs374919751
C > G
New ESE Site
Creation of an intronic ESE site, Probably no impact on splicing
rs374919751
C > T
No significant splicing motif alteration detected
This mutation has probably no impact on splicing
rs1288475235
C > A
No significant splicing motif alteration detected
This mutation has probably no impact on splicing
rs1015550731
T > C
ESS Site broken
Alteration of an intronic ESS site, Probably no impact on splicing
Predictions about splice site SNPs by HSF.19 among all the TBX5 intron variants were predicted to affect transcription factor binding sites by FuncPred. These 19 SNPs were further analyzed by RegulomeDB 2.0 server, and only two among these SNPs (rs12827969 and rs61931002) were considered functionally important. Table 7 summarizes results from SNP FuncPred and RegulomeDB.
Table 7
Predictions regarding deep intronic SNPs.
SNP FuncPred
RegulomeDB
rsID
Chromosome
Position
Allele
Regulatory Potential
Conservation
Rank
Probability
rs11067100
12
113328237
G/T
0.271604
0.998
4
0.60906
rs11067101
12
113328586
A/G
0.186436
0
4
0.60906
rs11613605
12
113326220
C/G
0.099303
0
4
0.60906
rs11837917
12
113329990
G/T
0.096837
0
4
0.60906
rs12319868
12
113329396
G/T
0.150806
0
4
0.60906
rs12372585
12
113330591
A/G
0.225674
0.19
4
0.60906
rs12423887
12
113327989
C/T
0.410483
0.025
4
0.60906
rs1248046
12
113329889
T/C
0
0
4
0.60906
rs12827969
12
113326517
A/T
0.181987
1
2b
0.80975
rs1522368
12
113329052
A/G
0.148659
0.997
3a
0.6893
rs1522369
12
113329156
A/G
0.076716
0
4
0.60906
rs2113436
12
113326379
G/A
0.469676
0.006
4
0.60906
rs35203448
12
113328816
A/G
0.113463
0.5
4
0.60906
rs3782467
12
113327309
C/G
0.000319
0.001
4
0.60906
rs4553413
12
113328827
A/G
0.0709
0.567
4
0.60906
rs57820630
12
113330823
C/T
NA
NA
4
0.60906
rs61931002
12
113326665
A/G
NA
NA
2b
0.64343
rs736560
12
113327637
A/C
0.393482
0
4
0.60906
rs7957609
12
113331519
A/G
0.316415
0.022
4
0.60906
Predictions regarding deep intronic SNPs.RegulomeDB ranks: 1a-1f: likely to affect transcription factor binding and linked to expression of a gene target, 2a-2c: likely to affect binding, 3a-3b: less likely to affect binding, 4,5,6: minimal binding evidence.Due to the unavailability of a TBX5 full-length crystal structure, full-length 3D models of TBX5 were created employing three different protein structure prediction tools which were later energy minimized and refined. Supplementary Fig. 1 shows cartoon presentations of the energy minimized and refined models.3D Model evaluation is indispensable for checking the credibility of a generated model. PROCHECK, Verify3D, and SWISS-MODEL Structure Assessment Tool assessed the 3D models generated in the current study. Ramachandran plot is a reliable qualitative indicator of protein structure [70]. PROCHECK calculated Ramachandran plots for the created models. A model is considered reasonable by PROCHECK if >90% of the residues are present in the most favored regions of Ramachandran plot. Only one model (generated by Robetta and later energy minimized and refined) was found to fulfill this criterion. It has 94.5% of residues in the most favored regions. Supplementary Fig. 2 shows the Ramachandran plots calculated by PROCHECK. Verify3D is another tool that approves a model if its 80% or more amino acid residues score ≥0.2 in the 3D/1D profile. However, none of the models could fulfill this requirement. MolProbity score and QMEAN Z-score of our models were obtained from SWISS-MODEL Structure Assessment Tool. A low MolProbity score and a high QMEAN Z-score are preferred. A negative QMEAN Z-score indicates instability in protein structure [71]. All of our models have a negative QMEAN Z-score. This disagreement can be explained by the fact that presently available ab initio protein modeling tools predict poorly about the proteins having more than 120 residues [80], and for our protein models, 279 residues (5BQD-A chain extends from 1 to 239 residues only and the full-length protein has 518 residues) do not have any template. However, among the models, energy minimized and refined Robetta model has the lowest MolProbity score and the least negative QMEAN Z-score. Considering Ramachandran plot, MolProbity scores, and QMEAN Z-scores, it can be decided that Robetta outperformed two other protein structure prediction tools used in this study. Supplementary Table 4 summarizes the results obtained from PROCHECK, Verify3D, and SWISS-MODEL Structure Assessment Tool, and Supplementary Fig. 2 shows the Ramachandran plots obtained from PROCHECK.
Discussion
Recently bioinformatics approaches have become popular for identifying different human gene SNPs and predicting their structural and functional consequences. In the past few years, damaging SNPs of CCR6 gene [81], SMPX gene [82], human aldehyde oxidase gene [83], PTEN gene [84], folate pathway genes [85], human bone morphogenetic protein receptor type 1A gene [86], SKP2 gene [87], human adiponectin receptor 2 gene [88], MMP9 gene [89], etc. have been identified with the help of different computational biology tools. With the rapid development of genomics, the number of reported SNPs in different databases is continuously increasing for the last few years. However, determining the SNPs that result in diseases is challenging. Various pathogenicity prediction tools can narrow this number to a list of high-confidence deleterious SNPs [90].TBX5 gene is an essential regulator of cardiac and upper limb development, and mutations in this gene are responsible for Holt-Oram syndrome, especially missense mutations. It is estimated that 1 in 100,000 neonates are born with Holt-Oram syndrome, and male and female children are affected equally. Since HOS is an autosomal dominant inheritance, it is imperative that all individuals with HOS and their parents and siblings get genetic counseling [91]. In the past years, some mutations, including SNPs have been reported in association with the TBX5 gene. However, most of the SNPs of this gene have not been characterized yet. In this study, we have tried to fill this gap.In our current study, we have analyzed all currently available TBX5 SNPs. Since results from a single bioinformatics tool may be inconclusive, we have appointed 9 highly reliable prediction tools for identifying deleterious missense SNPs. The 58 deleterious missense SNPs that were initially identified by concordance among PROVEAN, SIFT, and PolyPhen-2 were also found to be damaging by 6 other tools (PredictSNP1, PredictSNP2, MetaLR, MetaSVM, and REVEL) with few exceptions. Among these 58 missense SNPs, 5 missense SNPs (G80R, G125R, T223 M, R237P, R237Q, and R237W) have been proven to be responsible for HOS [[76], [77], [78], [79]].The amino acid conservation analysis found that 55 among the 58 missense SNPs cause amino acid substitution in highly conserved positions (ConSurf score 7–9). This further corroborates the importance of identified SNPs. Furthermore, 8 of these 58 missense SNPs (P108T, D111Y, A143T, R237P, R237Q, R237W, S261C, and R375W) were found to have the ability to modify post-translational sites.We have also analyzed the effects of these 58 SNPs on the stability of native TBX5 protein structure. 15 SNPs were predicted to cause structural damage by Missense3D. G80R, T72K, and V153D substitutions introduce buried charges. V153D and I22 N mutations substitute buried hydrophobic residues with hydrophilic residues. G80R and G145W replace buried glycines; P85L, P85S, and P142S replace cis prolines, D105 N breaks a buried salt bridge, and E208D disrupts a side-chain/side-chain H-bond and a side-chain/main-chain H-bond. W121R, S261C, and W401C cause an expansion of the internal protein cavity, and thus make the protein structure unstable. G80R, G145W, and C323Y substitutions trigger clash alerts. These three substitutions lead to a MolProbity clash score ≥30, and the increase in clash score is >18 when compared to the wild-type residues. G80R and G125R mutations induce disallowed phi/psi alerts. 5 of these 15 substitutions (P85S, W121R, V153D, E208D, and I222N) were predicted to destabilize the native protein structure by the consensus of Missense3D, DynaMut, and iStable 2.0. This finding was further supported by MD simulation results. These 5 SNPs [rs1565941579 (P85S), rs1269970792 (W121R), rs772248871 (V153D), rs769113870 (E208D), and rs1318021626 (I222N)], which have not yet been reported, are most likely to cause HOS.We have also employed 6 other tools to identify deleterious non-coding SNPs in the TBX5 gene. 86 SNPs in the 3′ UTR of the TBX5 gene were identified that can affect miRNA-mRNA binding. No significant SNP was identified in the splice sites of TBX5, and 2 deep intronic SNPs were identified to affect transcription factor binding and gene expression.Due to the unavailability of a full-size experimental TBX5 protein structure, TBX5 3D models were also predicted employing I-TASSER, Phyre2, and Robetta. Later, the 3D structures were evaluated by PROCHECK, Verify3D, and SWISS-MODEL Structure Assessment Tool. A model is considered good if >90% of its residues are present in the most favored regions of the Ramachandran plot. In the case of our TBX5 models, only Robetta generated model passed this requirement. On the other hand, verify3D results and QMEAN Z-scores indicate poor quality of the generated models. This can be attributed to the inability of the available ab initio protein modeling tools to predict the correct structure of proteins having more than 120 residues [80]. These template-less regions are responsible for the deviations of our models from an ideal one.
Conclusion
This extensive in silico study attempts to find the most damaging TBX5 SNPs. Our obtained results can guide further wet-lab studies of TBX5 gene-related diseases and may potentiate finding cure in the future.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Availability of data and material
All data generated or analyzed during this study are included in this paper and the supplementary file.
Funding
This work did not receive any funding.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Authors: Christopher J Williams; Jeffrey J Headd; Nigel W Moriarty; Michael G Prisant; Lizbeth L Videau; Lindsay N Deis; Vishal Verma; Daniel A Keedy; Bradley J Hintze; Vincent B Chen; Swati Jain; Steven M Lewis; W Bryan Arendall; Jack Snoeyink; Paul D Adams; Simon C Lovell; Jane S Richardson; David C Richardson Journal: Protein Sci Date: 2017-11-27 Impact factor: 6.725
Authors: Harry C Jubb; Arun P Pandurangan; Meghan A Turner; Bernardo Ochoa-Montaño; Tom L Blundell; David B Ascher Journal: Prog Biophys Mol Biol Date: 2016-11-29 Impact factor: 3.667
Authors: Jaroslav Bendl; Miloš Musil; Jan Štourač; Jaroslav Zendulka; Jiří Damborský; Jan Brezovský Journal: PLoS Comput Biol Date: 2016-05-25 Impact factor: 4.475
Authors: Vikas Pejaver; Jorge Urresti; Jose Lugo-Martinez; Kymberleigh A Pagel; Guan Ning Lin; Hyun-Jun Nam; Matthew Mort; David N Cooper; Jonathan Sebat; Lilia M Iakoucheva; Sean D Mooney; Predrag Radivojac Journal: Nat Commun Date: 2020-11-20 Impact factor: 14.919