Literature DB >> 28379995

Investigation of mutations in the HBB gene using the 1,000 genomes database.

Tânia Carlice-Dos-Reis¹, Jaime Viana^1,2, Fabiano Cordeiro Moreira^1,3, Greice de Lemos Cardoso¹, João Guerreiro¹, Sidney Santos^1,3, Ândrea Ribeiro-Dos-Santos^1,3.

Abstract

Mutations in the HBB gene are responsible for several serious hemoglobinopathies, such as sickle cell anemia and β-thalassemia. Sickle cell anemia is one of the most common monogenic diseases worldwide. Due to its prevalence, diverse strategies have been developed for a better understanding of its molecular mechanisms. In silico analysis has been increasingly used to investigate the genotype-phenotype relationship of many diseases, and the sequences of healthy individuals deposited in the 1,000 Genomes database appear to be an excellent tool for such analysis. The objective of this study is to analyze the variations in the HBB gene in the 1,000 Genomes database, to describe the mutation frequencies in the different population groups, and to investigate the pattern of pathogenicity. The computational tool SNPEFF was used to align the data from 2,504 samples of the 1,000 Genomes database with the HG19 genome reference. The pathogenicity of each amino acid change was investigated using the databases CLINVAR, dbSNP and HbVar and five different predictors. Twenty different mutations were found in 209 healthy individuals. The African group had the highest number of individuals with mutations, and the European group had the lowest number. Thus, it is concluded that approximately 8.3% of phenotypically healthy individuals from the 1,000 Genomes database have some mutation in the HBB gene. The frequency of mutated genes was estimated at 0.042, so that the expected frequency of being homozygous or compound heterozygous for these variants in the next generation is approximately 0.002. In total, 193 subjects had a non-synonymous mutation, which 186 (7.4%) have a deleterious mutation. Considering that the 1,000 Genomes database is representative of the world's population, it can be estimated that fourteen out of every 10,000 individuals in the world will have a hemoglobinopathy in the next generation.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Substances：
beta-Globins

Year: 2017 PMID： 28379995 PMCID： PMC5381778 DOI： 10.1371/journal.pone.0174637

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

1. Introduction

Understanding the relationship between phenotype and genotype in the clinical setting is one of the main objectives of traditional research [1]. However, studies on a large number of mutations are problematic, primarily due to the experimental analyses. In contrast, in silico analysis is faster and easier to execute, yields more results, and costs less, thus making it more efficient. This type of analysis is based on alterations in the sequences of nucleotides and/or amino acids and their comparison with the native sequence to correlate the effect of these alterations on the phenotype of the individual [1,2,3,4]. Mutations in the HBB gene, which is located on chromosome 11 p15.5 [5], are responsible for several serious hemoglobinopathies, such as sickle cell anemia and β-thalassemia. Hemoglobinopathies are a set of hereditary diseases caused by the abnormal structure or insufficient production of hemoglobin. Sickle cell anemia and β-thalassemia can lead to serious anemia and other life threatening conditions [6]. Sickle cell anemia is one of the most common monogenic diseases worldwide. It is estimated that 312,000 people are born with sickle cell anemia every year, and the majority of these individuals are native to Sub-Saharan Africa [7]. Thus, it is important for the public healthcare system to detect heterozygous carriers of hemoglobinopathies, as they can produce homozygous and double heterozygous individuals with serious clinical conditions [8]. The 1,000 Genomes Project is an international consortium organized with the objective of sequencing a large number of individual genomes representative of the world’s population. The consortium has the objective of better characterizing the sequence variation of the human genome and enabling the investigation of the relationship between genotype and phenotype. Thus, the 1,000 Genomes Project enables a more precise study of variants in genome-wide association studies (GWAS) and the best localization of variants associated with diseases in different population groups [9]. The objective of this study is to track variations in the β-globin gene (HBB); to describe the frequencies of mutations in different population groups using the 1,000 Genomes databank, which provides a comprehensive resource of human genetic variation [9] relative to the HG19 reference genome [10]; and to investigate the pattern of resulting pathogenicity.

2. Methodology

To perform this study, data from 2,504 samples deposited in the 1,000 Genomes database were used; these open-access sequences were aligned with the HG19 reference genome using the SNPEFF tool [11]. This program provides and records the effects both of genetic variations as well as amino acid alterations. The resulting data were visualized in the Integrative Genomics Viewer (IGV) [12], a high-performance visualization tool for the interactive exploration of genomic datasets. The mutations were tracked at the nucleotide and amino acid levels, and the population frequencies with which these mutations occur, the type of mutation, and the respective positions were recorded. To investigate pathogenicity these mutations, five different prediction tools, including POLYPHEN [13], SIFT [14], PROVEAN [15], PANTHER [16], and E MUTPRED [17], and three databanks, including CLINVAR [18], dbSNP [19] and HbVar [20], were used, as shown in Fig 1.

Fig 1

Alignment of the 1000 Genomes and HG19 sequences of HBB using the SNPEFF tool; predictors and BD used for the investigation of pathogenic mutations.

Each predictor uses distinct characteristics to determine the effect of the mutations in relation to the information obtained regarding the structure and function of the protein. It is important to highlight that the results of all predictors provide additional evidence of pathogenicity; thus, five predictors were analyzed to improve accuracy. The determination of the pathogenicity of each mutation is based on four pieces of evidence: (i) CLINVAR, (ii) dbSNP, (iii) HbVar, and (iv) predictors. Tables 1, 2 and 3 present the following results of the alignment of sequences from 2,504 samples: (1) the positions in the genome; (2) the identification of the single nucleotide polymorphism (SNP) of each mutation; (3) the types of mutations; (4) the mutations observed at the nucleotide level; (5) the respective consequences at the amino acid level; (6) the population frequency of each mutation; and (7) the pathogenicity investigated for each mutation.

Table 1

Position and SNP ID of the mutations observed at the nucleotide level, the respective consequences at the amino acid level, the types of mutations, and the number of individuals.

Position	SNP ID	Nucleotide change	AA alteration	Type of mutation	N° Individuals	Ref.
5246840	rs36020563	G/A	His144His	Synonymous	1	[21]
5246870	rs113082294	C/G	Val134Val	Synonymous	9	[22]
5246883	rs111645889	G/A	Ala130Val	Missense	1	[23]
5246890	rs33971634	G/A	Gln128	Stop gained	1	[24]
5246908	rs33946267	C/G	Glu122Gln	Missense	3	[25]
5246947	rs33958637	T/G	Asn109His	Missense	1	[26]
5246948	rs193922562	G/A	Gly108Gly	Synonymous	1	[27]
5247876	rs145669504	G/T	Leu82Leu	Synonymous	5	[28]
5247992–5247996	rs281864900	CAAAG/C	Phe42fs	Frameshift	5	[29]
5248004	rs11549407	G/A	Gln40	Stop gained	1	[30]
5248029	rs1135071	C/A	Arg31Ser	Splice region and missense	1	[31]
5248030	rs33943001	C/G	#	Splice acceptor and intron variant	1	[32]
5248159	rs33971440	C/T	#	Splice donor and intron variant	1	[33]
5248162	rs35578002	G/T	Glu30Gly	Splice region and synonymous variant	1	[34]
5248173	rs33950507	C/T	Glu27Lys	Missense	14	[35]
5248200	rs33986703	T/A	Lys18	Stop gained	6	[36]
5248205	rs63750783	C/T	Trp16	Stop gained	2	[37]
5248232	rs334	T/A	Glu7Val	Missense	137	[38]
5248233	rs33930165	C/T	Glu7Lys	Missense	17	[39]
5248236	rs33912272	G/A	Pro6Ser	Missense	1	[40]

#—Intronic variant mutations

Table 2

SNP ID, nucleotide and Amino Acid changes, number of individuals and population frequency of each mutation.

SNP ID	Nucleotide change	Amino Acid change	Total individuals	N°/ Freq AFR	N°/ Freq AMR	N°/ Freq EAS	N°/Freq EUR	N°/ Freq SAS	Total Allele Frequency
rs36020563	G/A	His144His	1	1 (0.0008)	0	0	0	0	0.00019
rs113082294	C/G	Val134Val	9	0	2 (0.0029)	0	7 (0.007)	0	0.00179
rs111645889	G/A	Ala130Val	1	1 (0.0008)	0	0	0	0	0.00019
rs33971634	G/A	Gln128	1	0	1 (0.0014)	0	0	0	0.00019
rs33946267	C/G	Glu122Gln	3	0	0	0	0	3 (0.0031)	0.00059
rs33958637	T/G	Asn109His	1	0	0	1 (0.001)	0	0	0.00019
rs193922562	G/A	Gly108Gly	1	1 (0.0008)	0	0	0	0	0.00019
rs145669504	G/T	Leu82Leu	5	0	0	5 (0.005)	0	0	0.00099
rs281864900	CAAAG/C	Phe42fs	5	0	0	5 (0.005)	0	0	0.00099
rs11549407	G/A	Gln40	1	0	1 (0.0014)	0	0	0	0.00019
rs1135071	C/A	Arg31Ser	1	0	0	0	1 (0.001)	0	0.00019
rs33943001	C/G	#	1	0	0	0	0	1 (0.001)	0.00019
rs33971440	C/T	#	1	0	1 (0.0014)	0	0	0	0.00019
rs35578002	G/T	Glu30Gly	1	1 (0.0008)	0	0	0	0	0.00019
rs33950507	C/T	Glu27Lys	14	0	0	8 (0.0079)	0	6 (0.0061)	0.00279
rs33986703	T/A	Lys18	6	0	0	6 (0.006)	0	0	0.00119
rs63750783	C/T	Trp16	2	0	0	0	0	2 (0.002)	0.00039
rs334	T/A	Glu7Val	137	132 (0.0072)	5 (0.0998)	0	0	0	0.02735
rs33930165	C/T	Glu7Lys	17	17 (0.0129)	0	0	0	0	0.00339
rs33912272	G/A	Pro6Ser	1	0	0	0	1 (0.001)	0	0.00019

AFR: African.; AMR: American; EAS: Eastern Asian; EUR: European; SAS: Southern Asian.

Table 3

SNP ID; nucleotide alteration; amino acid alteration; total number of individuals; list of the results from CLINVAR, dbSNP, HbVar, POLYPHEN, PROVEAN, SIFT, PANTHER, and MUTPRED; and final analysis of pathogenicity.

SNP ID	Nucleotide change	Amino acid change	Total individuals	CLINVAR	dbSNP (NCBI)	HbVar	POLYPHEN	PROVEAN	SIFT	PANTHER	MUTPRED	Conclusion pathogenicity
rs11164588	G/A	Ala130Val	1	Other	Other	Benign	Benign	Damaging	Damaging	Damaging	Damaging	Conflict
rs33971634	G/A	Gln128	1	Damaging	Other	Damaging	*	Damaging	*	*	*	Damaging
rs33946267	C/G	Glu122Gln	3	Damaging	Damaging	Benign	Benign	Benign	Damaging	Benign	Damaging	Conflict
rs33958637	T/G	Asn109His	1	Other	*	Benign	Probably damaging	Damaging	Damaging	Benign	Damaging	Conflict
rs281864900	CAAAG/C	Phe42fs	5	Damaging	Damaging	Damaging	*	Damaging	*	*	*	Damaging
rs11549407	G/A	Gln40	1	Damaging	Damaging	Damaging	*	*	*	*	*	Damaging
rs1135071	C/A	Arg31Ser	1	Damaging	Damaging	Benign	Probably damaging	Damaging	Damaging	Damaging	Damaging	Damaging
rs33943001	C/G	#	1	Damaging	Damaging	Damaging	*	*	*	*	*	Damaging
rs33971440	C/T	#	1	Damaging	Damaging	Damaging	*	*	*	*	*	Damaging
rs35578002	G/T	Glu30Gly	1	*	*	Damaging	Benign	Benign	Benign	Benign	Benign	Conflict
rs33950507	C/T	Glu27Lys	14	Damaging	Damaging	Damaging	Benign	Damaging	Damaging	Damaging	Damaging	Damaging
rs33986703	T/A	Lys18	6	Damaging	Damaging	Damaging	*	Damaging	*	*	*	Damaging
rs63750783	C/T	Trp16	2	Damaging	Damaging	Damaging	*	Damaging	*	*	*	Damaging
rs334	T/A	Glu7Val	137	Damaging	Damaging	Damaging	Benign	Damaging	Damaging	*	Damaging	Damaging
rs33930165	C/T	Glu7Lys	17	Damaging	Damaging	Damaging	Benign	Damaging	Damaging	*	Damaging	Damaging
rs33912272	G/A	Pro6Ser	1	Other	Other	Benign	Benign	Benign	Benign	*	Damaging	Conflict

* Could not be evaluated

# Intronic variant mutations

#—Intronic variant mutations AFR: African.; AMR: American; EAS: Eastern Asian; EUR: European; SAS: Southern Asian. * Could not be evaluated # Intronic variant mutations

3. Results

A total of 20 different mutations were identified in 209 individuals (8.34%) in the coding region of the HBB gene. The variants observed were classified as follows: (i) four synonymous mutations; (ii) seven missense mutations; (iii) four stop-gain mutations; (iv) one frameshift mutation; (v) one splice region and missense variant; (vi) one splice region and synonymous variant; (vii) one splice acceptor and intron variant; and (viii) one splice donor and intron variant. Missense mutations were the most frequently encountered, affecting 174 (83.2%) individuals, as shown in Table 1. All observed mutations were heterozygous and already had SNP IDs. The mutations with the highest allelic frequencies were as follows: (i) rs334 had total frequency of 0.0274 (African and American populations); (ii) rs33930165 had a frequency of 0.0034 (only in the African population); and (iii) rs33950507 had a frequency of 0.0028 (Eastern and Southern Asian populations), as shown in Table 2. Synonymous mutations were encountered in 16 (7.6%) samples and were excluded from the investigation of pathogenicity performed by the database predictors because they do not alter the amino acid sequence. Thus, the pathogenicity of missense, stop-gain, frameshift, splice region (both acceptor and donors), synonymous and intron mutations were tracked using the dbSNP, CLINVAR and HbVar databases, as well as five in silico predictors (POLYPHEN, SIFT, PROVEAN, PANTHER and MUTPRED). The results showed 11 pathogenic mutations of HBB (Table 3). In addition, five mutations—(1) rs111645889, (2) rs33946267, (3) rs33958637 (4) rs35578002 and (5) rs33912272—presented conflicting results between predictors and databases.

4. Discussion

Mutations in the HBB gene are distributed unevenly among the different population groups. The African population was the most affected, with 73.2% of individuals having mutations in this gene, while the European population was least affected, with 4.3% of individuals having such mutations. The three mutations with the greatest frequency were (1) rs334 (AFR and AMR); (2) rs33930165 (AFR); and (3) rs33950507 (EAS and SAS). The rs334 mutation is responsible for hemoglobin S, known as HbS, which causes sickle cell anemia. The rs33930165 mutation is responsible for hemoglobin C, or HbC [41], which is more frequent in the African population [42,43]. In addition, the rs3395057 mutation is responsible for hemoglobin E, or HbE [41], which is involved in β-thalassemia described in Asian populations [44]. The available data show that variants rs33986703, rs63750783, and rs281864900 are responsible for β-thalassemia and are described in Asian populations [45,46,39]. Variants rs11549407 and rs33971634 are also β-thalassemia mutations but are common in European populations [47,24]; rs33971440 and rs35578002 are commonly found in populations of the Mediterranean region [48,49,34]. Although the HBB gene is well studied, there are some mutations in this gene that are not well known and poorly described in the literature. This is the case of the variants rs111645889, rs33958637, rs1135071, rs33943001 and rs33912272, for which no scientific papers were found discussing their epidemiology. CLINVAR [18] is one of the most widely used databases in clinical and pathological analyses related to mutations. However, not all mutations of the HBB gene (rs35578002) are registered in this database, and conflicting results have been observed when comparing predictors with the CLINVAR, dbSNP and HbVar databases to estimate the pathogenicity of each mutation, or more specifically, the clinical significance of mutations rs111645889, rs33946267, rs33958637, rs35578002 and rs33912272. It is important to emphasize that all samples deposited in the 1,000 Genomes Project, an international consortium aimed at producing a public catalog of human genetic variability, belong to individuals without clinical manifestations of any disease. The SNP rs35578002 is not available in CLINVAR and has no information on clinical significance in the dbSNP database. Predictors consider this variant as benign, but the HbVar database classifies it as a damaging mutation. This variant is the β-thalassemia mutation Cd29 (C> T), which in homozygosis causes hemolytic anemia and ineffective erythropoiesis [34]. This mutation was described in Mediterranean populations. One possible explanation for the inconsistent information about the clinical significance of this variant is that it is a synonymous mutation in the splice region that is critical for RNA processing, causing thalassemia as described in HbVar. Also noteworthy is the mutation rs33946267. According to the literature, this mutation leads to the formation of Hb D-Punjab. This mutation is generally asymptomatic but may occasionally cause moderate hemolytic anemia, similar to the manifestations of sickle cell anemia when associated with other hemoglobin variants, such as HbS or β-thalassemia mutations. Its initial distribution suggests that it is more prevalent in the central region of Asia, but due to migration, it can be found in several other regions [50]. According to the results, 8.3% of the phenotypically healthy individuals of the 1,000 Genomes database have a mutation in the HBB gene in heterozygosis. This means that eighty out of 1,000 individuals have a mutant allele in the gene. The frequency of mutated genes was estimated at 0.042, so that the expected frequency of being homozygous or compound heterozygous for these variants in the next generation is approximately 0.002. In total, 193 subjects had a non-synonymous mutation, meaning that approximately 7.7% had a change that affects the sequence of amino acids. Of these, 186 (7.4%) have a deleterious mutation based on available data on the clinical significance of these mutations (Table 3). Considering that the 1,000 Genomes database is representative of the world’s population, it can be estimated that fourteen out of every 10,000 individuals in the world will have a hemoglobinopathy in the next generation. Independently, new studies are needed to validate the clinical consequences of the mutations with undefined pathogenicity. Considering the absence of physiopathological knowledge relative to the newly identified mutations, the use of in silico predictors (in an orderly and criteria-based manner) emerges as a possible tool to aid in decision-making with respect to diagnostic, preventative, and treatment measures.

43 in total

1. Efficient correction of hemoglobinopathy-causing mutations by homologous recombination in integration-free patient iPSCs.

Authors: Mo Li; Keiichiro Suzuki; Jing Qu; Preeti Saini; Ilir Dubova; Fei Yi; Jungmin Lee; Ignacio Sancho-Martinez; Guang-Hui Liu; Juan Carlos Izpisua Belmonte
Journal: Cell Res Date: 2011-11-22 Impact factor: 25.617

2. A computational approach to determine susceptibility to cancer by evaluating the deleterious effect of nsSNP in XRCC1 gene on binding interaction of XRCC1 protein with ligase III.

Authors: Preety Kadian Singh; Kinnari N Mistry
Journal: Gene Date: 2015-10-09 Impact factor: 3.688

3. Molecular basis of beta-thalassemia in Turkey: detection of rare mutations by direct sequencing.

Authors: C Aulehla-Scholz; S Basaran; L Agaoglu; A Arcasoy; W Holzgreve; P Miny; F Ridolfi; J Horst
Journal: Hum Genet Date: 1990-01 Impact factor: 4.132

4. Hemoglobin D Los Angeles in two Caucasian families: hemoglobin SD disease and hemoglobin D thalassemia.

Authors: R G Schneider; S Ueda; J B Alperin; W C Levin; R T Jones; B Brimhall
Journal: Blood Date: 1968-08 Impact factor: 22.113

5. Hemoglobin Machida [beta 6 (A3) Glu replaced by Gln], a new abnormal hemoglobin discovered in a Japanese family: structure, function and biosynthesis.

Authors: T Harano; K Harano; S Ueda; S Shibata; K Imai; M Seki
Journal: Hemoglobin Date: 1982 Impact factor: 0.849

6. A new beta chain variant, Hb Tyne [beta 5(A2)Pro-->Ser].

Authors: J V Langdown; D Williamson; C H Beresford; I Gibb; R Taylor; R Deacon-Smith
Journal: Hemoglobin Date: 1994-09 Impact factor: 0.849

7. Molecular basis of β-thalassemia in Karnataka, India.

Authors: Gururaj D Kulkarni; Suyamindra S Kulkarni; Gurushantappa S Kadakol; Bhushan B Kulkarni; Prakashgouda H Kyamangoudar; Bhaskar V K S Lakkakula; Kumarasamy Thangaraj; Tipperudra A Shepur; Muralidhar L Kulkarni; Pramod B Gai
Journal: Genet Test Mol Biomarkers Date: 2011-10-06

Review 8. The influence of host genetics on erythrocytes and malaria infection: is there therapeutic potential?

Authors: Patrick M Lelliott; Brendan J McMorran; Simon J Foote; Gaetan Burgio
Journal: Malar J Date: 2015-07-29 Impact factor: 2.979

9. Genome-wide association analyses based on whole-genome sequencing in Sardinia provide insights into regulation of hemoglobin levels.

Authors: Fabrice Danjou; Magdalena Zoledziewska; Carlo Sidore; Maristella Steri; Fabio Busonero; Andrea Maschio; Antonella Mulas; Lucia Perseu; Susanna Barella; Eleonora Porcu; Giorgio Pistis; Maristella Pitzalis; Mauro Pala; Stephan Menzel; Sarah Metrustry; Timothy D Spector; Lidia Leoni; Andrea Angius; Manuela Uda; Paolo Moi; Swee Lay Thein; Renzo Galanello; Gonçalo R Abecasis; David Schlessinger; Serena Sanna; Francesco Cucca
Journal: Nat Genet Date: 2015-09-14 Impact factor: 38.330

10. ClinVar: public archive of relationships among sequence variation and human phenotype.

Authors: Melissa J Landrum; Jennifer M Lee; George R Riley; Wonhee Jang; Wendy S Rubinstein; Deanna M Church; Donna R Maglott
Journal: Nucleic Acids Res Date: 2013-11-14 Impact factor: 16.971

6 in total

1. Proteomic variations of esophageal squamous cell carcinoma revealed by combining RNA-seq proteogenomics and G-PTM search strategy.

Authors: Pooja Ramesh; Vidhyavathy Nagarajan; Vartika Khanchandani; Vasanth Kumar Desai; Vidya Niranjan
Journal: Heliyon Date: 2020-08-29