Literature DB >> 16595073

Functional nsSNPs from carcinogenesis-related genes expressed in breast tissue: potential breast cancer risk alleles and their distribution across human populations.

Sevtap Savas¹, Steffen Schmidt, Hamdi Jarjanazi, Hilmi Ozcelik.

Abstract

Although highly penetrant alleles of BRCA1 and BRCA2 have been shown to predispose to breast cancer, the majority of breast cancer cases are assumed to result from the presence of low-moderate penetrant alleles and environmental carcinogens. Non-synonymous single nucleotide polymorphisms (nsSNPs) are hypothesised to contribute to disease susceptibility and approximately 30 per cent of them are predicted to have a biological significance. In this study, we have applied a bioinformatics-based strategy to identify breast cancer-related nsSNPs from 981 carcinogenesis-related genes expressed in breast tissue. Our results revealed a total of 367 validated nsSNPs, 109 (29.7 per cent) of which are predicted to affect the protein function (functional nsSNPs), suggesting that these nsSNPs are likely to influence the development and homeostasis of breast tissue and hence contribute to breast cancer susceptibility. Sixty-seven of the functional nsSNPs presented as commonly occurring nsSNPs (minor allele frequencies > or =5 per cent), representing excellent candidates for breast cancer susceptibility. Additionally, a non-uniform distribution of the common functional nsSNPs among different human populations was observed: 15 nsSNPs were reported to be present in all populations analysed, whereas another set of 15 nsSNPs was specific to particular population(s). We propose that the nsSNPs analysed in this study constitute a unique resource of potential genetic factors for breast cancer susceptibility. Furthermore, the variations in functional nsSNP allele frequencies across major population backgrounds may point to the potential variability of the molecular basis of breast cancer predisposition and treatment response among different human populations.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2006 PMID： 16595073 PMCID： PMC3500178 DOI： 10.1186/1479-7364-2-5-287

Source DB: PubMed Journal: Hum Genomics ISSN： 1473-9542 Impact factor: 4.639

Introduction

Mutations of BRCA1[1] and BRCA2[2] confer high breast cancer risk to the carriers. Such highly penetrant mutations are only responsible for a small fraction (~5-10 per cent) of all breast cancer cases,[3,4] however, suggesting the presence of other, yet to be identified, mutations in other breast cancer predisposition genes [5-7]. Mutations in a number of genes, such as p53,[8]ATM[6] and Chek2,[9] have also been shown to contribute to breast cancer risk in a very small fraction of breast cancer cases. So far, no other high-penetrant breast cancer susceptibility gene has been identified; however, genetic variations including single nucleotide polymorphisms (SNPs) have been hypothesised to act as low-moderate penetrant alleles and contribute to breast cancer, as well as other complex diseases [7,10-12]. Variations in protein sequence and function are mainly due to the non-synonymous form of SNPs (nsSNPs). The fraction of nsSNPs in the genome is relatively low (~10 per cent of all coding SNPs)[13] compared with other types, but they are more likely to alter the structure, function and interaction of the proteins, and thus constitute a set of candidate genetic factors associated with disease predisposition [14,15]. Approximately 30 per cent of the nsSNPs are predicted to have biological consequences [16-18]. Several nsSNPs from the proteins acting in a variety of cellular pathways--such as apoptosis,[19] oxidative stress[20] and signal transduction[21]--have already been reported to be associated with an increased/decreased risk of breast cancer. Several studies have described cancer-relevant nsSNPs;[22-25] however, to our knowledge they have not been studied in the context of expression of genes in a particular tissue. Clearly, in order for genes to be linked to a disease of a tissue, their protein products should somehow influence that particular tissue, either as exogenous proteins (such as hormones) or endogenous proteins (such as the proteins expressed in that tissue) [26,27]. In this study, we have applied a bioinformatics-based strategy and identified potentially functional nsSNPs from endogenous carcinogenesis-related proteins expressed in breast tissue.

Methods

Genes

The Ensembl transcript identifiers (http://www.ensembl.org/)[28] of the genes expressed in breast tissue were retrieved from the TissueInfo database (db) (http://icb.med.cornell.edu/services/tissueinfo/query) [29]. The list of carcinogenesis-related genes from 18 different categories ('DNA adduct', 'DNA damage', 'DNA replication', 'angiogenesis', 'apoptosis', 'behavior', 'cell cycle', 'cell signaling', 'development', 'gene regulation', 'transcription', 'immunology', 'metabolism', 'metastasis', 'pharmacology', 'signal transduction', 'tumor suppressors/oncogenes' and 'miscellaneous') was retrieved from the National Cancer Institute's Cancer Genome Anatomy Project Genetic Annotation Initiative ([CGAP-GAI] website [http://lpgws.nci.nih.gov/html-cgap/cgl/]) [30]. The genes retrieved from the TissueInfo and the CGAP-GAI resources were then cross-referenced with each other to identify the group of carcinogenesis-related genes that are expressed in breast tissue.

nsSNPs

The nsSNPs from the group of carcinogenesis-related genes expressed in breast tissue were retrieved from dbSNP build 120 (http://www.ncbi.nlm.nih.gov/SNP/) [31]. Only the nsSNPs detected in ≥ 2 chromosomes in a sample panel of ≥ 40 chromosomes were included in this study (validated nsSNPs). Seventeen nsSNPs were found in both less and more than 5 per cent of the chromosomes analysed in different sample sets; for simplicity, we have classified such nsSNPs within the nsSNP set with ≥ 5 per cent minor allele frequencies throughout this paper.

PolyPhen analysis

The PolyPhen predictions[18] were retrieved from a pre-computed dbSNP-PolyPhen resource. All PolyPhen predictions were based on either alignment of at least five similar proteins (for a more reliable prediction) or structural parameters.

Results

The results obtained in this study are summarised in Table 1 and constitute only the validated nsSNPs with a reliable prediction made by the PolyPhen prediction tool (see Methods). A total of 367 nsSNPs from 189 carcinogenesis-related genes expressed in breast tissue are presented. A total of 109 nsSNPs (28.4 per cent) from 75 genes were predicted potentially to affect the protein function (functional nsSNPs). Additionally, 61.5 per cent (n = 67) of the potentially functional nsSNPs represented commonly occurring nsSNPs in the population (≥ 5 per cent minor allele frequency; Table 2). In this paper, we mainly discuss the commonly occurring functional nsSNPs; however, the list of rarely occurring functional nsSNPs can also be found under the supplementary table (http://www.ozceliklab.com/Breast_rare_nsSNPs/).

Table 1

Summary of the results.

	n
Genes

Carcinogenesis-related genes	2,832

Expressed in breast tissue	981

With validated nsSNPs	189

With functional nsSNPs	75

nsSNPs

Validated nsSNPs	367

Benign by PolyPhen	258

Functional by PolyPhen	109

With ≥ 5% minor allele frequency	67

With < 5% minor allele frequency	42

Abbreviation: n = number; nsSNP = non-synonymous form of single nucleotide polymorphisms. Please note that only the genes and the nsSNPs for which a reliable PolyPhen prediction (based on ≥ 5 proteins in the alignment) was available are shown in this table.

Table 2

Functional and common non-synonymous form of single nucleotide polymorphisms (nsSNPs) from the breast tissue-expressed carcinogenesis-related genes.

Gene^a	Accessionnumber	SNP ID^b	Amino acidchange^c	Codon^d	Damagingallele	Damagingamino acid^e	PolyPhenprediction	Pathway^f
ACY1	NM_000666.1	rs2229152	R386C	cgt/tgt	t	C	Probably damaging	IM

ADD1	NM_014189.2	rs4961	G460W	ggg/tgg	t	W	Probably damaging	IM

ADD1	NM_014189.2	rs4962	N541I	aat/att	t	I	Probably damaging	IM

ADD1	NM_014189.2	rs4971	Y270N	tat/aat	a	N	Probably damaging	IM

ADM	NM_001124.1	rs5005	S50R	agc/agg	g	R	Possibly damaging	AN

ADRB2	NM_000024.3	rs1042713	G16R	gga/aga	a	R	Possibly damaging	BE, IM

ALDH2	NM_000690.2	rs671	E504K	gaa/aaa	a	K	Possibly damaging	IM, PH

APOE	NM_000041.1	rs429358	C130R	tgc/cgc	c	R	Probably damaging	IM

AXIN2	NM_004655.1	rs2240308	P50S	cct/tct	t	S	Probably damaging	DE

C2	NM_000063.3	rs4151648	R734C	cgc/tgc	t	C	Possibly damaging	IM

CD2	NM_001767.2	rs699738	H266Q	cac/caa	a	Q	Probably damaging	AN, IM, MET

CDH12	NM_004061.2	rs4371716	V68M	gtg/atg	g	V	Probably damaging	IM

CHGA	NM_001275.2	rs729940	R399W	cgg/tgg	t	W	Probably damaging	IM

CHGA	NM_001275.2	rs9658667	G382S	ggc/agc	a	S	Possibly damaging	IM

CLU	NM_001831.1	rs9331936	N317H	aac/cac	c	H	Possibly damaging	IM

CSF1	NM_000757.3	rs2229165	G438R	ggg/agg	a	R	Probably damaging	IM

CSF3R	NM_000760.2	rs3917973	M231T	atg/acg	c	T	Probably damaging	IM

CSF3R	NM_000760.2	rs3917974	Q346R	cag/cgg	g	R	Possibly damaging	IM

CSF3R	NM_000760.2	rs3917991	D510H	gac/cac	c	H	Possibly damaging	IM

CYBA	NM_000101.1	rs4673	Y72H	tac/cac	c	H	Possibly damaging	IM

CYP11B1	NM_000497.2	rs4541	A386V	gcg/gtg	c	A	Possibly damaging	PH

CYP11B1	NM_000497.2	rs5287	M160I	atg/atc	c	I	Possibly damaging	PH

CYP11B1	NM_000497.2	rs5294	Y439H	tac/cac	t	Y	Probably damaging	PH

CYP11B1	NM_000497.2	rs5312	E383V	gag/gtg	t	V	Probably damaging	PH

CYP1B1	NM_000104.2	rs1800440	N453S	aac/agc	g	S	Possibly damaging	IM, PH

CYP2A6	NM_000762.4	rs1801272	L160H	ctc/cac	a	H	Probably damaging	IM, PH

CYP2B6	NM_000767.3	rs2279343	K262R	aag/agg	a	K	Possibly damaging	PH

CYP2C9	NM_000771.2	rs1799853	R144C	cgt/tgt	t	C	Probably damaging	IM, PH

DAG1	NM_004393.1	rs2131107	S14W	tcg/tgg	c	S	Probably damaging	IM

ENG	NM_000118.1	rs1800956	D366H	gac/cac	c	H	Possibly damaging	AN, DE, IM, MET

EPHX1	NM_000120.2	rs1051740	Y113H	tac/cac	c	H	Possibly damaging	IM, ME, PH

ERBB2	NM_004448.1	rs1058808	P1170A	ccc/gcc	g	A	Possibly damaging	IM, ST, TS/ON

F2R	NM_001992.2	rs2230849	Y187N	tac/aac	a	N	Probably damaging	IM

FPR1	NM_002029.3	rs867228	E346A	gag/gcg	c	A	Possibly damaging	IM

FUCA2	NM_032020.3	rs3762001	H371Y	cat/tat	t	Y	Possibly damaging	IM

GAA	NM_000152.2	rs1800307	G576S	ggc/agc	a	S	Possibly damaging	IM

GBP1	NM_002053.1	rs1048425	T349S	acc/agc	g	S	Possibly damaging	CS

GYS1	NM_002103.3	rs5453	P691A	cca/gca	g	A	Probably damaging	IM

GYS1	NM_002103.3	rs5456	K130E	aag/gag	g	E	Possibly damaging	IM

GYS1	NM_002103.3	rs5461	N283S	aat/agt	g	S	Possibly damaging	IM

HK2	NM_000189.4	rs2229629	R844K	agg/aag	g	R	Possibly damaging	IM, MIS

LIG4	NM_002312.2	rs1805388	T9I	act/att	t	I	Possibly damaging	DA, DD

MC1R	NM_002386.2	rs1805005	V60L	gtg/ttg	t	L	Possibly damaging	IM

MC1R	NM_002386.2	rs1805007	R151C	cgc/tgc	t	C	Probably damaging	IM

MC1R	NM_002386.2	rs3212366	F196L	ttc/ctc	c	L	Probably damaging	IM

MMP9	NM_004994.1	rs2250889	R574P	cgg/ccg	g	R	Possibly damaging	AN, IM

MMP9	NM_004994.1	rs3918252	N127K	aac/aag	g	K	Probably damaging	AN, IM

MNDA	NM_002432.1	rs2276403	H357Y	cac/tac	t	Y	Possibly damaging	GR, TR

MUC4	NM_004532.2	rs2259292	G88D	ggc/gac	g	G	Possibly damaging	IM

NFATC1	NM_006162.3	rs754093	C751G	tgt/ggt	g	G	Probably damaging	IM

NOTCH4	NM_004557.2	rs2071282	P203L	ccc/ctc	t	L	Probably damaging	IM, TS/ON

PGM3	NM_015599.1	rs473267	D466N	gat/aat	a	N	Possibly damaging	IM

PLAU	NM_002658.1	rs2227564	L141P	ctg/ccg	t	L	Possibly damaging	AN

PLAUR	NM_002659.1	rs4760	L317P	ctc/ccc	c	P	Possibly damaging	AN

PTGS2	NM_000963.1	rs5272	E488G	gag/ggg	g	G	Probably damaging	IM, MIS

PTPN3	NM_002829.2	rs3793524	A90P	gcc/ccc	g	A	Probably damaging	CC, CS

SLC1A5	NM_005628.1	rs3027956	P17A	ccc/gcc	g	A	Possibly damaging	IM

STAT2	NM_005419.2	rs2066816	Q66H	cag/cat	t	H	Possibly damaging	IM, ST

TBXAS1	NM_001061.2	rs5760	G390V	ggc/gtc	t	V	Probably damaging	IM

TBXAS1	NM_001061.2	rs5762	R425C	cgc/tgc	t	C	Probably damaging	IM

TBXAS1	NM_001061.2	rs5770	R261G	agg/ggg	g	G	Probably damaging	IM

TDG	NM_003211.2	rs4135113	G199S	ggc/agc	a	S	Possibly damaging	DD

TUBA1	NM_006000.1	rs3731891	R243C	cgc/tgc	t	C	Probably damaging	CS, MET

TYR	NM_000372.2	rs1042602	S192Y	tct/tat	a	Y	Possibly damaging	ME

VCAM1	NM_001078.2	rs3783613	G413A	ggt/gct	c	A	Possibly damaging	AN, CS, IM, MET

XRCC1	NM_006297.1	rs25489	R280H	cgt/cat	a	H	Possibly damaging	DD, DR, IM

XRCC1	NM_006297.1	rs1799782	R194W	cgg/tgg	t	W	Probably damaging	DD, DR, IM

Abbreviations: AN = angiogenesis; BE = behaviour, CC = cell cycle; CS = cell signalling; DA = DNA adduct; DD = DNA damage; DE = development; GR = gene regulation; IM = immunology; ME = metabolism;

MET = metastasis; MIS = miscellaneous; PH = pharmacology; ST = signal transduction; TS/ON = tumour suppressor/oncogene; TR = transcription.

All nsSNPs are with ≥ 5 per cent minor allele frequency.

The gene symbols are as approved by the HUGO Gene Nomenclature Committee [67].

SNP identifiers (IDs) correspond to the dbSNP IDs (http://www.ncbi.nlm.nih.gov/SNP/) [31].

The position of the amino acid substitution and the amino acids specified by the major and minor SNP alleles are indicated.

The codons specified by the major and the minor SNP alleles are shown. The nucleotide change is underlined.

One-letter codes for the amino acids that are predicted to affect the protein function by PolyPhen.

The pathway(s) that the proteins are implicated in are as shown by the Cancer Genome Anatomy Project Genetic Annotation Initiative website (http://lpgws.nci.nih.gov/html-cgap/cgl/) [30].

Summary of the results. Abbreviation: n = number; nsSNP = non-synonymous form of single nucleotide polymorphisms. Please note that only the genes and the nsSNPs for which a reliable PolyPhen prediction (based on ≥ 5 proteins in the alignment) was available are shown in this table. Functional and common non-synonymous form of single nucleotide polymorphisms (nsSNPs) from the breast tissue-expressed carcinogenesis-related genes. Abbreviations: AN = angiogenesis; BE = behaviour, CC = cell cycle; CS = cell signalling; DA = DNA adduct; DD = DNA damage; DE = development; GR = gene regulation; IM = immunology; ME = metabolism; MET = metastasis; MIS = miscellaneous; PH = pharmacology; ST = signal transduction; TS/ON = tumour suppressor/oncogene; TR = transcription. All nsSNPs are with ≥ 5 per cent minor allele frequency. The gene symbols are as approved by the HUGO Gene Nomenclature Committee [67]. SNP identifiers (IDs) correspond to the dbSNP IDs (http://www.ncbi.nlm.nih.gov/SNP/) [31]. The position of the amino acid substitution and the amino acids specified by the major and minor SNP alleles are indicated. The codons specified by the major and the minor SNP alleles are shown. The nucleotide change is underlined. One-letter codes for the amino acids that are predicted to affect the protein function by PolyPhen. The pathway(s) that the proteins are implicated in are as shown by the Cancer Genome Anatomy Project Genetic Annotation Initiative website (http://lpgws.nci.nih.gov/html-cgap/cgl/) [30]. A fraction of protein products of genes bearing commonly occurring functional nsSNPs were found to be involved in one or more carcinogenesis-related biological pathways compiled by the CGAP-GAI[30] (Table 2). Such nsSNPs were mostly found in the proteins from DNA repair (three genes, four nsSNPs); metastasis (four genes, four nsSNPs); angiogenesis (seven genes, eight nsSNPs); pharmacology (seven genes, ten nsSNPs); and immunology (38 genes, 51 nsSNPs). We have also analysed the distribution of the commonly occurring functional nsSNPs across human populations. For simplicity, we have categorised the frequency information obtained from different dbSNP entries into three major groups: African (African and African-American), Caucasian (Caucasian and European) and Asian (Chinese and East Asian) populations. Minor allele frequencies for nsSNPs were available for at least three different human populations for 30 out of 67 commonly occurring functional nsSNPs (Table 3). Fifteen nsSNPs were found in all populations analysed (n ≥ 3). In the case of the remaining 15 nsSNPs, five were found exclusively in one population (ADM-S50R and MMP9-N127K in African; ALDH2-E504K and MNDA-H357Y in Asian; MC1R-R151C in Caucasian). Additionally, three nsSNPs were found in Caucasian, Asian or Hispanic samples, but not in the African samples (CHGA-G382S, CYP1B1-N453S and CYP2C9-R144C). Moreover, in the case of five nsSNPs, the major and the minor alleles were different among the populations analysed (ADBR2-G16R, CDH12-V68M, ERBB2-P1170A, PGM3-D466N and SLC1A5-P17A).

Table 3

Functional and common non-synonymous form of single nucleotide polymorphisms (nsSNPs) with frequency information available from different human populations.

Gene^a	SNP ID^b	Amino acid change ^c	African	Asian	Caucasian	Hispanic
ADD1	rs4961	G460W	46 chr. G = 0.891 T = 0.109	48 chr. G = 0.521 T = 0.479	48 chr. G = 0.833 T = 0.167	n/a

ADM	rs5005	S50R	46 chr. C = 0.957 G = 0.043	48 chr. C = 1.000	48 chr. C = 1.000	n/a

ADRB2	rs1042713	G16R	46 chr. G = 0.609 A = 0.391	48 chr. A = 0.583 G = 0.417	46 chr. G = 0.674 A = 0.326	n/a

ALDH2	rs671	E504K	48 chr. G = 1.000	48 0 G = 0.771 A = 0.229	58 chr. G = 1.000	44 chr. G = 1.000

CDH12	rs4371716	V68M	46 chr. T = 0.674 C = 0.326	48 chr. C = 0.812 T = 0.188	48 chr. C = 0.729 T = 0.271	n/a

CHGA	rs729940	R399W	114 chr. C = 0.954 T = 0.046	88 chr. C = 0.715 T = 0.285	104 chr. C = 0.893 T = 0.107	56 chr. C = 0.769 T = 0.231

CHGA	rs9658667	G382S	114 chr. G = 1.000	88 chr. G = 0.982 A = 0.018	104 chr. G = 0.951 A = 0.049	56 chr. G = 0.941 A = 0.059

CSF3R	rs3917973	M231T	48 chr. T = 0.938 C = 0.062	48 chr. T = 1.000	58 chr. T = 0.983 C = 0.017	46 chr. T = 1.000

CSF3R	rs3917991	D510H	48 chr. G = 0.750 C = 0.250	48 chr. G = 1.000	58 chr. G = 1.000	46 chr. G = 0.935 C = 0.065

CYBA	rs4673	Y72H	48 chr. C = 0.542 T = 0.458	1480 chr. G = 0.907 A = 0.093	60 chr. C = 0.683 T = 0.317	46 chr. C = 0.783 T = 0.217

CYP1B1	rs1800440	N453S	48 chr. A = 1.000	48 chr. A = 0.958 G = 0.042	62 chr. A = 0.806 G = 0.194	46 chr. A = 0.761 G = 0.239

CYP2A6	rs1801272	L160H	46 chr. T = 1.000	46 chr. T = 1.000	60 chr. T = 0.900 A = 0.100	46 chr. T = 0.978 A = 0.022

CYP2C9	rs1799853	R144C	48 chr. C = 1.000	48 chr. C = 0.979 T = 0.021	62 chr. C = 0.871 T = 0.129	46 chr. C = 0.935 T = 0.065

ENG	rs1800956	D366H	46 chr. C = 0.978 G = 0.022	1480 chr. C = 0.942 G = 0.058	46 chr. C = 1.000	n/a

EPHX1	rs1051740	Y113H	48 chr. T = 0.917C = 0.083	84 chr. T = 0.620C = 0.380	62 chr. T = 0.613C = 0.387	46 chr. T = 0.587C = 0.413

ERBB2	rs1058808	P1170A	40 chr. C = 0.775 G = 0.225	1502 chr. G = 0.514 C = 0.486	48 chr. G = 0.646 C = 0.354	n/a

FPR1	rs867228	E346A	44 chr. G = 0.818 T = 0.182	46 chr. G = 0.761 T = 0.239	48 chr. G = 0.771 T = 0.229	n/a

FUCA2	rs3762001	H371Y	44 chr. G = 0.818 A = 0.182	1282 chr. G = 0.789 A = 0.211	44 chr. G = 0.795 A = 0.205	n/a

LIG4	rs1805388	T9I	48 chr. C = 0.979T = 0.021	48 chr. G = 0.792A = 0.208	62 chr. C = 0.871T = 0.129	46 chr.C = 0.848T = 0.152

MC1R	rs1805007	R151C	42 chr. C = 1.000	40 chr. C = 1.000	46 chr. C = 0.891 T = 0.109	n/a

MMP9	rs2250889	R574P	46 chr. C = 0.870 G = 0.130	1488 chr. C = 0.688 G = 0.312	48 chr. C = 0.896 G = 0.104	n/a

MMP9	rs3918252	N127K	48 chr. C = 0.938 G = 0.062	48 chr. C = 1.000	48 chr. C = 1.000	n/a

MNDA	rs2276403	H357Y	46 chr. C = 1.000	1484 chr. C = 0.944 T = 0.056	48 chr. C = 1.000	n/a

PGM3	rs473267	D466N	46 chr. T = 0.565 C = 0.435	84 chr. C = 0.750 T = 0.250	48 chr. C = 0.688 T = 0.312	n/a

PLAU	rs2227564	L141P	48 chr. C = 0.979 T = 0.021	1492 chr. G = 0.783 A = 0.217	44 chr. C = 0.659 T = 0.341	n/a

PTPN3	rs3793524	A90P	46 chr. G = 0.522 C = 0.478	1498 chr. G = 0.628 C = 0.372	46 chr. C = 0.717 G = 0.283	n/a

SLC1A5	rs3027956	P17A	46 chr. G = 0.957 C = 0.043	42 chr. G = 0.524 C = 0.476	146 chr. C = 0.710 G = 0.290	n/a

TYR	rs1042602	S192Y	46 chr. C = 0.957 A = 0.043	48 chr. C = 1.000	48 chr. C = 0.750 A = 0.250	n/a

VCAM1	rs3783613	G413A	48 chr. G = 0.938 C = 0.062	44 chr. G = 0.977 C = 0.023	48 chr. G = 1.000	n/a

XRCC1	rs25489	R280H	48 chr. G = 0.937A = 0.063	84 chr. C = 1.000	62 chr. G = 0.968A = 0.032	46 chr.G = 0.957A = 0.043

Abbreviations: chr: chromosomes; n/a: not available.

The gene symbols are as approved by the HUGO Gene Nomenclature Committee [67].

SNP identifiers (IDs) correspond to the dbSNP IDs (http://www.ncbi.nlm.nih.gov/SNP/) [31].

The position of the amino acid substitution and the amino acids specified by the major and minor SNP alleles are indicated. The frequency information is as in dbSNP build 123 and is based on ≥ 40 chromosomes. Please note that the samples annotated as African and African-American; Caucasian and European; Chinese and East Asian are combined together here and are referred to as African, Caucasian and Asian, respectively. Whenever more than one entry was available for a group, only the information from the entries with the highest number of chromosomes is included here.

Functional and common non-synonymous form of single nucleotide polymorphisms (nsSNPs) with frequency information available from different human populations. Abbreviations: chr: chromosomes; n/a: not available. The gene symbols are as approved by the HUGO Gene Nomenclature Committee [67]. SNP identifiers (IDs) correspond to the dbSNP IDs (http://www.ncbi.nlm.nih.gov/SNP/) [31]. The position of the amino acid substitution and the amino acids specified by the major and minor SNP alleles are indicated. The frequency information is as in dbSNP build 123 and is based on ≥ 40 chromosomes. Please note that the samples annotated as African and African-American; Caucasian and European; Chinese and East Asian are combined together here and are referred to as African, Caucasian and Asian, respectively. Whenever more than one entry was available for a group, only the information from the entries with the highest number of chromosomes is included here.

Discussion

A portion of SNPs is considered to contribute to complex disease development [7,10-12]. SNPs in or around the candidate genes might be directly linked to a disease; however, not all SNPs are supposed to affect gene expression and function, so selection of those with potential effects is keenly debated [32]. Several studies have developed tools and/or systematically analysed nsSNPs to identify those that affect gene function based on evolutionary conservation or structural parameters [16-18,33]. PolyPhen[18] is one such web-based tool utilised to select the nsSNPs that are likely to affect protein function. In short, the PolyPhen predictions are based on protein alignments, structural parameters or sequence annotations. The sensitivity of PolyPhen has been reported to be approximately 82 per cent [18]. In this study, we hypothesised that the systematic analysis of candidate genes that are expressed in the affected tissue is likely to improve and enrich the identification of disease-susceptibility alleles. Accordingly, using a bioinformatics-based strategy, we identified the functional nsSNPs from a large number of genes related to the carcinogenesis-related pathways (DNA repair, cell cycle, signal transduction, etc), which are expressed in breast tissue. We propose that these potentially functional nsSNPs can result in abnormalities at the protein level, which are likely to affect the development, metabolism and homeostasis of the breast tissue, and thus can contribute to breast cancer susceptibility. The genes with functional nsSNPs identified in this study were from a variety of carcinogenesis-related cellular pathways. According to this information, possible biological roles for these nsSNPs may be suggested. For example, nsSNPs from angiogenesis- and metastasis-related proteins may have roles in tumour growth and the development of metastatic tumours [34,35]. Additionally, DNA repair nsSNPs may lead to the accumulation of somatic mutations and thus can participate in cancer initiation and promotion [34-36]. Furthermore, together with the DNA repair nsSNPs, the nsSNPs from the pharmacology genes may also be good candidates for the studies targeting the efficacy, differential response and adverse effect of chemo-/radiotherapy in breast cancer [37-39]. The majority of the nsSNPs were from the genes related to immunological responses (74.6 per cent), which can both suppress and promote tumorigenesis [34]. It is likely that the larger number of the functional nsSNPs in immune system-related genes is a reflection of the large number of immunology genes in the breast tissue-expressed gene set (60 per cent). A considerable number of genes with functional nsSNPs have been previously linked to breast cancer aetiology: ADM,[40]ADRB2,[41]APOE,[42]CHGA,[43]CSF1,[44]CYP1B1,[45]DAG1,[46]ENG,[47]EPHX1,[48]ERBB2,[49]F2R,[50]MMP9,[51]MUC4,[52]NFATC1,[53]NOTCH4,[54]PLAU,[55]PLAUR,[55]PTGS2[56] and VCAM1 [57]. Therefore, we propose that the nsSNPs in Table 2 are excellent candidates as genetic factors involved in breast cancer initiation, promotion or progression. Additionally, some of these nsSNPs may be critical for breast cancer treatment outcome. When the distribution of the commonly occurring functional nsSNPs was analysed, differences in the major alleles and the allele frequencies across human populations were observed. For example, 15 commonly occurring nsSNPs were found in all populations, whereas another set of 15 nsSNPs was specific to particular population(s). These differences might be reflections of either the age of the allele, founder effects or the dissimilar selective pressures acting on different populations [58,59]. Most importantly, the data also indicate that a common nsSNP with a potential biological consequence in our set was equally likely to be either prevalent across different human populations or limited to some populations. Clearly, the latter prompted us to conclude that the population-specific functional nsSNPs may contribute to the genetic predisposition in individuals with a specific background. In this regard, this conclusion is consistent with previous studies in which genetic variations with significantly different allelic frequencies among populations were found to be associated with specific disease or differential drug responses [60-65]. This information may be particularly helpful to researchers in determining which nsSNPs may be relevant to utilise in specific population-based studies. In addition, although further analyses are required, it is tempting to speculate that these nsSNPs may be a part of the potential variability of the molecular basis of breast cancer predisposition and drug response among different human populations. Data integration from several databases forms the basis of our strategy to determine functional SNPs of breast tissue-expressed genes. The quality and the quantity of the genomic data within individual databases influence the comprehensiveness of the combined data. The functional SNP list presented in this study is a result of data integration from three databases -- namely, TissueInfo,[29] Ensembl,[28] and dbSNP [31]. The non-matching data fields (eg transcript identifiers) between TissueInfo, Ensembl and dbSNP have been the main source of missing data. For example, although BRCA1 was known to have a potentially functional SNP (predicted previously), this information has not been captured because of non-matching transcript identifier information for BRCA1 in the databases. Thus, incompatibility of data in different databases has been a rate-limiting factor for the bioinformatics-based strategies presented here. The improvement of the quality and the quantity of genomic data in the databases will prove beneficial for researching complex questions. Also, the genes presented in this paper are based on the expressed sequence tag information, which may lead to an under-representation of rarely expressed genes [29,66]. Data integration using other tissue expression databases is likely to enrich the quality of the data produced. Nevertheless, although it is possible that the SNPs presented here may not represent the most comprehensive list, the SNPs identified using the proposed strategy represent a valuable resource for studying the genetic predisposition to breast cancer.

Conclusion

In conclusion, we have designed a novel strategy to identify potentially functional variants of cancer-related genes expressed in breast tissue. Our results demonstrated the presence of 109 nsSNPs with a potential biological consequence, 67 of which were frequent in human populations. We propose that, together with other genetic and environmental factors, these nsSNPs may be involved in breast cancer initiation and progression; thus, these nsSNPs represent the premium candidates as genetic variations of breast cancer predisposition. We also suggest that a considerable fraction of the nsSNPs may, in fact, be population-specific genetic variations.

67 in total

Review 1. Gene discovery in the auditory system using a tissue specific approach.

Authors: Cynthia C Morton
Journal: Am J Med Genet A Date: 2004-09-15 Impact factor: 2.802

2. Pattern of sequence variation across 213 environmental response genes.

Authors: Robert J Livingston; Andrew von Niederhausern; Anil G Jegga; Dana C Crawford; Christopher S Carlson; Mark J Rieder; Sivakumar Gowrisankar; Bruce J Aronow; Robert B Weiss; Deborah A Nickerson
Journal: Genome Res Date: 2004-09-13 Impact factor: 9.043

3. The future of genetic studies of complex human diseases.

Authors: N Risch; K Merikangas
Journal: Science Date: 1996-09-13 Impact factor: 47.728

4. Identification of the breast cancer susceptibility gene BRCA2.

Authors: R Wooster; G Bignell; J Lancaster; S Swift; S Seal; J Mangion; N Collins; S Gregory; C Gumbs; G Micklem
Journal: Nature Date: 1995 Dec 21-28 Impact factor: 49.962

5. A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1.

Authors: Y Miki; J Swensen; D Shattuck-Eidens; P A Futreal; K Harshman; S Tavtigian; Q Liu; C Cochran; L M Bennett; W Ding
Journal: Science Date: 1994-10-07 Impact factor: 47.728

Review 6. The search for low-penetrance cancer susceptibility alleles.

Authors: Richard S Houlston; Julian Peto
Journal: Oncogene Date: 2004-08-23 Impact factor: 9.867

Review 7. Breast cancer genetics: unsolved questions and open perspectives in an expanding clinical practice.

Authors: Shirley V Hodgson; Patrick J Morrison; Melita Irving
Journal: Am J Med Genet C Semin Med Genet Date: 2004-08-15 Impact factor: 3.908

Review 8. Notch in mammary gland development and breast cancer.

Authors: Katerina Politi; Nikki Feirt; Jan Kitajewski
Journal: Semin Cancer Biol Date: 2004-10 Impact factor: 15.707

9. Chromogranin A and B gene expression in carcinomas of the breast. Correlation of immunocytochemical, immunoblot, and hybridization analyses.

Authors: A Pagani; M Papotti; H Höfler; R Weiler; H Winkler; G Bussolati
Journal: Am J Pathol Date: 1990-02 Impact factor: 4.307

10. Germ line p53 mutations in a familial syndrome of breast cancer, sarcomas, and other neoplasms.

Authors: D Malkin; F P Li; L C Strong; J F Fraumeni; C E Nelson; D H Kim; J Kassel; M A Gryka; F Z Bischoff; M A Tainsky
Journal: Science Date: 1990-11-30 Impact factor: 47.728

7 in total

1. Association analysis of rs1049255 and rs4673 transitions in p22phox gene with coronary artery disease: A case-control study and a computational analysis.

Authors: M Mazaheri; M Karimian; M Behjati; F Raygan; A Hosseinzadeh Colagar
Journal: Ir J Med Sci Date: 2017-05-04 Impact factor: 1.568

2. Racial disparity in breast cancer and functional germ line mutation in galectin-3 (rs4644): a pilot study.

Authors: Vitaly Balan; Pratima Nangia-Makker; Ann G Schwartz; Young Suk Jung; Larry Tait; Victor Hogan; Tirza Raz; Yi Wang; Zeng Quan Yang; Gen Sheng Wu; Yongjun Guo; Huixiang Li; Judith Abrams; Fergus J Couch; Wilma L Lingle; Ricardo V Lloyd; Stephen P Ethier; Michael A Tainsky; Avraham Raz
Journal: Cancer Res Date: 2008-12-15 Impact factor: 12.701

Review 3. Membrane transporters for the special amino acid glutamine: structure/function relationships and relevance to human health.

Authors: Lorena Pochini; Mariafrancesca Scalise; Michele Galluccio; Cesare Indiveri
Journal: Front Chem Date: 2014-08-11 Impact factor: 5.221

4. A Single Nucleotide Polymorphism in Catalase Is Strongly Associated with Ovarian Cancer Survival.

Authors: Jimmy Belotte; Nicole M Fletcher; Mohammed G Saed; Mohammed S Abusamaan; Gregory Dyson; Michael P Diamond; Ghassan M Saed
Journal: PLoS One Date: 2015-08-24 Impact factor: 3.240

5. Germline variation in cancer-susceptibility genes in a healthy, ancestrally diverse cohort: implications for individual genome sequencing.

Authors: Dale L Bodian; Justine N McCutcheon; Prachi Kothiyal; Kathi C Huddleston; Ramaswamy K Iyer; Joseph G Vockley; John E Niederhuber
Journal: PLoS One Date: 2014-04-11 Impact factor: 3.240

6. Development of an AmpliSeq^TM Panel for Next-Generation Sequencing of a Set of Genetic Predictors of Persisting Pain.

Authors: Dario Kringel; Mari A Kaunisto; Catharina Lippmann; Eija Kalso; Jörn Lötsch
Journal: Front Pharmacol Date: 2018-09-19 Impact factor: 5.810

Review 7. Role of Endoglin Insertion and rs1800956 Polymorphisms in Intracranial Aneurysm Susceptibility: A Meta-Analysis.

Authors: Xin Hu; Yuan Fang; Yun-Ke Li; Wen-Ke Liu; Hao Li; Lu Ma; Chao You
Journal: Medicine (Baltimore) Date: 2015-11 Impact factor: 1.817

7 in total