Literature DB >> 33650783

Worldwide variation of the COL14A1 gene is shaped by genetic drift rather than selective pressure.

Carla M Calò1, Federico Onali1, Renato Robledo2, Laura Flore1, Myosotis Massidda1, Paolo Francalacci1.   

Abstract

BACKGROUND: The aim of this study is to analyze the worldwide distribution of SNP rs4870723 in COL14A1 gene to check if there are significant genetic differences among different populations and to test if the gene is a trait under selection.
METHODS: Genomic DNA was extracted from 69 unrelated individuals from Sardinia and genotyped for SNP rs4870723. Data were compared with 26 different populations, clustered in 5 super-populations, from the public 1000 genomes database. Allele frequency and heterozygosity were calculated with Genepop. The Hardy-Weinberg equilibrium and pairwise population differentiation through analysis of molecular variance (AMOVA FST) were determined with Arlequin.
RESULTS: Allele frequencies of COL14A1 rs4870723 were compared in 27 populations clustered in 5 super-populations. All populations were in the Hardy-Weinberg equilibrium. In almost all populations, allele C was the most frequent allele, reaching the highest values in East Asia. The 27 populations showed an appreciable structure, with significant differences observed between European, African, and Asian populations.
CONCLUSION: Significant differences were observed in the rs4870723 SNP distribution among the populations studied. However, we found no evidence for a selective pressure. Rather, the differentiation among the populations is likely the result of founder effect, genetic drift, and cultural factors, all events known to establish and maintain genetic diversity between populations.
© 2021 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals LLC.

Entities:  

Keywords:  1000 genomes; SNPs; Sardinia; collagen; selection

Mesh:

Substances:

Year:  2021        PMID: 33650783      PMCID: PMC8123734          DOI: 10.1002/mgg3.1629

Source DB:  PubMed          Journal:  Mol Genet Genomic Med        ISSN: 2324-9269            Impact factor:   2.183


INTRODUCTION

Type‐XIV collagen is a homotrimer belonging to a family of non‐fibrillar collagens referred to as fibril associated collagens with interrupted triple helices (FACITs). It is able to form interfibrillar connections and may influence fibril and matrix density, suggesting that the type‐XIV collagen may also be involved in fibrillogenesis (Young et al., 2002). Moreover, it has been demonstrated its role in the contraction of collagen gels, suggesting its ability to modulate tissue response to mechanical stress (Chiquet, 1999). Indeed, the type‐XIV collagen is often present in areas of high mechanical stress, indicating a potential role in maintaining mechanical tissue or in affecting mechanical properties of a tissue (Berthod et al., 1997). For all these characteristics, its implication in onset of tendon pathologies has been suggested, hypothesizing that a single nucleotide polymorphism (SNP) may predispose to injuries due to either a reduction of the tensile strength of collagen or altered fibrillogenesis. In a genome wide association study (GWAS) on twin siblings with an anterior cruciate ligament (ALC) rupture, single‐nucleotide variants (SNVs) for COL5A2, COL5A3, COL14A1, and COL15A1 genes showed a damaging disease impact profile (Caso et al., 2016). Moreover, positive association was found for pelvic organ collapse (Li et al., 2020) or ALC injuries (Massidda et al., 2018); however, no correlation was found when the rupture of Achilles tendon was investigated (September et al., 2008). All these considerations drove us to investigate the genetic variability of COL14A1 (OMIM n. 120324) among human populations to assess its possible role on natural selection. Our choice to analyze COL14A1, over other COL genes, has been driven by the paucity of data in the literature, which also gave contradictory results. COL14A1 gene is located on chromosome 8 (8q24.12) and codes for alpha 1 chain of type‐XIV collagen. Within the gene, 53,950 SNPs have been detected in dbSNP (National Centre for Biotechnology Information, NCBI). We applied a filter selecting for the exonic missense SNPs with a minimum allele frequency (MAF) >0.01. Among the three SNPs obtained, rs4870723, a mutation at position 90 of exon 14 (c.1952 C > A, p.H563 N) with the highest MAF was considered the most informative SNP. The aim of this study is to analyze the worldwide distribution of SNP rs4870723 in the COL14A1 gene, to determine if there are significant genetic differences among ethnic groups, and to check if such different distribution is due to selective pressure or it is the result of random genetic drift. Furthermore, we provide new data on rs4870723 for the Sardinian population (Italy). Sardinia is an interesting case study since evolutionary forces such as the balancing selection due to formerly endemic malaria, with the combined effect of isolation and inbreeding on genetic drift, shaped its genetic variation making the Sardinian population an outlier in the European context.

MATERIALS AND METHODS

Ethical Compliance: The study was approved by Ethic Committees of Azienda Ospedaliera Universitaria (AOU) of Cagliari University (Italy), and written informed consent was obtained from each participant. A total of 69 individuals of both sexes (31 females and 38 males) from Sardinia (Italy) were analyzed. All selected individuals were unrelated, apparently healthy, born, and resident in the area for at least three generations. Genomic DNA was extracted from buccal swab, through salting out method, and amplified by standard PCR using the following primers: Forward 5’‐CTTTGCCAGAGTCACATGGT‐3’. Reverse 5’‐TGTCCCGGAACTTACCTCAT‐3’. PCR was performed in 25 μL volumes containing 200 ng genomic DNA; 20 pmol of each primer; NZYTaq II 2×Master Mix (0.2 U/μL). Amplifications were conducted by denaturing at 94°C for 3 minutes, followed by 30 cycles at 94°C for 30 seconds, 54°C for 30 seconds, and 72°C for 1 minute, and a final extension at 72°C for 5 minutes. NcoI enzymatic digestion of amplified products yelded either a single band of 530 bp (A allele) or two bands of 465 bp and 65 bp (C allele). Fragments were separated by 2% agarose gel electrophoresis and stained with Sybr green. The worldwide variation of COL14A1 (GenBank accession number: NG_033107.1) polymorphism A > C (SNP rs4870723, Chr. 8:121228679 Forward Strand in GRCh37) was analyzed with data obtained from the public database 1000 Genomes Phase 3 Browser (The, 1000 genomes project Consortium, 2015) plus Sardinia samples. The final dataset provided information on 27 different populations clustered in 5 super‐populations (African, American, East Asia, European, and South Asia). Allele frequency and heterozygosity (gene diversity) for each population were calculated using the Genepop software (ver. 4.4). The Hardy–Weinberg equilibrium, population relationships through pairwise differences and hierarchical analyses of molecular variation (AMOVA FST) were determined with Arlequin v.3.5 (Excoffier et al., 2007) using all the 27 populations. Finally, global population relationships (same 27 populations) have been checked by means of Multidimensional Scaling (MDS) based on FST genetic distance through Statistica Programme (ver. 7) for rs4870723. Selection signatures using the whole gene COL14A1 were evaluated through the 1000 Genomes Selection Browser 1.0 (Pybus et al., 2014) comparing data from CEU (European, N = 97), YRI (Sub‐Saharan, N = 88), and CHB (China, N = 85) for FST‐rank scores (data from the integrated Phase 1 variant set), and through PopHuman (Casillas et al., 2018), which uses data generated by the 1000GP Phase III.

RESULTS

Table 1 reports allele frequencies of COL14A1 rs4870723 SNP in 27 populations distributed into five geographical areas. All populations fit the Hardy–Weinberg equilibrium (p > 0.05). With the exception of Great Britain (GBR), Finland (FIN), Northwestern Europe (CEU), and Sardinia (SAR), allele C shows the highest frequency in all populations, ranging from 0.514 in Puerto Rico (PUR) to 0.715 in Chinese Dai (CDX). When pairwise difference analysis is carried out, significant p‐values are observed when European populations are compared with the African (0.0233), East Asia (0), and South Asia populations (0.04080). It is noteworthy that Sardinia population appears significantly differentiated (p < 0.05) from all populations with the exception of Finland (p = 0.54955). The hierarchical structure of these groups, measured through AMOVA, emphasizes low degree of differentiation that is imputable mainly to European population. Indeed, of a total FST genetic variance of 3.51% (p < 0.001) when the five groups are compared, the variance attributable to differences among groups (FCT) accounts for 2.67% (p <.001). When Europe is eliminated from the analysis, the values of FST drastically decrease to 0.21%.
TABLE 1

Frequencies of rs4870723 alleles in 27 populations clustered in five geographical areas

PopulationCodeallele freq. Aallele freq. CH n.b.N
Africa
Afrocaribbeans, BarbadosACB0.3280.6720.444497
Afroamerican, USAASW0.4430.5570.497360
Esan, NigeriaESN0.3590.6410.462399
Luhya, KenyaLWK0.3940.6060.479999
Gambian, GambiaGWD0.3410.6590.4512113
Mende, Sierra LeoneMSL0.4060.5940.485185
Yoruba, NigeriaYRI0.3660.6340.4661108
America
Colombians, ColombiaCLM0.4150.5850.488194
Mexicans, USAMXL0.3440.6560.454764
Peruvians, PeruPEL0.4240.5760.491285
Puerto Rricans, Puerto RicoPUR0.4860.5140.5020104
East Asia
Dai, ChinaCDX0.2850.7150.409793
N Han, ChinaCHB0.3300.6700.4444103
S Han, ChinaCHS0.3710.6290.4692105
Japanese, JapanJPT0.3560.6440.4606104
Kinh, VietnamKHV0.3080.6920.428599
South Asia
Bengali, Bangla DeshBEB0.4480.5520.497486
Gujarati Indian, USAGIH0.4170.5830.4888103
Indian Telogu, UKITU0.3380.6620.4499102
Punjabi, PakistanPJL0.4110.5890.486996
Sri Lankan Tamil, UKSTU0.4360.5640.4943102
Europe
NW Europeans, USACEU0.5350.4650.500099
Finnish, FinlandFIN0.6460.3540.459499
British, UKGBR0.5440.4560.498991
Iberians, SpainIBS0.4530.5470.4980107
Tuscans, ItalyTSI0.4810.5190.5016107
Sardinians, Italy*SAR0.4810.5190.5016107

Data from: 1000 Genomes.

Abbreviations: H n.b., nonbiased expected heterozygosity; N, numbers of individuals.

Present study.

Frequencies of rs4870723 alleles in 27 populations clustered in five geographical areas Data from: 1000 Genomes. Abbreviations: H n.b., nonbiased expected heterozygosity; N, numbers of individuals. Present study. Possible traces of selection signatures in the whole COL14A1 gene have been evaluated using FST rank scores for comparisons among CEU, YRI, and CHB. Significant probability values of FST‐rank scores (p < 0.01) are found in the comparison CEU vs YRI and CEU vs CHB. In the first comparison, only five variants show significant values, but the corresponding area does not include the SNP under scrutiny. Instead, when CEU and CHB are compared, 29 variants, located within a 39.30 kb coding region in chr8:121223036–121262332, show significant values. This region includes the COL14A1 SNP rs4870723 (Chr8: 121228679). This result was verified with PopHuman (https://pophuman.uab.cat/), which showed a weak sign of negative selection for 3 out of the 10 parameters calculated for the detection of selective pressure (Tajima, Dos, and ka/ks), but in no case the region involved in the selection included the COL14A1 SNP rs4870723 (Chr8: 121228679). The MDS shows an appreciable population structure among the 27 populations (Figure 1). Asian populations are placed at the negative values of the first dimension, followed by African populations, while the European populations occupy the positive values. The Amerindian populations are rather scattered in the central part of the graph, possibly due to the heterogeneous composition of the sample. The Sardinian represents an outgroup in the graph, being at the extreme values in both dimensions.
FIGURE 1

MDS of the rs4870723. Abbreviations as in Table 1

MDS of the rs4870723. Abbreviations as in Table 1

DISCUSSION

In this study, the worldwide distribution of rs4870723 A/C in the COL14A1 gene was studied to determine if there are significant genetic differences among ethnically different populations and to understand if its distribution has been influenced by prehistoric and more recent demographic events or it is under a selective pressure. Searching for Darwinian selection in natural populations has been the focus of a multitude of studies over the last decades. Different selection forces can negatively or positively select SNPs that are associated with disadvantageous or advantageous traits, respectively. For example, while negative selection tends to decrease the level of population differentiation, positive selection tends to increase it (Barreiro et al., 2008). The data here reported do not show evidence of selective pressure since all the samples meet the Hardy–Weinberg equilibrium. Possible traces of selection signatures for the region including rs4870723 are found only for the comparison CEU vs CHB, two rather heterogeneous populations, but this result was not confirmed when the PopHuman browser was used. The MDS points out to a population structure according to a model of isolation by distance with African populations in the central position and the Asian and European population structured in divergent direction. In particular, strong effect of the genetic drift can be revealed by the eccentric placement of the isolated population such as the Dai (a population located on the mountainous area of south China), the Finns (a separate northernmost European population) and, mostly, Sardinia (a central Mediterranean island). Finland and Sardinian are well known outgroups of European genetic variation because of a founder effect and subsequent geographic isolation (Anagnostou et al., 2017; Francalacci et al., 2010; Francalacci & Sanna, 2008; Palo et al., 2009), and Dai turned to be an ethnic minority related to Lao‐Thai people with peculiar genetic characteristics (Shi et al., 2010). In conclusion, although there are significant differences between different populations for rs4870723, there is no evidence for the presence of a selective pressure and therefore the observed genetic variation is most probably due to random events such as genetic drift, founder effect and other demographic events that the populations have gone through. The different distribution of SNP rs4870723, despite correlating with tendinopathies, does not seem to affect population fitness, which implies different survival and reproduction ability. Finally, despite having strengthened the idea that the European genetic variation is associated with geographic barriers (Novembre et al., 2008) and prehistoric demographic events (Ammerman & Cavalli‐Sforza, 1984), our data highlighted the presence of some isolated population that increased the Eurasian variability (Anagnostou et al., 2017; Capocasa et al., 2014).

CONFLICT OF INTEREST

The authors state that they have no conflict of interest.

AUTHORS’ CONTRIBUTION

Carla Maria Calò: Conceptualization. Federico Onali: data analysis. Renato Robledo: writing draft. Laura Flore: data analysis. Myosotis Massidda: writing review. Paolo Francalacci: critical revision.
  18 in total

1.  The history and geography of the Y chromosome SNPs in Europe: an update.

Authors:  Paolo Francalacci; Laura Morelli; Antonella Useli; Daria Sanna
Journal:  J Anthropol Sci       Date:  2010

2.  The roles of types XII and XIV collagen in fibrillogenesis and matrix assembly in the developing cornea.

Authors:  Blanche B Young; Guiyun Zhang; Manuel Koch; David E Birk
Journal:  J Cell Biochem       Date:  2002       Impact factor: 4.429

3.  Whole-exome sequencing analysis in twin sibling males with an anterior cruciate ligament rupture.

Authors:  Enrique Caso; Antonio Maestro; Cristina C Sabiers; Manuel Godino; Zaira Caracuel; Joana Pons; F Jesus Gonzalez; Rocio Bautista; M Gonzalo Claros; Jaime Caso-Onzain; Elena Viejo-Allende; Peter V Giannoudis; Sara Alvarez; Paolo Maietta; Enrique Guerado
Journal:  Injury       Date:  2016-09       Impact factor: 2.586

Review 4.  Regulation of extracellular matrix gene expression by mechanical stress.

Authors:  M Chiquet
Journal:  Matrix Biol       Date:  1999-10       Impact factor: 11.583

5.  History and geography of human Y-chromosome in Europe: a SNP perspective.

Authors:  Paolo Francalacci; Daria Sanna
Journal:  J Anthropol Sci       Date:  2008

6.  The COL12A1 and COL14A1 genes and Achilles tendon injuries.

Authors:  A V September; M Posthumus; L van der Merwe; M Schwellnus; T D Noakes; M Collins
Journal:  Int J Sports Med       Date:  2007-10-25       Impact factor: 3.118

7.  Natural selection has driven population differentiation in modern humans.

Authors:  Luis B Barreiro; Guillaume Laval; Hélène Quach; Etienne Patin; Lluís Quintana-Murci
Journal:  Nat Genet       Date:  2008-02-03       Impact factor: 38.330

8.  A global reference for human genetic variation.

Authors:  Adam Auton; Lisa D Brooks; Richard M Durbin; Erik P Garrison; Hyun Min Kang; Jan O Korbel; Jonathan L Marchini; Shane McCarthy; Gil A McVean; Gonçalo R Abecasis
Journal:  Nature       Date:  2015-10-01       Impact factor: 49.962

9.  Overcoming the dichotomy between open and isolated populations using genomic data from a large European dataset.

Authors:  Paolo Anagnostou; Valentina Dominici; Cinzia Battaggia; Luca Pagani; Miguel Vilar; R Spencer Wells; Davide Pettener; Stefania Sarno; Alessio Boattini; Paolo Francalacci; Vincenza Colonna; Giuseppe Vona; Carla Calò; Giovanni Destro Bisol; Sergio Tofanelli
Journal:  Sci Rep       Date:  2017-02-01       Impact factor: 4.379

10.  Genetic polymorphisms in collagen-related genes are associated with pelvic organ prolapse.

Authors:  Lei Li; Zhijing Sun; Juan Chen; Ye Zhang; Honghui Shi; Lan Zhu
Journal:  Menopause       Date:  2020-02       Impact factor: 2.953

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.