Literature DB >> 21734814

Mendelian breeding units versus standard sampling strategies: Mitochondrial DNA variation in southwest Sardinia.

Daria Sanna1, Maria Pala, Piero Cossu, Gian Luca Dedola, Sonia Melis, Giovanni Fresu, Laura Morelli, Domenica Obinu, Giancarlo Tonolo, Giannina Secchi, Riccardo Triunfo, Joseph G Lorenz, Laura Scheinfeldt, Antonio Torroni, Renato Robledo, Paolo Francalacci.   

Abstract

We report a sampling strategy based on Mendelian Breeding Units (MBUs), representing an interbreeding group of individuals sharing a common gene pool. The identification of MBUs is crucial for case-control experimental design in association studies. The aim of this work was to evaluate the possible existence of bias in terms of genetic variability and haplogroup frequencies in the MBU sample, due to severe sample selection. In order to reach this goal, the MBU sampling strategy was compared to a standard selection of individuals according to their surname and place of birth. We analysed mitochondrial DNA variation (first hypervariable segment and coding region) in unrelated healthy subjects from two different areas of Sardinia: the area around the town of Cabras and the western Campidano area. No statistically significant differences were observed when the two sampling methods were compared, indicating that the stringent sample selection needed to establish a MBU does not alter original genetic variability and haplogroup distribution. Therefore, the MBU sampling strategy can be considered a useful tool in association studies of complex traits.

Entities:  

Keywords:  association studies; breeding units strategy; mtDNA haplogroup distribution

Year:  2011        PMID: 21734814      PMCID: PMC3115307          DOI: 10.1590/s1415-47572011000200003

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


Introduction

Population definition, sample selection and choice of markers are crucial points in human population genetics studies, and the sampling strategy depends principally on the questions being asked. In addition to biological aspects, such studies should also take into account important socio-cultural parameters, such as language and religion, along with social and self-identity affiliation. If a human population is clearly ethnically-identified and recent admixture is negligible, sampling strategies based only on surname (whenever distinctive) and place of birth are preferred, since they allow exclusion of recent immigrants, not yet blended into the gene pool, from the analysis. Moreover, surname and place of birth criteria can be extended from the DNA donors to their ancestors, provided that genealogical information is available. A more stringent sampling strategy is required in studies based on genome-wide association scans, which look for different allele distributions between individuals with (cases) or without (controls) a phenotype of interest. The case-control experimental design is expected to be appropriate in surveys on homogeneous populations, whereas both false-positive and false-negative results may occur in heterogeneous or substructured populations, if cases and controls are not carefully sampled according to their origin. This scenario is likely to occur in an island like Sardinia, where the majority of the present population is distributed among 363 isolated villages (Siniscalco ) which, while sharing common ancestry, might have diversified during many centuries of isolation. Therefore, it is important to identify true Mendelian Breeding Units (MBUs), i.e. interbreeding groups of individuals sharing a common ancestral gene pool. In Sardinia, the most practical way to define a MBU is to derive a direct estimate of the percentage of endogamous mating occurring in the last 200 years. This information was obtained anonymously from municipal and ecclesiastical marriage registers (Siniscalco ). However, rigorous sample selection for reconstructing MBUs led to a conspicuous reduction in sample size, which might have significantly skewed haplotypic or allelic frequencies. In a previous paper (Siniscalco ), we reported a pilot study on 55 unrelated controls belonging to the MBU of Carloforte, who were genotyped at six markers. We showed there that the allele frequencies, and therefore the genomic profile, remained constant even when only a subset of 20 individuals was analysed. The main goal of this work was to evaluate the reliability of the MBU approach in describing genetic variation in human populations, particularly regarding its application to association studies of complex traits. We compared genetic variability in two sets of samples which included different individuals recruited from the same areas, using two diverse sampling strategies. With the Standard (STD) Method, individuals unrelated for at least two generations were selected on the basis of the surname and place of birth of their grandparents, depicting present-day genetic variation, with the sole exclusion of the most recent immigrants. Using the MBU Method, the selected DNA donors were proven to be descendants of individuals present in the 17th century archives, with no common ancestors for up to at least five generations. This was ascertained by means of a complete genealogical history checking, based on the official records made available to us by the City Halls. Samples collected using the latter method, being representative of population settlements before the migratory events of the last few centuries, allow an extension of the temporal resolution of genetic variability. Therefore, comparison of the two sampling methods might also reveal possible occurrences of diachronic genetic variation in the analysed areas, due to micro-evolutionary dynamics such as drift or gene flow from neighbouring populations. The analysed samples belong to two different socio-cultural areas, Cabras and western Campidano, whose cultural traits differentiated around the second half of the 19th century: the former, and its neighbouring area, became a flourishing fishing centre, while the latter consists of rural villages whose economy is based on farming and sheep raising. We studied mitochondrial DNA (mtDNA), since it has been extensively used as a molecular marker during the past 20 years, is maternally inherited, does not recombine and is in a haploid state; thus it is more sensitive than nuclear DNA to the effects of genetic drift and gene flow, and any discrepancy between the two sampling methods is expected to be enhanced.

Materials and Methods

Sample selection

Using the MBU strategy, we analysed 85 unrelated healthy subjects from two areas located in southwestern Sardinia: 35 individuals from Cabras and 50 individuals from western Campidano (Figure 1). Using the STD strategy, we analysed 71 unrelated individuals coming from the same areas. Comparison was performed between 48 samples from Cabras and its neighbouring area (up to 50 km) and 23 samples from the western Campidano area.
Figure 1-

Map showing the distribution of the two areas analysed in southwest Sardinia. Cas: Cabras. W Camp: western Campidano.

mtDNA analysis

Whole genomic DNA was extracted using standard procedures. For each individual, mitochondrial haplogroup affiliation was determined by both sequencing of the first hypervariable segment (HVS-I) of the control region from position 15997 to 16399 bp (Anderson ) and RFLP (Restriction Fragment Length Polymorphism) analysis of the coding region for the presence/absence of haplo-group diagnostic markers (see Table 1 for details).
Table 1-

Oligonucleotide pairs used and polymorphic sites investigated to classify mitochondrial coding regions into haplogroups H, V and U/K.

HaplogroupPrimer sequencesPolymorphic siteEnzyme
HL: aagcaatatgaaatgatctgcH: gcgtaggtttggtctag–7025AluI
VL: gagcttaaacccccttatttH: gtattgattggtagtattggttatggttca–4577NlaIII
U/KL: ctcaaccccgacatcattaccH: attacttttatttggagttgcaccaagatt+12308HinfI

Data analysis

BioEdit software 7.0.5.2 (Hall, 1999) was used to align the sequences obtained. To characterise genetic variation among sampling sites, estimates of the number of polymorphic sites (S), the number of haplotypes (h), the nucleotide diversity (Pi), and the haplotype diversity (Hd) were obtained using the DnaSP 4.10 software (Rozas and Rozas, 1999). Pearson chi-square (χ2) values (Pearson, 1900) were calculated in order to assess whether there was any difference between the haplotype frequency distributions obtained for the same areas by means of different sampling strategies (MBU and STD). Principal Coordinate Analysis (PCoA) was carried out on the matrix of DNA pairwise differences, using the Genalex 6.3 software (Peakall and Smouse, 2006). The method based on the covariance matrix with data standardisation was applied. In order to assess the occurrence of significant genetic structuring among samples, analysis of molecular variance (AMOVA) was performed on the matrix of pairwise DNA distances among haplotypes, using the Arlequin 3.1 computer package (Excoffier ). Furthermore, genetic differentiation between pairs of samples was estimated by pairwise Φ values, computed from the matrix of haplotype DNA pairwise differences. The significance of variance components and F-statistic was assessed by a random permutation test (10,000 replicates). A Median-Joining network was drawn for each sampling strategy using Network 4.2.0.1 software (http://www.fluxus-engineering.com).

Results

Nucleotide sequence analysis of HVS-I (GenBank accession numbers: HM584611-HM584695 for MBU samples, and HM594952-HM595022 for STD samples) combined with RFLP analysis allowed the clustering of samples from both MBU and STD strategies into nine main haplogroups. They increased to eleven when sub-haplo-groups K and U5b3 were also considered (Table 2). Haplogroup H, which includes the Cambridge Reference Sequence (CRS) (Anderson ), proved to be the most common. Haplogroup U5b3, reported as Sardinian-specific (Fraumene ; Pala ), was found in Cabras MBU, western Campidano MBU and Cabras STD, missing in western Campidano STD only. The values of genetic diversity, calculated for the dataset of HVS-I, were similar for all regions and sampling strategies considered, showing a high level of variability (Table 3). Furthermore, we found a total of 82 different haplotypes. Those whose occurrence was detected by both sampling methods (MBU and STD) showed comparable relative frequency distributions, with no significant Pearson chi-square values (Table 4).
Table 2-

MtDNA haplogroup distribution obtained using the two sampling methods (values are expressed as relative distribution frequencies). Cas: Cabras; W Camp: western Campidano; B: MBU method; S: STD method.

HaplogroupCas-BCas-SW Camp-BW Camp-S
V2.96.310.04.4
H tot37.12950.056.5
T tot17.114.66.021.7
J tot5.716.714.08.7
U1 (x U5b3, K)17.116.78.08.7
U5b35.78.310.0-
K2.9---
I11.4---
W-4.2--
X-4.2--
M tot--2.0-

This haplogroup does not include the sub-haplogroups U5b3 and K.

Table 3-

Estimates of genetic diversity among samples analysed. S: segregating sites; h: number of haplotypes; Hd: haplotype diversity; Pi: nucleotide diversity; Uh: frequency of unique haplotypes per total number of haplotypes; Sh: frequency of individuals with shared haplotypes; Rh: frequency of haplotypes observed in more than one individual.

ShHdPiUhShRh
Cas-B38220.9460.0190.6290.4860.114
Cas-S45370.9760.0210.7710.3330.104
W Camp-B55320.9370.0180.6400.4600.100
W Camp-S35160.9130.0180.6960.3910.087

Cas: Cabras; W Camp: western Campidano; B: MBU method; S: STD method.

Table 4-

MtDNA haplotype distribution obtained using the two sampling methods (values are expressed as relative distribution frequencies). N: number of individuals; Cas: Cabras; W Camp: western Campidano; B: MBU method; S: STD method; χ2: values of Pearson chi-square (significance level with p values ≤0.05).

HaplotypeHaplogroupCas-B N = 35Cas-S N = 48χ2W Camp-B N = 50W Camp-S N = 23χ2GenBank accession n°
Hap1J-4.21.462.0-0.46HM594952
Hap2X-2.10.73---HM594953
Hap3V2.92.10.052.0-0.46HM594954
Hap4U-2.10.73---HM594955
Hap5J-2.10.73---HM594956
Hap6U5b32.96.20.488.0-1.84HM594957
Hap7W-2.10.73---HM594958
Hap8U-2.10.73---HM594960
Hap9H17.114.60.0824.030.40.25HM594961
Hap10H-2.10.73---HM594962
Hap11X-2.10.73---HM594963
Hap12T14.32.10.30---HM594964
Hap13U5b3-2.10.73---HM594965
Hap14T-2.10.73---HM594966
Hap15H-2.10.73---HM594967
Hap16U-4.21.46---HM594969
Hap17H2.94.20.10-4.32.17HM594972
Hap18H-2.10.73---HM594973
Hap19U-2.10.73---HM594974
Hap20T-2.10.73---HM594976
Hap21U-2.10.73-4.32.17HM594978
Hap22J2.92.10.056.0-1.38HM594979
Hap23T-2.10.73---HM594980
Hap24J-2.10.732.04.30.32HM594981
Hap25V-2.10.73---HM594982
Hap26V-2.10.73---HM594984
Hap27T2.92.10.05-4.32.17HM594986
Hap28W-2.10.73---HM594987
Hap29T-2.10.73---HM594988
Hap30U-2.10.73---HM594989
Hap31J-2.10.732.0-0.46HM594990
Hap32U-2.10.73---HM594991
Hap33T-2.10.732.04.30.32HM594993
Hap34J-2.10.73---HM594994
Hap35J-2.10.73---HM594997
Hap36H-2.10.734.04.30.005HM594998
Hap37H-2.10.73-4.32.17HM594999
Hap38H----4.32.17HM595001
Hap39U----4.32.17HM595002
Hap40T----8.74.351HM595003
Hap41V----4.32.17HM595005
Hap42J----4.32.17HM595010
Hap43H----4.3217HM595014
Hap44H----4.32.17HM595015
Hap45T----4.32.17HM595018
Hap46U5.7-2.744.0-0.92HM584612
Hap47M---2.0-0.46HM584613
Hap48H---2.0-0.46HM584615
Hap49H---2.0-0.46HM584621
Hap50J---2.0-0.46HM584622
Hap51V---2.0-0.46HM584623
Hap52H---2.0-0.46HM584624
Hap53V---2.0-0.46HM584626
Hap54U---2.0-0.46HM584629
Hap55T---2.0-0.46HM584631
Hap56H---2.0-0.46HM584632
Hap57H---2.0-0.46HM584633
Hap58U---2.0-0.46HM584634
Hap59U5b3---2.0-0.46HM584635
Hap60J---2.0-0.46HM584639
Hap61V---2.0-0.46HM584644
Hap62H---2.0-0,.46HM584646
Hap63H---2.0-0.46HM584650
Hap64V---2.0-0.46HM584653
Hap65H---2.0-0.46HM584654
Hap66T---2.0-0.46HM584656
Hap67H---2.0-0.46HM584657
Hap68H---2.0-0.46HM584660
Hap69H2.9-1.37---HM584667
Hap70I11.4-5.491---HM584668
Hap71U2.9-1.37---HM584669
Hap72U2.9-1.37---HM584674
Hap73U2.9-1.37---HM584676
Hap74H2.9-1.37---HM584677
Hap75U2.9-1.37---HM584679
Hap76H2.9-1.37---HM584683
Hap77J2.9-1.37---HM584685
Hap78H2.9-1.37---HM584686
Hap79K2.9-1.37---HM584687
Hap80H2.9-1.37---HM584690
Hap81U5b32.9-1.37---HM584692
Hap82H2.9-1.37---HM584694

Significant values of χ2.

Nucleotide sequences from the control region were combined with RFLP data on the coding region to obtain a single dataset for the following analysis. The first two coordinates of PCoA, which account for 62.39% of the total variability, identify two main groups of haplotypes. However, haplotypes were not grouped either according to the geographic area of origin (Cabras or western Campidano) or to the sampling strategy adopted (MBU versus STD) (Figure 2).
Figure 2-

Principal Coordinate Analysis (PCoA) plot: the first PC accounts for 37.82% of variance, while the second PC accounts for 25.07% (Cas: Cabras; W Camp: western Campidano; B: MBU method; S: STD method).

Accordingly, the analysis of molecular variance (AMOVA) did not indicate significant genetic differentiation among samples (Φ = 0.0096, p > 0.05). Indeed, nearly all variance was found within samples (99.04%), whereas differences among samples accounted for only 0.96% of the total variation. These results were further confirmed by the pairwise comparison of samples, which did not show any significant genetic differentiation (Table 5).
Table 5-

Population pairwise Φ values among samples obtained from MBU and STD strategies. Population codes are reported as in Table 2. Conventional Φ values are shown below the diagonal and corresponding P values with significance level ≤0.05 are shown above the diagonal.

Cas-BCas-SW Camp-BW Camp-S
Cas-B-0.41500.09400.2286
Cas-S–0.0003-0.15520.1082
W Camp-B0.01500.0087-0.2020
W Camp-S0.00960.01810.0098-
Furthermore, network analysis showed similar relationships among haplogroups without geographical structuring when the two sampling methods were compared (Figure 3).
Figure 3-

Networks obtained from combined dataset (control region and coding region) for MBU (A and C) and STD (B and D) strategies. A and B: phylogenetic relationships among mitochondrial haplogroups; C and D: geographic distribution of mitochondrial haplogroups. Cas: Cabras; W Camp: western Campidano; B: MBU method; S: STD method.

Discussion

Estimates of genetic diversity (Table 2) obtained for the two sampling strategies were compatible with no occurrence of high levels of repeated haplotypes in the STD strategy, as could be expected. This finding supports the possible occurrence of a homogeneous population shared by both the western Campidano and Cabras areas, with a constant high level of genetic variability in the samples obtained by the two sampling methods and low levels of stochastic forces. The similarity of genetic diversity values between areas and sampling strategies may be explained considering the lack of diachronic divergence between the present and past genetic settlement of the western Campidano and Cabras areas. Furthermore, this finding is attributable to the absence of genetic drift in the analysed areas. Indeed, this stochastic force, if present, could lead to genetic heterogeneity due to random loss of haplogroups and alteration of their frequencies. The absence of higher levels of identical haplotypes among the STD samples suggests that no significant founder effects affected the population recently. Consistently, the result of PCoA applied to the combined dataset (control region + coding region) (Figure 2) contributed to group MBU and STD samples without genetic structuring. Such similarity was also confirmed by the corresponding, not significant, P values of Φ. Network analysis was also consistent with the results above. The two sampling strategies displayed similar global relationships among mitochondrial haplogroups without geographical structuring, showing that mtDNA haplo-group frequencies and distribution obtained by the MBU method were not skewed by the severe sample selection of the method used. Overall, these results suggest a lack of genetic variation in southwest Sardinia, probably due to a continuous gene flow between the areas, either in the past or more recently, which may have counterbalanced the development of microheterogeneity due to genetic drift. Previous studies carried out on the paternal unilinear marker Y-chromosome pointed out a similar trend for the entire Sardinian population, suggesting an initial settlement of a relatively large number of individuals with a common origin (Contu ) and conspicuous genetic variability. The presence of genetic structuring is the major obstacle in association studies based on genome-wide scans searching for linkage disequilibrium (LD) between patients and controls (Risch and Botstein, 1996; Terwilliger and Weiss, 1998), even in isolated populations like Finns and Sardinians (Eaves ; Taillon-Miller ). Pooling individuals belonging to different breeding units may merge alleles that might have different frequencies in different villages, as we have previously reported for some common polymorphisms in Sardinian villages (Robledo ). As previously shown, in a well-defined breeding unit, a small sample was sufficient to describe the genomic profile of the population, which was not affected by severe reduction of sample size (Siniscalco ). More importantly, the repeated application of our strategy in different MBUs offers the advantage of reducing the risk of false-positive results due to population stratification, since obtaining similar artifactual results in different MBUs is not anticipated. In conclusion, the comparison of the variability detected by means of the MBU and STD sampling methods points to a diachronic continuity of the genetic structure of southwestern Sardinia. The benefit of the MBU sampling strategy lies in the possibility of: i) selecting the original population on the basis of written documents and not by inferring surname monophyletism, and ii) not excluding from the analysis unrelated individuals with polyphyletic surnames, when present, in the founder families. Our results confirm that the MBU sampling strategy, despite the drastic reduction in sample size, does not introduce deviations in gene frequencies, even if haploid markers such as mtDNA are used. Therefore it can be considered a useful tool in association studies of complex traits, making it possible to infer the genetic settlement of the population, recovering the deepest branches of a genealogy and avoiding the recent contribution of foreign peopling.
  13 in total

1.  DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis.

Authors:  J Rozas; R Rozas
Journal:  Bioinformatics       Date:  1999-02       Impact factor: 6.937

2.  Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28.

Authors:  P Taillon-Miller; I Bauer-Sardiña; N L Saccone; J Putzel; T Laitinen; A Cao; J Kere; G Pilia; J P Rice; P Y Kwok
Journal:  Nat Genet       Date:  2000-07       Impact factor: 38.330

3.  The genetically isolated populations of Finland and sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes.

Authors:  I A Eaves; T R Merriman; R A Barber; S Nutland; E Tuomilehto-Wolf; J Tuomilehto; F Cucca; J A Todd
Journal:  Nat Genet       Date:  2000-07       Impact factor: 38.330

4.  A 9.1-kb gap in the genome reference map is shown to be a stable deletion/insertion polymorphism of ancestral origin.

Authors:  Renato Robledo; Sandro Orru; Antonella Sidoti; Rosella Muresu; Diane Esposito; Marie Claude Grimaldi; Carlo Carcassi; Antoniettina Rinaldi; Luigi Bernini; Licinio Contu; Massimo Romani; Bruce Roe; Marcello Siniscalco
Journal:  Genomics       Date:  2002-12       Impact factor: 5.736

5.  A manic depressive history.

Authors:  N Risch; D Botstein
Journal:  Nat Genet       Date:  1996-04       Impact factor: 38.330

6.  Population genomics in Sardinia: a novel approach to hunt for genomic combinations underlying complex traits and diseases.

Authors:  M Siniscalco; R Robledo; P K Bender; C Carcassi; L Contu; J C Beck
Journal:  Cytogenet Cell Genet       Date:  1999

7.  Sequence and organization of the human mitochondrial genome.

Authors:  S Anderson; A T Bankier; B G Barrell; M H de Bruijn; A R Coulson; J Drouin; I C Eperon; D P Nierlich; B A Roe; F Sanger; P H Schreier; A J Smith; R Staden; I G Young
Journal:  Nature       Date:  1981-04-09       Impact factor: 49.962

8.  Mitochondrial haplogroup U5b3: a distant echo of the epipaleolithic in Italy and the legacy of the early Sardinians.

Authors:  Maria Pala; Alessandro Achilli; Anna Olivieri; Baharak Hooshiar Kashani; Ugo A Perego; Daria Sanna; Ene Metspalu; Kristiina Tambets; Erika Tamm; Matteo Accetturo; Valeria Carossa; Hovirag Lancioni; Fausto Panara; Bettina Zimmermann; Gabriela Huber; Nadia Al-Zahery; Francesca Brisighelli; Scott R Woodward; Paolo Francalacci; Walther Parson; Antonio Salas; Doron M Behar; Richard Villems; Ornella Semino; Hans-Jürgen Bandelt; Antonio Torroni
Journal:  Am J Hum Genet       Date:  2009-06-04       Impact factor: 11.025

9.  GenAlEx 6.5: genetic analysis in Excel. Population genetic software for teaching and research--an update.

Authors:  Rod Peakall; Peter E Smouse
Journal:  Bioinformatics       Date:  2012-07-20       Impact factor: 6.937

10.  Y-chromosome based evidence for pre-neolithic origin of the genetically homogeneous but diverse Sardinian population: inference for association scans.

Authors:  Daniela Contu; Laura Morelli; Federico Santoni; Jamie W Foster; Paolo Francalacci; Francesco Cucca
Journal:  PLoS One       Date:  2008-01-09       Impact factor: 3.240

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.