Literature DB >> 30761343

Sequence polymorphism and haplogroup data of the hypervariable regions on mtDNA in Semoq Beri population.

Muhamad Aidil Zahidin1, Wan Bayani Wan Omar2, Wan Rohani Wan Taib3, Jeffrine Rovie Ryan Japning4, Mohd Tajuddin Abdullah1,2.   

Abstract

Orang Asli is the aboriginal people in Peninsular Malaysia who have been recognized as indigenous to the country and still practicing traditional lifestyle. The molecular interest on the Orang Asli started when the earliest prehistoric migration occurred approximately 200 kya and entering Peninsular Malaysia 50 kya in stages. A total of three groups of Orang Asli present in Peninsular Malaysia, namely, Negrito also known as Semang, Senoi and Proto Malays. Through records, there is no research has been conducted on mtDNA variations in the Semoq Beri population, one of the tribes in Senoi group. In this report, variations of mtDNA were analysed in the population in Hulu Terengganu as an initial effort to establish the genetic characterisation and elucidating the history of Orang Asli expansion in Peninsular Malaysia. An array of mtDNA parameters was estimated and the observed polymorphisms with their respective haplogroups in comparison to rCRS were inferred respectively. The DNA sequences are registered in the NCBI with accession numbers KY853670-KY853753.

Entities:  

Year:  2018        PMID: 30761343      PMCID: PMC6288409          DOI: 10.1016/j.dib.2018.10.158

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications table Value of the data Presently, there are 533 Semoq Beri and likely to be a threatened population in Hulu Terengganu due to the culture assimilation and intermarriage [3], [4], [5], [6]. The data provide baseline information to any future genetic and evolutionary studies as inferred from control region mtDNA. The data will enhance the DNA database of Semoq Beri population to elucidating the history of Orang Asli expansion in Peninsular Malaysia. The data allow other researchers focusing on this population to start genome-wide analysis.

Data

This data article is possible after unrelated blood samples successfully sequenced as inferred from Hypervariable Segment I (HVSI) and HVSII of mtDNA (Table 1). Each sequence was subjected into Sequencher 5.4 software (https://genecodes.com), ClustalW2 MUSCLE (https://www.ecbi.ac.uk) and MEGA 7 software [1] to identify the sequence polymorphisms (Table S1), C-stretch (Table 2) and nucleotide composition (Table 3). Meanwhile, haplotype data (Table 4) were obtained through DnaSP 5.1 software [2]. Haplogroup classification was performed by using Haplogroup software (https://dna.jameslick.com) on HVSI sequences. The schematic diagrams that represent two major haplogroups, which are M and N, were drawn ( Figs. 1 and 2).
Table 1

Details of the primers used for PCR amplification [3].

mtDNA regionNucleotide positionPrimersPrimer sequences (5′-3′)Size (bp)
HVI16,024–16,569conL1 (F)TCAAGCTTACACCAGTCTTGTAAACC600
conH1 (R)CCTGAAGTAGGAACCAGATG
HVII0–576conL4 (F)GGTCTATCACCCTATTAACCAC600
conH4 (R)CTGTTAAAAGTGCATACCGCCA
Table 2

C-stretch region of HVII region between nucleotide positions 233 (*233C) to 250 (250C).

ns222222222222222222n
333333344444444445
345678901234567890
rCRSNNNCCCCCCCTCCCCCGC
24CCCCCCCC8
13CCCCCCC7
6CCCCCCCCC9
1CCCCCCCCCCCCCCCC16
1CCCCCCCCCCCCCCCCC17

N - deletion base, ns - total number of sequences, n - total number of unbroken bases C series.

Table 3

Sequence variation for the HVI and HVII regions.

Variation indicesHVI regionHVII region
Nucleotide position (%)16,024 to 16,504 (88%)72 to 351 (49%)
Base pair481 bp280 bp
No. of polymorphic sites1826
No. of observed transitions1617
No. of observed transversions29
No. of indels5
Nucleotide composition (%) C31.1727.66
T23.7427.39
A31.2028.75
G13.8916.20
Table 4

Frequency distribution of the mtDNA haplotypes.

HaplotypeNSamplesFrequency
HVS-IHap 11Semaq Beri 190.025
Hap 22Semaq Beri 3, 430.050
Hap 31Semaq Beri 450.025
Hap 418Semaq Beri 1, 5, 6, 8, 11, 12, 17, 18, 21, 24, 29, 32, 33, 35, 39, 40, 42, 470.450
Hap 57Semaq Beri 7, 13, 23, 30, 36, 37, 490.175
Hap 64Semaq Beri 20, 27, 28, 340.100
Hap 77Semaq Beri 2, 9, 14, 22, 31, 38, 480.175
HVS-IIHap 81Semaq Beri 440.023
Hap 91Semaq Beri 460.023
Hap 101Semaq Beri 210.023
Hap 111Semaq Beri 360.023
Hap 121Semaq Beri 350.023
Hap 1310Semaq Beri 2, 9, 14, 20, 22, 27, 28, 31, 38, 480.227
Hap 141Semaq Beri 30.023
Hap 151Semaq Beri 250.023
Hap 161Semaq Beri 260.023
Hap 173Semaq Beri 10, 15, 410.068
Hap 182Semaq Beri 19, 500.045
Hap 1914Semaq Beri 1, 5, 6, 8, 11, 12, 17, 18, 24, 29, 32, 39, 40, 470.318
Hap 207Semaq Beri 4, 13, 16, 23, 30, 43, 490.159

N - number of haplotype.

Fig. 1

The current Asian and Pacific mtDNA within Manju clan. The tree was reconstructed based on [11]. The uppercase letter (E-East, N-North, S-South, NA-North Asia, EA-East Asia, SEA-Southeast Asia and PM-Peninsular Malaysia) is referring to the geographical location.

Fig. 2

The current Asian and Pacific mtDNA within Nasreen clan. The tree was reconstructed based on [11]. The uppercase letter (E-East, N-North, S-South, NA-North Asia, EA-East Asia, SEA-Southeast Asia and PM-Peninsular Malaysia) is referring to the geographical location.

Details of the primers used for PCR amplification [3]. C-stretch region of HVII region between nucleotide positions 233 (*233C) to 250 (250C). N - deletion base, ns - total number of sequences, n - total number of unbroken bases C series. Sequence variation for the HVI and HVII regions. Frequency distribution of the mtDNA haplotypes. N - number of haplotype. The current Asian and Pacific mtDNA within Manju clan. The tree was reconstructed based on [11]. The uppercase letter (E-East, N-North, S-South, NA-North Asia, EA-East Asia, SEA-Southeast Asia and PM-Peninsular Malaysia) is referring to the geographical location. The current Asian and Pacific mtDNA within Nasreen clan. The tree was reconstructed based on [11]. The uppercase letter (E-East, N-North, S-South, NA-North Asia, EA-East Asia, SEA-Southeast Asia and PM-Peninsular Malaysia) is referring to the geographical location.

Experimental design, materials, and methods

Sample collection and genomic DNA extraction

All sequence data were generated from DNA samples that were collected with informed and written consent, and approved by Universiti Sultan Zainal Abidin (UniSZA) Human Research Ethics Committee, Malaysia. Blood samples were collected from unrelated individuals of Semoq Beri in Kampung Sungai Berua, Hulu Terengganu, Malaysia. The blood samples were extracted using PureLink™ Genomic DNA Mini Kit (Invitrogen, USA) following protocol provided by the manufacturer.

PCR amplification, DNA purification and sequencing

The isolated genomic DNA were amplified using a set of partial forward and reverse HVI and HVII primers respectively (Table 1) [7]. Negative, amplification and reagent blank controls were used to avoid contamination present at any stage during laboratory works. The PCR amplification was carried out in a final volume of 25 μl (Table S2) in Arktik Thermal Cycler (Thermo Scientific, USA) and the PCR profile was given in Table S3. The amplified PCR products were purified using QIAquick Purification Kit (QIAGEN Ag., Germany). The DNA products were visualized using 1% of agarose gel electrophoresis to read the size of the amplified product. The sequencing was carried out at First Base Laboratories Sdn Bhd (Malaysia) using ABI PRISM® 377 DNA Sequencher with the BigDye® Terminator 3.0 Cycle Sequencing Kit.

Statistical sequence analyses

The fluorescence nucleotide bases of segmented DNA sequences were visualized and read using Sequencher 5.4 (https://genecodes.com). The sequences were matched and aligned with the revised Cambridge Reference Sequences (rCRS) [8], [9] using ClustalW2 MUSCLE (Multiple Sequence Comparison by Log-Expectation) (https://www.ebi.ac.uk). The C-stretch for each sequence was checked and counted (Table 2). The nucleotide composition was performed in MEGA 7 [1] (Table 3). The Arlequin haplotype data were generated using DnaSP 5.1 [2] (Table 4). Haplogroup classification was performed using haplogroup online software (https://dna.jameslick.com) where the haplogroup data were compatible with PhyloTree Build 17 [10]. The schematic diagrams were drawn based on [10] and [11] (Figs. 1 and 2). GenBank accession numbers and haplogroups identification for HVI and HVII of Semoq Beri population are provided in Table S1.
Subject areaForensic science
More specific subject areaForensic genetic
Type of dataTables and figure
How data were acquiredData were acquired by extracting, amplifying, purifying, sequencing and analysing the target mtDNA region using PureLink Genomic DNA Mini Kit (Invitrogen, USA), QIAquick Purification Kit (QIAGEN Ag., Germany), DNA sequencer (First Base Laboratories, Malaysia), Sequencher 5.4 software (https://genecodes.com), ClustalW2 MUSCLE (https://www.ecbi.ac.uk), MEGA 7 software [1], DnaSP 5.1 software [2] and Haplogroup software (https://dna.jameslick.com)
Data formatRaw and analysed
Experimental factorsBlood sample collection, DNA extraction, PCR amplification, DNA purification, sequencing and data interpretation
Experimental featuresSequence analysed followed by haplogroup identification
Data source locationKampung Sungai Berua, Hulu Terengganu, Terengganu, Malaysia
Data accessibilityThe mtDNA sequences are registered in the NCBI with accession number KY853670-KY853753 [Table S1]
Related research articleZahidin [3]
  2 in total

1.  Compendium of hand, foot and mouth disease data in Malaysia from years 2010-2017.

Authors:  Bryan Raveen Nelson; Hisham Atan Edinur; Mohd Tajuddin Abdullah
Journal:  Data Brief       Date:  2019-03-20

2.  Partial mtDNA sequencing data of vulnerable Cephalopachus bancanus from the Malaysian Borneo.

Authors:  Muhamad Aidil Zahidin; Norehan Abd Jalil; Nur Mukminah Naharuddin; Mohd Ridwan Abd Rahman; Millawati Gani; Mohd Tajuddin Abdullah
Journal:  Data Brief       Date:  2019-06-25
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.