| Literature DB >> 32957965 |
Lucy Crooks1, Johnathan Cooper-Knock2, Paul R Heath2, Ahmed Bouhouche3, Mostafa Elfahime4, Mimoun Azzouz2, Youssef Bakri5, Mohammed Adnaoui6, Azeddine Ibrahimi7, Saaïd Amzazi5, Rachid Tazi-Ahnini8,9.
Abstract
BACKGROUND: Large-scale human sequencing projects have described around a hundred-million single nucleotide variants (SNVs). These studies have predominately involved individuals with European ancestry despite the fact that genetic diversity is expected to be highest in Africa where Homo sapiens evolved and has maintained a large population for the longest time. The African Genome Variation Project examined several African populations but these were all located south of the Sahara. Morocco is on the northwest coast of Africa and mostly lies north of the Sahara, which makes it very attractive for studying genetic diversity. The ancestry of present-day Moroccans is unknown and may be substantially different from Africans found South of the Sahara desert, Recent genomic data of Taforalt individuals in Eastern Morocco revealed 15,000-year-old modern humans and suggested that North African individuals may be genetically distinct from previously studied African populations.Entities:
Keywords: Africa; Population genomics; SNVs; Whole genome sequencing
Mesh:
Year: 2020 PMID: 32957965 PMCID: PMC7507649 DOI: 10.1186/s12863-020-00917-4
Source DB: PubMed Journal: BMC Genet ISSN: 1471-2156 Impact factor: 2.797
Fig. 1Distribution of SNV sites across the genome. Number of SNVs is shown for 10 Mb blocks of each chromosome. Lines indicate chromosome boundaries. Shaded area is chromosome 5. a All SNV sites identified in the Moroccan individuals. b SNV sites with alleles not found in 1000G. c SNV sites with alleles not found in gnomAD. The value for 30-40 Mb on chromosome 6 was highest, at 4094, but is omitted to better represent the majority of the genome. No values are shown for the Y chromosome because GnomAD did not make calls for this chromosome
Fig. 2Properties of discovered SNVs. Results are shown for all SNV alleles, alleles not found in 1000G and alleles not found in gnomAD. Light grey bars are percentage of all alleles in each class, black are percentage of alleles novel to 1000G in each class, and dark grey are percentage of alleles novel to gnomAD in each class. a Distribution of number of copies of the alternative allele in the three Moroccan individuals. b Distribution of alleles across different types of sequence. c For exonic alleles, distribution of functional effects. For (b) and (c), the Y axis is log10 scaled and annotation was performed in ANNOVAR version 16-04-18 with the RefSeq gene model
Fig. 3Inferred shared ancestry of Moroccan individuals to global populations. The first (PC1) and second (PC2) principle components of individuals from 1000G and the Moroccan individuals are shown. Principle components were calculated from the genotype matrix of high-quality, LD-pruned SNPs from 1000G. The Moroccan individuals were genotyped at the same sites and values predicted with the same model. Three letter codes in the legend are from 1000G. The continental areas of the populations are as follows: ACB, ASW, ESN, GWD, LWK, MSL and YRI - Africa; CLM, MXL, PEL and PUR - the Americas; CDX, CHB, CHS, JPT and KHV - East Asia; CEU, FIN, GBR, IBS and TSI - Europe; BEB, GIH, ITU, PJL, STU - South Asi