| Literature DB >> 36055213 |
Elizabeth G Atkinson1, Shareefa Dalvie2, Yakov Pichkar3, Allan Kalungi4, Lerato Majara5, Anne Stevenson6, Tamrat Abebe7, Dickens Akena8, Melkam Alemayehu9, Fred K Ashaba10, Lukoye Atwoli11, Mark Baker12, Lori B Chibnik13, Nicole Creanza3, Mark J Daly14, Abebaw Fekadu15, Bizu Gelaye16, Stella Gichuru17, Wilfred E Injera18, Roxanne James19, Symon M Kariuki20, Gabriel Kigen21, Nastassja Koen2, Karestan C Koenen16, Zan Koenig12, Edith Kwobah17, Joseph Kyebuzibwa8, Henry Musinguzi10, Rehema M Mwema22, Benjamin M Neale14, Carter P Newman16, Charles R J C Newton20, Linnet Ongeri22, Sohini Ramachandran23, Raj Ramesar24, Welelta Shiferaw9, Dan J Stein2, Rocky E Stroud16, Solomon Teferra9, Mary T Yohannes12, Zukiswa Zingela25, Alicia R Martin14.
Abstract
African populations are the most diverse in the world yet are sorely underrepresented in medical genetics research. Here, we examine the structure of African populations using genetic and comprehensive multi-generational ethnolinguistic data from the Neuropsychiatric Genetics of African Populations-Psychosis study (NeuroGAP-Psychosis) consisting of 900 individuals from Ethiopia, Kenya, South Africa, and Uganda. We find that self-reported language classifications meaningfully tag underlying genetic variation that would be missed with consideration of geography alone, highlighting the importance of culture in shaping genetic diversity. Leveraging our uniquely rich multi-generational ethnolinguistic metadata, we track language transmission through the pedigree, observing the disappearance of several languages in our cohort as well as notable shifts in frequency over three generations. We find suggestive evidence for the rate of language transmission in matrilineal groups having been higher than that for patrilineal ones. We highlight both the diversity of variation within Africa as well as how within-Africa variation can be informative for broader variant interpretation; many variants that are rare elsewhere are common in parts of Africa. The work presented here improves the understanding of the spectrum of genetic variation in African populations and highlights the enormous and complex genetic and ethnolinguistic diversity across Africa.Entities:
Keywords: Africa; diverse populations; genotypes; linguistics; population genetics; population structure
Mesh:
Year: 2022 PMID: 36055213 PMCID: PMC9502052 DOI: 10.1016/j.ajhg.2022.07.013
Source DB: PubMed Journal: Am J Hum Genet ISSN: 0002-9297 Impact factor: 11.043
Figure 1Genetic and admixture composition of the NeuroGAP-Psychosis samples against a global reference
(A) First two principal components showing NeuroGAP-Psychosis samples as projected onto global variation of the full 1000 Genomes, HGDP, and AGVP. While most samples fall on a cline of African genetic variation, some South African samples exhibit high amounts of admixture and European genetic ancestry. Color scheme for global PCA plot: Latin American, yellow; East Asian, dark orange; European, tan; South Asian, fuschia; West African, green/blue; East African, red/orange; South African, purple; NeuroGAP-Psychosis collections, gray.
(B) ADMIXTURE plot at best fit k (k = 10) of all African samples as well as three representative non-African populations from the 1000 Genomes Project. The GIH, CHB, and GBR were included to capture South Asian, East Asian, and European admixture, respectively. Individuals are represented as bar charts sorted by population, and ancestry components for each person are visualized with different colors. A key describing the country of origin for all populations can be found in Table S1.
Figure 2Genetic composition of subcontinental African structure in the NeuroGAP-Psychosis samples
PCA plots for PCs 1–8 with an African reference panel. A map of collection locations is shown to the left of PCA plots. Points are colored by region to assist in interpretation: green, west; blue, west central/central; red, east; orange, Ethiopia; purple, south. See Figures S2–S6 for plots highlighting each cohort individually.
Figure 3Primary self-reported language shifts over three generations
(A) Individual languages were re-classified into broader language families for comparable granularity. Note that while all languages in the legend are represented in the plot, not all are visible due to being at low frequency in the data.
(B) All languages reported with at least 3% frequency in any generation are shown across the generations. Note the increase in endorsement of English and drop in Oromiffa/Oromigna in the present generation.
(C) Primary language reported by the individuals within each NeuroGAP-Psychosis study country.
Figure 4Procrustes correlations between genetics, geography, and language
Procrustes correlations (all p < 5E−5) are shown between geography and genetics (A and B), geography and language (C and D), and genetics and language (E and F). The left column includes results for the entire NeuroGAP-Psychosis collection. The right column contains results subset to the four cohorts in East Africa. For linguistic analyses, linguistic variation is measured by the first three PCs of phoneme inventories from languages reported by individuals as spoken by themselves and their relatives. Matrilineal relatives include the mother and maternal grandmother. Patrilineal relatives include the father and paternal grandfather. Familial refers to a weighted average of all reported family members. Note that Y-axis labels vary between plots.
Procrustes correlation between genetics, geography, and language
| Genetic | Geography | Self | Maternal languages | Paternal languages | |
|---|---|---|---|---|---|
| All individuals | autosomal | 0.5426 | 0.6223 | 0.5935 | 0.6078 |
| All individuals | X chrom. | 0.5231 | 0.6078 | 0.5988 | 0.6082 |
| East African cohorts | autosomal | 0.7868 | 0.6815 | 0.6856 | 0.6924 |
| East African cohorts | X chrom. | 0.6170 | 0.6103 | 0.6178 | 0.6200 |
All p < 5E−5. The first three PCs of autosomal and X chromosome variation were used for comparisons. Linguistic variation was calculated as a function of mean phoneme presence across all languages reported by the individual across their pedigree. Maternal language contains results from the languages spoken by the participants’ mother and maternal grandmother; paternal contains results from their father and paternal grandfather.
Language transmission rates from relatives
| Family member | Overall | Patrilineal | Patrilineal (downsampled) | Matrilineal |
|---|---|---|---|---|
| Father | 0.810 | 0.837 | 0.901 | 0.871 |
| Mother | 0.802 | 0.811 | 0.837 | 0.800 |
| Paternal grandfathers | 0.778 | 0.726 | 0.775 | 0.926 |
| Paternal grandmothers | 0.773 | 0.738 | 0.779 | 0.939 |
| Maternal grandfathers | 0.762 | 0.708 | 0.736 | 0.903 |
| Maternal grandmothers | 0.758 | 0.726 | 0.750 | 0.812 |
Frequency of a participant’s reported primary language matching one of the top three reported languages spoken by relatives. Rates were calculated excluding English. Given that all but one of the NeuroGAP-Psychosis populations with linguistic data were collected in East Africa, we conducted an additional suite of analyses zooming into this region to examine transmission in this part of the continent. In East Africa, individuals were thus additionally partitioned by their affiliation with ethnic groups with either a matrilineal or patrilineal transmission of movable property. Patrilineal languages were run in their entirety as well as downsampled to 105 to match the sample size available for matrilineal languages.