Dang Liu1,2, Benjamin M Peter1, Wulf Schiefenhövel3, Manfred Kayser4, Mark Stoneking1,5. 1. Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany. 2. Human Evolutionary Genetics Unit, Institut Pasteur, UMR 2000, CNRS, Paris, France. 3. Human Ethology Group, Max Planck Institute for Ornithology, Seewiesen, Germany. 4. Department of Genetic Identification, Erasmus MC, University Medical Center Rotterdam, Rotterdam, The Netherlands. 5. CNRS, Laboratoire de Biométrie et Biologie Evolutive, UMR 5558, Université Lyon 1, Villeurbanne, France.
Abstract
The Massim, a cultural region that includes the southeastern tip of mainland Papua New Guinea (PNG) and nearby PNG offshore islands, is renowned for a trading network called Kula, in which different valuable items circulate in different directions among some of the islands. Although the Massim has been a focus of anthropological investigation since the pioneering work of Malinowski in 1922, the genetic background of its inhabitants remains relatively unexplored. To characterize the Massim genomically, we generated genome-wide SNP data from 192 individuals from 15 groups spanning the entire region. Analyzing these together with comparative data, we found that all Massim individuals have variable Papuan-related (indigenous) and Austronesian-related (arriving ∼3,000 years ago) ancestries. Individuals from Rossel Island in southern Massim, speaking an isolate Papuan language, have the highest amount of a distinct Papuan ancestry. We also investigated the recent contact via sharing of identical by descent (IBD) genomic segments and found that Austronesian-related IBD tracts are widely distributed geographically, but Papuan-related tracts are shared exclusively between the PNG mainland and Massim, and between the Bismarck and Solomon Archipelagoes. Moreover, the Kula-practicing groups of the Massim show higher IBD sharing among themselves than do groups that do not participate in Kula. This higher sharing predates the formation of Kula, suggesting that extensive contact between these groups since the Austronesian settlement may have facilitated the formation of Kula. Our study provides the first comprehensive genome-wide assessment of Massim inhabitants and new insights into the fascinating Kula system.
The Massim, a cultural region that includes the southeastern tip of mainland Papua New Guinea (PNG) and nearby PNG offshore islands, is renowned for a trading network called Kula, in which different valuable items circulate in different directions among some of the islands. Although the Massim has been a focus of anthropological investigation since the pioneering work of Malinowski in 1922, the genetic background of its inhabitants remains relatively unexplored. To characterize the Massim genomically, we generated genome-wide SNP data from 192 individuals from 15 groups spanning the entire region. Analyzing these together with comparative data, we found that all Massim individuals have variable Papuan-related (indigenous) and Austronesian-related (arriving ∼3,000 years ago) ancestries. Individuals from Rossel Island in southern Massim, speaking an isolate Papuan language, have the highest amount of a distinct Papuan ancestry. We also investigated the recent contact via sharing of identical by descent (IBD) genomic segments and found that Austronesian-related IBD tracts are widely distributed geographically, but Papuan-related tracts are shared exclusively between the PNG mainland and Massim, and between the Bismarck and Solomon Archipelagoes. Moreover, the Kula-practicing groups of the Massim show higher IBD sharing among themselves than do groups that do not participate in Kula. This higher sharing predates the formation of Kula, suggesting that extensive contact between these groups since the Austronesian settlement may have facilitated the formation of Kula. Our study provides the first comprehensive genome-wide assessment of Massim inhabitants and new insights into the fascinating Kula system.
New Guinea has a long history of human occupation and harbors extensive ethnolinguistic diversity. Modern humans first colonized Sahul (the connected Australia–New Guinea land mass) at least 47 thousand years ago (kya), and perhaps as long ago as 65 kya (Roberts et al. 1990; Summerhayes et al. 2010; O’Connell and Allen 2015; Clarkson et al. 2017), while ∼3 kya, the Austronesian expansion/settlement brought a second wave of human migration into the region (Bellwood 2007; Kirch 2017). This long-term isolation of human occupation and different episodes of human migration promoted diverse regional culture developments, including an independent invention of farming in the New Guinea highlands at least 4–5 kya (Denham 2004; Shaw, Field, et al. 2020) and the Lapita culture, associated with the expansion of Austronesians from Near Oceania into the Pacific (Spriggs 1995; Kirch 2017).Geographically, the Massim encompasses the southeastern tip of mainland Papua New Guinea (PNG) and nearby offshore islands (fig. 1) and is considered by anthropologists to be a culturally defined region (Seligmann 1910; Shaw 2019). The Massim region is well known for the Kula ring tradition (or Kula), a network trading system in which two types of unique necklaces circulate among the islands in opposite directions, facilitating inter-island social and economic relationships (Malinowski 1922; Ziegler 2017; Irwin et al. 2018; Kuehling 2021). The Kula connects the islands of the northern and the western Massim (including the eastern tip of Mainland PNG) and also includes Misima, the most northern island of the southern Massim, while other islands of the southern Massim are not involved in the Kula (fig. 1). Due to seasonal wind conditions, traders on Kula voyages often spend long periods of time on different islands (Malinowski 1922; Irwin et al. 2018), which might further facilitate interactions, including genetic (Schiefenhövel 2004), between the groups participating in Kula.
Fig. 1.
Map showing the location of the newly studied Massim groups and groups in the reference data set and PCA of the East Asian and Oceanian groups. (A) The main map shows the location of Massim groups, color-coded according to subregion, while the inset map shows the location of reference groups. Dot shapes indicate the corresponding publication. The solid arrows connect groups frequently involved in a Kula relationship, while the dashed arrows connect groups that marginally involved in a Kula relationship. The numbers indicate distances (in kilometers) between islands. (B) Plot of PC1 versus PC2 of the median position of the East Asian and Oceanian groups from supplementary fig. S2, Supplementary Material online, colored according to region; Massim regions are further indicated. The eigenvalues from PC1 to PC10 are shown on the top left.
Map showing the location of the newly studied Massim groups and groups in the reference data set and PCA of the East Asian and Oceanian groups. (A) The main map shows the location of Massim groups, color-coded according to subregion, while the inset map shows the location of reference groups. Dot shapes indicate the corresponding publication. The solid arrows connect groups frequently involved in a Kula relationship, while the dashed arrows connect groups that marginally involved in a Kula relationship. The numbers indicate distances (in kilometers) between islands. (B) Plot of PC1 versus PC2 of the median position of the East Asian and Oceanian groups from supplementary fig. S2, Supplementary Material online, colored according to region; Massim regions are further indicated. The eigenvalues from PC1 to PC10 are shown on the top left.Since the initial description of Kula by Malinowski in 1922 (Malinowski 1922), the region has attracted the attention of anthropologists, archeologists, and linguists, and chronologies of Massim have been proposed (Shaw 2016; Shaw, Coxe, Kewibu, et al. 2020). The earliest evidence for human occupation of the Massim is dated to 17.3 kya, on Paneati Island in the Louisiades Archipelago (Shaw, Coxe, Haro, et al. 2020); undated obsidian and stone artifacts that are similar to those dated to the Late Pleistocene or Early Holocene elsewhere in New Guinea further support human occupation around this time, when lowered sea levels connected much of the Massim to the PNG mainland (Shaw et al. 2016; Shaw 2017). After settlement by Lapita people ∼2.5–3 kya, the northern and southern Massim exhibited separate cultural developments, until ∼500–200 years ago with the formation of Kula (Shaw 2016). Linguistically, almost all Massim inhabitants speak languages belonging to the Austronesian language family, except for a few groups living on mainland PNG who speak Papuan languages belonging to the Trans-New Guinea language family (Eberhard et al. 2021), and the Rossel Islanders on the most eastern tip of the southern Massim. Rossel Islanders speak a Papuan language that has been classified either as a language isolate (Levinson and Majid 2013; Stebbins et al. 2017) or as belonging to the Yele-West New Britain language family, linking Rossel linguistically with the Bismarck Archipelago (Ross 2005; Eberhard et al. 2021).While the Massim region has been well studied by archeologists, anthropologists, and linguists, genetic variation in the Massim remains largely unexplored, even though it occupies a key position in connecting the northern and southern coasts of New Guinea as well as connecting New Guinea with the neighboring Solomon Islands and other regions of Oceania. The most comprehensive human genetic study of the Massim to date analyzed mitochondrial DNA (mtDNA) and Y-chromosome variation, and found regional genetic-geographic population structure for mtDNA but not for the Y-chromosome (van Oven et al. 2014). This was interpreted as a potential signature of the Kula, as the travel between islands to perform the trading of the goods is mostly mediated by males, which could reduce inter-island genetic differences for the Y-chromosome but not for mtDNA. However, studies of genome-wide data provide much richer information concerning admixture and population history, as shown by two recent genome-wide studies of PNG (Bergström et al. 2017; Brucato et al. 2021), but genome-wide studies of the Massim are lacking as of yet. Here, we report the results of comprehensive genome-wide analyses of the Massim region with resulting insights into the genetic history of Oceania in general and implications for the Kula tradition in particular.
Results
Overall Genetic Variation of Austronesian and Papuan Ancestries in the Massim
We generated genome-wide single nucleotide polymorphism (SNP) array data on the Illumina Infinium Multi-Ethnic Global Array (MEGA; ∼1.5 million SNPs) for 192 individuals from 15 groups spanning the entire Massim region including northern, southern, and western Massim as well as the mainland parts from Collingwood Bay (fig. 1). We merged our data with published array and whole-genome sequencing (WGS) data encompassing East Asia and Oceania and additionally included Africans and Europeans as more distant comparative groups (fig. 1; supplementary fig. S1 and table S1, Supplementary Material online). To obtain an overview of the genetic variation and genetic-geographic population structure revealed with our compiled genomic data set, we first performed principal component analysis (PCA). In a PCA plot focusing on the East Asian and Oceanian groups (fig. 1; supplementary fig. S2, Supplementary Material online), we observed a striking cline with East Asians at one pole and PNG highlanders at the other; the Massim groups fall in between together with groups from the Central Province of PNG, the Bismarck Archipelago (in short, Bismarcks), and the Solomon Archipelago (i.e., Bougainville and the Solomon Islands; in short, Solomons). Groups from the PNG lowlands are placed close to PNG highlanders, except for the Central group, which comprises more Austronesian speakers. Bellona, Renell, and Tikopia are Polynesian outliers in the Solomons, that is, groups that migrated back to the Solomons from Polynesia, which explains their placement further toward East Asians than any other Near Oceanian group studied. The northern Massim groups are closer to the East Asian pole, while the southern Massim group from Rossel is closest to the PNG highlander pole, and other southern Massim as well as all western Massim and Collingwood Bay groups fall in between. Notably, there is a division in the PCA within southern Massim, with Rossel being closest to the PNG mainland and highland groups and Sudest, which geographically is located next to Rossel, being close to Rossel, while the other southern Massim groups fall together with northern and western Massim groups and those from Collingwood Bay.Since the PCA suggests mixed East Asian-Papuan ancestry in the Massim groups, we explored this further in an ADMIXTURE analysis. At K = 2, one component (pink) is most enriched in the East Asian Austronesian groups while the other component (blue) is enriched in the Papuan highlanders (supplementary fig. S3, Supplementary Material online); all other Oceanian groups (including the Massim) exhibit a mixture of these two genetic ancestry components, albeit in different proportions. The northern Massim individuals, who speak Austronesian languages, have the highest amount of Austronesian-related ancestry (∼52% pink), while the Papuan-speaking Rossel islanders from southern Massim have the lowest amount (∼20% pink), and hence the highest amount of Papuan (∼80% blue) ancestry (supplementary table S2, Supplementary Material online). However, several Austronesian-speaking groups in the Massim, including those from Sudest, the western Massim, and Collingwood Bay, also have more Papuan than Austronesian ancestry, suggesting substantial contact between Papuan and Austronesian groups.This large range in variation of Austronesian (∼20–52%) versus Papuan genomic ancestry in the Massim exceeds that of nearby, larger island regions: Austronesian ancestry varies between ∼25 and 36% in the Bismarcks and between ∼38 and 54% in the Solomons (excluding Santa Cruz and Polynesian outliers) (supplementary fig. S3, Supplementary Material online). We therefore investigated the Austronesian versus Papuan ancestry in the Massim in more detail via outgroup f3 statistics. A plot of the f3 values measuring shared drift between Oceanian populations and East Asian Austronesians, versus the f3 values measuring shared drift between Oceanian populations and PNG highlanders (supplementary fig. S4, Supplementary Material online), gives results similar to the PCA plot (fig. 1), namely that Massim groups harbor both Austronesian and Papuan ancestry with different proportions depending on their locations within the Massim region. As before, northern Massim groups have more Austronesian ancestry while Rossel Islanders from southern Massim have more Papuan ancestry. As in the PCA plot, the Massim groups are distributed along a linear cline between PNG highlanders with high amounts of Papuan ancestry at one pole, and Polynesian outliers in the Solomons with high amounts of Austronesian ancestry at the other pole. Moreover, the shape of the f3 plot suggests a single major admixture episode rather than continuous gene flow or a tree-like history (Bergstrom et al. 2020).In addition to the drift/allele-sharing analyses, a haplotype-based method (GLOBETROTTER) also suggests a single pulse of admixture for all the Massim groups except for Rossel (supplementary fig. S5, Supplementary Material online); Rossel exhibits an unclear admixture signal probably due to the low amount of Austronesian ancestry. Moreover, it is likely that the incoming population was already admixed (supplementary fig. S5, Supplementary Material online), that is, had both Austronesian and Papuan ancestry. Consistently, the percentage of modeled Austronesian-related ancestry is highest in the northern Massim groups (∼42%), and the Papuan-related ancestry is highest in the Rossel group from southern Massim (∼73%; supplementary table S2, Supplementary Material online). GLOBETROTTER also infers admixture dates of ∼1–3 kya for the Massim groups, and similar dates are inferred with another method, MALDER (supplementary fig. S6, Supplementary Material online); additionally, MALDER does not find evidence for more than one admixture event. The inferred dates are not correlated with the amount of Austronesian ancestry (supplementary fig. S7, Supplementary Material online), suggesting that the higher amounts of Austronesian ancestry in some Massim groups cannot be explained by a longer period of contact, but instead must reflect other social circumstances.
Austronesian and Papuan Ancestries Specific to the Massim
In the ADMIXTURE analysis, the lowest cross-validation error (and hence, the best-supported number of ancestry components) occurred at K = 8 (supplementary fig. S8, Supplementary Material online). Six ancestry components were found in the Oceanians (fig. 2); the other two were restricted to Africans, and to Europeans and East Asians, respectively. The ancestry components found in Oceanians include:
Fig. 2.
ADMIXTURE analyses focusing on Oceanian structure. ADMIXTURE results for K = 8 (the optimal K with the lowest cross-validation error; supplementary fig. S8, Supplementary Material online) on the top, plotted on a map at the bottom. Color bars on the top indicate the region (RG), language family (LF) for the Oceanian groups, and Massim region (MR) for the Massim groups. The size of the circle for each group on the map is proportional to the sample size.
a light green component at the highest frequency in Chimbu and Madang highlanders and also enriched in Western and Eastern highlanders; this component is present at uniformly low frequencies across the Massim.a blue component at the highest frequency in Southern PNG highlanders and Enga and enriched in other mainland PNG groups; this component is present up to medium frequency in the Massim, Bismarcks, and Santa Cruz.a pink component enriched in Central PNG that exists in low frequency in a few other lowland PNG groups, the Solomons (including the Polynesian outlier Tikopia, but not the other Polynesian outlier, Bellona_Rennell), and the Bismarcks; and is the major component in all Massim groups except Rossel and neighboring Sudest.a black component that is at the highest frequency in the Rossel group, second highest in the neighboring group from Sudest, and at lower frequency in several other groups from the western and southern Massim.a dark green component enriched in Bougainville, the Bismarcks and the Solomons (including the Polynesian outlier Tikopia, but not the other Polynesian outlier, Bellona_Rennell); this component is present at very low frequency in some PNG mainland groups but notably absent from the Massim.a peach component that is the only component in the Polynesian outlier Bellona_Rennell, and is found in high frequency in Tikopia and low frequency in other Solomons and Bismarck groups; this component is not found in the Massim.ADMIXTURE analyses focusing on Oceanian structure. ADMIXTURE results for K = 8 (the optimal K with the lowest cross-validation error; supplementary fig. S8, Supplementary Material online) on the top, plotted on a map at the bottom. Color bars on the top indicate the region (RG), language family (LF) for the Oceanian groups, and Massim region (MR) for the Massim groups. The size of the circle for each group on the map is proportional to the sample size.The blue and pink components detected at K = 8 (fig. 2) overlap largely, but not completely, with the results for the Oceanian groups observed at K = 2 (supplementary fig. S3, Supplementary Material online). All of the groups exhibiting any of the four components newly detected with K = 8 (light green, black, dark green, and peach) in the Oceanians have more Papuan than Austronesian ancestry detected at K = 2 (supplementary fig. S3, Supplementary Material online) except for the Polynesian outliers (peach), which suggests substantial variation in Papuan ancestry as well as a drifted Austronesian ancestry in the Polynesian outliers. Similar results were obtained with a more strict threshold for kinship quality control, suggesting that this observed pattern is not due to cryptically related individuals (supplementary figs. S8and S9, Supplementary Material online). Hence, we further investigated the Papuan and Austronesian ancestry-specific variation in the Massim.We first calculated f4-statistics of the form f4(Test groups, East Asian Austronesians; Chimbu/Bougainville/Rossel, Southern Highlanders) to test for different affinities of each test group (all other Oceanian groups) with Chimbu, Bougainville, or Rossel-related ancestry when compared with Southern Highlanders. The results of this f4 statistic will be influenced by both differences in Papuan-related ancestry and different amounts of Austronesian-related ancestry in the test groups. Therefore, to control for the latter, we plotted this f4 statistic against an f4 statistic of the form f4(Test groups, Southern Highlanders; East Asian Austronesians, Australians). This f4 statistic was previously shown mathematically to provide a measure of Austronesian-related ancestry in the test groups (Lipson et al. 2020). Groups exhibiting significant deviations from the regression line in each plot would indicate differential Papuan ancestries in those groups. The results for the comparison of Chimbu versus Southern Highlanders show no significant deviations from the regression line for any Oceanian group (supplementary fig. S10, Supplementary Material online), indicating that all of the groups tested have equal affinities with Chimbu and Southern Highlanders ancestry. In the comparison between Bougainville and Southern Highlanders (supplementary fig. S10, Supplementary Material online), several of the groups from the Bismarcks and the Solomons exhibit stronger affinities with Bougainville, as noted previously (Pugach et al. 2018), while several PNG groups exhibit stronger affinities with the Southern Highlanders; however, none of the Massim groups show any preferential affinities. In the comparison between Rossel and Southern Highlanders (supplementary fig. S10, Supplementary Material online), again several PNG groups exhibit stronger affinities with Southern Highlanders, along with the Polynesian Outliers from the Solomons. Interestingly, there is variation in genomic ancestry among the Massim groups: southern Massim groups share more ancestry with Rossel; Collingwood Bay groups (from mainland PNG) share more ancestry with the Southern Highlanders; and the remaining Massim groups do not share excess ancestry with either Rossell or Southern Highlanders.We then carried out a haplotype-based method of analysis (ChromoPainter) in which we restricted Massim groups to be recipients and not donors (with the exception of Rossel, in order to have a source of the putative distinct Papuan ancestry in the southern Massim). The results (supplementary fig. S11, Supplementary Material online) are consistent with the f4 analyses and strongly support the existence of more than one distinct Papuan-related ancestry in the Massim groups (one related to Rossel, and at least one more related to other Papuan groups).Switching Australians to non-Austronesian East Asians in the f4(Test groups, Southern Highlanders; East Asian Austronesians, Australians) further confirmed that the non-Papuan ancestry in the Massim groups is associated with Austronesian ancestry (supplementary fig. S10, Supplementary Material online). We next used f4-statistics to investigate variation in Austronesian-related ancestry in the Massim groups and found that Massim groups are equally related to the most differentiated Austronesian-related ancestries, East Asian Austronesians versus Polynesian outliers (supplementary fig. S12, Supplementary Material online), suggesting no significant differences in the source(s) of Austronesian-related ancestry among the Massim groups.To study the ancestry-specific variation we detected in the Massim in more detail, we applied local ancestry inference via RFMix and extracted Austronesian and Papuan ancestry-specific segments. The amount of Austronesian-related ancestry inferred by RFMix is slightly lower than that inferred by the ADMIXTURE and GLOBETROTTER analyses, while similar proportions of Papuan ancestries are inferred by all three methods (supplementary table S2, Supplementary Material online). This suggests that the Papuan ancestry-specific segments are identified with high confidence, but there is more uncertainty in inferring Austronesian-related ancestry segments, probably due to a poorer proxy for this source. A PCA based on Papuan ancestry-specific segments shows three poles, consisting of the Bismarck and Solomon groups, Rossel, and Papuan highlanders (fig. 3; supplementary fig. S13, Supplementary Material online), suggesting that the Papuan ancestry associated with Rossel probably reflects strong genetic drift, as further evidenced by the high degree of sharing of identity-by-descent (IBD) fragments and runs of homozygosity (ROH) within Rossel (supplementary fig. S14, Supplementary Material online). The Bismarck and Solomon groups are separated from Rossel and Papuan highlanders at PC1, suggesting the Papuan ancestry in Rossel is closer to Papuan highlanders than to the Papuan ancestry of Bismarck and Solomon groups. This is further supported by Papuan ancestry-specific ADMIXTURE and TreeMix results (supplementary figs. S15 and S16, Supplementary Material online). In contrast to the Papuan ancestry-specific PCA, in the Austronesian ancestry-specific PCA all of the Oceanian groups fall together in a cluster quite distinct from East Asian groups (fig. 3 and supplementary fig. S13, Supplementary Material online) with Polynesian outliers at the extreme pole; this lack of distinction in the Austronesian-related ancestry of Massim groups is consistent with the results from the allele-sharing statistics (f3 and f4) and further supported by an Austronesian ancestry-specific ADMIXTURE analysis (supplementary fig. S17, Supplementary Material online).
Fig. 3.
Papuan/Austronesian ancestry-specific PCA of Oceanian (and East Asian) groups. Plot of the median position of PC1 versus PC2 for individuals from Oceanian groups (see supplementary fig. S13, Supplementary Material online for the PCA plot of all Oceanian individuals) and using only (A) Papuan or (B) Austronesian segments identified by RFMix, colored according to region; Massim regions are further highlighted.
Papuan/Austronesian ancestry-specific PCA of Oceanian (and East Asian) groups. Plot of the median position of PC1 versus PC2 for individuals from Oceanian groups (see supplementary fig. S13, Supplementary Material online for the PCA plot of all Oceanian individuals) and using only (A) Papuan or (B) Austronesian segments identified by RFMix, colored according to region; Massim regions are further highlighted.
Recent Contact Involving Massim Groups and Implications for the Kula Tradition
To investigate in more detail genetic contacts involving the Massim groups in the recent past, we analyzed sharing of IBD segments. Different size ranges of IBD segments are informative about contact at different times, extending back to ∼4 kya (Ralph and Coop 2013; Al-Asadi et al. 2019). In the range of 1–5 cM, which roughly corresponds to an average of ∼2.7 kya, that is, around the time of the arrival of Austronesians in the region (Spriggs 1995; Shaw 2016), there is strong IBD sharing among Oceanian groups, in particular among the Massim, the Bismarcks, and Solomons, and among the Papuan highlanders (fig. 4; supplementary fig. S18, Supplementary Material online). In the range of 5–10 cM, roughly corresponding to ∼675 ya, the strong connection between mainland PNG and the offshore islands in Massim is no longer there, but the connection between northern and southern Massim remains. In the range of over 10 cM or roughly 225 ya, strong sharing is only seen within some parts of the Papuan highlands, northern Massim, and southern Massim, respectively. To further test the idea that the extensive IBD sharing in the range of 1–5 cM reflects the Austronesian settlement, we extracted and analyzed Austronesian ancestry-specific and Papuan ancestry-specific IBD segments. The results show that in the range of 1–5 cM, Austronesian ancestry-specific IBD segments account for most of the sharing between regions, while Papuan ancestry-specific IBD sharing shows a clear distinction between the PNG mainland and Massim groups, versus the Bismarcks and Solomons (fig. 4 and ; supplementary figs. S19 and S20, Supplementary Material online). In particular, the strong sharing between many Massim groups and the Central Province group in the ChromoPainter analysis (supplementary fig. S11, Supplementary Material online) can be attributed to Austronesian-related ancestry, as this accounts for the IBD sharing between them (supplementary figs. S18 and S19, Supplementary Material online). And, the closer relationship in Papuan-related ancestry between the PNG mainland and Massim groups, versus either of these and the Bismarck/Solomon groups, is consistent with the PCA/ADMIXTURE/TreeMix based on Papuan-related ancestry (fig. 3; supplementary figs. S15 and S16, Supplementary Material online).
Fig. 4.
IBD sharing network for Oceanians. Network visualizations of the mean summed length of (A) all shared IBD blocks, (B) shared Austronesian ancestry-specific IBD blocks; and (C) shared Papuan ancestry-specific IBD blocks. Identified IBD blocks are in the range of 1–5, 5–10 cM, and over 10 cM, with the mean number of shared IBD blocks equal to 0.5 or more. The group points are colored according to region. The heat plot segments are proportional to the average of the summed IBD length (cM).
IBD sharing network for Oceanians. Network visualizations of the mean summed length of (A) all shared IBD blocks, (B) shared Austronesian ancestry-specific IBD blocks; and (C) shared Papuan ancestry-specific IBD blocks. Identified IBD blocks are in the range of 1–5, 5–10 cM, and over 10 cM, with the mean number of shared IBD blocks equal to 0.5 or more. The group points are colored according to region. The heat plot segments are proportional to the average of the summed IBD length (cM).Finally, to investigate the potential relationship between the Kula tradition and genomic variation, we compared the IBD sharing between Kula-practicing versus non-Kula-practicing groups (in short, Kula vs. non-Kula). We measured IBD similarities among Kula/non-Kula taking the differences in IBD sharing within groups into account by dividing the mean IBD sharing between groups by the mean IBD sharing within groups. A plot of IBD similarity versus geographic distance, for different lengths of IBD segments (corresponding to different time periods) shows that Kula groups exhibit higher IBD similarities than non-Kula groups separated by the same geographic distance (supplementary fig. S21, Supplementary Material online). To test for statistical significance and further controlling for background sharing through time by additionally including groups outside Massim, we then computed a relative IBD similarity statistic that takes into account the overall IBD sharing among all Oceanians for different IBD segment size bins. We find that relative IBD similarity statistics are significantly positive for all Massim groups (i.e., both Kula and non-Kula groups), indicating that the Massim region overall shows evidence of more contact among groups than does the rest of Oceania. Moreover, Kula groups show constantly and significantly higher relative IBD similarities than non-Kula groups (fig. 5; supplementary fig. S22, Supplementary Material online), even before the formation of the Kula ring tradition ∼500 ya (Shaw 2016; Irwin et al. 2018).
Fig. 5.
Relative IBD similarities of Kula versus non-Kula groups. The lower and upper bound lengths of the IBD blocks used in the calculations are indicated at the top and the bottom of the plot, respectively. These intervals roughly correspond to ∼2.7, ∼1.5, ∼1.1, ∼0.8, ∼0.7, ∼0.6, and ∼0.2 thousand year ago (kya). The distance between the two horizontal lines on each point indicate ±3 standard errors.
Relative IBD similarities of Kula versus non-Kula groups. The lower and upper bound lengths of the IBD blocks used in the calculations are indicated at the top and the bottom of the plot, respectively. These intervals roughly correspond to ∼2.7, ∼1.5, ∼1.1, ∼0.8, ∼0.7, ∼0.6, and ∼0.2 thousand year ago (kya). The distance between the two horizontal lines on each point indicate ±3 standard errors.
Discussion
Our analyses of genome-wide data from an extensive sampling of individuals from across the Massim, together with data from other regions, indicate that all Massim groups share both Austronesian-related and Papuan-related ancestry, in agreement with the previous study of uniparental markers (van Oven et al. 2014). However, we also find important regional distinctions within the Massim with respect to various aspects of these ancestries.First, there is more Austronesian-related ancestry in the northern Massim (average based on ADMIXTURE of 52%; supplementary fig. S3 and table S2, Supplementary Material online) than in the other regions (average of 34%, with Rossel having the lowest amount, 20%). In general, there is also more Austronesian-related ancestry for the uniparental markers in the northern Massim than in the other regions, although Rossel does not have a particularly low amount (van Oven et al. 2014). While the higher amount of Austronesian-related ancestry may reflect additional pulses of admixture or more prolonged contact with Austronesians in the northern Massim, the estimated admixture dates do not correlate with the amount of Austronesian ancestry (supplementary fig. S7, Supplementary Material online). Moreover, the GLOBETROTTER results suggest single pulses of admixture (supplementary fig. S5, Supplementary Material online), as does the linear relationship in the f3 values between the Massim and Papuans versus those between the Massim and Austronesians (supplementary fig. S4, Supplementary Material online) (Bergstrom et al. 2020). We also do not see any differences in the Austronesian-related ancestry of different Massim groups (fig. 3; supplementary figs. S12 and S17, Supplementary Material online). These results therefore suggest differences in the contact relationships between the indigenous Papuan groups and the incoming Austronesians in the northern Massim versus elsewhere in the Massim, with Rossel being the least-impacted by the Austronesians, possibly due to its more remote location. It is also notable that while all of the Massim groups studied (with the exception of Rossel and 5 individuals from groups on mainland PNG) speak Austronesian languages, Austronesian ancestry is in the minority in most of the groups, and reaches a maximum of 52% in the northern Massim.Second, the Papuan-related ancestry of Papuan-speaking Rossel Islanders is distinct from Papuan-related ancestries identified previously and elsewhere in Near Oceania. Previous studies of genome-wide data have shown a major distinction between Papuan ancestries in the PNG Highlands versus Bougainville (Pugach et al. 2018), and in the western versus eastern regions of the PNG Highlands (Bergström et al. 2017; Brucato et al. 2021). We find support for both of these distinctions, as well as for a distinct Papuan-related ancestry on Rossel (figs. 2 and 3; supplementary figs. S10and S15, Supplementary Material online) that is also present in lower amounts in other Massim groups, especially from the southern Massim. In addition to this Rossel-related Papuan ancestry, Massim groups (except Rossel) also have, at frequencies of 1–63%, another Papuan-related ancestry that is at the highest frequency in the western PNG Highlands and the southern PNG lowlands (fig. 2; supplementary fig. S15, Supplementary Material online). Whether this ancestry predates, is associated with, or postdates the arrival of the Austronesians cannot be determined from our data. The existence of a distinct Papuan-related ancestry on Rossel could reflect initial colonization by people with this ancestry, subsequent isolation and genetic drift, or both. Further attesting to its isolation is the fact that the unique Papuan-related (non-Austronesian) language, Yélî Dnye, is spoken on Rossel; Yélî Dnye has been variously classified as either a language isolate (Levinson and Majid 2013; Stebbins et al. 2017) or possibly related to Anêm and Ata, two languages of West New Britain in the Bismarck Archipelago (Ross 2005; Eberhard et al. 2021). Moreover, pottery was introduced relatively late on Rossel, around 500–550 years ago, compared with around 2.8 kya elsewhere in the southern Massim (Shaw et al. 2016). The very high amount of sharing of IBD segments within Rossel (supplementary fig. S15, Supplementary Material online) further supports isolation and genetic drift as responsible for the development of the distinct Papuan-related ancestry on Rossel. We further note that the genetic isolation of Rossel in comparison to other Massim groups was not as apparent in a previous study of mtDNA and Y-chromosome variation in these same samples (van Oven et al. 2014), attesting to the value of genome-wide data for studies of human population history.Third, there is a striking contrast in patterns of sharing of IBD segments of Austronesian versus Papuan ancestry between groups. There is extensive sharing of short IBD segments (1–5 cM) across the studied Massim region (fig. 4; supplementary fig. S18, Supplementary Material online), and Austronesian segments are widely shared among the PNG lowland and island groups but not with the highlands (fig. 4), in keeping with previous observations of a lack of Austronesian-associated ancestry in the PNG highlands (Stoneking et al. 1990; Bergström et al. 2017; Pugach et al. 2018; Brucato et al. 2021). However, sharing of Papuan segments is strictly either among mainland PNG and Massim groups or among groups from the Bismarcks and Solomons; remarkably, there is no sharing of Papuan-related IBD segments between any mainland PNG or Massim group and any group from the Bismarcks or Solomons (fig. 4). The Papuan ancestry-specific PCA/ADMIXTURE/TreeMix results also suggest a closer relationship between the mainland PNG and Massim groups compared with the Bismarcks and Solomons (fig. 3; supplementary figs. S15 and S16, Supplementary Material online). The extensive sharing of Austronesian-related IBD segments of 1–5 cM, which roughly corresponds to a time of ∼2.7 kya (Al-Asadi et al. 2019), is in keeping with the large impact of the Austronesian expansion, which spread rapidly across the lowland and island regions of Near Oceania shortly after its arrival around 3 kya (Spriggs 1995; Shaw 2016; Kirch 2017). The distinct geographic pattern in the sharing of Papuan-related ancestry—which, to the best of our knowledge, has not been noted previously—must have a different explanation; it cannot reflect the spread of people with both Austronesian and Papuan ancestry as then there should not be a distinction between patterns of IBD sharing for Austronesian-related versus Papuan-related segments (although we cannot rule out that the spread of Austronesian-related ancestry included a small amount of Papuan-related ancestry). We speculate that the arrival of the Austronesians may have impacted the indigenous Papuan societies, as documented in a recent archeological study (Shaw et al. 2022), resulting in enhanced movement of Papuans within these two geographic regions (mainland/Massim, and Bismarck/Solomon Archipelagos). However, IBD sharing of segments of 1–5 cM reflects a time span of a few thousand years, and so the results in figure 4 could also reflect the movement of people prior to the arrival of the Austronesians, or movement unrelated to the arrival of the Austronesians (Brucato et al. 2021). For example, the flooding of the shallow continental shelf east of New Guinea, following the Last Glacial Maximum, appears to have led to depopulation of the current islands of the Massim region, with subsequent recolonization after ∼5 kya (Shaw, Coxe, Haro, et al. 2020); perhaps the IBD sharing results reflect these movements.Finally, we found increased sharing of IBD segments between Massim groups practicing Kula versus Massim groups that do not participate in this specific trading exchange ritual in the Massim (fig. 5; supplementary figs. S21 and S22, Supplementary Material online). Kula, made famous by Malinowski in his classic work Argonauts of the Western Pacific (1922), is an example of “… gift exchange with delayed reciprocity. Two kinds of objects are passed between a chain of partners in a large maritime region (the ‘Massim’), providing strong networks of support, a competitive element between the participants and the thrill of adventure” (Kuehling 2021). The objects in question consist of decorated shell armbands (mwali) and decorated shell necklaces (bagi or souvlava) which travel in opposite directions through the participating islands (Malinowski 1922; Kuehling 2021); long-distance travel between islands is largely restricted to Kula-related voyages, leading to the expectation that Kula would have an impact on patterns of gene flow between islands. Indeed, a previous study of mtDNA and Y-chromosome variation in the same samples studied here found less genetic structure between islands for the Y-chromosome than for mtDNA, suggesting a potential impact of male-mediated Kula voyages (van Oven et al. 2014). The increased sharing of IBD segments for Kula versus non-Kula groups in the Massim is further evidence for the impact of this cultural practice on patterns of gene flow. However, we note that the increased sharing of IBD segments for Kula versus non-Kula groups occurs across all size classes of IBD segments (fig. 5), and thus has an approximate associated time depth of a few thousand years (Ralph and Coop 2013; Al-Asadi et al. 2019), whereas archeological evidence suggests a time depth for Kula of ∼500 years (Shaw 2016; Irwin et al. 2018), as well as the existence of other inter-island exchange networks that preceded the formation of Kula (Shaw 2016; Shaw and Langley 2017) and from which Kula may have developed. Thus, both archeological and genetic evidence suggest that Kula should be viewed as arising out of a previous history of enhanced contact among the islands involved, rather than necessarily introducing novel avenues of contact that did not exist before. We observed a decline in the relative amount of IBD sharing with longer IBD segments (fig. 5; supplementary figs. S21 and S22, Supplementary Material online) and no apparent change in IBD sharing at the time of Kula ring formation. This may reflect either that the development of Kula was not associated with any increase in the intensity of contact between participating islands, or a lack of power to detect any such increase due to the nature of IBD sharing being restricted to fewer individuals in more recent times (Ralph and Coop 2013; Al-Asadi et al. 2019). Nonetheless, we identified some strong sharing of long IBD segments within northern and southern Massim, suggesting extensive regional interactions in the last few hundred years.In conclusion, our results concerning the Massim region of PNG fill an important lacuna in genetic studies of Near Oceania. Austronesian-related ancestry varies across the region but overall is in the minority, even though Austronesian languages are in the majority. We demonstrate the existence of a distinct Papuan-related ancestry that is associated with Rossel Island and probably arose as a consequence of its isolation. In addition to the expected signal of a rapid and widespread dispersion of Austronesian-related ancestry across coastal and island Near Oceania, we also found an unexpected signal of substantial movement of people with Papuan-related ancestry that was geographically restricted, occurring exclusively between mainland PNG and the Massim region, and between the Bismarck and Solomon Archipelagoes. We speculate that these movements of people with Papuan-related ancestry may reflect a disruptive impact of the arrival of Austronesian people. Finally, we document the effect of a cultural trait, the Kula, on relative amounts of IBD sharing among participating versus nonparticipating groups; the Kula thus joins other examples of cultural traits, such as residence pattern (Seielstad et al. 1998; Oota et al. 2001) and social stratification (Bamshad et al. 1998), that can influence human genetic diversity.
Materials and Methods
Sample and Data Information
Saliva samples were collected in 2001 with the approval of the Medical Board of PNG and with support from the Diocese of Alotau, PNG (Missionaries of the Sacred Heart, M.S.C.), particularly Fr Joe Ensing (M.S.C.) and then Bishop Desmond Charles Moore (M.S.C.). Written informed consent was obtained from each donor, after the project was explained and all questions answered to the satisfaction of the donor. Genetic work within this study was additionally approved by the Ethics Commission of the University of Leipzig Medical Faculty. MtDNA and Y-chromosome data were published in a previous study (van Oven et al. 2014). Here, we generated genome-wide data (∼1.6 million SNPs) for 255 (192 from Massim, 33 from Gulf Province, and 30 from Central Province) individuals on the Illumina Infinium Multi-Ethnic Global Array (MEGA); genotyping was carried out by the Laboratorio de Servicios Genómicos at the Laboratorio Nacional de Genómica para la Biodiversidad, Irapuato, Mexico, on our request. We first merged our newly generated data with published data from Papuan populations on the same SNP array (Bergström et al. 2017), and then with published WGS data from Oceanians, East Asians, Europeans, and Africans (Mallick et al. 2016; Vernot et al. 2016; Choin et al. 2021). For the WGS data, we obtained the jointly re-called genotypes from Choin et al. (2021). To avoid batch effects between the array and WGS data, we extracted the overlapping sites (including both monomorphic and polymorphic sites) and removed sites whose reference versus alternative alleles were inconsistent between the array and WGS data. For sites with more than two alleles, we first flipped the WGS data and then removed those flipped sites that still had more than two alleles. Merging was done using PLINK v1.9 (Purcell et al. 2007). For quality control, we first excluded sites with more than 5% missing data in the entire data set and sites with more than 50% missing data and/or Hardy–Weinberg equilibrium P values below 0.00005 within a population group (except for groups with only one individual). Then, we removed individuals with more than 5% missing data or with parents speaking different languages or coming from different locations. We also filtered out individuals to exclude up to first-degree kinship pairs. Data missingness and Hardy–Weinberg equilibrium were calculated using PLINK v1.9 while the individuals to be removed to avoid first-degree relatedness (kinship coefficient ≥0.177) were inferred using KING (Manichaikul et al. 2010), as implemented in PLINK v2 (Chang et al. 2015). We used KING to remove second-degree relatedness (kinship coefficient ≥0.0884) to further confirm that the components seen in ADMIXTURE results were not affected by cryptic kinship. There are 776 individuals and 1,408,767 SNPs remaining after quality control (supplementary table S1, Supplementary Material online indicates the individuals that were removed and why).The sampled Massim individuals were assigned to one of 14 groups according to parental geographic origins as described in the previous study (van Oven et al. 2014): (1) Trobriand; (2) Gawa; (3) Woodlark (also known as Muyuw); (4) Laughlan (also known as Budibudi); (5) Fergusson (also known as Moratau), including a few individuals from nearby Dobu (also known as Watoa) and Goodenough (also known as Nidula); (6) Normanby (also known as Duau); (7) Milne Bay mainland eastern tip (Mainland eastern tip); (8) Misima, including some individuals from nearby Paneati, Panapompom, and Kimuta; (9) Western Calvados (including Motorina, Bagaman, Utian or Brooker Island, and Panaumala); (10) Eastern Calvados (including: Dadahai, Kuanak or Abaga Gaheia Island, Nimoa, Panatinane or Joannet Island, Panawina, Sabarl, and Wanim or Grass Island); (11) Sudest (also known as Vanatinai or Tagula); (12) Rossel (also known as Yela); (13) Wanigela (and nearby settlements); and (14) Airara (and nearby settlements). Together with a previously studied group from the region (Northern, from North Collingwood Bay) from Bergström et al. (2017), there are data from 15 groups over the entire Massim region including mainland and island parts. Subregion groups were defined according to geography as in the previous study (van Oven et al. 2014): Collingwood Bay (Northern, Wanigela, Airara), western Massim (Mainland eastern tip, Normanby, Fergusson), northern Massim (Trobriand, Gawa, Woodlark, Laughlan), and southern Massim (Misima, Western Calvados, Eastern Calvados, Sudest, Rossel). We assigned Massim groups as participating in the Kula tradition or as nonparticipants according to previous studies (van Oven et al. 2014; Irwin et al. 2018). The locations of the newly studied and the reference groups are shown in figure 1 (and supplementary fig. S1, Supplementary Material online).
Population Structure Analyses
For population structure analyses, variants were pruned beforehand for linkage disequilibrium using PLINK v1.9, excluding one variant from pairs with r2 > 0.4 within windows of 200 variants and a step size of 25 variants. There were 366,193 SNPs left after pruning. PCA was done with smartpca v16,000 (Patterson et al. 2006). Two individuals (papuan6278328 and papuan6278360) from Bergström et al. (2017) were identified as PCA outliers and excluded from this study. We then ran ADMIXTURE v1.3.0 (Alexander et al. 2009) for K = 2 to K = 15 with 100 replicates for each K with random seeds. We used pong v1.4.7 (Behr et al. 2016) to visualize the 20 ADMIXTURE replicates with the highest likelihoods for the major mode at each K. We also plotted K = 2 (to visualize the variation of East Asian- and Papuan-related ancestries) and K = 8 (the K with the lowest cross-validation error; supplementary fig. S8, Supplementary Material online) on a map in R v4.0.3. To investigate if the results might be due to cryptic kinship, we further analyzed results of 20 independent ADMIXTURE replicates for each K from K = 2 to K = 11 (the lowest cross-validation error still occurred at K = 8; supplementary fig. S8, Supplementary Material online) using data with second-degree kinship excluded.
F3 and F4-Statistics
We used admixr v0.9.1 (Petr et al. 2019) from ADMIXTOOLS v7.0.2 (Patterson et al. 2012) to compute f3- and f4-statistics, with significance assessed through block jackknife resampling across the genome. For f3- and f4-statistics, the African groups Mbuti and Yoruba were used together as the outgroup.
Data Phasing
We used consHap (Al Bkhetan et al. 2019) to obtain a consensus of phasing from the results of SHAPEIT v2 (Delaneau et al. 2013), BEAGLE v5.1 (Browning, Zhou, et al. 2018), and EAGLE v2 (Loh et al. 2016) using the genetic map from the 1000 Genomes Project Phase3 (Genomes Project Consortium et al. 2015). In general, phasing accuracy can be increased by increasing the number of iterations and conditioning states on which haplotype estimation is based (Browning and Browning 2011). We therefore ran SHAPEIT v2 with options -burn 10,-prune 10, and -main 30 for iteration number with 500 conditioning states, leaving other parameters as default; BEAGLE v5.1 with options burnin = 12, iterations = 24, phase-states = 24, and ne = 800,000 (this is smaller than the default value, but is recommended by the authors for populations with a smaller effective population size, as expected for island populations in Oceania); EAGLE v2 with -Kpbwt 40,000 (which determines the number of conditioning haplotypes). We did not use a reference panel for phasing as 1,000 Genomes or other available WGS data sets lack representative cohorts for Oceanian populations.
IBD Analyses
We identified shared IBD blocks between each pair of individuals and homozygous-by-descent (HBD; same as ROH) blocks within each individual using refinedIBD (Browning and Browning 2013). Both identified IBD and HBD blocks are considered as IBD blocks in our analyses, which is analogous to pairwise shared coalescence (PSC) segments in a previous study (Al-Asadi et al. 2019). The IBD blocks within a 0.6 cM gap were merged using the program merge-ibd-segments from BEAGLE utilities (https://faculty.washington.edu/browning/refined-ibd), allowing only one inconsistent genotype between the gap and block regions. We used IBD blocks at least 2 cM in length shared by individuals within a population to investigate the demography of each population group. Then, we used IBD blocks in 1–5, 5–10 cM, and over 10 cM to investigate the sharing between individuals from different populations for different time periods (Ralph and Coop 2013; Al-Asadi et al. 2019). For network visualization of the sharing between populations, the pairs with average sharing ≥ 0.5 (i.e., on average at least half of the pairs share IBD blocks) were kept to reduce noise and false positives. We summarize the patterns of shared IBD length within a population by averaging over all comparisons between individuals, that is, we define the average of summed IBD length L aswhere n is the number of individuals in population X and ibd(X) is the length of IBD shared between individuals X and X. For the number of blocks, an analogous equation applies. Similarly, for sharing between two populations X, Y, we define the average of summed IBD length L aswhere m is the number of individuals in population Y. In order to compare IBD sharing between Massim groups (Kula versus non-Kula), we use a statistic motivated by FST (Bhatia et al. 2013). In particular, we define the similarity statistic S asThis statistic will be zero if there is no IBD sharing between X and Y (since L(X, Y) is zero), and it will be one if IBD sharing is independent of population structure, that is, L(X) = L(Y) = L(X, Y). For our analysis, we compute a matrix of pairwise S-statistics for all pairs of populations. In order to test whether there is more recent gene flow within the Kula/non-Kula group relative to the background, we use the relative similarity statistic R aswhere Swithin and Sall are the average pairwise S-statistics within Kula/non-Kula populations and all Oceanian populations, respectively. If the IBD sharing among the Kula/non-Kula groups and among all Oceanians were equal then R will be zero, and the more shared migration the Kula/non-Kula groups have, the larger R will be. We calculate the standard error of the R-statistic by jackknife resampling the chromosomes. The significance of the R-statistic between Kula- versus non-Kula groups was estimated by comparing the observed value to the distribution of simulated values generated by randomly assigning Massim groups to either Kula- or non-Kula groups and calculating the difference in relative IBD similarity. Scripts made for analyzing IBD results are available from https://github.com/dangliu/Massim_project.
ChromoPainter and GLOBETROTTER Analyses
To study haplotyple sharing, ChromoPainter v2 (Lawson et al. 2012) was run on the phased data set with sample sizes for each group randomly down-sampled to 5 (all individuals were used for groups with size <5). We began with 10 iterations of the EM (expectation maximization) process to estimate the switch rate and global mutation probability, using chromosomes 1, 5, 10, 15, and 20. With the estimated switch and global mutation rates, we ran the chromosomal painting process for all chromosomes, which then gave the output for downstream analyses. We first attempted to paint the Massim chromosomes using them only as recipients and all of the other individuals as both donors and recipients. The EM estimation of switch rate and global mutation probability were ∼142.27 and ∼0.0002, respectively, which were then used as the starting values for these parameters for all donors in the painting process. To investigate differential affinities of Massim groups to PNG highlanders versus Rossel, we also performed another run using the Massim (except for Rossel) only as recipients and all of the other individuals (including Rossel) as both donors and recipients. The EM estimation of switch rate and global mutation probability for this analysis were ∼142.21 and ∼0.0002, respectively.To investigate the admixture of Austronesian- and Papuan-related ancestries, GLOBETROTTER (Hellenthal et al. 2014) was run on the ChromoPainter output with all Massim as recipients and East Asians (including both Austronesian and non-Austronesians to increase power) and Papuan Highlanders (who showed no more than 0.0001 East Asian-related ancestry in the ADMIXTURE results for K = 2) as surrogates. We first tested the certainty and potential waves of admixture events, and then estimated the major and minor sources as well as the dates of admixture. The Rossel group showed an unclear signal for admixture inference, while a single pulse of admixture was inferred for the other groups. The distributions of admixture dates were estimated via 100 bootstraps.
MALDER Admixture Dating
We used MALDER (https://github.com/joepickrell/malder; last accessed on May 25, 2022) with default settings, using East Asian Austronesian and Southern Highland groups as reference sources, to date the time of admixture of Austronesian and Papuan ancestries in Massim groups.
Local Ancestry Inference and Ancestry-Specific Analyses
We ran the PopPhased mode of RFMix v1.5.4 (Maples et al. 2013) to infer local ancestry across the genome for each individual in our data set with options −e 3 (three EM iterations), −G 85 (85 generations since the admixture event; as suggested in a previous study (Bergström et al. 2017)), −n 5 (minimum five reference haplotypes per tree node; as suggested by the authors of RFMix), –use-reference-panels-in-EM (the reference samples were used in EM iterations), and the East Asians (including both Austronesian and non-Austronesians to increase power) and Papuan highlanders (who showed no more than 0.0001 East Asian-related ancestry in the ADMIXTURE results for K = 2) as reference panels. The Papuan highlanders were randomly down-sampled to 50 individuals, to be the same as the sample size of East Asians. We used 0.95 as the cutoff of forward–backward probability (i.e., the posterior probability of an inferred ancestry at an SNP in a haplotype). We interpreted the inferred East Asian-related segments as Austronesian-related segments, as indicated in f4 results (supplementary fig. S10, Supplementary Material online). Global ancestry for each Massim group was calculated from the local ancestry profiles of the corresponding individuals. Papuan-/Austronesian-related ancestry-specific PCA was performed with PCAmask (Moreno-Estrada et al. 2013). Analyses of local ancestry inference, global ancestry calculation, and ancestry-specific PCA were done using a pipeline from a previous study (Martin et al. 2017); scripts for preparing input files for these analyses were modified from the pipeline (available from https://github.com/dangliu/Massim_project). We generated Papuan-/Austronesian-related ancestry masked genomes using our own script (available from https://github.com/dangliu/Massim_project) for ancestry-specific ADMIXTURE and TreeMix analyses. We ran ADMIXTURE v1.3.0 for K = 2 to K = 7 and for K = 2 to K = 6 with 20 replicates for each K on Austronesian and Papuan ancestry masked data, respectively. We ran TreeMix v1.12 (Pickrell and Pritchard 2012) only for Austronesian ancestry masked data (as there were too few SNPs in the Papuan ancestry masked data for this analysis) with 0–2 migration events and ten independent runs, and selected the topology with the highest likelihood for further investigation. Papuan/Austronesian ancestry-specific PCA, ADMIXTURE, and TreeMix analyses were all performed on East Asian and Oceanian individuals with at least 33% Papuan-/Austronesian-related ancestry based on the ADMIXTURE results for K = 2, respectively. The same local ancestry-inferred results were applied to IBD results to study Papuan-/Austronesian-related ancestry-specific IBD. Ancestry-specific IBD analyses were carried out by following the pipeline described in a previous study (Browning, Browning, et al. 2018).Click here for additional data file.
Authors: Alicia R Martin; Christopher R Gignoux; Raymond K Walters; Genevieve L Wojcik; Benjamin M Neale; Simon Gravel; Mark J Daly; Carlos D Bustamante; Eimear E Kenny Journal: Am J Hum Genet Date: 2017-03-30 Impact factor: 11.025
Authors: Benjamin Vernot; Serena Tucci; Janet Kelso; Joshua G Schraiber; Aaron B Wolf; Rachel M Gittelman; Michael Dannemann; Steffi Grote; Rajiv C McCoy; Heather Norton; Laura B Scheinfeldt; David A Merriwether; George Koki; Jonathan S Friedlaender; Jon Wakefield; Svante Pääbo; Joshua M Akey Journal: Science Date: 2016-03-17 Impact factor: 47.728
Authors: Chris Clarkson; Zenobia Jacobs; Ben Marwick; Richard Fullagar; Lynley Wallis; Mike Smith; Richard G Roberts; Elspeth Hayes; Kelsey Lowe; Xavier Carah; S Anna Florin; Jessica McNeil; Delyth Cox; Lee J Arnold; Quan Hua; Jillian Huntley; Helen E A Brand; Tiina Manne; Andrew Fairbairn; James Shulmeister; Lindsey Lyle; Makiah Salinas; Mara Page; Kate Connell; Gayoung Park; Kasih Norman; Tessa Murphy; Colin Pardoe Journal: Nature Date: 2017-07-19 Impact factor: 49.962
Authors: Daniel Falush; Simon Myers; Garrett Hellenthal; George B J Busby; Gavin Band; James F Wilson; Cristian Capelli Journal: Science Date: 2014-02-14 Impact factor: 47.728
Authors: Anders Bergström; Stephen J Oppenheimer; Alexander J Mentzer; Kathryn Auckland; Kathryn Robson; Robert Attenborough; Michael P Alpers; George Koki; William Pomat; Peter Siba; Yali Xue; Manjinder S Sandhu; Chris Tyler-Smith Journal: Science Date: 2017-09-15 Impact factor: 47.728
Authors: Po-Ru Loh; Petr Danecek; Pier Francesco Palamara; Christian Fuchsberger; Yakir A Reshef; Hilary K Finucane; Sebastian Schoenherr; Lukas Forer; Shane McCarthy; Goncalo R Abecasis; Richard Durbin; Alkes L Price Journal: Nat Genet Date: 2016-10-03 Impact factor: 38.330