Literature DB >> 31684976

Potential risks and solutions for sharing genome summary data from African populations.

Abstract

Genome data from African population can substantially assist the global effort to identify aetiological genetic variants, but open access to aggregated genomic data from these populations poses some significant risks of community- and population- level harms. A recent amendment to National Institutes of Health policy, following various engagements with predominantly North American scientists, requires that genomic summary results must be made available openly on the internet without access oversight or controls.The policy does recognise that some sensitive, identifiable population groups might be harmed by such exposure of their data, and allows for exemption in these cases. African populations have a very wide and complex genomic landscape, and because of this diversity, individual African populations may be uniquely re-identified by their genomic profiles and genome summary data. Given this identifiability, combined with additional vulnerabilities such as poor access to health care, socioeconomic challenges and the risk of ethnic discrimination, it would be prudent for the National Institutes of Health to recognise the potential of their current policy for community harms to Africans; and to exempt all African populations as sensitive or vulnerable populations with regard to the unregulated exposure of their genome summary data online.Three risk-mitigating mechanisms for sharing genome summary results from African populations to inform global genomic health research are proposed here; namely use of the Beacon Protocol developed by the Global Alliance for Genomics and Health, user access control through the planned African Genome Variation Database, and regional aggregation of population data to protect individual African populations from re-identification and associated harms.

Entities: Chemical Disease Gene Species

Keywords: African diversity; African genomes; Community harms; Genome summary results

Mesh：

Year: 2019 PMID： 31684976 PMCID： PMC6830006 DOI： 10.1186/s12920-019-0604-6

Source DB: PubMed Journal: BMC Med Genomics ISSN： 1755-8794 Impact factor: 3.063

Background

Because of the complexity and depth of African genomes compared to rest-of-world populations, genome summary data that include population allele frequencies from African populations can greatly enhance identification of disease-causing and other variants in African as well as rest-of-world research, and advances in health genomics research on the African Continent can contribute meaningfully to biomedical research globally [1]. Since 2008, genomic summary results (GSR) had been archived in controlled-access portions of NIH-designated data repositories due to concerns that an individual’s inclusion in a group could be ascertained given their whole genome data [2]. In November 2018, the National Institutes of Health (NIH) released a statement updating their policy on management of access to GSR, based on recent workshops and various engagement mechanisms undertaken in the USA to explore access options for sharing GSR. The NIH concluded that respondents in general believed that benefits of open access to GSR outweigh the risks. This informed the subsequent NIH requirement that GSR generated with NIH funding should be made freely available on the internet with no access restriction – with the caveat that some sensitive population groups could be exempt from this requirement due to a risk of stigmatisation of specific communities or populations. This amended policy also applies to research programs in Africa that are funded by the NIH, and it is important to review how the policy might affect the protection of African study participants and their communities, particularly as it appears that there was no documented engagement with African stakeholders when considering the amendment of the policy. According to documented elements of the public engagement process, the NHGRI Workshop on Aggregate Genomic Data (May 2016), had predominantly North American attendees and no registered African representatives [3]. A “Request for Information” call in 2017 [4] recorded responses from 109 parties (37 of whom appeared to be users of ExAC and gnomAD databases who were solicited to respond using standardised text), of whom 79% were scientific researchers [5] and none were African [6]. Finally, the GSR access policy was discussed at a Genomic Variation Program Workshop on Establishing a Central Resource of Data from Genome Sequencing Projects (June 2012) [7], which also had no African representation in speakers or scheduled content, although the participant list of this workshop is not available to confirm whether Africans were present.

Main text

It is, however, important to consider the genomic depth and breadth of African genomes and the consequent ability to genetically distinguish small populations and communities from each other, often in approximation of ethnicity or ancestral lineage [8, 9]. This inherent genomic complexity of African populations is often disregarded in Caucasian-centric policies and recommendations, and community or population-level risks may be overlooked because such re-identification of specific Caucasian communities using genomic data is unlikely. Current National boundaries in Africa were arbitrarily defined during colonisation, and multiple African populations may co-exist in a single Nation, which in some cases has resulted in tensions between different population groups. The ability to fine-map population-level genomic data to specific communities comes with inherent community-level risks that have already been experienced by minority Indigenous populations on other continents – such as the experiences of the Havasupai Native Americans [10, 11], or the negative implications of genomic research for the San population in Southern Africa [12, 13]. History is littered with examples of opaque, invasive, and often poor quality research that has damaged vulnerable communities [14]; such as the Xavante and Yanomani populations in Brazil [15, 16], or the Indigenous populations of Australia and New Zealand [17, 18]. It is notable that three of the respondents to the NIH “Request for Information” were representing Native American Tribes, namely the Sault Ste. Marie Tribe of Chippewa Indians (submitted by Larry Jacques), the Southcentral Foundation (submitted by Denise A Dillard), and United South and Eastern Tribes Sovereignty Protection Fund (submitted by Liz Malerba). These commentaries all included strong recommendations that any genomic information should be reviewed by Tribal review boards and/or community representatives before release. Concerns were expressed that unlimited and indefinite use of genomic summary data without oversight is dangerous to the ongoing trust relationship between Tribal populations and the NIH; that ongoing and future determination of harms from genomic information collected from tribal populations must be facilitated; and that NIH program officers and scientific reviewers might push widespread data sharing in direct contradiction to tribes’ requirements as sovereign nations [6]. As with Indigenous populations in North America and Australasia, as well as other sensitive populations across the globe, full genomic summary data for identifiable African populations or communities published online without any oversight could expose these people to a high risk of discrimination or stigmatisation. As further variant-phenotype associations are discovered, allele frequencies for these variants can be assayed in different populations using GSR, and predictions made about trait prevalence in those populations. The genomic diversity and distance between different African populations is sufficiently large, even on a local scale, that genomic summary data can uniquely identify individual communities [8, 9] who can be geographically located, and associated phenotypes can be ascribed to those specific communities based on their aggregated allele frequencies. Given known examples of ethnic discrimination, violence and xenophobia within Africa [19], as well as unfortunate historical and ongoing misappropriation of genetic data to publicly denigrate African populations [20, 21], the open availability of summary genome data for distinct African ethnic groups may be unethical because of the untenable risk of harms accruing to those populations. As such, African populations should all be regarded as ‘special populations’ for the purposes of the new NIH policy to ensure they are protected from such harms, in line with conclusions drawn in that policy that privacy risks related to broad access to GSR may be heightened for some study populations. Furthermore, participants who have provided DNA samples to date are unlikely to have consented to have their data shared openly without Access Committee oversight; and specific consent for aggregate data sharing - with full participant information about potential harms - is needed from individuals as well as generally accepted representative community organisations before further sharing of these aggregated data. Here, we propose a framework for the use of GSR from African populations that could greatly reduce the risk to African participants, whilst still facilitating the general use of African summary genomic data to inform and advance global research to identify aetiological variants and contribute to advancing health research. This framework has three components that provide options for appropriate levels of summary data use.

Use of GA4GH beacons

In this use case, a researcher seeking to prioritise candidate disease-causing variants in another population could check whether candidate variants have been identified in African populations, if so, at what frequency, and/or whether they have been associated with a specific disease in African populations. The Global Alliance for Genomics and Health (GA4GH) Beacon protocol [22, 23] allows researchers to make limited queries as to whether a particular variant has been seen in a particular dataset, thus encouraging sharing of information without compromising privacy, with proposed extensions to include queries of variant-phenotype associations through direct online access. Query rate limits can be used to restrict abuse of the system by “walking” across the genome using thousands of queries of the same aggregated dataset, but without restricting ease of access for honest research purposes.

Registered user access through the African genome variation database

The African Genome Variation Database (AGVD) is under development as a project of the H3Africa Informatics Network [24, 25], and aims to be a resource for exploring African variation data available to registered users. Regionally-aggregated genomic data summaries – for example for North, South, West and East Africa - can be made available for bona fide researchers who are reviewed as part the AGVD general administration for registered users. It is likely that such summaries will provide valuable allele frequency data for regional groupings without exposing communities or populations to potential harms; and that a genetic diversity metric such as Fst [26, 27] could determine an aggregation level that provides some granularity without exposing individual populations or communities.

Access to study population pre-calculated genomic summary data through applications reviewed by an access committee

Where requests for summary data cannot be met by the processes outlined above, applications for population-specific summary data could be made through an appropriately constituted Access Committee, which should normally already be in place to administer access to individual-level genotype data where secondary data use consents are in place. It is likely that only in a small subset of cases would this detail review be required, as beacons and regional summaries should answer many of the use cases for external researchers. Should the number of requests become unmanageable for an existing access committee, a subcommittee could be constituted of individuals who are qualified to review specifically these requests under the oversight of the main committee. Where genotype data are submitted to central repositories such as the European Genome-Phenome Archive (EGA) [28], access to African genome summary data might be managed similarly to whole dataset requests in cases where Beacons or regional aggregated data do not suffice.

Conclusions

In conclusion, genome summary data from studies of African populations can substantially enhance ongoing health research in African and rest-of-world populations, and ethical and responsible sharing of these data should be supported. Open and unregulated online exposure of genome summary data from African populations or communities, however, may expose these populations to unacceptable risks and potential harms such as those experienced by Indigenous and/or minority population groups to date. Outlined here are three levels of controlled access to genome summary data from African populations and communities that can harness the benefits of these data for global and local health research, whilst mitigating the risks and potential harms for the African participants and communities who provide their samples and data for genomic research.

15 in total

1. STUDIES ON THE XAVANTE INDIANS OF THE BRAZILIAN MATO GROSSO.

Authors: J V NEEL; F M SALZANO; P C JUNQUEIRA; F KEITER; D MAYBURY-LEWIS
Journal: Am J Hum Genet Date: 1964-03 Impact factor: 11.025

2. Aboriginal genome analysis comes to grips with ethics.

Authors: Ewen Callaway
Journal: Nature Date: 2011-09-28 Impact factor: 49.962

3. After Havasupai litigation, Native Americans wary of genetic research.

Authors:
Journal: Am J Med Genet A Date: 2010-07 Impact factor: 2.802

4. Genetic research among the Havasupai--a cautionary tale.

Authors: Robyn L Sterling
Journal: Virtual Mentor Date: 2011-02-01

5. GENOMICS. A federated ecosystem for sharing genomic, clinical data.

Authors:
Journal: Science Date: 2016-06-10 Impact factor: 47.728

Review 6. The genomic landscape of African populations in health and disease.

Authors: Charles N Rotimi; Amy R Bentley; Ayo P Doumatey; Guanjie Chen; Daniel Shriner; Adebowale Adeyemo
Journal: Hum Mol Genet Date: 2017-10-01 Impact factor: 6.150

7. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity.

Authors: D J Balding; R A Nichols
Journal: Genetica Date: 1995 Impact factor: 1.082

8. Complete Khoisan and Bantu genomes from southern Africa.

Authors: Stephan C Schuster; Webb Miller; Aakrosh Ratan; Lynn P Tomsho; Belinda Giardine; Lindsay R Kasson; Robert S Harris; Desiree C Petersen; Fangqing Zhao; Ji Qi; Can Alkan; Jeffrey M Kidd; Yazhou Sun; Daniela I Drautz; Pascal Bouffard; Donna M Muzny; Jeffrey G Reid; Lynne V Nazareth; Qingyu Wang; Richard Burhans; Cathy Riemer; Nicola E Wittekindt; Priya Moorjani; Elizabeth A Tindall; Charles G Danko; Wee Siang Teo; Anne M Buboltz; Zhenhai Zhang; Qianyi Ma; Arno Oosthuysen; Abraham W Steenkamp; Hermann Oostuisen; Philippus Venter; John Gajewski; Yu Zhang; B Franklin Pugh; Kateryna D Makova; Anton Nekrutenko; Elaine R Mardis; Nick Patterson; Tom H Pringle; Francesca Chiaromonte; James C Mullikin; Evan E Eichler; Ross C Hardison; Richard A Gibbs; Timothy T Harkins; Vanessa M Hayes
Journal: Nature Date: 2010-02-18 Impact factor: 49.962

9. Estimating and interpreting FST: the impact of rare variants.

Authors: Gaurav Bhatia; Nick Patterson; Sriram Sankararaman; Alkes L Price
Journal: Genome Res Date: 2013-07-16 Impact factor: 9.043

10. H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa.

Authors: Nicola J Mulder; Ezekiel Adebiyi; Raouf Alami; Alia Benkahla; James Brandful; Seydou Doumbia; Dean Everett; Faisal M Fadlelmola; Fatima Gaboun; Simani Gaseitsiwe; Hassan Ghazal; Scott Hazelhurst; Winston Hide; Azeddine Ibrahimi; Yasmina Jaufeerally Fakim; C Victor Jongeneel; Fourie Joubert; Samar Kassim; Jonathan Kayondo; Judit Kumuthini; Sylvester Lyantagaye; Julie Makani; Ahmed Mansour Alzohairy; Daniel Masiga; Ahmed Moussa; Oyekanmi Nash; Odile Ouwe Missi Oukem-Boyer; Ellis Owusu-Dabo; Sumir Panji; Hugh Patterton; Fouzia Radouani; Khalid Sadki; Fouad Seghrouchni; Özlem Tastan Bishop; Nicki Tiffin; Nzovu Ulenga
Journal: Genome Res Date: 2015-12-01 Impact factor: 9.438

4 in total

Review 1. Rights, interests and expectations: Indigenous perspectives on unrestricted access to genomic data.

Authors: Maui Hudson; Nanibaa' A Garrison; Rogena Sterling; Nadine R Caron; Keolu Fox; Joseph Yracheta; Jane Anderson; Phil Wilcox; Laura Arbour; Alex Brown; Maile Taualii; Tahu Kukutai; Rodney Haring; Ben Te Aika; Gareth S Baynam; Peter K Dearden; David Chagné; Ripan S Malhi; Ibrahim Garba; Nicki Tiffin; Deborah Bolnick; Matthew Stott; Anna K Rolleston; Leah L Ballantyne; Ray Lovett; Dominique David-Chavez; Andrew Martinez; Andrew Sporle; Maggie Walter; Jeff Reading; Stephanie Russo Carroll
Journal: Nat Rev Genet Date: 2020-04-06 Impact factor: 53.242

2. Handling Ethics Dumping and Neo-Colonial Research: From the Laboratory to the Academic Literature.

Authors: Jaime A Teixeira da Silva
Journal: J Bioeth Inq Date: 2022-06-22 Impact factor: 2.216

3. Stigma in African genomics research: Gendered blame, polygamy, ancestry and disease causal beliefs impact on the risk of harm.

Authors: Jantina de Vries; Guida Landouré; Ambroise Wonkam
Journal: Soc Sci Med Date: 2020-05-30 Impact factor: 5.379

4. Indigenous Genomic Databases: Pragmatic Considerations and Cultural Contexts.

Authors: Nadine Rena Caron; Meck Chongo; Maui Hudson; Laura Arbour; Wyeth W Wasserman; Stephen Robertson; Solenne Correard; Phillip Wilcox
Journal: Front Public Health Date: 2020-04-24

4 in total