Literature DB >> 36085557

The Pioneer Advantage: Filling the blank spots on the map of genome diversity in Europe.

Taras K Oleksyk^1,2, Walter W Wolfsberger², Khrystyna Schubelka², Serghei Mangul³, Stephen J O'Brien⁴.

Abstract

Documenting genome diversity is important for the local biomedical communities and instrumental in developing precision and personalized medicine. Currently, tens of thousands of whole-genome sequences from Europe are publicly available, but most of these represent populations of developed countries of Europe. The uneven distribution of the available data is further impaired by the lack of data sharing. Recent whole-genome studies in Eastern Europe, one in Ukraine and one in Russia, demonstrated that local genome diversity and population structure from Eastern Europe historically had not been fully represented. An unexpected wealth of genomic variation uncovered in these studies was not so much a consequence of high variation within their population, but rather due to the "pioneer advantage." We discovered more variants because we were the first to prospect in the Eastern European genome pool. This simple comparison underscores the importance of removing the remaining geographic genome deserts from the rest of the world map of the human genome diversity.

Entities: Chemical

Keywords: Europe; Russia; Ukraine; genome deserts; genome diversity; genome project; genomes; genotyping; variants; whole-genome sequencing

Mesh：

Year: 2022 PMID： 36085557 PMCID： PMC9463063 DOI： 10.1093/gigascience/giac081

Source DB: PubMed Journal: Gigascience ISSN： 2047-217X Impact factor: 7.658

It has been more than two decades since data of the first human genome project (HGP) were publicly released [1, 2], leading to a revolution in biomedical research. Evaluating torrents of data coming from sequencing enabled a genomic-based approach to study human health, disease, and natural history in an evolutionary context. After the HGP established the baseline for understanding common genetic variation, the analysis of genomic diversity discovered from comparing genomes of different species, multiple individuals, and across diverse populations worldwide led to the effective annotation of clinically relevant variants essential in understanding disease origin, health risks, drug sensitivity, and the promise and the perspectives of personalized medicine. At this first stage of mapping global genome diversity, efforts were led by the global consortia of scientists who collaborated to discover and classify genome variation across the globe: Human Genome Diversity Panel (HGDP) and the 1,000 Genomes (G1K) project [3, 4] represented a monumental effort of the international community that focused on creating a comprehensive genetic diversity map of humankind. However, after the initial success, this concerted effort seems to have dissipated, leaving many blank spots, missing many local and rare variants critically important for characterization of human diversity. In the second stage of mapping worldwide genome diversity, national projects replaced the global surveys to serve as a major reference resource for human genetic variation and to provide locally based annotation of disease variants. National genome projects were supported by country governments [5, 6], international collaborations [7], and/or groups of enthusiasts [8]. These efforts continued to contribute, without the unified systematic global strategy. The projects provide an unequal geographic and population coverage and thus a fractured picture of the genome diversity across the continents. Geographic genome surveys across populations supported the earlier conclusions of population structure, when in the 1990s, Luca Cavalli-Sforza identified 5 major clinal patterns throughout Europe [9]. While the exact distribution of these clines continues to be debated and redrawn, the population genetic structure of Europe is undoubtedly real, and similar patterns have continued to be found in more recent studies. Increasing numbers of autosomal single-nucleotide polymorphisms (SNPs; from 9,000 to 300,000) affirmed strong continent-wide correlation between geography and genetic distance [10-12]. Early surveys did not include Northeastern and Eastern Europe. Then, studies from Finland, Estonia, and the Komi Republic (Russia) showed distinct genetic diversity in Northeastern Europe, associated with the Uralic language family [13, 14]. This analysis of the European population eventually displayed a phylogeographic patterning, further underscoring the importance of local genome variation for biomedical studies [12, 15]. As of July 2022 at least 3,089 whole-genome sequences from different continental European populations have become publicly available in addition to the 2,638 genomes sequenced and publicly released by the research groups in Iceland [16] and 204,109 in the United Kingdom [17] (Fig. 1; Table 1; Supplementary Table S1). The distribution of the available genomes is still partially explained by the initial efforts by the HGDP and the G1K projects [3, 4], but a quick look at the map of Europe is enough to see that most of the data in these genome projects, as in the genotyping projects before, represent the populations of the United Kingdom and technologically advanced countries of the European Union (Fig. 1), while the diversity within many countries in Eastern Europe is represented by a handful of genomes each. The HGDP and the G1K projects followed sampling schemes that were biased by the geographic composition of the consortia and sample availability and left many important regions of genetic diversity unexplored [18]. Some of these regions will be eventually addressed by the Genome of Europe initiative that aims to build a European network of national genomic reference cohorts of at least 500,000 European citizens selected to be representative of the European population [16]. This ambitious and worthwhile goal is welcome but has not yet been released in 2022. Other projects like the Personal Genome Project may also be useful given that they carry geographical context [19]. However, as genome sequencing shifted from the international consortia to the national projects, the intrinsic bias in the distribution of human genome data available from Europe remained [20]. Table 1 incorporates the current status of the geographically referenced whole-genome data in Europe.

Figure 1.

Table 1.

Sequences of individual genomes available in Europe. The datasets are classified by source countries, and the total number of samples per country is given. Within each country, genomes may be derived from several independent studies that represent the population of the country, geographic subpopulations, and ethnic minorities, and numbers of individuals for each subpopulation/study are listed in the last column. Links to the open data for each study summarized in this table, including references, databases, and links to the studies, are provided in Supplementary Table S1. Ethnic populations and subpopulations within each country are shown in italics.

Country (population)	Total # samples per country	# Populations/studies per country	Population or subpopulation names	# Subpopulations/ studies	# Samples per subpopulation/study
Albania	4	2
			Albanians	2	4
Azerbaijan	2	1
			Azeri	1	2
Belarus	4	1
			Belarusians	1	4
Bosnia-Herzegovina	7	2
			Croats	1	4
			Roma	1	3
Bulgaria	2	1
			Bulgarians	1	2
Czechia	1	1
			Czechi	1	1
Denmark	150	1
			Danes	1	150
Estonia	8	2
			Estonians	2	8
Finland	113	4
			Finnish	4	113
France	51	4
			Basques	2	24
			French	2	27
Georgia	9	4
			Abkhazians	2	5
			Georgians	2	4
Germany	3	1
			Germans	1	3
Greece	254	3
			Cretans	2	252
			Mainland Greeks	1	2
Hungary	3	2
			Hungarians	2	3
Iceland	2,638	2
			Icelanders	2	2,638
Italy	159	7
			Italians	5	132
			Sardinians	2	27
Latvia	3	1
			Latvians	1	3
Lithuania	4	2
			Lithuanians	2	4
Moldova	2	1
			Moldovans	1	2
Netherlands	769	1
			Dutch	1	769
Norway	4	2
			Finnish	1	3
			Norwegians	1	1
Orkney Islands	15	2
			Orcadians	2	15
Poland	5	2
			Poles	2	5
Russia*	178	34
			Adygeis	2	17
			Avars	1	3
			Azerbaijanis	1	1
			Balkars	1	3
			Bashkirs	1	5
			Chechen	1	1
			Circassians	1	3
			Ingrians	1	3
			Kabardins	1	4
			Karelians	1	3
			Khantys	1	3
			Komis	1	2
			Kryashen-Tatars	1	3
			Kuban-Cossacks	1	2
			Kumyks	1	3
			Lezgins	2	6
			Mansis	2	5
			Maris	1	4
			Mishar-Tatars	1	1
			Mordvins	1	3
			North-Ossetians	2	4
			Russians*	7	92
			Tabasarans	1	3
			Vepsas	1	4
Spain	164	2
			Spanish	2	164
Sweden	1,002	2
			Swedes	2	1,002
Turkey	2	1
			Turks	2	1
Ukraine	257	7
			Cossacks	1	2
			Hungarians	1	1
			Ukrainians	5	254
United Kingdom	204,109	3
			British	3	204,107
			English	1	2
Total	9,917	96			209,836

*Only the populations native to the European part of the Russian Federation are represented in this survey.

Public availability of whole-genome sequences in Europe. Numbers that represent total sample sizes for each country, geographic subpopulations, and ethnic minorities and numbers of individuals for each subpopulation/study are shown in Table 1. Links to the open data for each study summarized in this table are provided in Supplementary Table S1. Sequences of individual genomes available in Europe. The datasets are classified by source countries, and the total number of samples per country is given. Within each country, genomes may be derived from several independent studies that represent the population of the country, geographic subpopulations, and ethnic minorities, and numbers of individuals for each subpopulation/study are listed in the last column. Links to the open data for each study summarized in this table, including references, databases, and links to the studies, are provided in Supplementary Table S1. Ethnic populations and subpopulations within each country are shown in italics. *Only the populations native to the European part of the Russian Federation are represented in this survey. Multiple types of efforts to aggregate current known variation are under way. To cover the highest number of samples and include projects with varying data release strategies, first the Exome Aggregation Consortium (ExAC) and then the Genome Aggregation Database (gnomAD) omitted the individual-level information. Today, gnomAD has provided publicly available and accessible variation summaries from as many as 76,156 individual genomes of various projects across the globe [21, 22]. These datasets are easy to handle and to interpret and can be useful for comparative analysis and identification of novel variation, but they are not well suited for population analysis on the smaller scale. On the other hand, the Database of Genotypes and Phenotypes provides resources like the individual variants and an extensive amount of phenotypical information per sample but requires additional bioinformatic expertise to handle [23]. Often, the nature of these data means that data access is controlled and requires an application with appropriate justification. To ensure data security, specialized instruments and tokens are often needed to reach the data. These necessary precautions can create delays or outright limit a research group in its ability to reach the data, given a lack of technical expertise. Recently, analysis of the data from whole-genome studies, one in Ukraine [8] and one in Russia [24], clearly demonstrated that intrinsic genomic diversity from Eastern Europe had been poorly represented. The SNP allele frequency differences are large enough to provide previously unknown dimensions of population structure. For instance, Zhernakova et al. [24] identified 5 distinct phylogeographic population partitions from east to west across the 11 time zones of the Russian Federation. They also reported important genetic differences between ethnic Russian populations and their neighbors. The principal component analysis of Russian subpopulations from Pskov and Novgorod in the European part of Russia compared with other populations of Europe and Asia (264 study participants) demonstrated genetic distinctiveness as great or greater than differences between populations of neighboring Finns, Swedes, and Estonians [24]. A subsequent analysis of Ukrainian genomes showed that the Ukrainian cluster was distinct from publicly available European populations or publicly available genomes from subpopulations of ethnic Russians in the European part of Russia [3,8]. These 2 studies discovered millions of mutations, many of which were previously not described (478,000 in Ukraine [8] and approximately 300,000 in 2 populations from the European Russia [24]) and reported major differences in frequencies of medically related alleles between Eastern Europe and the rest of the continent. This wealth of genomic variation uncovered in Ukraine and, to the lesser extent, in Russia was not due to the particularly high variation within the Ukrainian population but to the absence of sampling variation from the countries around Ukraine. The effect observed is due to what is known in the business and marketing world as a “first-mover” or “pioneer advantage” [25]. Since the Ukrainian population was surrounded by unexplored genome deserts, the study of fewer than 100 Ukrainian genomes uncovered 478,000 novel genomic SNPs that have never been previously registered in the Genome Aggregation Database [26]. This number is huge, even in comparison to the most genetically diverse populations in sub-Saharan Africa, where the addition of new genomes from 426 people across 50 ethnolinguistic groups to the existing databases revealed approximately 3 million variants [27]. This simple comparison underscores the importance of complete information about the global extent of genome variation and removing remaining genome deserts from the rest of the world map. Understanding all geographic dimensions of genome diversity in Europe is crucial for the local biomedical community to use local genomics data instead of extrapolating results from genome projects in other countries. The importance of sequence data from multiple populations cannot be underestimated, given their unique histories of drift, selection, migration, admixture, and socioeconomic structures. Therefore, we suggest that every country needs its own national genome database to inform regionally relevant and objective public health policies. There is still a lot of important variation to be discovered, and it needs to be made public to provide the informational framework for the biomedical research to follow. Currently, the largest genome representation in Europe is in the United Kingdom [5] due to a well-funded national project, including more that 200,000 genomes publicly available for analysis. Countries like Iceland (2,638), the Netherlands (769), Italy (159), Spain (164), and Sweden (1,002) also shared large sequence databases with the scientific community (Table 1). Still, many countries in Europe remain underrepresented. Aside from Belgium, Portugal, and Switzerland, most of these underrepresented countries are in Eastern Europe. Poland has recently started its national genome project that will soon contribute thousands of genomes to the public domain [28]. Austria, Croatia, Montenegro, North Macedonia, Romania, Serbia, and Slovakia have no representation in the public genome databases yet, and several other countries such as Bosnia-Herzegovina, Bulgaria, Belarus, Czechia, Hungary, Lithuania, Latvia, and Moldova have only a handful genomes included in international projects [4, 7, 29]. Funding for the national genome projects has come from various sources and often combines private and public sources. Unfortunately, successful genomic initiatives rarely come from countries from smaller economies. Ukraine is an outlier in this category (an Eastern European country with a relatively small economy with 254 genomes available), due to a successful collaboration strategy. Lacking support of the Ukrainian government, samples became available through international collaborations with BGI (formerly the Beijing Genomics Institute) as well as the National Institutes of Health (USA), specifically to help fill in the genome desert in that country [8]. This approach can be replicated if the principal roadblock to a national genome project is the lack of local funding in countries with smaller economies. The national genomic projects offer an effective platform for training genome scientists and bioinformaticians at the national level. Research teams involved in national genomic projects should be multidisciplinary and interinstitutional and include policy makers, lawyers, data scientists, and human geneticists. There is a critical need for involving experts in the humanities, especially those who understand relevant ethical and social issues. Given this complexity, international collaborations can be very helpful and can be defined from the beginning of the project, not only for writing the collection protocols and providing sequencing platforms but also for serving the ultimate objectives—improving public health in each country. Politics matters in these efforts and not always for the good. The Genome Russia project was conceived as a platform that would bring together scientists from across the Russian Federation with international collaborations from across the globe (http://genomerussia.spbu.ru/) [6, 18]. The hope was that this project would include a program to train bioinformatics experts who could carry on the torch for the next generation of genome studies. After the initial success in building a consortium and a first analysis/data publication [24], public access to the genome data was initially approved by NCBI, then abruptly retracted by direct order of Russian authorities after the paper was published. The 2022 hostilities in Ukraine from Russian invaders bode rather poorly for immediate remedy of these issues. The publication of Ukrainian genomes was met with hostility by the Russian authorities, who demanded the retraction of the manuscript, first the preprint from bioRxiv and then the paper from GigaScience [30]. Genome diversity of the Ukraine paper was ultimately published, and all the data (except the Genome Russia genomes) were released publicly over the Russian objections and presure. As stipulated in the original proposal and development documents for the Genome Russia and cooperative agreements for the Genome Diversity in Ukraine projects, the principal goal of these efforts was to provide open access of all sequence data SNP annotation and other genome features, so that these could join other international genome sequence consortia releasers [6, 8, 18, 24]. The design and even the informed consent protocols for both of these projects were actually modeled after the 1,000 Genomes Project, which had been vetted by the world's experts on human genome ethics. The original intention and promise was to join and augment the catalogue of genomes in the 1,000 Genomes Project as well as the SNP annotations to complement ExAc [22], gnomAd [21], Gene Mutation Database (HGMD), and HGMD-DM (disease-causing mutations) [31]. The political shuttering of open release by Russian authorities would ultimately cost all these important projects until release was remedied. This ill-advised suspension of open release was further exacerbated by the 2022 February 24 invasion of Ukraine ordered by Russian President Putin. Researchers can still retrieve ethnic Russian genomes scattered in small batches across other publications, as well as from modest representation of multiple non-Russian ethnic minorities who make up almost half of the genomes still available from the European part of the Russian Federation. From the European part of Russia, genomes from 23 ethnically distinct indigenous populations are available, such as Ossetians, Tatars, Chechens, Komis, Bashkirs, Mordvins, and others, in addition to 7 subpopulations of ethnic Russians (Table 1). The uneven distribution of the genome data in Europe is further exacerbated by widespread lack of data sharing in the scientific community. While commercial companies are compiling but restricting massive data among themselves, the genome-wide sequence data from ancestry testing and diagnostic sequencing [32] are difficult or impossible to retrieve [24,33]. This is usually justified by the possibility that, even when the databases are completely anonymous, it is technically possible to identify participating individuals with additional genotype information [34]. Therefore, publishing individual genome data needs appropriate levels of informed consent that require a combination of technical and societal stipulations within the context in which the data are released. There remains a stunning lack of consensus or compelling legal precedents involving ownership or open release of biomedical materials and derivative data [35]. Different published studies (genomes, GWAS (Genome Wide association Study), and others) have been collected under widely diverse levels of informed consent agreed on by the study participants. All this ambiguity compels many in management/privacy positions to simply deny access to both samples and data, a serious problem that may be getting worse. We emphasize the importance of public access to the data that is consented and open. The best practice, pioneered by the G1K consortium, was to deposit the collected genome sequences, informed consent details, and the accompanying data into an international database that could serve as a valuable resource for the researchers worldwide, while providing security and protecting the interests of participating individuals and the communities they represent [4]. Publicly available genome data generated from the general population of the country have a vital role to unlock the capacities of genomic-based personal medicine for residents of a given county and benefit everyone.

Abbreviations

ExAC: Exome Aggregation Consortium; G1K: 1,000 Genomes Project; gnomAD: Genome Aggregation Database; HGDP: Human Genome Diversity Panel; HGP: Human Genome Project; SNP: single-nucleotide polymorphism.

Funding

TKO, WW and KS were supported in part by 2SOFT/1.2/48 project "Partnership for Genomic Research in Ukraine and Romania" by the Joint Operational Programme Romania-Ukraine, through the European Neighbourhood Instrument (ENI).

Data Availability

Links to all the data mentioned in this article are available in Supplemental Table S1.

Competing Interests

The authors declare that they have no competing interests.

Authors' Contributions

T.K.O. has written the first draft. W.W.W. and K.S. prepared and analyzed the data. S.M. and S.J.O. contributed to the original ideas and writing and final editing of the manuscript. Click here for additional data file. Click here for additional data file. Click here for additional data file. Aniko Sabo -- 7/18/2022 Reviewed Click here for additional data file. Click here for additional data file.

28 in total

1. Initial sequencing and analysis of the human genome.

Authors: E S Lander; L M Linton; B Birren; C Nusbaum; M C Zody; J Baldwin; K Devon; K Dewar; M Doyle; W FitzHugh; R Funke; D Gage; K Harris; A Heaford; J Howland; L Kann; J Lehoczky; R LeVine; P McEwan; K McKernan; J Meldrim; J P Mesirov; C Miranda; W Morris; J Naylor; C Raymond; M Rosetti; R Santos; A Sheridan; C Sougnez; Y Stange-Thomann; N Stojanovic; A Subramanian; D Wyman; J Rogers; J Sulston; R Ainscough; S Beck; D Bentley; J Burton; C Clee; N Carter; A Coulson; R Deadman; P Deloukas; A Dunham; I Dunham; R Durbin; L French; D Grafham; S Gregory; T Hubbard; S Humphray; A Hunt; M Jones; C Lloyd; A McMurray; L Matthews; S Mercer; S Milne; J C Mullikin; A Mungall; R Plumb; M Ross; R Shownkeen; S Sims; R H Waterston; R K Wilson; L W Hillier; J D McPherson; M A Marra; E R Mardis; L A Fulton; A T Chinwalla; K H Pepin; W R Gish; S L Chissoe; M C Wendl; K D Delehaunty; T L Miner; A Delehaunty; J B Kramer; L L Cook; R S Fulton; D L Johnson; P J Minx; S W Clifton; T Hawkins; E Branscomb; P Predki; P Richardson; S Wenning; T Slezak; N Doggett; J F Cheng; A Olsen; S Lucas; C Elkin; E Uberbacher; M Frazier; R A Gibbs; D M Muzny; S E Scherer; J B Bouck; E J Sodergren; K C Worley; C M Rives; J H Gorrell; M L Metzker; S L Naylor; R S Kucherlapati; D L Nelson; G M Weinstock; Y Sakaki; A Fujiyama; M Hattori; T Yada; A Toyoda; T Itoh; C Kawagoe; H Watanabe; Y Totoki; T Taylor; J Weissenbach; R Heilig; W Saurin; F Artiguenave; P Brottier; T Bruls; E Pelletier; C Robert; P Wincker; D R Smith; L Doucette-Stamm; M Rubenfield; K Weinstock; H M Lee; J Dubois; A Rosenthal; M Platzer; G Nyakatura; S Taudien; A Rump; H Yang; J Yu; J Wang; G Huang; J Gu; L Hood; L Rowen; A Madan; S Qin; R W Davis; N A Federspiel; A P Abola; M J Proctor; R M Myers; J Schmutz; M Dickson; J Grimwood; D R Cox; M V Olson; R Kaul; C Raymond; N Shimizu; K Kawasaki; S Minoshima; G A Evans; M Athanasiou; R Schultz; B A Roe; F Chen; H Pan; J Ramser; H Lehrach; R Reinhardt; W R McCombie; M de la Bastide; N Dedhia; H Blöcker; K Hornischer; G Nordsiek; R Agarwala; L Aravind; J A Bailey; A Bateman; S Batzoglou; E Birney; P Bork; D G Brown; C B Burge; L Cerutti; H C Chen; D Church; M Clamp; R R Copley; T Doerks; S R Eddy; E E Eichler; T S Furey; J Galagan; J G Gilbert; C Harmon; Y Hayashizaki; D Haussler; H Hermjakob; K Hokamp; W Jang; L S Johnson; T A Jones; S Kasif; A Kaspryzk; S Kennedy; W J Kent; P Kitts; E V Koonin; I Korf; D Kulp; D Lancet; T M Lowe; A McLysaght; T Mikkelsen; J V Moran; N Mulder; V J Pollara; C P Ponting; G Schuler; J Schultz; G Slater; A F Smit; E Stupka; J Szustakowki; D Thierry-Mieg; J Thierry-Mieg; L Wagner; J Wallis; R Wheeler; A Williams; Y I Wolf; K H Wolfe; S P Yang; R F Yeh; F Collins; M S Guyer; J Peterson; A Felsenfeld; K A Wetterstrand; A Patrinos; M J Morgan; P de Jong; J J Catanese; K Osoegawa; H Shizuya; S Choi; Y J Chen; J Szustakowki
Journal: Nature Date: 2001-02-15 Impact factor: 49.962

2. Measuring European population stratification with microarray genotype data.

Authors: Marc Bauchet; Brian McEvoy; Laurel N Pearson; Ellen E Quillen; Tamara Sarkisian; Kristine Hovhannesyan; Ranjan Deka; Daniel G Bradley; Mark D Shriver
Journal: Am J Hum Genet Date: 2007-03-22 Impact factor: 11.025

Review 3. Privacy challenges and research opportunities for genomic data sharing.

Authors: Luca Bonomi; Yingxiang Huang; Lucila Ohno-Machado
Journal: Nat Genet Date: 2020-06-29 Impact factor: 38.330

4. The sequence of the human genome.

Authors: J C Venter; M D Adams; E W Myers; P W Li; R J Mural; G G Sutton; H O Smith; M Yandell; C A Evans; R A Holt; J D Gocayne; P Amanatides; R M Ballew; D H Huson; J R Wortman; Q Zhang; C D Kodira; X H Zheng; L Chen; M Skupski; G Subramanian; P D Thomas; J Zhang; G L Gabor Miklos; C Nelson; S Broder; A G Clark; J Nadeau; V A McKusick; N Zinder; A J Levine; R J Roberts; M Simon; C Slayman; M Hunkapiller; R Bolanos; A Delcher; I Dew; D Fasulo; M Flanigan; L Florea; A Halpern; S Hannenhalli; S Kravitz; S Levy; C Mobarry; K Reinert; K Remington; J Abu-Threideh; E Beasley; K Biddick; V Bonazzi; R Brandon; M Cargill; I Chandramouliswaran; R Charlab; K Chaturvedi; Z Deng; V Di Francesco; P Dunn; K Eilbeck; C Evangelista; A E Gabrielian; W Gan; W Ge; F Gong; Z Gu; P Guan; T J Heiman; M E Higgins; R R Ji; Z Ke; K A Ketchum; Z Lai; Y Lei; Z Li; J Li; Y Liang; X Lin; F Lu; G V Merkulov; N Milshina; H M Moore; A K Naik; V A Narayan; B Neelam; D Nusskern; D B Rusch; S Salzberg; W Shao; B Shue; J Sun; Z Wang; A Wang; X Wang; J Wang; M Wei; R Wides; C Xiao; C Yan; A Yao; J Ye; M Zhan; W Zhang; H Zhang; Q Zhao; L Zheng; F Zhong; W Zhong; S Zhu; S Zhao; D Gilbert; S Baumhueter; G Spier; C Carter; A Cravchik; T Woodage; F Ali; H An; A Awe; D Baldwin; H Baden; M Barnstead; I Barrow; K Beeson; D Busam; A Carver; A Center; M L Cheng; L Curry; S Danaher; L Davenport; R Desilets; S Dietz; K Dodson; L Doup; S Ferriera; N Garg; A Gluecksmann; B Hart; J Haynes; C Haynes; C Heiner; S Hladun; D Hostin; J Houck; T Howland; C Ibegwam; J Johnson; F Kalush; L Kline; S Koduru; A Love; F Mann; D May; S McCawley; T McIntosh; I McMullen; M Moy; L Moy; B Murphy; K Nelson; C Pfannkoch; E Pratts; V Puri; H Qureshi; M Reardon; R Rodriguez; Y H Rogers; D Romblad; B Ruhfel; R Scott; C Sitter; M Smallwood; E Stewart; R Strong; E Suh; R Thomas; N N Tint; S Tse; C Vech; G Wang; J Wetter; S Williams; M Williams; S Windsor; E Winn-Deen; K Wolfe; J Zaveri; K Zaveri; J F Abril; R Guigó; M J Campbell; K V Sjolander; B Karlak; A Kejariwal; H Mi; B Lazareva; T Hatton; A Narechania; K Diemer; A Muruganujan; N Guo; S Sato; V Bafna; S Istrail; R Lippert; R Schwartz; B Walenz; S Yooseph; D Allen; A Basu; J Baxendale; L Blick; M Caminha; J Carnes-Stine; P Caulk; Y H Chiang; M Coyne; C Dahlke; A Deslattes Mays; M Dombroski; M Donnelly; D Ely; S Esparham; C Fosler; H Gire; S Glanowski; K Glasser; A Glodek; M Gorokhov; K Graham; B Gropman; M Harris; J Heil; S Henderson; J Hoover; D Jennings; C Jordan; J Jordan; J Kasha; L Kagan; C Kraft; A Levitsky; M Lewis; X Liu; J Lopez; D Ma; W Majoros; J McDaniel; S Murphy; M Newman; T Nguyen; N Nguyen; M Nodell; S Pan; J Peck; M Peterson; W Rowe; R Sanders; J Scott; M Simpson; T Smith; A Sprague; T Stockwell; R Turner; E Venter; M Wang; M Wen; D Wu; M Wu; A Xia; A Zandieh; X Zhu
Journal: Science Date: 2001-02-16 Impact factor: 47.728

5. The African Genome Variation Project shapes medical genetics in Africa.

Authors: Deepti Gurdasani; Tommy Carstensen; Fasil Tekola-Ayele; Luca Pagani; Ioanna Tachmazidou; Konstantinos Hatzikotoulas; Savita Karthikeyan; Louise Iles; Martin O Pollard; Ananyo Choudhury; Graham R S Ritchie; Yali Xue; Jennifer Asimit; Rebecca N Nsubuga; Elizabeth H Young; Cristina Pomilla; Katja Kivinen; Kirk Rockett; Anatoli Kamali; Ayo P Doumatey; Gershim Asiki; Janet Seeley; Fatoumatta Sisay-Joof; Muminatou Jallow; Stephen Tollman; Ephrem Mekonnen; Rosemary Ekong; Tamiru Oljira; Neil Bradman; Kalifa Bojang; Michele Ramsay; Adebowale Adeyemo; Endashaw Bekele; Ayesha Motala; Shane A Norris; Fraser Pirie; Pontiano Kaleebu; Dominic Kwiatkowski; Chris Tyler-Smith; Charles Rotimi; Eleftheria Zeggini; Manjinder S Sandhu
Journal: Nature Date: 2014-12-03 Impact factor: 49.962

6. The Qatar genome: a population-specific tool for precision medicine in the Middle East.

Authors: Khalid A Fakhro; Michelle R Staudt; Monica Denise Ramstetter; Amal Robay; Joel A Malek; Ramin Badii; Ajayeb Al-Nabet Al-Marri; Charbel Abi Khalil; Alya Al-Shakaki; Omar Chidiac; Dora Stadler; Mahmoud Zirie; Amin Jayyousi; Jacqueline Salit; Jason G Mezey; Ronald G Crystal; Juan L Rodriguez-Flores
Journal: Hum Genome Var Date: 2016-06-30

7. A genome-wide analysis of populations from European Russia reveals a new pole of genetic diversity in northern Europe.

Authors: Andrey V Khrunin; Denis V Khokhrin; Irina N Filippova; Tõnu Esko; Mari Nelis; Natalia A Bebyakova; Natalia L Bolotova; Janis Klovins; Liene Nikitina-Zake; Karola Rehnström; Samuli Ripatti; Stefan Schreiber; Andre Franke; Milan Macek; Veronika Krulišová; Jan Lubinski; Andres Metspalu; Svetlana A Limborska
Journal: PLoS One Date: 2013-03-07 Impact factor: 3.240

8. The Simons Genome Diversity Project: 300 genomes from 142 diverse populations.

Authors: Swapan Mallick; Heng Li; Mark Lipson; Iain Mathieson; Melissa Gymrek; Fernando Racimo; Mengyao Zhao; Niru Chennagiri; Susanne Nordenfelt; Arti Tandon; Pontus Skoglund; Iosif Lazaridis; Sriram Sankararaman; Qiaomei Fu; Nadin Rohland; Gabriel Renaud; Yaniv Erlich; Thomas Willems; Carla Gallo; Jeffrey P Spence; Yun S Song; Giovanni Poletti; Francois Balloux; George van Driem; Peter de Knijff; Irene Gallego Romero; Aashish R Jha; Doron M Behar; Claudio M Bravi; Cristian Capelli; Tor Hervig; Andres Moreno-Estrada; Olga L Posukh; Elena Balanovska; Oleg Balanovsky; Sena Karachanak-Yankova; Hovhannes Sahakyan; Draga Toncheva; Levon Yepiskoposyan; Chris Tyler-Smith; Yali Xue; M Syafiq Abdullah; Andres Ruiz-Linares; Cynthia M Beall; Anna Di Rienzo; Choongwon Jeong; Elena B Starikovskaya; Ene Metspalu; Jüri Parik; Richard Villems; Brenna M Henn; Ugur Hodoglugil; Robert Mahley; Antti Sajantila; George Stamatoyannopoulos; Joseph T S Wee; Rita Khusainova; Elza Khusnutdinova; Sergey Litvinov; George Ayodo; David Comas; Michael F Hammer; Toomas Kivisild; William Klitz; Cheryl A Winkler; Damian Labuda; Michael Bamshad; Lynn B Jorde; Sarah A Tishkoff; W Scott Watkins; Mait Metspalu; Stanislav Dryomov; Rem Sukernik; Lalji Singh; Kumarasamy Thangaraj; Svante Pääbo; Janet Kelso; Nick Patterson; David Reich
Journal: Nature Date: 2016-09-21 Impact factor: 49.962

9. Genes reveal traces of common recent demographic history for most of the Uralic-speaking populations.

Authors: Kristiina Tambets; Bayazit Yunusbayev; Georgi Hudjashov; Anne-Mai Ilumäe; Siiri Rootsi; Terhi Honkola; Outi Vesakoski; Quentin Atkinson; Pontus Skoglund; Alena Kushniarevich; Sergey Litvinov; Maere Reidla; Ene Metspalu; Lehti Saag; Timo Rantanen; Monika Karmin; Jüri Parik; Sergey I Zhadanov; Marina Gubina; Larisa D Damba; Marina Bermisheva; Tuuli Reisberg; Khadizhat Dibirova; Irina Evseeva; Mari Nelis; Janis Klovins; Andres Metspalu; Tõnu Esko; Oleg Balanovsky; Elena Balanovska; Elza K Khusnutdinova; Ludmila P Osipova; Mikhail Voevoda; Richard Villems; Toomas Kivisild; Mait Metspalu
Journal: Genome Biol Date: 2018-09-21 Impact factor: 13.583

Review 10. National Genome Initiatives in Europe and the United Kingdom in the Era of Whole-Genome Sequencing: A Comprehensive Review.

Authors: Jan Smetana; Petr Brož
Journal: Genes (Basel) Date: 2022-03-21 Impact factor: 4.096