Laksshman Sundaram1,2,3, Hong Gao1, Samskruthi Reddy Padigepati1,3, Jeremy F McRae1, Yanjun Li3, Jack A Kosmicki1,4, Nondas Fritzilas1, Jörg Hakenberg1, Anindita Dutta1, John Shon1, Jinbo Xu5, Serafim Batzoglou1, Xiaolin Li3, Kyle Kai-How Farh6. 1. Illumina Artificial Intelligence Laboratory, Illumina Inc, San Diego, CA, USA. 2. Department of Computer Science, Stanford University, Stanford, CA, USA. 3. National Science Foundation Center for Big Learning, University of Florida, Gainesville, FL, USA. 4. Analytic and Translational Genetics Unit (ATGU), Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA. 5. Toyota Technological Institute at Chicago, Chicago, IL, USA. 6. Illumina Artificial Intelligence Laboratory, Illumina Inc, San Diego, CA, USA. kfarh@illumina.com.
Abstract
Millions of human genomes and exomes have been sequenced, but their clinical applications remain limited due to the difficulty of distinguishing disease-causing mutations from benign genetic variation. Here we demonstrate that common missense variants in other primate species are largely clinically benign in human, enabling pathogenic mutations to be systematically identified by the process of elimination. Using hundreds of thousands of common variants from population sequencing of six non-human primate species, we train a deep neural network that identifies pathogenic mutations in rare disease patients with 88% accuracy and enables the discovery of 14 new candidate genes in intellectual disability at genome-wide significance. Cataloging common variation from additional primate species would improve interpretation for millions of variants of uncertain significance, further advancing the clinical utility of human genome sequencing.
Millions of human genomes and exomes have been sequenced, but their clinical applications remain limited due to the difficulty of distinguishing disease-causing mutations from benign genetic variation. Here we demonstrate that common missense variants in other primate species are largely clinically benign in human, enabling pathogenic mutations to be systematically identified by the process of elimination. Using hundreds of thousands of common variants from population sequencing of six non-human primate species, we train a deep neural network that identifies pathogenic mutations in rare diseasepatients with 88% accuracy and enables the discovery of 14 new candidate genes in intellectual disability at genome-wide significance. Cataloging common variation from additional primate species would improve interpretation for millions of variants of uncertain significance, further advancing the clinical utility of human genome sequencing.
The clinical actionability of diagnostic sequencing is limited by the
difficulty of interpreting rare genetic variants in human populations and inferring
their impact on disease risk[1,2]. Because of their deleterious
effects on fitness, clinically significant genetic variants tend to be extremely
rare in the population, and for the vast majority, their effects on human health
have not been determined[3]. The
large number and rarity of these variants of uncertain clinical significance present
a formidable obstacle to the adoption of sequencing for individualized medicine and
population-wide health screening[4].Most penetrant Mendelian diseases have very low prevalence in the population,
hence the observation of a variant at high frequencies in the population is strong
evidence in favor of benign consequence[5]. Assaying common variation across diverse human populations is
an effective strategy for cataloguing benign variants[6], but the total amount of common variation in
present day humans is limited due to bottleneck events in our species’
recent history, during which a large fraction of ancestral diversity was
lost[7]. Population studies
of present day humans show a remarkable inflation from an effective population size
(N) of less than 10,000 individuals within the
last 15,000–65,000 years, and the small pool of common polymorphisms traces
back to the limited capacitance for variation in a population of this size[8]. Out of more than 70 million
potential protein-altering missense substitutions in the reference genome, only
roughly 1 in 1000 are present at greater than 0.1% overall population allele
frequency[6,9].Outside of modern human populations, chimpanzees comprise the next closest
extant species, and share 99.4% amino acid sequence identity[10]. The near-identity of
protein-coding sequence in humans and chimpanzees suggests that purifying selection
operating on chimpanzee protein-coding variants might also model the consequences on
fitness of human mutations that are identical-by-state. Because the mean time for
neutral polymorphisms to persist in the ancestral human lineage
(~4N generations) is a fraction of the
species’ divergence time (~6 mya)[11], naturally occurring chimpanzee variation explores
mutational space that is largely non-overlapping except by chance, aside from rare
instances of haplotypes maintained by balancing selection[12,13].
If polymorphisms that are identical-by-state similarly affect fitness in the two
species, the presence of a variant at high allele frequencies in chimpanzee
populations should indicate benign consequence in human, expanding the catalog of
known variants whose benign consequence has been established by purifying
selection.
Results
Common variants in other primates are largely benign in human
The recent availability of aggregated exome data, comprising 123,136
humans collected in the Exome Aggregation Consortium (ExAC) and Genome
Aggregation Database (gnomAD), allows us to measure the impact of natural
selection on missense and synonymous mutations across the allele frequency
spectrum[6]. Rare
singleton variants that are observed only once in the cohort closely match the
expected 2.2:1 missense:synonymous ratio predicted by de novo
mutation after adjusting for the effects of trinucleotide context on mutational
rate (Fig. 1a and Supplementary Fig. 1, 2)[14], but at higher allele
frequencies the number of observed missense variants decreases due to the
purging of deleterious mutations by natural selection. The gradual decrease of
missense:synonymous ratios with increasing allele frequency is consistent with a
substantial fraction of missense variants of population frequency <
0.1% having mildly deleterious consequence despite being observed in
healthy individuals[15]. These
findings support the widespread empirical practice by diagnostic labs of
filtering out variants with greater than 0.1%~1% allele
frequency as likely benign for penetrant genetic disease, aside from a handful
of well-documented exceptions due to balancing selection and founder
effects[16,17].
Figure 1
Missense: synonymous ratios across the human allele frequency
spectrum
a, All missense and synonymous variants observed in 123,136 humans
from the ExAC/gnomAD database were divided into 4 categories by allele
frequency. Shaded grey bars represent counts of synonymous variants in each
category; dark green bars represent missense variants. The height of each bar is
scaled to the number of synonymous variants in each allele frequency category
and the missense: synonymous counts and ratios are displayed after adjusting for
mutation rate. b, c, Allele frequency spectrum for
human missense and synonymous variants that are identical-by-state (IBS) with
(b) chimpanzee common and (c) chimpanzee singleton
variants. The depletion of chimpanzee missense variants at common human allele
frequencies (>0.1%) compared to rare human allele frequencies (<
0.1%) is indicated by the red box, along with accompanying
χ2 test p-values. d, As in (b)
and (c), but using human variants that are observed in at least one
of the non-human primate species. e, Counts of benign and
pathogenic missense variants in the overall ClinVar database (top row), compared
to ClinVar variants in a cohort of 30 humans sampled from ExAC/gnomAD allele
frequencies (middle row), compared to variants observed in primates (bottom
row). Conflicting benign and pathogenic assertions and variants annotated only
with uncertain significance were excluded.
We identified common chimpanzee variants that were sampled two or more
times in a cohort of 24 unrelated individuals[18]; we estimate that 99.8% of these
variants are common in the general chimpanzee population (allele frequency (AF)
> 0.1%), indicating that these variants have already passed through
the sieve of purifying selection (see Methods). We examined the human allele
frequency spectrum for the corresponding identical-by-state human variants
(Fig. 1b), excluding the extended major
histocompatibility complex region as a known region of balancing
selection[19], along
with variants lacking a one-to-one mapping in the multiple sequence alignment.
For human variants that are identical-by-state with common chimpanzee variants,
the missense:synonymous ratio is largely constant across the human allele
frequency spectrum (P > 0.5 by χ2 test),
which is consistent with absence of negative selection against common chimpanzee
variants in the human population and concordant selection coefficients on
missense variants in the two species. The low missense:synonymous ratio observed
in human variants that are identical-by-state with common chimpanzee variants is
consistent with the larger effective population size in chimpanzee
(N ~ 73,000), which enables more efficient
filtering of mildly deleterious variation[20,21].In contrast, for singleton chimpanzee variants (sampled only once in the
cohort), we observe a significant decrease in the missense:synonymous ratio at
common allele frequencies (P <
5.8×10−6; Fig.
1c), indicating that 24% of singleton chimpanzee missense
variants would be filtered by purifying selection in human populations at allele
frequencies greater than 0.1%. This depletion indicates that a
significant fraction of the chimpanzee singleton variants are rare deleterious
mutations whose damaging effects on fitness have prevented them from reaching
common allele frequencies in either species. We estimate that only 69%
of singleton variants are common (AF > 0.1%) in the general
chimpanzee population (see Methods).We next identified human variants that are identical-by-state with
variation observed in at least one of six non-human primate species. Variation
in each of the six species was ascertained from either the great ape genome
project (chimp, bonobo, gorilla, orangutan)[18] or were submitted to dbSNP from the primate genome
projects (rhesus, marmoset)[22-25], and
largely represent common variants based on the limited number of individuals
sequenced and the low missense:synonymous ratios observed for each species
(Supplementary Table
1). Similar to chimpanzee, we find that the missense:synonymous
ratios for variants from the six non-human primate species are roughly equal
across the human allele frequency spectrum, other than a mild depletion of
missense variation at common allele frequencies (Fig. 1d, Supplementary Fig. 3 and Supplementary Data File 1), which
is expected due to the inclusion of a minority of rare variants (~16%
with under 0.1% allele frequency in chimpanzee, and less in other
species due to fewer individuals sequenced; see Methods and Supplementary Note). These results
suggest that the selection coefficients on identical-by-state missense variants
are concordant within the primate lineage at least out to new world monkeys,
which are estimated to have diverged from the human ancestral lineage ~35
million years ago[26].We find that human missense variants that are identical-by-state with
observed primate variants are strongly enriched for benign consequence in the
ClinVar database[27]. After
excluding variants of uncertain significance and those with conflicting
annotations, ClinVar variants that are present in at least one non-human primate
species are annotated as Benign or Likely Benign on average 90% of the
time, compared to 35% for ClinVar missense variants in general
(P < 10−40; Fig. 1e). The pathogenicity of ClinVar annotations for
primate variants is slightly greater than that observed from sampling a
similarly sized cohort of healthy humans (~95% Benign or Likely Benign
consequence, P = 0.07; see Methods and Supplementary Note) excluding human
variants with greater than 1% allele frequency to reduce curation
bias.The field of human genetics has long relied upon model organisms to
infer the clinical impact of human mutations[28,29], but the long
evolutionary distance to most genetically tractable animal models raises
concerns about the extent to which findings on model organisms are generalizable
back to human[30]. We extended
our analysis beyond the primate lineage to include largely common variation from
four additional mammalian species (mouse, pig, goat, cow) and two species of
more distant vertebrates (chicken, zebrafish). We selected species with
sufficient genome-wide ascertainment of variation in dbSNP, and confirmed that
these are largely common variants, based on missense:synonymous ratios being
much lower than 2.2:1 (see Methods and Supplementary Note). In contrast to
our primate analyses, human missense mutations that are identical-by-state with
variation in more distant species are markedly depleted at common allele
frequencies (Fig. 2a), and the magnitude of
this depletion increases at longer evolutionary distances (Fig. 2b and Supplementary Tables 2 and 3).
Figure 2
Purifying selection on missense variants identical-by-state with other
species
a, Allele frequency spectrum for human missense and synonymous
variants that are identical-by-state with variants present in four non-primate
mammalian species (mouse, pig, goat, cow). The depletion of missense variants at
common human allele frequencies (>0.1%) is indicated by the red box,
along with the accompanying χ2 test p-value. b,
Scatter plot showing the depletion of missense variants observed in other
species at common human allele frequencies (>0.1%) versus the
species’ evolutionary distance from human, expressed in units of branch
length (mean number of substitutions per nucleotide position). The total branch
length between that species number appearing in parentheses beside each
species’ name indicates the total branch length between that species and
human. Depletion values for singleton and common variants are shown for species
where variant frequencies were available, with the exception of gorilla, which
contained related individuals. c, Counts of benign and pathogenic
missense variants in a cohort of 30 humans sampled from ExAC/gnomAD allele
frequencies (top row), compared to variants observed in primates (middle row),
and compared to variants observed in mouse, pig, goat, and cow (bottom row).
Conflicting benign and pathogenic assertions and variants annotated only with
uncertain significance were excluded. d, Scatter plot showing the
depletion of fixed missense substitutions observed in pairs of closely related
species at common human allele frequencies (>0.1%) versus the
species’evolutionary distance from human (expressed in units of mean
branch length).
The missense mutations that are deleterious in human, yet tolerated at
high allele frequencies in more distant species, indicate that the coefficients
of selection for identical-by-state missense mutations have diverged
substantially between human and more distant species. Nonetheless, the presence
of a missense variant in more distant mammals still increases the likelihood of
benign consequence, as the fraction of missense variants depleted by natural
selection at common allele frequencies is less than the ~50% depletion
observed for human missense variants in general (Fig. 1a). Consistent with these results, we find that ClinVar
missense variants that have been observed in mouse, pig, goat, and cow are
73% likely to be annotated with Benign or Likely Benign consequence,
compared to 90% for primate variation (P < 2
× 10−8; Fig.
2c), and 35% for the ClinVar database overall.To confirm that evolutionary distance, and not domestication artifact,
is the primary driving force for the divergence of the selection coefficients,
we repeated the analysis using fixed substitutions between pairs of closely
related species in lieu of intra-species polymorphisms across a broad range of
evolutionary distances (Fig. 2d, Supplementary Table 4 and
Supplementary Data File
2). We find that the depletion of human missense variants that are
identical-by-state with inter-species fixed substitutions increases with
evolutionary branch length, with no discernable difference for wild species
compared to those exposed to domestication. This concurs with earlier work in
fly and yeast[31], which found
that the number of identical-by-state fixed missense substitutions were lower
than expected by chance in divergent lineages.
A deep learning network for variant pathogenicity classification
The importance of variant classification for clinical applications has
inspired numerous attempts to use supervised machine learning to address the
problem, but these efforts have been hindered by the lack of an adequately-sized
truth dataset containing confidently labeled benign and pathogenic variants for
training[32-42]. Existing databases of human
expert curated variants do not represent the entire genome, with ~50% of
the variants in the ClinVar database coming from only 200 genes (~1% of
human protein-coding genes). Moreover, systematic studies reveal that many human
expert annotations have questionable supporting evidence[6,43], underscoring the difficulty of interpreting rare variants that
may be observed in only a single patient. Although human expert interpretation
has become increasingly rigorous[1,5], classification
guidelines are largely formulated around consensus practices, and are at risk of
reinforcing existing tendencies. To reduce human interpretation biases, recent
classifiers have been trained on common human polymorphisms or fixed
human-chimpanzee substitutions[44-47], but
these classifiers also use as their input the prediction scores of earlier
classifiers that were trained on human curated databases. Objective benchmarking
of the performance of these various methods has been elusive in the absence of
an independent, bias-free truth dataset[48].Variation from the six non-human primates (chimpanzee, bonobo, gorilla,
orangutan, rhesus, and marmoset) contributes over 300,000 unique missense
variants that are non-overlapping with common human variation, and largely
represent common variants of benign consequence that have been through the sieve
of purifying selection, greatly enlarging the training dataset available for
machine learning approaches. On average, each primate species contributes more
variants than the whole of the ClinVar database (~42,000 missense variants as of
Nov 2017, after excluding variants of uncertain significance and those with
conflicting annotations). Additionally, this content is free from biases in
human interpretation.Using a dataset consisting of common human variants (AF >
0.1%) and primate variation (Supplementary Table 5), we trained
a novel deep residual network, PrimateAI, which takes as input the amino acid
sequence flanking the variant of interest and the orthologous sequence
alignments in other species (Fig. 3a and
Supplementary Fig.
4)[49]. Unlike
existing classifiers which employ human-engineered features, our deep learning
network learns to extract features directly from primary sequence. To
incorporate information about protein structure, we trained separate networks to
predict secondary structure and solvent accessibility from sequence
alone[50,51], and then included these as sub-networks
in the full model (Fig. 3b and Supplementary Fig. 5).
Given the small number of human proteins that have been successfully
crystallized, inferring structure from primary sequence has the advantage of
avoiding biases due to incomplete protein structure and functional domain
annotation. The total depth of the network, with protein structure included, was
36 layers of convolutions, consisting of roughly 400,000 trainable
parameters.
Figure 3
Deep learning network for classification of missense variants
a, Architecture of the deep learning network for pathogenicity
prediction, PrimateAI. Predicted pathogenicity is on a scale from 0 (benign) to
1 (pathogenic). The network takes as input the human amino acid (AA) reference
and alternate sequence (51 AAs) centered at the variant, the position weight
matrix (PWM) conservation profiles calculated from 99 vertebrate species, and
b, the outputs of secondary structure and solvent accessibility
prediction deep learning networks, which predict three-state protein secondary
structure (helix—H, beta sheet—B, and coil—C) and
three-state solvent accessibility (buried—B, intermediate—I, and
exposed—E). c, Predicted pathogenicity score at each amino
acid position in the SCN2A gene, annotated for key functional domains. Plotted
along the gene is the average PrimateAI score for missense substitutions at each
amino acid position. d, Comparison of classifiers at predicting
benign consequence for a test set of 10,000 common primate variants that were
withheld from training. The y-axis represents the percentage of primate variants
correctly classified as benign, after normalizing the threshold of each
classifier to its 50th percentile score on a set of 10,000 random
variants that were matched for mutational rate. e, Distributions of
PrimateAI prediction scores for de novo missense variants
occurring in DDD patients compared to unaffected siblings, with corresponding
Wilcoxon rank-sum p-value. f, Comparison of classifiers at
separating de novo missense variants in DDD cases versus
controls. Wilcoxon rank-sum test p-values are shown for each classifier.
To train a classifier using only variants with benign labels, we framed
the prediction problem as whether a given mutation is likely to be observed as a
common variant in the population. Several factors influence the probability of
observing a variant at high allele frequencies, of which we are interested only
in deleteriousness; other factors include mutation rate, technical artifacts
such as sequencing coverage, and factors impacting neutral genetic drift such as
gene conversion[52]. We matched
each variant in the benign training set with a missense mutation that was absent
in 123,136 exomes from the ExAC database, controlling for each of these
confounding factors, and trained the deep learning network to distinguish
between benign variants and matched controls (Supplementary Fig. 6)[14]. As the number of unlabeled
variants greatly exceeds the size of the labeled benign training dataset, we
trained eight networks in parallel, each using a different set of unlabeled
variants matched to the benign training dataset, to obtain a consensus
prediction.Using only primary amino acid sequence as its input, the deep learning
network accurately assigns high pathogenicity scores to residues at critical
protein functional domains, as shown for the voltage-gated sodium channel SCN2A
(Fig. 3c), a major disease gene in
epilepsy, autism, and intellectual disability. The structure of the SCN2A
consists of four homologous repeats, each containing six transmembrane helixes
(S1–S6)[53,54]. Upon membrane depolarization,
the positively-charged S4 transmembrane helix moves towards the extracellular
side of the membrane, causing the S5/S6 pore-forming domains to open via the
S4–S5 linker. Mutations in the S4, S4–S5 linker, and S5 domains,
which are clinically associated with early onset epilepticencephalopathy[55], are
predicted by the network to have the highest pathogenicity scores in the gene,
and are depleted for variants in the healthy population (Supplementary Table 6). We also
find that the network recognizes important amino acid positions within domains,
and assigns the highest pathogenicity scores to mutations at these positions,
such as the DNA-contacting residues of transcription factors and the catalytic
residues of enzymes (Supplementary Fig. 7). To better understand how the deep learning
network derives insights into protein structure and function from primary
sequence, we visualized the trainable parameters from the first three layers of
the network. Within these layers, we observe that the network learns
correlations between the weights of different amino acids which approximate
existing measurements of amino acid distance such as Grantham score (Supplementary Fig.
8)[56-58]. The outputs of these initial
layers become the inputs for later layers, enabling the deep learning network to
construct progressively higher order representations of the data[59].We compared the performance of our network with existing classification
algorithms, using 10,000 common primate variants that were withheld from
training (Supplemental Data
File 3). Because ~50% of all newly arising human missense
variants are filtered by purifying selection at common allele frequencies (Fig. 1a), we determined the 50th-percentile
score for each classifier using randomly selected variants that were matched to
the 10,000 common primate variants by mutational rate and sequencing coverage,
and evaluated the accuracy of each classifier at that threshold (Fig. 3d, Supplementary Fig. 9a and Supplemental Data File
4). Our deep learning network (91% accuracy) surpassed the
performance of other classifiers (80% accuracy for the next best model)
at assigning benign consequence to the 10,000 withheld common primate variants.
Roughly half the improvement over existing methods comes from using the deep
learning network, and half comes from augmenting the training dataset with
primate variation, as compared to the accuracy of the network trained with human
variation data only (Fig. 3d).To test classification of variants of uncertain significance in a
clinical scenario, we evaluated the ability of the deep learning network to
distinguish between de novo mutations occurring in patients
with neurodevelopmental disorders versus healthy controls. By prevalence,
neurodevelopmental disorders constitute one of the largest categories of rare
genetic diseases[60], and recent
trio sequencing studies have implicated the central role of de
novo missense and protein truncating mutations[61-64]. We classified each confidently called de
novo missense variant in 4,293 affected individuals from the
Deciphering Developmental Disorders cohort (DDD)[65], versus de novo
missense variants from 2,517 unaffected siblings in the Simon’s Simplex
Collection cohort (SSC)[66], and
assessed the difference in prediction scores between the two distributions with
the Wilcoxon rank-sum test (Fig. 3e and
Supplementary Fig.
10). The deep learning network clearly outperforms other classifiers
on this task (P < 10−28; Fig. 3f and Supplementary Fig. 9b). Moreover,
the performance of the various classifiers on the withheld primate variant
dataset and the DDD cases vs controls dataset were correlated (Spearman
ρ = 0.57, P < 0.01),
indicating good agreement between the two datasets for evaluating pathogenicity,
despite using entirely different sources and methodologies (Supplementary Fig. 11a).We next sought to estimate the accuracy of the deep learning network at
classifying benign versus pathogenic mutations within the same gene. Given that
the DDD population largely consists of index cases of affected children without
affected first degree relatives, it is essential to show that the classifier has
not inflated its accuracy by favoring pathogenicity in genes with de
novo dominant modes of inheritance. We restricted the analysis to
605 genes that were nominally significant for disease association in the DDD
study, calculated from protein-truncating variation only (P
< 0.05)[65]. Within these
genes, de novo missense mutations are enriched 3:1 compared to
expectation (Fig. 4a), indicating that
~67% are pathogenic. The deep learning network was able to discriminate
pathogenic and benign de novo variants within the same set of
genes (P < 10−15; Fig. 4b), outperforming other methods by a large
margin (Fig. 4c and Supplementary Fig. 9c). At a binary
cutoff of ≥ 0.803 (Fig. 4d and
Supplementary Fig.
11b), 65% of de novo missense mutations in
cases are classified by the deep learning network as pathogenic, compared to
14% of de novo missense mutations in controls,
corresponding to a classification accuracy of 88% (Fig. 4e and Supplementary Fig. 11c). Given
frequent incomplete penetrance and variable expressivity in neurodevelopmental
disorders[67], this
figure likely underestimates the accuracy of our classifier due to the inclusion
of partially penetrant pathogenic variants in controls. We caution that data
from a greater diversity of disease genes are needed before generalizing these
conclusions out to all Mendelian disorders.
Figure 4
Classification accuracy within 605 DDD genes with P < 0.05
a, Enrichment of de novo missense mutations over
expectation in affected individuals from the DDD cohort within 605 associated
genes that were significant for de novo protein truncating
variation (p<0.05). b, Distributions of PrimateAI prediction
scores for de novo missense variants occurring in DDD patients
vs unaffected siblings within the 605 associated genes, with corresponding
Wilcoxon rank-sum p-value. c, Comparison of various classifiers at
separating de novo missense variants in cases vs controls
within the 605 genes. The y-axis shows the p-values of the Wilcoxon rank-sum
test for each classifier. d, Comparison of various classifiers,
shown on a Receiver Operator Characteristic (ROC) curve, with area under the
curve (AUC) indicated for each classifier. e, Classification
accuracy and AUC for each classifier. The classification accuracy shown is the
average of the true positive and true negative error rates, using the threshold
where the classifier would predict the same number of pathogenic and benign
variants as expected based on the enrichment in Fig. 4a. To take into account
that 33% of the DDD de novo missense variants represent
background, the maximum achievable AUC for a perfect classifier is indicated
with a dotted line.
Novel candidate gene discovery
Applying a threshold of ≥ 0.803 to stratify pathogenic missense
mutations increases the enrichment of de novo missense
mutations in DDDpatients from 1.5-fold to 2.2-fold, close to protein-truncating
mutations (2.5-fold), while relinquishing less than one third of the total
number of variants enriched above expectation. This substantially improves
statistical power, enabling discovery of 14 additional candidate genes in
intellectual disability, which had previously not reached the genome-wide
significance threshold in the original DDD study (Table 1). Additional clinical validation will be
necessary to confirm these candidates and understand the spectrum of their
genotype-phenotype relationships.
Table 1
Additional genes achieving genome-wide significance in intellectual
disability when considering only missense de novo mutations
(DNMs) with PrimateAI scores ≥ 0.803
Counts of protein truncating and missense DNMs are provided. P-values for gene
enrichment are shown when the statistical test was run only with missense
mutations with PrimateAI score ≥ 0.803, and when it was repeated for all
missense mutations.
HGNC symbol
Protein-truncating
variants
Missense
P-value
Phenotypic abnormalities
observed in multiple individuals
We examined the performance of various classifiers on recent human
expert-curated variants from the ClinVar database, but find that the performance
of classifiers on the ClinVar dataset was not significantly correlated with
either the withheld primate variant dataset or the DDD case vs control dataset
(P = 0.12 and P = 0.34,
respectively) (Supplementary
Fig. 12). We hypothesize that existing classifiers have biases from
human expert curation, and while these human heuristics tend to be in the right
direction, they may not be optimal. One example is the mean difference in
Grantham score between pathogenic and benign variants in ClinVar, which is twice
as large as the difference between de novo variants in DDD
cases versus controls within the 605 disease-associated genes (Table 2). In comparison, human expert curation
appears to underutilize protein structure, especially the importance of the
residue being exposed at the surface where it can be available to interact with
other molecules. We observe that both ClinVar pathogenic mutations and DDD
de novo mutations are associated with predicted
solvent-exposed residues, but that the difference in solvent accessibility
between benign and pathogenic ClinVar variants is only half that seen for DDD
cases versus controls. These findings are suggestive of ascertainment bias in
favor of factors that are more straightforward for a human expert to interpret,
such as Grantham score and conservation. Machine learning classifiers trained on
human curated databases would be expected to reinforce these tendencies.
Table 2
Comparison of the difference in Grantham score, Protein surface-exposure, and
Amino acid sequence conservation between human expert annotated variants in
ClinVar and de novo variants in DDD cases vs controls
Mean scores are shown for missense mutations with non-conflicting annotations in
the ClinVar database, and for de novo variants present in DDD
cases vs controls within 605 disease-associated genes. Protein surface-exposure
reflects the fraction of amino acids predicted as exposed residues by the
solvent accessibility neural network, and sequence conservation shows the
fraction of amino acids with sequence identity in the 100-vertebrate
alignment.
Grantham score
Protein surface-exposed
Sequence conservation
ClinVar Pathogenic variants
91.1
.53
.87
ClinVar Benign variants
67.4
.41
.54
Difference in human-expert annotations
+23.7
+.12
+.33
de novo variants in DDD
patients
84.9
.51
.90
de novo variants in healthy
controls
72.7
.29
.73
Difference in affected vs unaffected
individuals
+12.2
+.22
+.17
Discussion
Our results suggest that systematic primate population sequencing is an
effective strategy to classify the millions of human variants of uncertain
significance that currently limit clinical genome interpretation. The accuracy of
our deep learning network on both withheld common primate variants and clinical
variants increases with the number of benign variants used to train the network
(Fig. 5a). Moreover, training on variants
from each of the six non-human primate species independently contributes to
increasing the performance of the network (Fig. 5b,
c), whereas training on variants from more distant mammals negatively
impacts the performance of the network. These results support the assertion that
common primate variants are largely benign in human with respect to penetrant
Mendelian disease, while the same cannot be said of variation in more distant
species.
Figure 5
Impact of data used for training on classification accuracy
a, Deep learning networks trained with increasing numbers of primate
and human common variants up to the full dataset (385,236 variants).
Classification performance for each of the networks is benchmarked on accuracy
for the 10,000 withheld primate variants (as in Fig. 3d) and de novo variants in DDD cases vs
controls (as in Fig. 3f).
b–c, Performance of networks trained using datasets
consisting of 83,546 human common variants plus 23,380 variants from a single
primate or mammal species. Results are shown for each network trained with
different sources of common variation, b, benchmarked on 10,000
withheld primate variants, and c, on de novo
missense variants in DDD cases vs controls. d, Expected saturation
of all possible human benign missense positions by identical-by-state common
variants (> 0.1%) in the 504 extant primate species. The y-axis shows
the fraction of human missense variants observed in at least one primate
species, with CpG missense variants indicated in red, and all missense variants
indicated in blue. To simulate the common variants in each primate species, we
sampled from the set of all possible single nucleotide substitions with
replacement, matching the trinucleotide context distribution observed for common
human variants (> 0.1% allele frequency) in ExAC.
Although the number of non-human primate genomes examined in this study is
small compared to the number of human genomes and exomes that have been sequenced,
it is important to note that these additional primates contribute a disproportionate
amount of information about common benign variation. Simulations with ExAC show that
discovery of common human variants (>0.1% allele frequency) plateaus
quickly after only a few hundred individuals (Supplementary Fig. 13), and further
healthy population sequencing into the millions mainly contributes additional rare
variants. Unlike common variants, which are known to be largely clinically benign
based on allele frequency, rare variants in healthy populations may cause recessive
genetic diseases or dominant genetic diseases with incomplete penetrance. Because
each primate species carries a different pool of common variants, sequencing several
dozen members of each species is an effective strategy to systematically catalog
benign missense variation in the primate lineage. Indeed, the 134 individuals from
six non-human primate species examined in this study contribute nearly four times as
many common missense variants as the 123,136 humans from the ExAC study (Supplementary Table 5).
Primate population sequencing studies involving hundreds of individuals may be
practical even with the relatively small numbers of unrelated individuals residing
in wildlife sanctuaries and zoos, thus minimizing the disturbance to wild
populations, which is important from the standpoint of conservation and ethical
treatment of non-human primates.Present day human populations carry much lower genetic diversity than most
non-human primate species[68], with
roughly half the number of single nucleotide variants per individual as chimpanzee,
gorilla, and gibbon, and 1/3 as many variants per individual as orangutan[18]. Although genetic diversity levels
for the majority of non-human primate species are not known, the large number of
extant non-human primate species allows us to extrapolate that the majority of
possible benign human missense positions are likely to be covered by a common
variant in at least one primate species, enabling pathogenic variants to be
systematically identified by process of elimination (Fig. 5d). Even with only a subset of these species sequenced, increasing
the training data size will enable more accurate prediction of missense consequence
with machine learning. Finally, while our findings in this paper focus on missense
variation, this strategy may also be applicable for inferring the consequences of
noncoding variation, particularly in conserved regulatory regions where there is
sufficient alignment between human and primate genomes to unambiguously determine
whether a variant is identical-by-state.Of the 504 known non-human primate species, roughly 60% face
extinction due to poaching and widespread habitat loss[69]. The reduction in population size and
potential extinction of these species represents an irreplaceable loss in genetic
diversity, motivating urgency for a worldwide conservation effort that would benefit
both these unique and irreplaceable species and our own.
Online Methods
Data generation and alignment
Coordinates in the paper refer to human genome build UCSC hg19/GRCh37,
including the coordinates for variants in other species mapped to hg19 using
multiple sequence alignments. Canonical transcripts for protein-coding DNA
sequence and multiple sequence alignments of 99 vertebrate genomes and branch
length were downloaded from the UCSC genome browser[70,71](see URLs).We obtained human exome polymorphism data from the Exome Aggregation
Consortium (ExAC)/genome Aggregation Database (gnomAD exomes) v2.0[6] (see URLs). We obtained primate
variation data from the great ape genome sequencing project[18], which consisted of whole genome
sequencing data and genotypes for 24 chimpanzees, 13 bonobos, 27 gorillas and 10
orangutans. We also included variation from 35 chimpanzees from a separate study
of chimpanzee and bonobos[21],
but due to differences in variant calling methodology, we excluded these from
the population analysis, and used them only for training the deep learning
model. In addition, 16 rhesus individuals and 9 marmoset individuals were used
to assay variation in the original genome projects for these species, but
individual-level information was not available[23,24]. We obtained variation data for rhesus, marmoset, pig, cow,
goat, mouse, chicken, and zebrafish from dbSNP[25]. dbSNP also included additional
orangutan variants, which we only used for training the deep learning model,
since individual genotype information was not available for the population
analysis. To avoid effects due to balancing selection, we also excluded variants
from within the extended MHC region (chr6: 28,477,797–33,448,354) for
the population analysis.We used the multiple species alignment of 99 vertebrates to ensure
orthologous 1:1 mapping to human protein-coding regions and prevent mapping to
pseudogenes. We accepted variants as identical-by-state if they occurred in
either reference/alternative orientation. To ensure that the variant had the
same predicted protein-coding consequence in both human and the other species,
we required that the other two nucleotides in the codon are identical between
the species, for both missense and synonymous variants. Polymorphisms from each
species included in the analysis are listed in Supplementary Data File 1 and
detailed metrics are shown in Supplementary Table 1.For each of the four allele frequency categories (Fig. 1a), we used intronic sequence to estimate the
expected number of synonymous and missense variants in each of 96 possible
tri-nucleotide contexts and correct for mutational rate (Supplementary Fig. 1 and Supplementary Tables
7,8). We also separately analyzed identical-by-state CpG and non-CpG
variants, and verified that the missense: synonymous ratio was flat across the
allele frequency spectrum for both classes, indicating that our analysis holds
for both CpG and non-CpG variants, despite the large difference in their
mutation rate (Supplementary
Fig. 2 and Supplementary Note).
Depletion of human missense variants that are identical-by-state with
polymorphisms in other species
To evaluate whether variants present in other species would be tolerated
at common allele frequencies (> 0.1%) in human, we identified human
variants that were identical-by-state with variation in the other species. For
each of the variants, we assigned them to one of the four categories based on
their allele frequencies in human populations (singleton, more than
singleton~0.01%, 0.01%~0.1%, > 0.1%), and
estimated the decrease in missense: synonymous ratios (MSR) between the rare
(< 0.1%) and common (> 0.1%) variants. The depletion of
identical-by-state missense variants at common human allele frequencies (>
0.1%) indicates the fraction of variants from the other species that are
sufficiently deleterious that they would be filtered out by natural selection at
common allele frequencies in human.The missense: synonymous ratios and the percentages of depletion were
computed per species and are shown in Fig.
2b and Supplementary Table 2. In addition, for chimpanzee common variants
(Fig. 1b), chimpanzee singleton
variants (Fig. 1c), and mammal variants
(Fig. 2a), we performed the
χ2 test of homogeneity on the 2×2 contingency
table to test if the differences in missense: synonymous ratios between rare and
common variants were significant.Because sequencing was only performed on limited numbers of individuals
from the great ape genome project, we used the human allele frequency spectrum
from ExAC to estimate the fraction of sampled variants which were rare (<
0.1%) or common (> 0.1%) in the general chimpanzee
population. We sampled a cohort of 24 humans based on the ExAC allele
frequencies, and identified missense variants that were observed either once, or
more than once, in this cohort. Variants that were observed more than once had a
99.8% chance of being common (> 0.1%) in the general
population, whereas variants that were observed only once in the cohort had a
69% chance of being common in the general population.To verify that the observed depletion for missense variants in more
distant mammals was not due to a confounding effect of genes that are better
conserved, and hence more accurately aligned, we repeated the above analysis,
restricting only to genes with > 50% average nucleotide identity in
the multiple sequence alignment of 11 primates and 50 mammals compared with
human (see Supplementary Table
3). This removed ~7% of human protein-coding genes from the
analysis, without substantially affecting the results. Additionally, to ensure
that our results were not affected by issues with variant calling, or
domestication artifacts (since most of the species selected from dbSNP were
domesticated), we repeated the analyses using fixed substitutions from pairs of
closely-related species in lieu of intra-species polymorphisms (Fig. 2d, Supplementary Table 4, Supplementary Note, and
Supplementary Data File
2).
ClinVar analysis of polymorphism data for human, primates, mammals, and other
vertebrates
To examine the clinical impact of variants that are identical-by-state
with other species, we downloaded the the ClinVar database (see URLs)[27], excluding variants those that
had conflicting annotations of pathogenicity, or were only labeled as variants
of uncertain significance. Following the filtering steps shown in Supplementary Table 9,
there are a total of 24,853 missense variants in the pathogenic category and
17,775 missense variants in the benign category.We counted the number of pathogenic and benign ClinVar variants that
were identical-by-state with variation in humans, non-human primates, mammals
and other vertebrates. For human, we simulated a cohort of 30 humans, sampled
from ExAC allele frequencies. The numbers of benign and pathogenic variants for
each species are shown in Supplementary Table 10.
Generation of benign and unlabeled variants for model training
We constructed a benign training dataset of largely common benign
missense variants from human and non-human primates for machine learning. The
dataset consisted of common human variants (> 0.1% allele frequency;
83,546 variants), and variants from chimpanzee, bonobo, gorilla, and orangutan,
rhesus, and marmoset (301,690 unique primate variants). The number of benign
training variants contributed by each source is shown in Supplementary Table 5.We trained the deep learning network to discriminate between a set of
labeled benign variants and an unlabeled set of variants that were matched to
control for trinucleotide context, sequencing coverage, and alignability between
the species and human. To obtain an unlabeled training dataset, all possible
missense variants were generated from each base position of canonical coding
regions by substituting the nucleotide at the position to the other three
nucleotides. We excluded variants that were observed in the 123,136 exomes from
ExAC, and variants in start or stop codons. In total, 68,258,623 unlabeled
missense variants were generated. This was filtered to correct for regions of
poor sequencing coverage, and regions where there was not a one-to-one alignment
between human and primate genomes when selecting matched unlabeled variants for
the primate variants. We obtained a consensus prediction by training eight
models that use the same set of labeled benign variants and eight randomly
sampled sets of unlabeled variants and taking the average of their predictions.
We also set aside two randomly sampled two of 10,000 primate variants for
validation and testing, which we withheld from training (Supplementary Data File 3). For
each of these sets, we sampled 10,000 unlabeled variants that were matched by
trinucleotide context, which we used to normalize the threshold of each
classifier when comparing between different classification algorithms (Supplementary Data File
4).We assessed the classification accuracy of two versions of the deep
learning network, one trained with common human variants only, and one trained
with the full benign labeled dataset including both common human variants and
primate variants.
Architecture of the deep learning network
For each variant, the pathogenicity prediction network takes as input
the 51-length amino acid sequence centered at the variant of interest, and the
outputs of the secondary structure and solvent accessibility networks (Fig. 3a and Supplementary Fig. 4). To represent
the variant, the network receives both the 51-length reference amino acid
sequence ome and the alternative 51-length amino acid sequence with the missense
variant substituted in at the central position. Three 51-length position
frequency matrices (PFMs) are generated from multiple sequence alignments of 99
vertebrates, including one for 11 primates, one for 50 mammals excluding
primates, and one for 38 vertebrates excluding primates and mammals.The secondary structure deep learning network predicts 3-state secondary
structure at each amino acid position: alpha helix (H), beta sheet (B), and
coils (C) (Supplementary Table
11). The solvent accessibility network predicts 3-state solvent
accessibility at each amino acid position: buried (B), intermediate (I), and
exposed (E) (Supplementary
Table 12). Both networks only take the flanking amino acid sequence
as their inputs, and were trained using labels from known non-redundant crystal
structures in the Protein DataBank (Supplementary Note and Supplementary Table 13).
For the input to the pre-trained 3-state secondary structure and 3-state solvent
accessibility networks, we used a single PFM matrix generated from the multiple
sequence alignments for all 99 vertebrates, also with length 51 and depth 20.
After pre-training the networks on known crystal structures from the Protein
DataBank, the final two layers for the secondary structure and solvent models
were removed and the output of the network was directly connected to the input
of the pathogenicity model. The best testing accuracy achieved for the 3-state
secondary structure prediction model is 79.86 % (Supplementary Table 14). There was
no substantial difference when comparing the predictions of the neural network
when using DSSP-annotated[72,73] structure labels for the
approximately ~4000 human proteins that had crystal structures, versus using
predicted structure labels only (Supplementary Table 15).Both our deep learning network for pathogenicity prediction (PrimateAI)
and deep learning networks for predicting secondary structure and solvent
accessibility adopted the architecture of residual blocks[49,74]. The detailed architecture for PrimateAI is described in
Supplementary Fig.
4 and Supplementary
Table 16. The detailed architecture for the networks for predicting
secondary structure and solvent accessibility is described in Supplementary Fig. 5 and Supplementary Tables 11 and
12.
Benchmarking of classifier performance on a withheld test set of 10,000
primate variants
We used the 10,000 withheld primate variants in the test dataset to
benchmark the deep learning network as well as the other 20 previously published
classifiers[32-39,41,42,44,46,47,75-79], for which we obtained prediction scores from
dbNSFP[80] (see URLs).
The performance for each of the classifiers on the 10,000 withheld primate
variant test set is provided in Supplementary Fig. 9a. Because the
different classifiers had widely varying score distributions, we used 10,000
randomly selected unlabeled variants that were matched to the test set by
trinucleotide context to identify the 50th percentile threshold for
each classifier. We benchmarked each classifier on the fraction of variants in
the 10,000 withheld primate variant test set that were classified as benign at
the 50th percentile threshold for that classifier, to ensure fair
comparison between the methods.For each of the classifiers, the fraction of withheld primate test
variants predicted as benign using the 50th percentile threshold is
shown (Supplementary Fig.
9a and Supplementary Table 17). We also show that the performance of
PrimateAI is robust with respect to the number of aligned species at the variant
position, and generally performs well as long as sufficient conservation
information from mammals is available, which is true for most protein-coding
sequence (Supplementary Fig.
14).
Analysis of de novo variants from the DDD study
We obtained published de novo variants from the
Deciphering Developmental Disorders (DDD) study[64,65], and de novo variants from the healthy
sibling controls in the Simons Simplex Collection (SSC) autism study[66]. The DDD study provides a
confidence level for de novo variants, and we excluded variants
from the DDD dataset with a threshold of < 0.1 as potential false positives
due to variant calling errors. In total, we had 3,512 missense de
novo variants from DDD affected individuals and 1,208 missense
de novo variants from healthy controls. The canonical
transcript annotations used by UCSC for the 99-vertebrate multiple-sequence
alignment differed slightly from the transcript annotations used by DDD,
resulting in a small difference in the total counts of missense variants. We
evaluated the classification methods on their ability to discriminate between
de novo missense variants in the DDD affected individuals,
versus de novo missense variants in unaffected sibling controls
from the autism studies. For each classifier, we reported the p-value from the
Wilcoxon rank-sum test of the difference between the prediction scores for the
two distributions (Supplementary Fig. 9b, c and Supplementary Table 17).To measure the accuracy of various classifiers at distinguishing benign
and pathogenic variation within the same disease gene, we repeated the analysis
on only a set of 605 genes that were enriched for de novo
protein-truncating variation in the DDD cohort (p<0.05, Poisson exact test)
(Supplementary Table
18). Within these 605 genes, we estimated that 2/3 of the de
novo variants in the DDD dataset were pathogenic and 1/3 were
benign, based on the 3:1 enrichment of de novo missense
mutations over expectation. We assumed minimal incomplete penetrance and that
the de novo missense mutations in the healthy controls were
benign. To estimate the accuracy of each classifier to each de
novo mutations in the DDD and healthy control datasets, we
identified the threshold that produced the same number of benign or pathogenic
predictions as the empirical proportions observed in these datasets, and used
this threshold as a binary cutoff to estimate the accuracy of each classifier at
distinguishing de novo mutations in cases versus controls.To construct a receiver operator characteristics curve, we treated
pathogenic classification of de novo DDD variants as true
positive calls, and treated classification of de novo variants
in healthy controls as pathogenic as being false positive calls. Because the DDD
dataset is contains 1/3 benign de novo variants, the area under
the curve (AUC) for a theoretically perfect classifier is less than
one[81]. Hence, a
classifier with perfect separation of benign and pathogenic variants would
classify 67% of de novo variants in the DDDpatients as
true positives, 33% of de novo variants in the DDDpatients as false negatives, and 100% of de novo
variants in controls as true negatives, yielding a maximum possible AUC of 0.837
(Supplementary Fig.
10, Supplementary
Table 19, and Supplementary Note).We tested enrichment of de novo mutations in genes by
comparing the observed number of de novo mutations to the
number expected under a null mutation model[14]. We repeated the enrichment analysis performed in the
DDD study, and report genes that are newly genome-wide significant when only
counting de novo missense mutations with a PrimateAI score of
> 0.803. We adjusted the genome-wide expectation for de novo
damaging missense variation by the fraction of missense variants that meet the
PrimateAI threshold of > 0.803 (roughly ~1/5th of all possible missense
mutations genome-wide). As per the DDD study, each gene required four tests, one
testing protein truncating enrichment, and one testing enrichment of
protein-altering de novo mutations, both tested for just the
DDD cohort[65], and for a larger
meta-analysis of neurodevelopmental trio sequencing cohorts[62,63,66,82-89]. The enrichment of protein-altering de
novo mutations was combined by Fisher’s method with a test
of the clustering of missense de novo mutations within the
coding sequence (Supplementary
Tables 20, 21). The p-value for each gene was taken from the minimum
of the four tests, and genome-wide significance was determined as
P < 6.757 × 10−7
(α=0.05, 18,500 genes with four tests).
ClinVar classification accuracy
Since most of the existing classifiers are either trained directly or
indirectly on ClinVar content, such as using prediction scores from classifiers
that are trained on ClinVar, we limited analysis of the ClinVar dataset to only
use ClinVar variants that were added since 2017. There was substantial overlap
among the recent ClinVar variants and other databases, and hence we further
filtered to remove found at common allele frequencies (> 0.1%) in
ExAC, or present in HGMD, LSDB, or Uniprot[90-92].
After excluding variants annotated only as uncertain significance and those with
conflicting annotations, we were left with 177 missense variants with benign
annotation and 969 missense variants with pathogenic annotation. We scored these
ClinVar variants using both the deep learning network and ther other
classification methods. For each classifier, we identified the threshold that
produced the same number of benign or pathogenic predictions as the empirical
proportions observed in these datasets, and used this threshold as a binary
cutoff to estimate the accuracy of each classifier (Supplementary Fig. 12).
Impact of increasing training data size and using different sources of
training data
To evaluate the impact of training data size on the performance of the
deep learning network, we randomly sampled a subset of variants from the labeled
benign training set of 385,236 primate and common human variants, and kept the
underlying deep learning network architecture the same. To show that variants
from each individual primate species contributes to classification accuracy
whereas variants from each individual mammal species lower classification
accuracy, we trained deep learning networks using a training dataset consisting
of 83,546 human variants plus a constant number of randomly selected variants
for each species, again keeping the underlying network architecture the same.
The constant number of variants we added to the training set (23,380) is the
total number of variants available in the species with the lowest number of
missense variants, i.e. bonobo. We repeated the training procedures five times
to get the median performance of each classifier.
Saturation of all possible human missense mutations with increasing number of
primate populations sequenced
We investigated the expected saturation of all ~70M possible human
missense mutations by common variants present in the 504 extant primate species,
by simulating variants based on the trinucleotide context of human common
missense variants (> 0.1% allele frequency) observed in ExAC. For
each primate species, we simulated 4 times the number of common missense
variants observed in human (~83,500 missense variants with allele frequency >
0.1%), because humans have roughly half the number of variants per
individual as other primate species[13], and about ~50% of human missense variants have
been filtered out by purifying selection at > 0.1% allele frequency
(Fig. 1a and Supplementary Note).To model the fraction of human common missense variants (>
0.1% allele frequency) discovered with increasing size of human cohorts
surveyed (Supplementary Fig.
13), we sampled genotypes according to ExAC allele frequencies and
report the fraction of common variants that were observed at least once in these
simulated cohorts.
URLs
Data downloaded from UCSC genome browser: http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/alignments/knownCanonical.exonNuc.fa.gz,
http://hgdownload.soe.ucsc.edu/goldenPath/hg19/multiz100way/hg19.100way.commonNames.nh;
ExAC/gnomAD data: http://gnomad.broadinstitute.org/; ClinVar database released on
02-Nov-2017: ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/clinvar_20171029.vcf.gz;
dbNSFP: https://sites.google.com/site/jpopgen/dbNSFP; PrimateAI scores
of 70M variants: https://basespace.illumina.com/s/cPgCSmecvhb4; Life Sciences
Reporting Summary: https://www.nature.com/authors/policies/ReportingSummary.pdf
Data and code availability
Prediction scores for all 70M human missense variants on the hg19/GRCh37
genome build with the human+primate deep learning network (PrimateAI)
are publicly hosted (see URLs). For practical application of PrimateAI scores,
we recommend a threshold of > 0.8 for likely pathogenic classification, <
0.6 for likely benign, and 0.6–0.8 as intermediate, based on the
enrichment of de novo variants in cases compared to controls
(Fig. 3d).To reduce problems with circularity that have become a concern for the
field, the authors explicitly request that the prediction scores from the method
not be incorporated as a component of other classifiers, and instead ask that
interested parties employ the provided source code and data to directly train
and improve upon their own deep learning models. Similarly, the authors request
that the 10,000 withheld primate variants (Supplementary Data File 3) not be
used for training future classifiers, in order to provide the community with an
independent truth dataset for benchmarking.
Authors: Ellen M Leffler; Ziyue Gao; Susanne Pfeifer; Laure Ségurel; Adam Auton; Oliver Venn; Rory Bowden; Ronald Bontrop; Jeffrey D Wall; Guy Sella; Peter Donnelly; Gilean McVean; Molly Przeworski Journal: Science Date: 2013-02-14 Impact factor: 47.728
Authors: Ivan Iossifov; Michael Ronemus; Dan Levy; Zihua Wang; Inessa Hakker; Julie Rosenbaum; Boris Yamrom; Yoon-Ha Lee; Giuseppe Narzisi; Anthony Leotta; Jude Kendall; Ewa Grabowska; Beicong Ma; Steven Marks; Linda Rodgers; Asya Stepansky; Jennifer Troge; Peter Andrews; Mitchell Bekritsky; Kith Pradhan; Elena Ghiban; Melissa Kramer; Jennifer Parla; Ryan Demeter; Lucinda L Fulton; Robert S Fulton; Vincent J Magrini; Kenny Ye; Jennifer C Darnell; Robert B Darnell; Elaine R Mardis; Richard K Wilson; Michael C Schatz; W Richard McCombie; Michael Wigler Journal: Neuron Date: 2012-04-26 Impact factor: 17.173
Authors: Brian J O'Roak; Laura Vives; Santhosh Girirajan; Emre Karakoc; Niklas Krumm; Bradley P Coe; Roie Levy; Arthur Ko; Choli Lee; Joshua D Smith; Emily H Turner; Ian B Stanaway; Benjamin Vernot; Maika Malig; Carl Baker; Beau Reilly; Joshua M Akey; Elhanan Borenstein; Mark J Rieder; Deborah A Nickerson; Raphael Bernier; Jay Shendure; Evan E Eichler Journal: Nature Date: 2012-04-04 Impact factor: 49.962
Authors: Ivan Iossifov; Brian J O'Roak; Stephan J Sanders; Michael Ronemus; Niklas Krumm; Dan Levy; Holly A Stessman; Kali T Witherspoon; Laura Vives; Karynne E Patterson; Joshua D Smith; Bryan Paeper; Deborah A Nickerson; Jeanselle Dea; Shan Dong; Luis E Gonzalez; Jeffrey D Mandell; Shrikant M Mane; Michael T Murtha; Catherine A Sullivan; Michael F Walker; Zainulabedin Waqar; Liping Wei; A Jeremy Willsey; Boris Yamrom; Yoon-ha Lee; Ewa Grabowska; Ertugrul Dalkic; Zihua Wang; Steven Marks; Peter Andrews; Anthony Leotta; Jude Kendall; Inessa Hakker; Julie Rosenbaum; Beicong Ma; Linda Rodgers; Jennifer Troge; Giuseppe Narzisi; Seungtai Yoon; Michael C Schatz; Kenny Ye; W Richard McCombie; Jay Shendure; Evan E Eichler; Matthew W State; Michael Wigler Journal: Nature Date: 2014-10-29 Impact factor: 69.504
Authors: Dominik G Grimm; Chloé-Agathe Azencott; Fabian Aicheler; Udo Gieraths; Daniel G MacArthur; Kaitlin E Samocha; David N Cooper; Peter D Stenson; Mark J Daly; Jordan W Smoller; Laramie E Duncan; Karsten M Borgwardt Journal: Hum Mutat Date: 2015-03-26 Impact factor: 4.878
Authors: Peter D Stenson; Matthew Mort; Edward V Ball; Katy Shaw; Andrew Phillips; David N Cooper Journal: Hum Genet Date: 2014-01 Impact factor: 4.132
Authors: Cath Tyner; Galt P Barber; Jonathan Casper; Hiram Clawson; Mark Diekhans; Christopher Eisenhart; Clayton M Fischer; David Gibson; Jairo Navarro Gonzalez; Luvina Guruvadoo; Maximilian Haeussler; Steve Heitner; Angie S Hinrichs; Donna Karolchik; Brian T Lee; Christopher M Lee; Parisa Nejad; Brian J Raney; Kate R Rosenbloom; Matthew L Speir; Chris Villarreal; John Vivian; Ann S Zweig; David Haussler; Robert M Kuhn; W James Kent Journal: Nucleic Acids Res Date: 2016-11-29 Impact factor: 16.971
Authors: Kaitlin E Samocha; Elise B Robinson; Stephan J Sanders; Christine Stevens; Aniko Sabo; Lauren M McGrath; Jack A Kosmicki; Karola Rehnström; Swapan Mallick; Andrew Kirby; Dennis P Wall; Daniel G MacArthur; Stacey B Gabriel; Mark DePristo; Shaun M Purcell; Aarno Palotie; Eric Boerwinkle; Joseph D Buxbaum; Edwin H Cook; Richard A Gibbs; Gerard D Schellenberg; James S Sutcliffe; Bernie Devlin; Kathryn Roeder; Benjamin M Neale; Mark J Daly Journal: Nat Genet Date: 2014-08-03 Impact factor: 38.330
Authors: Zhengdong D Zhang; Sofiya Milman; Jhih-Rong Lin; Shayne Wierbowski; Haiyuan Yu; Nir Barzilai; Vera Gorbunova; Warren C Ladiges; Laura J Niedernhofer; Yousin Suh; Paul D Robbins; Jan Vijg Journal: Nat Metab Date: 2020-07-27