Literature DB >> 35091648

Structural variation analysis of 6,500 whole genome sequences in amyotrophic lateral sclerosis.

Ahmad Al Khleifat1, Alfredo Iacoangeli1,2, Joke J F A van Vugt3, Harry Bowles1, Matthieu Moisse4, Ramona A J Zwamborn3, Rick A A van der Spek3, Aleksey Shatunov1, Johnathan Cooper-Knock5, Simon Topp1, Ross Byrne6, Cinzia Gellera7, Victoria López7, Ashley R Jones1, Sarah Opie-Martin1, Atay Vural8, Yolanda Campos9, Wouter van Rheenen3, Brendan Kenna3, Kristel R Van Eijk3, Kevin Kenna3, Markus Weber10, Bradley Smith1, Isabella Fogh1, Vincenzo Silani7, Karen E Morrison11, Richard Dobson2,12, Michael A van Es3, Russell L McLaughlin6, Patrick Vourc'h13, Adriano Chio14,15, Philippe Corcia13,16, Mamede de Carvalho17, Marc Gotkine18, Monica P Panades19, Jesus S Mora20, Pamela J Shaw5, John E Landers21, Jonathan D Glass22, Christopher E Shaw1,23, Nazli Basak8, Orla Hardiman24,25, Wim Robberecht4,26, Philip Van Damme4,26, Leonard H van den Berg3, Jan H Veldink3, Ammar Al-Chalabi27,28.   

Abstract

There is a strong genetic contribution to Amyotrophic lateral sclerosis (ALS) risk, with heritability estimates of up to 60%. Both Mendelian and small effect variants have been identified, but in common with other conditions, such variants only explain a little of the heritability. Genomic structural variation might account for some of this otherwise unexplained heritability. We therefore investigated association between structural variation in a set of 25 ALS genes, and ALS risk and phenotype. As expected, the repeat expansion in the C9orf72 gene was identified as associated with ALS. Two other ALS-associated structural variants were identified: inversion in the VCP gene and insertion in the ERBB4 gene. All three variants were associated both with increased risk of ALS and specific phenotypic patterns of disease expression. More than 70% of people with respiratory onset ALS harboured ERBB4 insertion compared with 25% of the general population, suggesting respiratory onset ALS may be a distinct genetic subtype.
© 2022. The Author(s).

Entities:  

Year:  2022        PMID: 35091648      PMCID: PMC8799638          DOI: 10.1038/s41525-021-00267-9

Source DB:  PubMed          Journal:  NPJ Genom Med        ISSN: 2056-7944            Impact factor:   8.617


Introduction

Amyotrophic lateral sclerosis (ALS) is a neurodegenerative disease predominantly of motor neurons, characterized by progressive weakness of the limbs, trunk, diaphragm, and bulbar musculature, with death occurring from respiratory failure, typically within 3 years of onset. Despite the poor prognosis, there is considerable variation in the survival rate, and up to 10% of people with ALS live more than 8 years from first symptoms[1]. In about 25% of people, the first symptom is difficulty with speaking or swallowing, and in nearly all the rest, it is limb weakness. However, about 1% to 2% of people experience onset with diaphragmatic weakness and early respiratory failure[2,3]. No gene variant has been found to predispose to a specific site of onset without also predisposing to greater risk of ALS. For example, pathological hexanucleotide expansion in the C9orf72 gene, a cause of ALS, increases the risk of bulbar onset[4]. The possibility that respiratory onset ALS represents a distinct subgroup is supported by the observation that despite early diaphragm involvement, disease progression is in some cases surprisingly slow[5]. Genome-wide association studies have identified ALS risk variants that are relatively common in the population, but such alleles tend to have small effect sizes and can explain only a small proportion of heritability[6,7]. The remaining heritability is presumed to lie in other genomic variation, including rare variants, repeat sequences and structural variants, not easily tagged by SNPs. Structural variants comprise various forms of genomic imbalance such as insertions, deletions, inversions, duplications and inter-chromosomal translocations[8]. Such variants have been associated with various neurological and psychiatric diseases including Charcot-Marie-Tooth neuropathy[9], schizophrenia[10] and autism[11,12]. Attempts to understand the relationship of structural variation with ALS have been limited by sequencing technology, computational burden, and the small number of samples[13,14]. Measuring the intensity of signals derived from a genotyping array is the most used method in detecting copy number variants[15,16], but advances in sequencing technology and increased computing power have now made it feasible to study structural variation by more direct means[17]. Here, we report the analysis of structural variation in known ALS genes using 6,580 whole genome sequences and genotype-phenotype correlations, using the Project MinE whole genome sequencing and deep phenotype dataset[18].

Results

Sample characteristics

There were 6,580 whole genome sequences, reducing to 6,195 samples (4,315 from people with ALS and 1,880 controls) after quality control, with minimum ~25× coverage across each sample. Of those with ALS, 4,236 had apparently sporadic ALS and 79 had familial ALS. The male-female ratio was 2:1. Overall, 31 had cognitive impairment, 20 had ALS-frontotemporal dementia (ALS-FTD) and 63 had respiratory onset ALS. There were 4,287 people sequenced using the HiSeqX Illumina platform, and 1,908 sequenced using the HiSeq2000 platform (Table 1).
Table 1

Demographic features of the study population.

CohortSampleCaseControlFemaleMale
Belgium548368180209339
Ireland403267136161242
Netherlands28941859103511821712
Spain338233105145193
Turkey2231487587136
United Kingdom14021124278603799
United States38731671153234
Total61954315188025403655

Detailed demographic features of the study population.

Demographic features of the study population. Detailed demographic features of the study population.

Association analyses

In three of the 25 genes, structural variation was associated with ALS: C9orf72 gene hexanucleotide repeat expansion (odds ratio 28.1, 95% CI (10.45, 75.61), p = 2 × 10−16), inversion in the VCP gene (odds ratio 2.33, 95% CI (2.09, 2.61), p = 2 × 10−5) and insertion in the ERBB4 gene (odds ratio 2.55, 95% CI (2.26, 2.88), p = 3 × 10−5; (Table 2, Supplementary appendix Table 2–6). All passed the multiple testing correction threshold (p = 0.0005). Inspection of the sequences showed that there were no rare missense or loss of function variants in those with ERBB4 insertion. In two people (0.1%) with VCP inversion, such variants were found. Inspection of BAM files showed that structural variation calls in the VCF files had a corresponding appropriate change in the BAM file. Inversions and insertions were not identical between people and the p-value (p = 2 × 10−16) is the minimum p-value that R will report to the console.
Table 2

Structural variation in sporadic ALS.

Genep-valueSV-typeCases (freq)Controls (freq)Odds ratio (CI 95%)
C9orf722 × 10−16Expansion244 (0.06)4 (0.002)28.1 (10.45, 75.61)
VCP2 × 10−5Inversion2430 (0.56)669 (0.36)2.33 (2.09, 2.61)
ERBB43 × 10−5Insertion2001 (0.46)476 (0.25)2.55 (2.26, 2.88)

There were three genes in which structural variation was associated with ALS: C9orf72, VCP, and ERBB4. Odds ratio is calculated from the exponential of the beta from the regression model including principal components of ancestry and other confounders. SV structural variation, freq frequency.

Structural variation in sporadic ALS. There were three genes in which structural variation was associated with ALS: C9orf72, VCP, and ERBB4. Odds ratio is calculated from the exponential of the beta from the regression model including principal components of ancestry and other confounders. SV structural variation, freq frequency. In the 200 samples that we tested for validation, the VCP inversion was detected by Manta alone in 180 samples, Pindel alone in 170, both Manta and Pindel in 165 and by neither in 35. The ERBB4 insertion was detected by Manta alone in 120 samples, Pindel alone in 130, both Manta and Pindel in 113, and by neither in 87. Comparison of Manta 0.23.1 results and the more recent version of Manta, 1.6.0, showed no difference in the number of samples showing inversion in the VCP gene identified by either version. The same was true for insertion in the ERBB4 gene.

Age of onset and age of death analyses

The mean age of onset for all people with apparently sporadic ALS was 60.7 years (SD 11.84) and the mean age at death was 65.3 years (SD 10.61). The Kolmogorov-Smirnov test showed non-normal distributions for both datasets (p < 0.001). The test for skewness showed −0.33 for age of onset and −0.48 for age of death, indicating an approximately symmetric distribution. The mean age of onset in people with C9orf72 gene expansion was 2.7 years younger than those with no C9orf72 gene expansion (p = 8.8 × 10−8, 95% CI for the difference 1.2 to 4.2 years). The mean age of onset in people with VCP gene inversion was 3 years younger than for people with no VCP gene inversion (p = 4.2 × 10−13, 95% CI for the difference 2.2 to 3.7 years). Additionally, the mean age of onset in those with ERBB4 gene insertion was one year younger than for those with no ERBB4 insertion (p = 0.003, 95% CI for the difference 0.25 to 1.72 years). The mean age of onset in people with VCP inversion, ERBB4 insertion and C9orf72 gene expansion was 3.5 years younger than those with no with no reported structural variation in these genes (p = 0.001, 95% CI for the difference 1.3 to 5.6 years) (Table 3).
Table 3

Structural variation burden for age of onset.

SV absentSV present
GeneAge of onset (years)Age of onset (years)p-valueDifference in years
C9orf7262.058.88.8 × 10−83.2 (4.31-1.96)
VCP62.659.74.2 × 10−132.97 (2.22-3.72)
ERBB461.260.20.0031.00 (0.25-1.72)
combined group62.659.30.0013.5 (1.3 −5.6)

SV structural variation.

Structural variation burden for age of onset. SV structural variation. People with ALS and C9orf72 gene expansion died on average 3.8 years younger than people with ALS and no C9orf72 gene expansion (p = 2.3 × 10−9 95% CI for the difference 2.6 to 5.1 years). People with ALS and VCP gene inversion died on average 1.8 years younger than those with ALS and no VCP gene inversion (p = 1.4 × 10−5, 95% CI for the difference 1.0 to 2.5 years). No difference in age at death was observed between people with ALS and ERBB4 gene insertion and those with ALS and no ERBB4 gene insertion (p = 0.1). People with ALS and VCP inversion, ERBB4 insertion and C9orf72 gene expansion died on average 4.8 years younger than those with no reported structural variations in those genes (p = 5.0 × 10−4, 95% CI for the difference 1.9 to 6.7 years) (Table 4).
Table 4

Structural variation burden for age of death.

SV absentSV present
GeneAge of death (years)Age of death (years)p-valueDifference in years
C9orf7266.062.22.3 × 10−93.8 (2.64−5.10)
VCP66.764.81.4 × 10−51.8 (1.04−2.58)
ERBB465.965.00.1NA
combined group66.562.65.0×10−44.8 (1.9−6.7)

SV structural variation.

Structural variation burden for age of death. SV structural variation. A family history of ALS was associated with a younger age at onset (4 years, p = 0.02, 95% CI for the difference 0.37 to 5.96 years) and death (4.5 years, p = 0.01, 95% CI for the difference 1.1 to 7.8 years), when compared with those with no family history. However, no difference in age of onset or death was observed when those with a family history were compared against those with no family history and carrying structural variation in the C9orf72, VCP or ERBB4 genes, suggesting these genetic variations are themselves reducing the age of onset and death.

Survival analyses

Cox survival analysis showed that people with ALS and C9orf72 gene expansion had worse survival (p = 3.0 × 10−6) than people with ALS with no C9orf72 gene expansion (Supplementary Fig. 2), while people with ALS and VCP gene inversion had longer survival than those with ALS and no VCP gene inversion (p = 0.002, Supplementary Fig. 3). No difference in survival was observed between people with ALS and ERBB4 gene insertion and those with ALS and no ERBB4 gene insertion (p = 0.9) (Supplementary Fig. 4). People with C9orf72 gene expansion, VCP gene inversion, and ERBB4 gene insertion had worse survival (p = 6.7 × 10−5) than people with ALS with no overlapping structural variation in C9orf72, VCP, and ERBB4 genes (Supplementary Fig. 7).

Site of onset analyses

Multivariable linear regression showed an association between C9orf72 repeat expansion and bulbar site of onset (p = 0.01), confirming previous findings. Inversion in the VCP gene was associated with bulbar onset (p = 3.5 × 10−12) and frontotemporal dementia (p = 1.1 × 10−4). ERBB4 insertion increased the risk of ALS, and also increased the risk of respiratory onset. ERBB4 insertion was seen in 45 of the 63 people (71.4%) with respiratory onset ALS (Table 5). The odds ratio of respiratory compared with non-respiratory onset was 2.9 (95% CI 1.69-5.08; p = 6.2 × 10−5), but compared with controls, the odds ratio was 7.37 (95% CI 4.23, 12.86; p = 4.4 × 10−16). Kaplan–Meier survival analysis showed that people with ALS with respiratory onset had worse survival than those with spinal onset ALS, and better survival than those with bulbar onset ALS (log rank p = 6.6 × 10−34) (Supplementary Fig. 5) but in the subset with ERBB4 insertion there was no difference in survival (log rank p = 0.15) (Supplementary Fig. 6). ERBB4 insertion was seen in 20 of the 31 with cognitive impairment OR 2.3 CI 95% (1.09–4.98; p = 1.3 × 10−4). We could not determine whether the cognitive changes were a result of respiratory failure, frontotemporal impairment, or some other cause. Moreover, individuals who harboured multiple types of structural variation were more likely to develop FTD and cognitive changes (p = 0.001), but none had respiratory onset ALS.
Table 5

ERBB4 insertion in respiratory onset ALS.

ERBB4 insertionRespiratory onset (freq)Non-respiratory onset (freq)Controls (freq)
Present45 (0.71)1956 (0.46)476 (0.25)
Absent18 (0.29)2296 (0.53)1404 (0.75)
Total6342521880

ERBB4 insertion in respiratory onset ALS compared with non-respiratory onset ALS and controls. Freq frequency.

ERBB4 insertion in respiratory onset ALS. ERBB4 insertion in respiratory onset ALS compared with non-respiratory onset ALS and controls. Freq frequency.

Discussion

We have shown that genomic structural variants in the C9orf72, VCP, and ERBB4 genes are variously associated with ALS risk, younger age of onset, earlier age at death, specific sites of onset, and survival, highlighting the importance of structural variation events in ALS. Earlier studies, using smaller sample sizes and attempting to impute structural variation from SNP microarray data, found no evidence of a difference in global structural variation burden between ALS and controls[13,14,19]. Our study has the advantage of directly sequenced data, giving a high degree of confidence for structural variant calling, and a larger sample size, giving a higher degree of confidence for statistical analyses, although even larger studies would be ideal. As calling structural variants is dependent on the quality of sequencing data, we applied stringent quality control measures, excluding 385 samples. The final number of samples passing quality control was 6195 whole genome sequences each representing one individual, and making this one of the largest such datasets in the world. In keeping with previous findings, we have found genotype-phenotype correlations in risk genes for ALS, the most striking of which is the finding of insertion in the ERBB4 gene in 71.4% of people with respiratory onset ALS compared with 46.4% of those with non-respiratory onset, and just 25.3% of the general population. This is the largest genetic study of respiratory onset ALS, but because the frequency of respiratory onset in ALS is only about 1 to 2%[2,3,20], the absolute numbers are still small. Nevertheless, the finding is possible because such a large proportion of affected people have the same genetic variation. The odds ratio of more than seven means this is a moderately large effect, and much larger than is typically seen in association studies. Interestingly, the original ERBB4 report has a pedigree in which affected individuals had a similar mean age of onset and with one in five also having respiratory onset[21]. We also found that insertion in the ERBB4 gene was associated with cognitive change. Multiple previous studies have linked ERBB4 gene variation with FTD and cognitive or behavioural changes[22-26]. As expected, we confirmed the worse prognosis conferred by C9orf72 expansion mutation, but other phenotypic associations of C9orf72 are less well understood. Previous studies of the C9orf72 repeat expansion and onset age have led to conflicting results[27-30], and the correlation between repeat size and diagnosis is poorly understood in apparently sporadic ALS, as most studies have been in familial ALS[31-33]. We found that familial ALS is associated with a younger age of onset, consistent with previous studies, and that this is also true for those with C9orf72 expansion mutation, regardless of family history[34-36]. Furthermore, our results support previous studies finding that the frequency of the C9orf72 expansion mutation in the general population is about 0.2%[37]. Previous independent research has shown C9orf72 repeat expansion in healthy individuals in whom the expansion was confirmed by standard laboratory methods[37]. To confirm that the expansion can be seen in unaffected individuals, we have calculated the number of controls with C9orf72 repeat expansion in new data in the Project MinE dataset and found 10 more with this expansion. While it might seem strange that a major ALS risk gene should be seen in unaffected control individuals, it is in fact expected, since there is age-dependent penetrance, penetrance is incomplete, and the effects of the expansion mutation are pleiotropic, increasing risk for several conditions other than ALS. The rate we observe is similar to that seen in other studies and in public databases. VCP inversion is associated with longer survival as well as younger age of onset. Our findings also suggest that VCP structural variation might be a marker for cognitive impairment and ALS-FTD, supporting previous work showing an association of common variation in the VCP gene with FTD and cognitive impairment[26,38,39]. Although, age of onset can be a good predictor of disease course, age of onset is not determinative of age at which death occurs. We have shown that both age of onset, age at death and disease duration are highly variable between individuals and genetically influenced. The genetic associations we have found in apparently sporadic ALS are in genes previously identified from family-based studies (C9orf72, ERBB4 and VCP) supporting the notion that familial and sporadic ALS are not mutually exclusive categories but rather a spectrum[36,40-42]. Understanding the involvement of SVs in VCP and ERBB4 therefore might help in understanding disease trajectories in ALS and potentially therefore selection in clinical trials. Moreover, understanding trajectories of illness is useful for planning clinical care. Interestingly, those who harboured multiple types of structural variation were found to have a younger age of onset, younger age of death and worse survival than people with for example, C9orf72 expansion alone, implying that people with multiple mutations of large effect in ALS driver genes might need fewer than six molecular steps to develop ALS[40,41]. Given the relative frequencies of the variants, screening for VCP inversion in people with C9orf72 expansion might therefore be helpful in estimating prognosis. This study has several limitations. We have analysed whole genome sequencing data generated using two sequencing platforms, the HiSeqX Illumina platform, and the HiSeq2000 platform, which increases the possibility of a batch effect. However, cases and controls are similarly distributed between the platforms (HiSeq2500 66% cases, 34% controls and HiSeqX 70% cases and 30% controls) (p = 0.54). To overcome this potential weakness, all the samples used were sequenced at the same Illumina lab using two industry-leading sequencing platform for all samples, as well as designing the study to minimize batch effects by having cases and controls sharing the same sequencing plate, and taking sequencing platform into account as a covariate in our analyses. Furthermore, although we have assessed reported ALS-associated rare missense and loss of function variants in linkage disequilibrium with the structural variants, we cannot exclude the possibility that the differential risk and phenotypes observed could be modulated by small common single-nucleotide variants or indels in linkage disequilibrium with the structural variants. Using GeneVar SV data browser[43], the estimated frequency of VCP inversion is 0.0005 and ERBB4 insertion 0.82. However, allele frequencies tend to vary across human populations, and different SV callers may give varying results between datasets. To allow full comparison between studies therefore requires the sequencing platform, population tested and SV callers are identical[44]. Another limitation is that we restricted the analysis of structural variation to known ALS genes. Extending the analysis to the entire genome would give a comprehensive view of this type of genomic variation in ALS, but with current technology is extremely resource intensive. Finally, ALS is a disease of the central nervous system, but our WGS data are derived from leukocyte DNA, since our DNA source was whole blood, and somatic mutation affecting the nervous system cannot therefore be assayed with our method. However, our findings have the advantage of a large sample size of more than 4300 cases. Our analysis was restricted to a statistical genomics approach. Although the sample size is large, replicating our results in other ALS datasets and performing wet lab confirmation will be needed to validate the findings. We are reassured by the observation that VCF calls were matched with raw BAM file reads when tested. Analysis of structural variation shows that such genetic variations influencing site of onset also modify risk, as is true for single nucleotide variations. Our finding that 71.4% of people with respiratory onset ALS have insertion in the ERBB4 gene is an important clue to disease mechanism and factors that determine which group of motor neurons are most vulnerable at disease onset, a key issue in neurodegenerative disease research. Although the number of people with ALS with respiratory onset in our study is small compared with that for other phenotypes, this is the largest genetic study of respiratory onset ALS thus far. The finding of association is possible because of the homogeneity of the cause, corresponding to a large effect of the genetic variation identified. In this large study of structural variation in ALS using whole genome sequence data, we find a number of risk variants for ALS as well as structural variants corresponding to specific ALS phenotypes. Further work is needed to understand the mechanisms and pathways underlying these relationships.

Methods

Data sources

Samples were from the international Project MinE whole genome sequencing consortium and derived from seven countries: the USA, Ireland, Belgium, the Netherlands, Spain, Turkey, and the United Kingdom[18].

Ethical approval

Informed consent for genetic research was obtained from all participants, approved by the Trent Research Ethics Committee 08/H0405/60.

Phenotyping

Clinical information including sex, age at first symptoms, age at onset, site of onset, survival status, and disease duration, was obtained from the patient record according to standard definitions as defined by the SOPHIA standard operating procedures[45].

Whole-genome sequencing

DNA was isolated from venous blood using standard methods. DNA concentration was set at 100 ng/µl as measured by fluorimeter with the PicoGreen® dsDNA quantitation assay. DNA integrity was assessed using gel electrophoresis. All samples were sequenced using Illumina’s FastTrack services (San Diego, CA, USA) on the Illumina HiSeq 2000 (100 bp paired-end reads) and HiSeqX platforms (150 bp paired end reads)[46], using PCR-free library preparations. Binary sequence alignment/map formats (BAM) were generated for each individual. The Project MinE genomes were aligned with Isaac (Illumina) to hg19. The details of the Isaac alignment and variant calling pipelines are discussed in Project MinE design[18] and Isaac protocol[47].

Determination of pathogenic ALS gene variants

A panel of 25 ALS genes was tested (ALS2, ANG, ATXN2, C9orf72, CHCHD10, DAO, ERBB4, FUS, HNRNPA1, MOBP, NEK1, OPTN, PFN1, SCFD1, SETX, SOD1, SPG11, SQSTM1, TARDBP, TBK1, TUBA4A, UBQLN2, UNC13A, VAPB, and VCP)[1,22,48] (Table 1. Supplementary appendix) selected for harbouring large-effect, rare, Mendelian ALS gene variants or common variants showing well-replicated association. The SMN1 gene is being assessed independently within the Project MinE consortium and was therefore not included in this study. Manta V 0.23.1[49] was used for variant assembly, variant extraction, and genomic quality scoring. A VCF was then generated for each participant. As the calls for Manta in this study were done using version 0.23.1, we repeated the test in a subset of 100 samples using the most recent version of Manta V1.6.0. For validation of Manta 0.23.1 calls, we tested the main SVs using a second tool, Pindel, in 200 randomly selected samples. To calculate the number of structural variation types in each gene, an in-house pipeline was used to filter the variants according to quality score, size, and type of structural variant. Insertions with size less than 200 bp were excluded as recommended by the Manta protocol. Repeat primed PCR or Expansion Hunter-v2.5.1[50] were used to assay the hexanucleotide repeat expansion in the C9orf72 gene. In individuals with structural variation, sequences were inspected for rare missense or loss of function variants known to be associated with ALS to exclude linkage disequilibrium of the structural variant with the rare variant as an explanation of association.

Statistical analysis

The effect of structural variation on ALS risk in each gene was examined independently, assessed using multivariable linear regression after correcting for different sequencing platforms and population stratification, principal components, centre, age and sex. To test gene-gene interaction effects between the identified structural variation groups, a combined group was created for any type of structural variation to compare against individuals with no structural variation in the genes examined. For age of onset data and age of death, a test of normality was conducted using the Kolmogorov-Smirnov test of normality. Skewness values were also obtained. As the age of onset and the age of death were not normally distributed, the median age of onset and age of death between people with sporadic ALS with structural variation, people with sporadic ALS without structural variation, and people with familial ALS, was compared with the non-parametric Mann-Whitney U test with 0.95 confidence level. To estimate the size of any ascertainment bias observed, the median time between symptom onset and diagnosis was compared between those with familial ALS and those with apparently sporadic ALS in a Mann-Whitney U test. Genotype-phenotype association for site of symptom onset (bulbar muscles, limb, respiratory) and presence of cognitive impairment for each gene was examined independently, assessed using multivariable linear regression after correcting for different sequencing platforms and population stratification, principal components, centre, age and sex. To assess the effect of structural variation on survival, we used Cox regression, controlling for age of onset, sex, C9orf72 expansion status, principal components, centre and technology platform and site of disease onset (bulbar muscles, limb or respiratory, supplementary appendix Table 7). To assess survival in respiratory onset ALS we also used Kaplan-Meier survival analysis. Statistical tests were performed using IBM SPSS Statistics 24.0 (SPSS Inc., Illinois), RStudio, R Foundation for Statistical Computing 3.4.1. We tested four structural variation categories: deletion, insertion, inversion, and duplication, in 25 genes. Therefore, we used 0.0005 [0.05/(25*4)] as the Bonferroni-corrected threshold for multiple testing correction.

Quality control

There were 6,195 samples (4,315 from people with ALS and 1,880 controls) passing quality control from a total of 6580 whole genome sequences. Quality control was preformed separately on genotyped data of each population according to the Project MinE methods published previously[51]. Sample mismatch was tested using sex checks, where genetic sex was compared to reported gender. After quality control, the full set of genomic Variant Call Format files (gVCFs) were merged together by first converting the gVCFs to Plink format and then merging all files together. This generated a single dataset containing all variant sites across all individuals. Non-autosomal chromosome and multi-allelic variants were excluded from pilot analyses. Sample and SNP quality control were performed using Plink[51,52] and VCFtools[53]. To begin sample quality control, missingness by sample was calculated on a per-chromosome basis. All other sample quality control steps were performed on a set of high-quality biallelic SNPs that had minor allele frequency at least 10%, missingness < 0.1%, were linkage disequilibrium pruned at an r2 threshold of 0.2, were not A/T or C/G SNPs, did not lie in the major histocompatibility complex or lactase gene locus, and did not occur in the inversions on chromosome 8 or chromosome 17. The ~30,000 SNPs overlapping this set of SNPs and HapMap 3 were used to calculate principal components projecting the ALS cases and controls onto the HapMap 3 samples. Samples of non-European ancestry, defined as further than 10 standard deviations from the European-ancestry population principal components in HapMap 3 (CEU, people of Northern and Western European ancestry living in Utah; TSI, Tuscans in Italy), were excluded from analysis to ensure an ancestrally homogeneous group of samples for association testing. Samples with an inbreeding coefficient >3 standard deviations from the mean of the distribution were excluded, as were unexpectedly related samples. Genotypes available from genotyping on the Illumina Omni 2.5 M array were compared to sequencing genotypes, and samples with < 95% concordance were dropped from the analysis. For variant quality control, variants with missingness >5% were removed, as were variants out of Hardy-Weinberg equilibrium in controls (p < 1 × 10−6). Differential missingness between cases and controls was checked and variants with p < 1 × 10−6 were removed. Variants with extreme low or extreme high depth of coverage (> 6 standard deviations from the mean of the total depth distribution) were also excluded. Finally, the mitochondrial, X and Y chromosomes were excluded from analysis (but will be included in later analyses as sample sizes in Project MinE continue to grow). Approximately 10 million sites were lost during variant quality control. For identity-by-descent analysis, all non-singleton variants were phased using SHAPEIT2[54]. Subsequently BEAGLE 4.0[55] was used to detect likely runs of identity by descent between individuals. The hg19 recombination map obtained from the 1000 Genomes Project was used to transform genetic positions from basepairs to centimorgans (cM). Presumed identity by descent segments shorter than one cM were excluded and regions with excessive identity by descent were excluded after visual inspection.

Structural variation calling and quality control

To calculate the number of structural variation types in each gene, an in-house pipeline was used to filter the variants according to quality score, size, and type of structural variant. An in-house coverage analysis determined that 92% of the desired regions were covered by at least 5 reads. Variants called with 10–20 reads were flagged to be visually inspected to remove false positives. Furthermore, we excluded variants with poor genotyping quality, defined as variants with sequencing quality score less than 20 (out of 100) as well as variants with minimal read depth (less than 5X). In the pipeline we limited counting of the structural variation to one variant/position to avoid counting the same variants multiple times. Manta cannot detect small variants, dispersed duplications and gene expansion variants of a reference tandem repeat such as C9orf72 and ATXN2, as the power to assemble variants to break-end resolution falls to zero as break-end repeat length approaches the read size. Furthermore, the power to detect any break-end falls to almost zero as the break-end repeat length approaches the fragment size. Therefore, we used Expansion Hunter and data obtained from real time PCR to confirm C9orf72 expansion status. Following Expansion Hunter tool instructions, the genome coordinates that were used to confirm C9orf72 expansion status were chr9:27573527-27573544 and the motif GGCCCC[50]. Additionally, 29 off-target regions were also included to determine the C9orf72 repeat size. (Please refer to the Expansion Hunter Github page (https://github.com/Illumina/ExpansionHunter) for the exact coordinates of the 29 off-target regions). If 30 or more repeats was reported, an allele was considered expanded[50]. Furthermore, Manta is unable to detect inversions less than about 200 bases in size. The actual limiting size was not tested; thus, we used the size 200 bp as the threshold in the in-house pipeline and called inverted variants bigger than 200 bp. Manta also cannot detect fully assembled large insertions. Thus, the pipeline included a cut-off limit of 100,000 bp as the tool was not tested beyond this size. As the exact coordinates of inversions and insertions can differ between people, sequence overlap was required for the coding sequence to be counted. A random selection of BAM files from 30 sequences was manually inspected to ensure that VCF calls of structural variation had corresponding raw source file changes between the BAM and VCF files. A few representative IGV screenshots of the SVs are included in supplementary appendix (Supplementary Data Fig. 1).
  52 in total

Review 1.  The epidemiology of ALS: a conspiracy of genes, environment and time.

Authors:  Ammar Al-Chalabi; Orla Hardiman
Journal:  Nat Rev Neurol       Date:  2013-10-15       Impact factor: 42.937

2.  Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering.

Authors:  Sharon R Browning; Brian L Browning
Journal:  Am J Hum Genet       Date:  2007-09-21       Impact factor: 11.025

3.  Whole genome analyses reveal no pathogenetic single nucleotide or structural differences between monozygotic twins discordant for amyotrophic lateral sclerosis.

Authors:  Karyn Meltz Steinberg; Thomas J Nicholas; Daniel C Koboldt; Bing Yu; Elaine Mardis; Roger Pamphlett
Journal:  Amyotroph Lateral Scler Frontotemporal Degener       Date:  2015-05-11       Impact factor: 4.092

4.  Whole-genome sequencing in a patient with Charcot-Marie-Tooth neuropathy.

Authors:  James R Lupski; Jeffrey G Reid; Claudia Gonzaga-Jauregui; David Rio Deiros; David C Y Chen; Lynne Nazareth; Matthew Bainbridge; Huyen Dinh; Chyn Jing; David A Wheeler; Amy L McGuire; Feng Zhang; Pawel Stankiewicz; John J Halperin; Chengyong Yang; Curtis Gehman; Danwei Guo; Rola K Irikat; Warren Tom; Nick J Fantin; Donna M Muzny; Richard A Gibbs
Journal:  N Engl J Med       Date:  2010-03-10       Impact factor: 91.245

Review 5.  Valosin-containing protein disease: inclusion body myopathy with Paget's disease of the bone and fronto-temporal dementia.

Authors:  Conrad C Weihl; Alan Pestronk; Virginia E Kimonis
Journal:  Neuromuscul Disord       Date:  2009-04-19       Impact factor: 4.296

6.  A hexanucleotide repeat expansion in C9ORF72 is the cause of chromosome 9p21-linked ALS-FTD.

Authors:  Alan E Renton; Elisa Majounie; Adrian Waite; Javier Simón-Sánchez; Sara Rollinson; J Raphael Gibbs; Jennifer C Schymick; Hannu Laaksovirta; John C van Swieten; Liisa Myllykangas; Hannu Kalimo; Anders Paetau; Yevgeniya Abramzon; Anne M Remes; Alice Kaganovich; Sonja W Scholz; Jamie Duckworth; Jinhui Ding; Daniel W Harmer; Dena G Hernandez; Janel O Johnson; Kin Mok; Mina Ryten; Danyah Trabzuni; Rita J Guerreiro; Richard W Orrell; James Neal; Alex Murray; Justin Pearson; Iris E Jansen; David Sondervan; Harro Seelaar; Derek Blake; Kate Young; Nicola Halliwell; Janis Bennion Callister; Greg Toulson; Anna Richardson; Alex Gerhard; Julie Snowden; David Mann; David Neary; Michael A Nalls; Terhi Peuralinna; Lilja Jansson; Veli-Matti Isoviita; Anna-Lotta Kaivorinne; Maarit Hölttä-Vuori; Elina Ikonen; Raimo Sulkava; Michael Benatar; Joanne Wuu; Adriano Chiò; Gabriella Restagno; Giuseppe Borghero; Mario Sabatelli; David Heckerman; Ekaterina Rogaeva; Lorne Zinman; Jeffrey D Rothstein; Michael Sendtner; Carsten Drepper; Evan E Eichler; Can Alkan; Ziedulla Abdullaev; Svetlana D Pack; Amalia Dutra; Evgenia Pak; John Hardy; Andrew Singleton; Nigel M Williams; Peter Heutink; Stuart Pickering-Brown; Huw R Morris; Pentti J Tienari; Bryan J Traynor
Journal:  Neuron       Date:  2011-09-21       Impact factor: 17.173

7.  The C9orf72 repeat size correlates with onset age of disease, DNA methylation and transcriptional downregulation of the promoter.

Authors:  I Gijselinck; S Van Mossevelde; J van der Zee; A Sieben; S Engelborghs; J De Bleecker; A Ivanoiu; O Deryck; D Edbauer; M Zhang; B Heeman; V Bäumer; M Van den Broeck; M Mattheijssens; K Peeters; E Rogaeva; P De Jonghe; P Cras; J-J Martin; P P de Deyn; M Cruts; C Van Broeckhoven
Journal:  Mol Psychiatry       Date:  2015-10-20       Impact factor: 15.992

8.  A pan-European study of the C9orf72 repeat associated with FTLD: geographic prevalence, genomic instability, and intermediate repeats.

Authors:  Julie van der Zee; Ilse Gijselinck; Lubina Dillen; Tim Van Langenhove; Jessie Theuns; Sebastiaan Engelborghs; Stéphanie Philtjens; Mathieu Vandenbulcke; Kristel Sleegers; Anne Sieben; Veerle Bäumer; Githa Maes; Ellen Corsmit; Barbara Borroni; Alessandro Padovani; Silvana Archetti; Robert Perneczky; Janine Diehl-Schmid; Alexandre de Mendonça; Gabriel Miltenberger-Miltenyi; Sónia Pereira; José Pimentel; Benedetta Nacmias; Silvia Bagnoli; Sandro Sorbi; Caroline Graff; Huei-Hsin Chiang; Marie Westerlund; Raquel Sanchez-Valle; Albert Llado; Ellen Gelpi; Isabel Santana; Maria Rosário Almeida; Beatriz Santiago; Giovanni Frisoni; Orazio Zanetti; Cristian Bonvicini; Matthis Synofzik; Walter Maetzler; Jennifer Müller Vom Hagen; Ludger Schöls; Michael T Heneka; Frank Jessen; Radoslav Matej; Eva Parobkova; Gabor G Kovacs; Thomas Ströbel; Stayko Sarafov; Ivailo Tournev; Albena Jordanova; Adrian Danek; Thomas Arzberger; Gian Maria Fabrizi; Silvia Testi; Eric Salmon; Patrick Santens; Jean-Jacques Martin; Patrick Cras; Rik Vandenberghe; Peter Paul De Deyn; Marc Cruts; Christine Van Broeckhoven; Julie van der Zee; Ilse Gijselinck; Lubina Dillen; Tim Van Langenhove; Jessie Theuns; Stéphanie Philtjens; Kristel Sleegers; Veerle Bäumer; Githa Maes; Ellen Corsmit; Marc Cruts; Christine Van Broeckhoven; Julie van der Zee; Ilse Gijselinck; Lubina Dillen; Tim Van Langenhove; Stéphanie Philtjens; Jessie Theuns; Kristel Sleegers; Veerle Bäumer; Githa Maes; Marc Cruts; Christine Van Broeckhoven; Sebastiaan Engelborghs; Peter P De Deyn; Patrick Cras; Sebastiaan Engelborghs; Peter P De Deyn; Mathieu Vandenbulcke; Mathieu Vandenbulcke; Barbara Borroni; Alessandro Padovani; Silvana Archetti; Robert Perneczky; Janine Diehl-Schmid; Matthis Synofzik; Walter Maetzler; Jennifer Müller Vom Hagen; Ludger Schöls; Matthis Synofzik; Walter Maetzler; Jennifer Müller Vom Hagen; Ludger Schöls; Michael T Heneka; Frank Jessen; Alfredo Ramirez; Delia Kurzwelly; Carmen Sachtleben; Wolfgang Mairer; Alexandre de Mendonça; Gabriel Miltenberger-Miltenyi; Sónia Pereira; Clara Firmo; José Pimentel; Raquel Sanchez-Valle; Albert Llado; Anna Antonell; Jose Molinuevo; Ellen Gelpi; Caroline Graff; Huei-Hsin Chiang; Marie Westerlund; Caroline Graff; Anne Kinhult Ståhlbom; Håkan Thonberg; Inger Nennesmo; Anne Börjesson-Hanson; Benedetta Nacmias; Silvia Bagnoli; Sandro Sorbi; Valentina Bessi; Irene Piaceri; Isabel Santana; Beatriz Santiago; Isabel Santana; Maria Helena Ribeiro; Maria Rosário Almeida; Catarina Oliveira; João Massano; Carolina Garret; Paula Pires; Giovanni Frisoni; Orazio Zanetti; Cristian Bonvicini; Stayko Sarafov; Ivailo Tournev; Albena Jordanova; Ivailo Tournev; Gabor G Kovacs; Thomas Ströbel; Michael T Heneka; Frank Jessen; Alfredo Ramirez; Delia Kurzwelly; Carmen Sachtleben; Wolfgang Mairer; Frank Jessen; Radoslav Matej; Eva Parobkova; Adrian Danel; Thomas Arzberger; Gian Maria Fabrizi; Silvia Testi; Sergio Ferrari; Tiziana Cavallaro; Eric Salmon; Patrick Santens; Patrick Cras
Journal:  Hum Mutat       Date:  2013-01-04       Impact factor: 4.878

9.  ALSgeneScanner: a pipeline for the analysis and interpretation of DNA sequencing data of ALS patients.

Authors:  Alfredo Iacoangeli; Ahmad Al Khleifat; William Sproviero; Aleksey Shatunov; Ashley R Jones; Sarah Opie-Martin; Ersilia Naselli; Simon D Topp; Isabella Fogh; Angela Hodges; Richard J Dobson; Stephen J Newhouse; Ammar Al-Chalabi
Journal:  Amyotroph Lateral Scler Frontotemporal Degener       Date:  2019-03-05       Impact factor: 4.092

10.  Analysis of amyotrophic lateral sclerosis as a multistep process: a population-based modelling study.

Authors:  Ammar Al-Chalabi; Andrea Calvo; Adriano Chio; Shuna Colville; Cathy M Ellis; Orla Hardiman; Mark Heverin; Robin S Howard; Mark H B Huisman; Noa Keren; P Nigel Leigh; Letizia Mazzini; Gabriele Mora; Richard W Orrell; James Rooney; Kirsten M Scott; William J Scotton; Meinie Seelen; Christopher E Shaw; Katie S Sidle; Robert Swingler; Miho Tsuda; Jan H Veldink; Anne E Visser; Leonard H van den Berg; Neil Pearce
Journal:  Lancet Neurol       Date:  2014-10-07       Impact factor: 44.182

View more
  7 in total

Review 1.  Genetics of amyotrophic lateral sclerosis: seeking therapeutic targets in the era of gene therapy.

Authors:  Naoki Suzuki; Ayumi Nishiyama; Hitoshi Warita; Masashi Aoki
Journal:  J Hum Genet       Date:  2022-06-13       Impact factor: 3.172

2.  The third international hackathon for applying insights into large-scale genomic composition to use cases in a wide range of organisms.

Authors:  Kimberly Walker; Divya Kalra; Rebecca Lowdon; Guangyi Chen; David Molik; Daniela C Soto; Fawaz Dabbaghie; Ahmad Al Khleifat; Medhat Mahmoud; Luis F Paulin; Muhammad Sohail Raza; Susanne P Pfeifer; Daniel Paiva Agustinho; Elbay Aliyev; Pavel Avdeyev; Enrico R Barrozo; Sairam Behera; Kimberley Billingsley; Li Chuin Chong; Deepak Choubey; Wouter De Coster; Yilei Fu; Alejandro R Gener; Timothy Hefferon; David Morgan Henke; Wolfram Höps; Anastasia Illarionova; Michael D Jochum; Maria Jose; Rupesh K Kesharwani; Sree Rohit Raj Kolora; Jędrzej Kubica; Priya Lakra; Damaris Lattimer; Chia-Sin Liew; Bai-Wei Lo; Chunhsuan Lo; Anneri Lötter; Sina Majidian; Suresh Kumar Mendem; Rajarshi Mondal; Hiroko Ohmiya; Nasrin Parvin; Carolina Peralta; Chi-Lam Poon; Ramanandan Prabhakaran; Marie Saitou; Aditi Sammi; Philippe Sanio; Nicolae Sapoval; Najeeb Syed; Todd Treangen; Gaojianyong Wang; Tiancheng Xu; Jianzhi Yang; Shangzhe Zhang; Weiyu Zhou; Fritz J Sedlazeck; Ben Busby
Journal:  F1000Res       Date:  2022-05-16

Review 3.  Singling out motor neurons in the age of single-cell transcriptomics.

Authors:  Jacob A Blum; Aaron D Gitler
Journal:  Trends Genet       Date:  2022-04-26       Impact factor: 11.821

Review 4.  DNA Damage, Defective DNA Repair, and Neurodegeneration in Amyotrophic Lateral Sclerosis.

Authors:  Anna Konopka; Julie D Atkin
Journal:  Front Aging Neurosci       Date:  2022-04-27       Impact factor: 5.702

5.  Intronic NEFH variant is associated with reduced risk for sporadic ALS and later age of disease onset.

Authors:  Frances Theunissen; Ryan S Anderton; Frank L Mastaglia; Ian James; Richard Bedlack; P Anthony Akkari
Journal:  Sci Rep       Date:  2022-08-30       Impact factor: 4.996

6.  A polymorphic transcriptional regulatory domain in the amyotrophic lateral sclerosis risk gene CFAP410 correlates with differential isoform expression.

Authors:  Jack N G Marshall; Alexander Fröhlich; Li Li; Abigail L Pfaff; Ben Middlehurst; Thomas P Spargo; Alfredo Iacoangeli; Bing Lang; Ammar Al-Chalabi; Sulev Koks; Vivien J Bubb; John P Quinn
Journal:  Front Mol Neurosci       Date:  2022-09-05       Impact factor: 6.261

Review 7.  The Role of Small Heat Shock Proteins in Protein Misfolding Associated Motoneuron Diseases.

Authors:  Barbara Tedesco; Veronica Ferrari; Marta Cozzi; Marta Chierichetti; Elena Casarotto; Paola Pramaggiore; Francesco Mina; Mariarita Galbiati; Paola Rusmini; Valeria Crippa; Riccardo Cristofani; Angelo Poletti
Journal:  Int J Mol Sci       Date:  2022-10-04       Impact factor: 6.208

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.