Literature DB >> 35298548

Genotyping of familial Mediterranean fever gene (MEFV)-Single nucleotide polymorphism-Comparison of Nanopore with conventional Sanger sequencing.

Jonas Schmidt^1,2,3, Sandro Berghaus¹, Frithjof Blessing^1,2, Holger Herbeck¹, Josef Blessing¹, Peter Schierack^3,4, Stefan Rödiger^3,4, Dirk Roggenbuck^3,4, Folker Wenzel².

Abstract

BACKGROUND: Through continuous innovation and improvement, Nanopore sequencing has become a powerful technology. Because of its fast processing time, low cost, and ability to generate long reads, this sequencing technique would be particularly suitable for clinical diagnostics. However, its raw data accuracy is inferior in contrast to other sequencing technologies. This constraint still results in limited use of Nanopore sequencing in the field of clinical diagnostics and requires further validation and IVD certification.
METHODS: We evaluated the performance of latest Nanopore sequencing in combination with a dedicated data-analysis pipeline for single nucleotide polymorphism (SNP) genotyping of the familial Mediterranean fever gene (MEFV) by amplicon sequencing of 47 clinical samples. Mutations in MEFV are associated with Mediterranean fever, a hereditary periodic fever syndrome. Conventional Sanger sequencing, which is commonly applied in clinical genetic diagnostics, was used as a reference method.
RESULTS: Nanopore sequencing enabled the sequencing of 10 target regions within MEFV with high read depth (median read depth 7565x) in all samples and identified a total of 435 SNPs in the whole sample collective, of which 29 were unique. Comparison of both sequencing workflows showed a near perfect agreement with no false negative calls. Precision, Recall, and F1-Score of the Nanopore sequencing workflow were > 0.99, respectively.
CONCLUSIONS: These results demonstrated the great potential of current Nanopore sequencing for application in clinical diagnostics, at least for SNP genotyping by amplicon sequencing. Other more complex applications, especially structural variant identification, require further in-depth clinical validation.

Entities: Chemical

Mesh：

Substances：
MEFV protein, human
Pyrin

Year: 2022 PMID： 35298548 PMCID： PMC8929590 DOI： 10.1371/journal.pone.0265622

Source DB: PubMed Journal: PLoS One ISSN： 1932-6203 Impact factor: 3.240

1. Introduction

Since its first description in 1996, nanopore-based deoxyribonucleic acid (DNA) sequencing has developed to one of the most powerful sequencing technologies thanks to continuous innovation and improvements [1,2]. Nowadays, different sequencing devices and protocols are commercially available rendering this technique attractive for various areas of molecular biological research and diagnostics, including metagenomics, bacterial and viral infectiology, human genomics, and cancer research [3-11]. The core components of current Nanopore sequencing devices are protein nanopores contained in a membrane [12,13]. As single DNA molecules are passed through these pores, the resulting changes in an ionic current across the membrane are used to infer the sequence of nucleic acids [11-13]. This sequencing approach offers the advantages of real-time sequencing, ultra-long read length (average read length up to 10 kb), high throughput and the possibility of base modification detection as well as native ribonucleic acid (RNA) sequencing [1,13,14]. However, a major drawback compared to other next-generation sequencing (NGS) techniques has been the comparatively high error rate [13]. Although this is a heterogenous measure, which is influenced by different parameters including sequencing instrument, sequencing protocol and sample type, Nanopore sequencing shows a distinct higher error rate (~6%) compared to PacBio sequencing (~1.5%), Illumina sequencing (~0.5%) and conventional Sanger sequencing (~0.001%) [15-19]. This is especially critical for medical applications such as single nucleotide polymorphism (SNP) genotyping, which require high sequencing accuracy to achieve reliable results [13]. Although the accuracy of Nanopore sequencing has improved considerably by optimization of the underlying sequencing chemistry and bioinformatic analysis tools, it is important to validate the technique against established gold standard methods such as Sanger sequencing to assess a possible application in medical diagnostics [13,20]. A common monogenetic autoinflammatory disease is Familial Mediterranean fever (FMF) which shows a high prevalence among Turkish, Armenian, Jewish and Arabic communities from the eastern Mediterranean region [21,22]. The disease is a clinical diagnosis and mainly characterized by recurrent fever and serositis, with amyloidosis being a severe complication in untreated individuals [22-24]. FMF is considered to be inherited autosomal recessive and is associated with point mutations (single substitutions) in the Mediterranean Fever (MEFV) gene [22,24]. This gene consists of 10 exons and is located on the short arm of chromosome 16 in minus strand orientation [22]. It encodes a 781 amino acids containing protein called pyrin, which plays a key role in apoptosis and inflammatory pathways. It is mainly expressed in neutrophils, eosinophils, dendritic cells and fibroblasts [21-23]. Mutated pyrin is thought to cause an excessive inflammatory response through uncontrolled interleukin-1 (IL-1) secretion [21,25]. After clinical diagnosis, the diseases is generally treated with colchicine, and IL-1 blockade is suggested in refractory cases [21]. Genetic testing is employed to aid in the clinical diagnosis of FMF and to screen relatives at risk [23]. This can be done either by testing for the most common mutations (targeted mutation analysis) or by sequencing of selected exons [23]. According to expert consensus guidelines for the genetic diagnosis of hereditary recurrent fevers a minimum diagnostic screen should include clearly pathogenic variants which are frequently identified in patients [26]. For FMF this incorporates the exons 2, 3, 5 and 10 of MEFV or a set of nine variants [26]. While DNA sequencing is used in most laboratories for variant analysis, targeted approaches can also be applied by using PCR based or reverse-hybridization based assays [26]. However, these targeted approaches as well as conventional Sanger sequencing suffer from the technological limitation that only a comparably small genetic target range can be covered within a single run. To overcome this limitation, NGS can be applied to sequence gene panels including not only MEFV for the diagnosis of FMF but also genes which are associated with other periodic fever syndromes like mevalonate kinase deficiency (MKD, gene MVK), tumor necrosis factor receptor-associated periodic syndrome (TRAPS, gene TNFRSF1A) and cryopyrin-associated periodic syndrome (CAPS, gene NLRP3) [26,27]. In this study, to evaluate the clinical performance of current Nanopore sequencing, we applied this sequencing technique in combination with a dedicated data analysis pipeline for SNP genotyping of selected regions of MEFV in 47 patients and validated the results against diagnostic Sanger sequencing as the gold standard method.

2. Material and methods

2.1 Clinical samples

Samples from 25 female and 22 male patients that were drawn for routine MEFV assessment were included into this study after routine testing by Sanger sequencing was performed. Median age was 12.1 years (interquartile range [IQR] 12.9). Primary blood samples were collected in EDTA collection tubes by venipuncture and stored at 4°C until further processing. The routine diagnostic workflow includes DNA isolation, polymerase chain reaction (PCR) amplification of selected targets within MEFV and Sanger sequencing as described below. Subsequent to routine Sanger sequencing, the amplicons obtained from the amplification step were pooled per sample and Nanopore sequencing was performed. All included individuals gave their written informed consent. For minor patients, written informed consent was obtained from the parents. The study followed all relevant national regulations and institutional policies, has been approved by the ethics committee of the Landesärztekammer Baden-Württemberg (F-2018-089) and complies with the World Medical Association Declaration of Helsinki regarding ethical conduct of research involving human subjects and/or animals.

2.2 DNA isolation and PCR amplification

DNA isolation from EDTA whole blood samples was performed on chemagic Prepito-D instruments (PerkinElmer, Waltham, USA) using Prepito NA Body Fluid kits (PerkinElmer) (expected yield: ~2.5 μg). PCR amplification of the MEFV target regions was performed stepwise in eight different PCR reactions using target specific primers (Biomers, Ulm, Germany), Q-Solution (Qiagen, Hilden, Germany), and the AmpliTaq Gold 360 Master Mix (ThermoFisher Scientific, Waltham, USA). The amplicons were designed to span MEFV exon 1, exon 2, exon 3, exon 4, exon 5, exon 6, exon 7/8, and exon 9/10 (S1 Table). PCR reactions were performed on an Applied Biosystems Veriti thermal cycler (ThermoFisher Scientific) (S2 and S3 Tables). Nuclease free water was included in all runs as a no template control.

2.3 Sanger sequencing

Prior to sequencing, a clean-up of the amplicons was performed by using ExoSAP-IT clean-up kits (ThermoFisher Scientific). Briefly, 7 μL PCR product were mixed with 1 μL clean-up reagent by pipetting. This reaction mix was incubated for 15 min at 37°C followed by 15 min at 80°C. Sanger sequencing of the purified amplicons was performed using the BigDye Terminator Version 3.1 kit (ThermoFisher Scientific) on an Applied Biosystems 3500 Dx Series Genetic Analyzer (ThermoFisher Scientific) according to the manufacturer’s protocol. Briefly, sequencing reactions were set up using target specific sequencing primers (Biomers) (S4 Table). After incubation on a thermal cycler, the reaction mix was cleaned by precipitation with ethanol/EDTA/sodium acetate and loaded on the instrument for capillary electrophoresis after resuspending in injection buffer. Sequencing was performed using POP-6 Polymer (ThermoFisher Scientific).

2.4 Nanopore sequencing

Prior to Nanopore sequencing, equal volumes (10 μL) of the amplicons from the target amplification step were pooled for each individual sample. DNA concentration of the pooled samples was measured on a Qubit 4 fluorometer (ThermoFisher Scientific) using the 1x dsDNA HS assay (ThermoFisher Scientific) (S5 Table). Afterwards, a 1.8x AMPure XP bead clean-up was performed according to the manufacturer’s protocol (Beckman Coulter, Brea, USA). Sequencing libraries were prepared according to the manufacturer’s protocol using native barcoding kits (EXP-NBD104, EXP-NBD114) in combination with ligation sequencing kits (SQK-LSK109) (Oxford Nanopore Technologies (ONT), Oxford, UK). The libraries were prepared with a total of 12 samples per library for each run to ensure a sufficient read count per sample and that the relative proportion of a single sample is comparable (S5 Table). DNA input per sample was 200fmol and 12.5fmol of each barcoded sample were pooled prior to sequencing. Sequencing was performed on a MinION sequencing device (ONT) for 6h using R9.4.1 flow cells (ONT). All samples were sequenced in four different runs using two flow cells. Prior to reuse, the flow cells were purged according to the manufacturer’s protocol using flow cell wash kits (EXP-WSH003) (ONT).

2.5 Sequencing data analysis

Sanger sequencing data was analyzed using SEQUENCE Pilot Software [v 3.4.2] (JSI medical systems GmbH, Ettenheim, Germany). Variants were called against the MEFV reference (ENSEMBL gene: ENSG00000103313; transcript: ENST00000219596). Identified variants were manually inspected and exported to a comma separated-values (csv) file for comparison with the Nanopore sequencing results. To analyze the Nanopore sequencing data, a dedicated data analysis pipeline was established by us and implemented into a bash shell script for automation purpose (Fig 1). Raw data in FAST5 file format was basecalled and demultiplexed using the Guppy Basecalling Software [v 5.0.11+2b6dbffa5] (ONT). Basecalling was performed using the “super-accurate” basecalling model (dna_r9.4.1_450bps_sup.cfg). Basic run quality control was performed by applying pycoQC [v 2.5.2] (github.com/tleonardi/pycoQC). To remove chimeric and low-quality reads, read filtering was done with NanoFilt [v 2.7.1] (github.com/wdecoster/nanofilt). The filter was set to keep only reads with a read length between 250 and 1200 bases and a quality score equal or larger 15. After filtering, the reads were aligned to chromosome 16 of the hg19 reference genome (NC_000016.9) using minimap2 [v 2.20-r1061] (github.com/lh3/minimap2). The resulting Sequence Alignment Map (SAM) files were sorted and indexed with Samtools [v 1.7] (github.com/samtools/samtools). Afterwards, bcftools [v 1.13] (github.com/samtools/bcftools) was used for variant calling. The tool was set to include only SNPs and skip insertions and deletions. Variant filtering was performed by applying bedtools [v 2.30.0] (github.com/arq5x/bedtools2). Only calls in MEFV regions covered by the amplicons were included into the final data set. Finally, the identified variants were annotated using ANNOVAR [v 2018-04-16] [28].

Fig 1

Data analysis pipeline applied for the assessment of the Nanopore sequencing data.

Tools used for the different tasks are shown. Step 1 to 7 were implemented in a bash shell script for automation purpose. SNP; single nucleotide polymorphism.

Data analysis pipeline applied for the assessment of the Nanopore sequencing data.

Tools used for the different tasks are shown. Step 1 to 7 were implemented in a bash shell script for automation purpose. SNP; single nucleotide polymorphism. Once the automated data analysis pipeline was complete, the results for each individual sample were manually reviewed using the Integrative Genomics Viewer [v2.10.3] (github.com/igvteam/igv).

2.6 Results comparison

Method comparison was done in R [v 3.6.3] (R Foundation for Statistical Computing, Vienna, Austria) [29]. After importing the data sets, Nanopore sequencing variant calls were compared to the Sanger sequencing reference for genomic position, nucleotide change, zygosity, amino acid position, and amino acid change. Nanopore sequencing calls were only classified as true positive (TP) if all five criteria matched to the corresponding Sanger sequencing reference. Variants without a complete match as well as variants which were missed by Nanopore sequencing were classified as false negative (FN) and variants, which were solely identified by Nanopore sequencing as false positive (FP). Based on these classifications, comparative measures including Precision (TP/(TP + FP)), Recall (TP/(FN + TP)) and F1-Score (2 * (Precision * Recall)/(Precision + Recall)) were calculated [30]. Data visualization was performed in R as well using the packages ggVennDiagram, ggplot2, gggenes, and ggpubr. Sequencing depth information was extracted from the SAM files prior to visualization using Samtools.

3. Results

To evaluate the performance of Nanopore sequencing for SNP genotyping, we performed amplicon sequencing of selected MEFV regions in 47 clinical samples using a MinION sequencing device and compared the results to conventional Sanger sequencing. By using Nanopore sequencing in combination with a dedicated data analysis pipeline, it was possible to sequence the eight amplicons covering the relevant MEFV regions of all 10 exons with a median read depth of 7565x (IQR 4025) over all 47 samples (Fig 2B). A reduced read depth was observed at the edges of individual amplicons (minimum 13x). Furthermore, differences in the median read depth between different amplicons were observed (Fig 2A). Overall, amplicon 1, 2, and 8 showed a lower median read depth compared to the remaining amplicons.

Fig 2

Visualization of the read depth distribution achieved by Nanopore sequencing.

Visualization of the read depth distribution achieved by Nanopore sequencing.

(A) Median read depth achieved by amplicon sequencing of selected regions in the MEFV gene in 47 clinical samples using a MinION sequencing device. The target regions cover the relevant regions of all 10 exons of this gene. (B) Read depth distribution in the target regions over all 47 samples. A median read depth of 7565x (IQR 4025) was achieved. Outliers with a reduced sequencing depth were observed at the edges of individual amplicons. In total, 433 SNPs were identified in the investigated sample collective by Sanger sequencing (284 heterozygous and 149 homozygous). They include 28 unique variants of which 13 are non-synonymous (Table 1). The most common non-synonymous variants include p.E148Q (40.4%), p.R202Q (34.0%), p.M694V (25.5%), p.P369S (12.8%) and p.R408Q (12.8%). In addition, the most common synonymous variants were p.R314R (76.6%), p.E474E (70.2%), p.Q476Q (70.2%), p.D510D (70.2%), and p.P588P (68.1%).

Table 1

Unique MEFV variants identified in 47 patients.

Genomic position^a	cDNA^b	Protein^c	Region	Exon^c	Count (%)	Function^d	Agreement^e
3299749	c.942C>T	p.R314R	exonic	3	36 (76.6)	S	yes
3298865	rs224212	-	intronic	-	33 (70.2)	-	yes
3297181	c.1422G>A	p.E474E	exonic	5	33 (70.2)	S	yes
3297175	c.1428A>G	p.Q476Q	exonic	5	33 (70.2)	S	yes
3297073	c.1530T>C	p.D510D	exonic	5	33 (70.2)	S	yes
3293888	c.1764G>A	p.P588P	exonic	9	32 (68.1)	S	yes
3293922	rs1231123	-	intronic	-	30 (63.8)	-	yes
3296616	rs224205	-	intronic	-	29 (61.7)	-	yes
3296429	rs224204	-	intronic	-	29 (61.7)	-	yes
3304762	c.306T>C	p.D102D	exonic	2	21 (44.7)	S	yes
3304654	c.414A>G	p.G138G	exonic	2	21 (44.7)	S	yes
3304573	c.495C>A	p.A165A	exonic	2	21 (44.7)	S	yes
3304626	c.442G>C	p.E148Q	exonic	2	19 (40.4)	NS	yes
3304463	c.605G>A	p.R202Q	exonic	2	16 (34.0)	NS	yes
3293407	c.2080A>G	p.M694V	exonic	10	12 (25.5)	NS	yes
3299586	c.1105C>T	p.P369S	exonic	3	6 (12.8)	NS	yes
3299468	c.1223G>A	p.R408Q	exonic	3	6 (12.8)	NS	yes
3293310	c.2177T>C	p.V726A	exonic	10	4 (8.5)	NS	yes
3297100	c.1503C>T	p.R501R	exonic	5	3 (6.4)	S	yes
3294246	rs77380520	-	intronic	-	3 (6.4)	-	yes
3293257	c.2230G>T	p.A744S	exonic	10	3 (6.4)	NS	yes
3293205	c.2282G>A	p.R761H	exonic	10	3 (6.4)	NS	yes
3293403	c.2084A>G	p.K695R	exonic	10	2 (4.3)	NS	yes
3293090	-	-	UTR3	-	2 (4.3)	-	no
3304380	c.688G>A	p.E230K	exonic	2	1 (2.1)	NS	yes
3304317	c.751G>A	p.E251K	exonic	2	1 (2.1)	NS	yes
3304158	c.910G>A	p.G304R	exonic	2	1 (2.1)	NS	yes
3293447	c.2040G>C	p.M680I	exonic	10	1 (2.1)	NS	yes
3293369	c.2118G>A	p.P706P	exonic	10	1 (2.1)	S	yes

aGenomic position on the hg19 reference genome (NC_000016.9).

bdbSNP identifiers are shown for variants in non-coding regions.

cAmino acid information and exon number are only shown for variants in exonic regions.

dS = synonymous; NS = non-synonymous.

eAgreement between Nanopore sequencing and initial Sanger sequencing results.

Unique MEFV variants identified in 47 patients.

Variant frequency in the sample collective under investigation is shown. One variant in two patients was only identified by Nanopore sequencing and could not be confirmed by initial Sanger sequencing. aGenomic position on the hg19 reference genome (NC_000016.9). bdbSNP identifiers are shown for variants in non-coding regions. cAmino acid information and exon number are only shown for variants in exonic regions. dS = synonymous; NS = non-synonymous. eAgreement between Nanopore sequencing and initial Sanger sequencing results. All 433 SNPs confirmed by Sanger sequencing in the sample collective were also identified by Nanopore sequencing with matching genomic position, nucleotide change, zygosity, amino acid position, and amino acid change (Fig 3). Additionally, the Nanopore sequencing results showed a transversion from guanine (G) to thymine (T) in the 3’ untranslated region (UTR) at genomic position 3293090 in two patients which has not been identified by initial Sanger sequencing (Figs 3 and S1). Read depth at this genomic position was >7000x in both cases. A data base research, including ClinVar and dbSNP, did not reveal any further information on this SNP. Remarkably, both individuals in whom this SNP was identified were related. By sequencing an additional amplicon, spanning this region, it was possible to confirm the transversion in both samples also by Sanger sequencing (S2 Fig).

Fig 3

Genetic variants which were identified in selected regions of MEFV.

Genetic variants which were identified in selected regions of MEFV.

(A) Frequency of single nucleotide polymorphisms (SNPs) identified in 47 clinical samples by Sanger and Nanopore sequencing. cDNA labels or dbSNP references are given for the most common variants. Variants with a complete agreement between Sanger and Nanopore sequencing in all 47 clinical samples are coloured in blue and differing variants are coloured in orange. (B) Gene map of MEFV and the amplicons used to sequence selected regions of this gene (S1 Table). Genomic positions on the hg19 reference genome (NC_000016.9) are shown in minus strand orientation. For further method comparison, performance parameters such as Precision, Recall, and F1-Score were calculated from the results. The SNP which was only identified by Nanopore sequencing was treated as false positive, since it was not identified during the initial diagnostic Sanger sequencing runs. Based on this assumption, the Nanopore sequencing method in comparison to Sanger sequencing showed a Precision of 0.995, a Recall of 1 and a F1-Score of 0.998.

4. Discussion

To evaluate the performance of Nanopore sequencing for SNP genotyping by amplicon sequencing, we performed a comprehensive method comparison with conventional Sanger sequencing using 47 clinical samples from patients with suspicion of FMF. The number of studies comparing Nanopore and Sanger sequencing in diagnostics has been limited [31-33]. Routine diagnostics using Sanger sequencing, the current gold standard for point-mutation detection so far, revealed the presence of various SNPs, including the non-synonymous variants p.E148Q, p.R202Q, p.M694V, p.P369S and p.R408Q in this sample collective [34]. All of these mutations have been previously described in FMF patients [22,35]. By performing Nanopore sequencing on a MinION sequencing device in combination with a dedicated data analysis pipeline, it was possible to sequence the relevant regions of all MEFV exons with a very high read depth. All variants previously identified by diagnostic Sanger sequencing were also accurately detected. Furthermore, Nanopore sequencing revealed only one SNP in two related patients, which had not been identified during initial Sanger sequencing. This SNP was located in the 3’ UTR at the edge of the amplicon covering this region. Since current Sanger sequencing is based on PCR amplification and capillary electrophoresis, poor sequence quality due to primer binding and insufficient base resolution is a very common problem at the beginning and end of an individual read [36]. Therefore, low-quality regions are trimmed prior to data analysis. For this reason, the diverging SNP is located in a region of amplicon 8, which cannot be properly sequenced by Sanger sequencing on either the forward or reverse strand. In Nanopore sequencing, a similar problem does not occur since the sequencing adapters are ligated to the ends of the PCR products during library preparation [37]. By sequencing an additional amplicon, spanning the relevant region of the 3’ UTR, we were able to confirm the transversion in both patients also by Sanger sequencing. Taking these additional results into account, our data show a complete agreement between Nanopore and Sanger sequencing. Nevertheless, a comprehensive data-base research did not reveal any information about the clinical relevance of this transversion. Since the initial diagnostic Sanger sequencing runs did not identify this variant, the corresponding variant calls were treated as false positive in the calculation of performance measures. The obtained Precision, Recall, and F1-Score of > 0.99 each demonstrate the excellent agreement between Nanopore and Sanger sequencing for SNP genotyping in our study [38]. This is consistent with other studies that also reported a high degree of agreement for various applications, especially in microbiology and cancer genomics [31,39-41]. The limitations of our study were the small sample size and the focus on targeted SNP genotyping alone. By using targeted amplicon sequencing on the MinION, we were able to sequence the relevant regions of the MEFV exons at a high read depth (median read depth 7565x). However, there is a substantial amount of variation in read depth between different amplicons within one sample and different samples. This was based on the varying DNA input and varying efficacy of the eight PCR reactions used to amplify the MEFV target regions. A more homogeneous read depth distribution could be achieved by determining the concentration of the individual amplicons prior to pooling and subsequent pooling of equimolar amounts. Although this would increase the complexity of the protocol, it would contribute to more homogenous results and probably facilitate a higher degree of multiplexing. Multiplexing of different clinical samples is a key factor in diagnostic NGS as it significantly improves cost efficiency (Table 2) [31]. According to Leija-Salazar et al. a read depth of >100x could be sufficient for accurate variant identification by Nanopore sequencing [10]. Such a threshold would remarkably increase the possible degree of multiplexing in our experimental design. However, due to the inhomogeneous read depth distribution between different amplicons we were not able to evaluate this accurately by subsampling of the data.

Table 2

Comparison of Nanopore and Sanger sequencing based on various aspects relevant for use in clinical diagnostics.

Aspect	Sanger sequencing	Nanopore sequencing
Capital costs (Instrument, Computing unit, Software)^a	High (~200000 €)	Low (~3500 €)
Price per MEFV sample [€]^b	160	75
Time to result [workdays]^c	3	3
Multiplexing	No	Yes
Data analysis	Simple	Complex
Application in clinical genetics	Reference method	Validation needed

aBased on current list prices.

bApproximate price per sample. To archive highest diagnostic accuracy, 11 sequencing reactions must be performed to sequence all target regions with Sanger sequencing, since amplicon 2 and 8 are sequenced in three and two sequencing reactions, respectively. For Nanopore sequencing, the price decreases with increasing degree of multiplexing. cIncludes DNA isolation, PCR amplification, sequencing and data analysis.

aBased on current list prices. bApproximate price per sample. To archive highest diagnostic accuracy, 11 sequencing reactions must be performed to sequence all target regions with Sanger sequencing, since amplicon 2 and 8 are sequenced in three and two sequencing reactions, respectively. For Nanopore sequencing, the price decreases with increasing degree of multiplexing. cIncludes DNA isolation, PCR amplification, sequencing and data analysis. Due to the high read depth achieved by amplicon sequencing, we were able to use bcftools for accurate variant calling. This tool employs Bayesian statistics to determine the most likely genotype [38,42]. However, modern diagnostic NGS applications mainly involve gene panel sequencing, whole exome sequencing, and whole genome sequencing [38]. Due to the obviously larger target space, the median read depth in such applications is normally much lower than in amplicon sequencing. Therefore, under these circumstances, it may be necessary to apply more modern tools for accurate variant calling, such as Nanopolish and Medaka (github.com/nanoporetech/medaka), that can handle the unique Nanopore sequencing error profile even at low read depth [43]. Further, structural variant calling including deletions, inversions, tandem duplications, insertions, transpositions, and translocations from Nanopore sequencing data requires also specialised tools [44]. Another important limitation of our study is that we did not utilize the full potential of Nanopore sequencing regarding long read sequencing. By using long reads and tiling amplicon sequencing, it should be possible to sequence the whole gene without the need of amplifying individual exons. While providing the same diagnostic information, this approach would simplify the protocol and reduce the variability in read depth distribution. Further, prior to clinical application a standardized workflow for sample processing is required. In the future, in addition to modern bioinformatic data analysis tools, recently announced innovations in nanopores and sequencing chemistry (R10.4 flow cells and Q20+ sequencing chemistry), that increase raw read accuracy, may further improve the performance of Nanopore sequencing for variant identification [45]. Furthermore, they may enable competitive use compared to other NGS technologies. As mentioned earlier, Nanopore sequencing is especially attractive compared to other technologies like Illumina sequencing, Ion Torrent sequencing or PacBio sequencing due to its fast processing time, lower costs, and ability to generate long reads [45,46]. Summarized, the results of our study show that state-of-the-art Nanopore sequencing in combination with a dedicated data analysis pipeline has a comparable performance to conventional Sanger sequencing for diagnostic SNP genotyping by amplicon sequencing in a clinical setting. Due to continuous technological improvements, after further in-depth clinical validation, this sequencing technique could be applied in clinical genomics and simplify diagnostic workflows in the future.

Screenshot from IGV showing the MEFV region in which the SNP was solely identified by Nanopore sequencing (red box) in two clinical samples.

As Sanger sequencing shows a poor sequence quality at the start and end of a read, this region cannot be sequenced properly by using the routine Sanger sequencing workflow (MF-9-3 = forward sequencing primer Exon 9/10; MF-10-6 = reverse sequencing primer Exon 9/10). By sequencing an additional amplicon, which spans the region containing the variant, it was possible to confirm the transversion in both samples also by Sanger sequencing (MF-10-2 = forward sequencing primer 3’ UTR). (TIF) Click here for additional data file.

Electropherograms from the Sanger sequencing runs of an amplicon spanning the relevant region of the 3’ UTR.

The transversion from guanine to thymine at genomic position 3293090 is clearly visible in sample 25 (A) and sample 26 (B). (TIFF) Click here for additional data file.

Specific primers used for the amplification of the targets within the MEFV gene.

(DOCX) Click here for additional data file.

PCR reaction mixes used for amplification of the targets within MEFV.

(DOCX) Click here for additional data file.

PCR reaction programs used for the amplification of the targets within the MEFV gene.

(DOCX) Click here for additional data file.

Sequencing primers which were used to sequence the individual amplicons by Sanger sequencing.

The final concentration in the reaction mix was 5 μM. (DOCX) Click here for additional data file.

Overview of the barcode assignment which was used to sequence the clinical samples on a MinION sequencing device.

47 samples were sequenced in four individual runs applying two R9.4.1 flow cells. (DOCX) Click here for additional data file. 20 Jan 2022

PONE-D-21-38344

Genotyping of familial Mediterranean fever gene (MEFV)- single nucleotide polymorphism - comparison of Nanopore with conventional Sanger sequencing

PLOS ONE Dear Dr. Roggenbuck, Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Mar 06 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file. Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'. A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'. An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'. If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter. If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols. We look forward to receiving your revised manuscript. Kind regards, J Francis Borgio, Ph.D., Academic Editor PLOS ONE Journal Requirements: When submitting your revision, we need you to address these additional requirements. 1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf 2. You indicated that you had ethical approval for your study. In your Methods section, please ensure you have also stated whether you obtained consent from parents or guardians of the minors included in the study or whether the research ethics committee or IRB specifically waived the need for their consent. 3. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. Additional Editor Comments: Well organized MS, Kindly make sure the consistency in the methods. [Note: HTML markup is below. Please do not edit.] Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Partly ********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 3. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 4. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Entitled: Genotyping of familial Mediterranean fever gene (MEFV) - single nucleotide polymorphism - comparison of Nanopore with conventional Sanger sequencing Summary: The manuscript is well written; good structured and presents the data in a comprehensive manner. It is a short, straightforward, convincing paper that provides the empirical data to move forward with using portable MinION nanopore sequencer to accurately SNP genotype Familial Mediterranean fever gene (MEFV) by amplicon sequencing using a dedicated and tested bioinformatic and data/analysis pipeline. The approach will be useful for those in the field to begin to adopt MinION sequencing. Overall Impact: This paper provides the 'bridge' to go from current practice to reliable SNP genotyping by amplicon sequencing using nanopore sequencing technology. Major Strengths: It is simple in approach and scope, and convincing. It also provides the workflow and software needed to reproduce their analysis. Major Weaknesses: No major weaknesses. Line 64. It would be useful to add the value of the error rate and a suitable citation, as it may help readers who are not familiar with how high is the Nanopore sequencing error rate. It might be even better if it is compared to Sanger sequencing, Illumina, PacBio sequencing platforms. Adding one to two sentences would be sufficient. Line 135-137. Please state how you pooled your samples, how many runs you had and on how many MinION flow cells. Line 145. The quality and resolution of Figure 1 should be improved. It would be visually nicer to have the same size of boxes for all steps. Line 186. Figure 2 can be improved; Please improve cDNA labels, they should be uniformly positioned, either in a vertical or skewed fashion. Now they are overlapping and in some places it is not clear which cDNA label comes first (i.e. c.495C>A and c.442G>C). It is also not clear in this Figure how you compare Sanger and Nanopore data. Figure 2 would better suit at Line 202 instead of Figure 4 and can completely be removed from line 186. In that case, you will need to renumber/reorder all figures. Line 202. Figure 4 is not really informative. It is already clear from the text that both Sanger and Nanopore sequencing identified all 433 SNPs. The graph is not showing or clarifying anything else. Not sure if this Figure is really needed in the main text. Line 236. I think it should be d) and not e)? Line 282. Add approximate value for capital costs in Table 2. Here you have a good opportunity to to help readers and show the big difference in costs between Sanger and Nanopore instruments. Line 299. What other technologies, be more specific (e.g. sequencing technologies, NGS technologies, or similar). Perhaps be again specific to which exactly technologies you are comparing nanopore sequencer to. Reviewer #2: General commentary: Summary: This paper assesses the accuracy of SNP genotyping in the coding region of the MEFV gene using the hand-held MinION device. Amplicon sequencing data for patient samples produced via nanopore sequencing were compared to the gold-standard for SNP genotyping (i.e., Sanger sequencing). The resultant data and subsequent analyses provide a bridge between current clinical practices and the accurate detection of disease-associated SNPs throughout the coding region of the MEFV gene using the MinION device to ultimately increase throughput and decrease cost. Major strengths: This manuscript is straightforward, well-structured and presents the data in a comprehensive manner. Moreover, the data analysis is simple in both approach and scope. If the automated pipeline developed herein is made accessible, this could be easily repeated in a clinical setting (by individuals that may not have extensive knowledge about nanopore data analysis). Major weaknesses: My main concern is that despite the clinical applicability of the research at hand, the significance of the results obtained are dampened by lack of consistency in the methods utilized (e.g., DNA input for PCR, equimolar amplicon pooling). Given that this study is largely focused on concordance of Sanger and nanopore sequencing, it is critical that the authors confirm the SNP in question using the current gold-standard. Line-by-line commentary Line 59: “This sequencing approach offers the advantages of real-time sequencing, ultra-long read length (average read length up to 10 kb), high throughput and low material requirements...” The authors mention low material requirement as an advantage of ONT sequencing but 7uL of each amplicon were used for Sanger (Line 116) and 10uL for nanopore (Line 127). This is counter to their claim here; can the authors please elaborate? Line 70: “This can be done either by testing for the most common mutations (targeted mutation analysis) or by sequencing of selected exons.” It would be helpful if the authors expanded upon the discussion surrounding current diagnostic techniques (e.g., method, number of SNPs assessed in a clinical setting) as well as the technological limitations in the introduction. Where does NGS (e.g., Illumina) fit in? Has it been used for the diagnosis of MEFV in a clinical setting? Line 74: “FMF is inherited autosomal recessive and results from point mutations (single substitutions) in the Mediterranean Fever (MEFV) gene.” Are all SNPs associated with FMF located within the exons of this gene? If not, please provide an explanation as to why the authors chose to focus exclusively on the exons (i.e., diagnostic value). Line 107: “PCR amplification of the MEFV target regions was performed stepwise in eight different PCR reactions...” How much DNA was used per reaction? Were samples quantified before amplification? Why or why not? This information is critical to the repeatability and reproducibility of this work. Lines 115 & 130: Question- Why were 2 different PCR purification methods used prior to Sanger (ExoSAP-IT) and nanopore sequencing (AMpure XP beads)? Line 134: “The libraries were prepared with an identical number of samples...” How many samples per flow cell? How did you pick this number? What was the DNA input per sample for library prep and how much of this was pooled prior to sequencing? Again, this information is critical for repeatability and reproducibility. Line 161: “Once the automated data analysis pipeline was complete...” Is the automated data analysis pipeline developed here available on github? Figure 2 & Table S1: Comment: It would be beneficial to include amplicon length. Table 2: 11 sequencing reactions must be performed to sequence all target regions with Sanger sequencing. Why are 11 sequencing reactions required when only 8 amplicons were generated via PCR? Line 249: “...it was possible to sequence the relevant regions of all MEFV exons with a very high read depth.” What do the results suggest about read depth requirements for accurate SNP genotyping? It is important to mention that this could be achieved by increasing sample per flow cell which would also decrease price per sample (maybe a future direction). However, the authors should also provide the number of samples were multiplexed in the first place. Line 263: “Nevertheless, because a comprehensive data-base research did not reveal any information about this transversion and we could not confirm its presence by Sanger sequencing...” While this is a valid explanation, supporting data is required. Would it be possible to confirm via Sanger sequencing using a different set of primers to produce an amplicon centered on the SNP in these two samples? This would significantly strengthen the results obtained and further highlight the advantages of nanopore sequencing. Line 295: Question: What benefits do R10.4 flowcells offer the region of interest? Line 271: Comment: A major flaw of this work is failure to harness the long-read capabilities of ONT sequencing platforms. Previous studies have demonstrated that amplification of the entire gene is in fact feasible. This would also bypass the need for multiple PCR reactions, reduce variability in amplicon coverage, and enabling phasing of the SNPs. Would this information provide valuable diagnostic and therapeutic value in a clinical setting? If so, these researchers should have used primers to generate overlapping amplicons that spanned the entire gene in PCR reactions containing the same amount of DNA at minimum. Because long reads are mentioned as an advantage of nanopore sequencing throughout, these points should be included as a limitation in the discussion section. The development of a standardized workflow for sample processing should also be mentioned as a requirement prior to clinical applicability to bypass these limitations. ********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No [NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.] While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step. Submitted filename: reviewer_comments_1-19-22.pdf Click here for additional data file. 8 Feb 2022 Journal Requirements: J1: Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf Reply J1: We checked the style requirements and adjusted the file naming accordingly. J2: You indicated that you had ethical approval for your study. In your Methods section, please ensure you have also stated whether you obtained consent from parents or guardians of the minors included in the study or whether the research ethics committee or IRB specifically waived the need for their consent. Reply J2: We thank the editors for this helpful comment and added the required statement to the Material & methods section. “For minor patients, written informed consent was obtained from the parents.” J3: We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide. Reply J3: Our data analysis pipeline as well as the results which were obtained from diagnostic Sanger sequencing are already publicly available from github (https://github.com/j4yo/MEFV-SNP-Genotyping-Pipeline). The Nanopore sequencing data is already uploaded to the European Nucleotide Archive and we can directly make it publicly available under accession number PRJEB49157 upon acceptance. Reviewer #1: R1-1: Line 64. It would be useful to add the value of the error rate and a suitable citation, as it may help readers who are not familiar with how high is the Nanopore sequencing error rate. It might be even better if it is compared to Sanger sequencing, Illumina, PacBio sequencing platforms. Adding one to two sentences would be sufficient. Reply 1-1: We thank the Reviewer for this helpful comment and added the following sentence to the Introduction after performing a comprehensive literature research on the error rates of different sequencing technologies: “Although this is a heterogenous measure, which is influenced by different parameters including sequencing instrument, sequencing protocol and sample type, Nanopore sequencing shows a distinct higher error rate (~6%) compared to PacBio sequencing (~1.5%), Illumina sequencing (~0.5%) and conventional Sanger sequencing (~0.001%) [15-19].” R1-2: Line 135-137. Please state how you pooled your samples, how many runs you had and on how many MinION flow cells. Reply 1-2: We appreciate the comment of the reviewer and added the missing information as follows. Further, we added an additional supplementary table (Tab. S5) to clarify the barcode assignment in each run. “The libraries were prepared with a total of 12 samples per library for each run to ensure a sufficient read count per sample and that the relative proportion of a single sample is comparable (S5 Table). DNA input per sample was 200fmol and 12.5fmol of each barcoded sample were pooled prior to sequencing. Sequencing was performed on a MinION sequencing device (ONT) for 6h using R9.4.1 flow cells (ONT). All samples were sequenced in four different runs using two flow cells. Prior to reuse, the flow cells were purged according to the manufacturer’s protocol using flow cell wash kits (EXP-WSH003) (ONT).” R1-3: Line 145. The quality and resolution of Figure 1 should be improved. It would be visually nicer to have the same size of boxes for all steps. Reply 1-3: We apologize for the poor figure quality. To solve this issue, we increased figure resolution and adjusted the size of the boxes. R1-4: Line 186. Figure 2 can be improved; Please improve cDNA labels, they should be uniformly positioned, either in a vertical or skewed fashion. Now they are overlapping and in some places it is not clear which cDNA label comes first (i.e. c.495C>A and c.442G>C). It is also not clear in this Figure how you compare Sanger and Nanopore data. Figure 2 would better suit at Line 202 instead of Figure 4 and can completely be removed from line 186. In that case, you will need to renumber/reorder all figures. Reply 1-4: We thank the reviewer for this helpful comment. To improve the appearance of Figure 2 (now Figure 3) we completely revised the cDNA labels and added coloring to clarify the comparison between Sanger and Nanopore data. Further, we changed the anchoring in the text and reordered the figures. R1-5: Line 202. Figure 4 is not really informative. It is already clear from the text that both Sanger and Nanopore sequencing identified all 433 SNPs. The graph is not showing or clarifying anything else. Not sure if this Figure is really needed in the main text. Reply 1-5: We appreciate the comment of the reviewer and removed Figure 4 to increase the clarity and structuredness of the manuscript. R1-6: Line 236. I think it should be d) and not e)? Reply 1-6: We thank the reviewer for this fine observation and corrected the table caption. R1-7: Line 282. Add approximate value for capital costs in Table 2. Here you have a good opportunity to to help readers and show the big difference in costs between Sanger and Nanopore instruments Reply 1-7: We thank the reviewer for this helpful comment and added the approximate value for capital costs, which include the sequencing device itself, computing units for data analysis as well as software based on available list prices. R1-8: Line 299. What other technologies, be more specific (e.g. sequencing technologies, NGS technologies, or similar). Perhaps be again specific to which exactly technologies you are comparing nanopore sequencer to. Reply 1-8: We appreciate the comment of the reviewer and added Illumina sequencing, Ion Torrent sequencing and PacBio sequencing as comparators as well as a suitable citation. “As mentioned earlier, Nanopore sequencing is especially attractive compared to other technologies like Illumina sequencing, Ion Torrent sequencing or PacBio sequencing due to its fast processing time, lower costs, and ability to generate long reads. [44,45]” Reviewer #2: R2-1: Line 59: “This sequencing approach offers the advantages of real-time sequencing, ultra-long read length (average read length up to 10 kb), high throughput and low material requirements...” The authors mention low material requirement as an advantage of ONT sequencing but 7uL of each amplicon were used for Sanger (Line 116) and 10uL for nanopore (Line 127). This is counter to their claim here; can the authors please elaborate? Reply2-1: We appreciate the helpful comment of the reviewer. The 10µl input of each amplicon was chosen to keep the pooling procedure of the amplicons prior to barcoding as simple as possible. In general, the Nanopore sequencing protocol from Oxford Nanopore Technologies for native barcoding requires 100 - 200 fmol of DNA input. Considering a mean amplicon length of 567 bp this equals ~35 - ~70 ng DNA mass which we interpreted as low material requirement. However, to our knowledge for an amplicon length from 500 - 1000 bp Sanger sequencing requires only around 20ng of template. Therefore, to prevent misinterpretation, we removed the low material requirement statement from the introduction. R2-2: Line 70: “This can be done either by testing for the most common mutations (targeted mutation analysis) or by sequencing of selected exons.” It would be helpful if the authors expanded upon the discussion surrounding current diagnostic techniques (e.g., method, number of SNPs assessed in a clinical setting) as well as the technological limitations in the introduction. Where does NGS (e.g., Illumina) fit in? Has it been used for the diagnosis of MEFV in a clinical setting? Reply2-2: We thank the reviewer for this helpful recommendation and add the following text to the introduction: “According to expert consensus guidelines for the genetic diagnosis of hereditary recurrent fevers a minimum diagnostic screen should include clearly pathogenic variants which are frequently identified in patients [25]. For FMF this incorporates the exons 2, 3, 5 and 10 of MEFV or a set of nine variants [25]. While DNA sequencing is used in most laboratories for variant analysis, targeted approaches can also be applied by using PCR based or reverse-hybridization based assays [25]. However, these targeted approaches as well as conventional Sanger sequencing suffer from the technological limitation that only a comparably small genetic target range can be covered within a single run. To overcome this limitation, NGS can be applied to sequence gene panels including not only MEFV for the diagnosis of FMF but also genes which are associated with other periodic fever syndromes like mevalonate kinase deficiency (MKD, gene MVK), tumor necrosis factor receptor-associated periodic syndrome (TRAPS, gene TNFRSF1A) and cryopyrin-associated periodic syndrome (CAPS, gene NLRP3) [25,26].” R2-3: Line 74: “FMF is inherited autosomal recessive and results from point mutations (single substitutions) in the Mediterranean Fever (MEFV) gene.” Are all SNPs associated with FMF located within the exons of this gene? If not, please provide an explanation as to why the authors chose to focus exclusively on the exons (i.e., diagnostic value). Reply2-3: We thank the reviewer for this helpful comment. According to the current literature, so far MEFV is the only gene which is known to be associated with FMF. However, the complexity of the FMF genetic background can not be described by a single-gene recessive model. Therefore, it can not be excluded that multiple genes and environmental factors are involved. (Ozdogan et al., Presse Med. 2019 Feb;48:e61-e76.) We focused exclusively on the exons since this is a very well established procedure which is recommended by expert committee guidelines for the genetic diagnosis of FMF. (Shinar et al., Ann Rheum Dis. 2012 Oct;71(10):1599-605) To clarify this, we corrected the introduction as follows: “The disease is a clinical diagnosis and mainly characterized by recurrent fever and serositis, with amyloidosis being a severe complication in untreated individuals [22-24]. FMF is considered to be inherited autosomal recessive and is associated with point mutations (single substitutions) in the Mediterranean Fever (MEFV) gene [22,23]. “In this study, to evaluate the clinical performance of current Nanopore sequencing, we applied this sequencing technique in combination with a dedicated data analysis pipeline for SNP genotyping of selected regions of MEFV in 47 patients and validated the results against diagnostic Sanger sequencing as the gold standard method.” R2-4: Line 107: “PCR amplification of the MEFV target regions was performed stepwise in eight different PCR reactions...” How much DNA was used per reaction? Were samples quantified before amplification? Why or why not? This information is critical to the repeatability and reproducibility of this work. Reply2-4: We appreciate this helpful comment of the reviewer. The Prepito NA Body Fluid kit, which was used for DNA isolation from whole blood samples, is very well established in our lab for routine diagnostic protocols. Since whole blood samples show a high yield in DNA isolation from our experience and Sanger sequencing is only a qualitative technique, no DNA quantification step is performed prior to amplification during this protocol. However, based on our experience from previous method validation experiments, the typical yield is around 2.5µg (25 ng/µl in 100µl elution buffer). As stated in supplementary table S2 2.5µl template are used for the PCR reactions which is equivalent to around 60ng DNA per reaction. To clarify this, we added the expected yield after DNA isolation to the Material & methods section as well as the template mass to supplementary Table S2. R2-5: Lines 115 & 130: Question- Why were 2 different PCR purification methods used prior to Sanger (ExoSAP-IT) and nanopore sequencing (AMpure XP beads)? Reply2-5: We thank the reviewer for this thoughtful question. Due to German regulations, research lab areas are spatially and logistically separated from areas used for routine diagnostics in our institute. The ExoSAP-IT kits are well established and routinely used for our diagnostic Sanger sequencing runs. AMpure XP beads on the other side are recommended by Oxford Nanopore Technologies for Nanopore sequencing experiments. Since we received the unpurified amplicons as surplus material from routine diagnostics, we used an AMPure XP bead clean-up after pooling because this was the purification method available. R2-6: Line 134: “The libraries were prepared with an identical number of samples...” How many samples per flow cell? How did you pick this number? What was the DNA input per sample for library prep and how much of this was pooled prior to sequencing? Again, this information is critical for repeatability and reproducibility. Reply2-6: We appreciate the comment of the reviewer. One library was composed of 12 samples. We picked this number to achieve sufficient reads for each sample in any case (Based on a total read count of around 3 million reads per 6h run we calculated 250000 reads per sample). We have planned with this high read count per sample to allow extensive quality filtering during data analysis if needed. The DNA input per sample was 200fmol and 12.5fmol were pooled prior to sequencing. To clarify this, we added the following to the Material & methods section: “The libraries were prepared with a total of 12 samples per library for each run to ensure a sufficient read count per sample and that the relative proportion of a single sample is comparable (S5 Table). DNA input per sample was 200fmol and 12.5fmol of each barcoded sample were pooled prior to sequencing.” R2-7: Line 161: “Once the automated data analysis pipeline was complete...” Is the automated data analysis pipeline developed here available on github? Reply2-7: As stated in the data availability statement the data analysis pipeline is available from github as a Shell script, which can be applied by experienced users (https://github.com/j4yo/MEFV-SNP-Genotyping-Pipeline). R2-8: Figure 2 & Table S1: Comment: It would be beneficial to include amplicon length. Reply2-8: We thank the reviewer for this helpful comment and included the amplicon length in Table S1 as well as a reference in the figure caption of Figure 3 (previously Figure 2). R2-9: Table 2: 11 sequencing reactions must be performed to sequence all target regions with Sanger sequencing. Why are 11 sequencing reactions required when only 8 amplicons were generated via PCR? Reply2-9: We thank the reviewer for this question. To completely resolve the relevant regions and acquire highest diagnostic accuracy the amplicon spanning exon 2 is sequenced in three different sequencing reactions using three different sequencing primers. Further, for the amplicon spanning the relevant regions of exon 9/10 it is necessary to perform two different sequencing reactions with two different primers. To clarify this, we added the following comment to Table 2. “To archive highest diagnostic accuracy, 11 sequencing reactions must be performed to sequence all target regions with Sanger sequencing, since amplicon 2 and 8 are sequenced in three and two sequencing reactions, respectively.” R2-10: Line 249: “...it was possible to sequence the relevant regions of all MEFV exons with a very high read depth.” What do the results suggest about read depth requirements for accurate SNP genotyping? It is important to mention that this could be achieved by increasing sample per flow cell which would also decrease price per sample (maybe a future direction). However, the authors should also provide the number of samples were multiplexed in the first place. Reply2-10: We thank the reviewer for this helpful comment. Unfortunately, based on our data it is difficult to make a statement on minimum read depth required for accurate SNP genotyping. During initial data analysis we tried subsampling to account for this question. However, due to the inhomogeneous read depth distribution over the different amplicons this did not lead to meaningful results. According to the literature, a read depth of >100x could be sufficient to accurately call SNPs (Leija-Salazar et al., Mol Genet Genomic Med. 2019 Mar;7(3):e564). In our case this would allow a much higher degree of multiplexing. To clarify this in the manuscript, we added the following text to the Discussion: “According to Leija-Salazar et al. a read depth of >100x could be sufficient for accurate variant identification by Nanopore sequencing [10]. Such a threshold would remarkably increase the possible degree of multiplexing in our experimental design. However, due to the inhomogeneous read depth distribution between different amplicons we were not able to evaluate this accurately by subsampling of the data.” R2-11: Line 263: “Nevertheless, because a comprehensive data-base research did not reveal any information about this transversion and we could not confirm its presence by Sanger sequencing...” While this is a valid explanation, supporting data is required. Would it be possible to confirm via Sanger sequencing using a different set of primers to produce an amplicon centered on the SNP in these two samples? This would significantly strengthen the results obtained and further highlight the advantages of nanopore sequencing. Reply2-11: We thank the reviewer for this major comment. To confirm the transversion we designed an additional set of primers to produce an amplicon which spans the region of interest and sequenced it by Sanger sequencing. Thereby, we could confirm the transversion in both samples. The primer sequences as well as PCR protocols were added to supplementary Tables S1, S2, S3 and S4. Further, the electropherograms generated by Sanger sequencing were added to the supplementary material (Figure S2) and we updated supplementary Figure S1. We added the following text to the Results and Discussion section: “By sequencing an additional amplicon, spanning this region, it was possible to confirm the transversion in both samples also by Sanger sequencing (S2 Fig).” “By sequencing an additional amplicon, spanning the relevant region of the 3’ UTR, we were able to confirm the transversion in both patients also by Sanger sequencing. Taking these additional results into account, our data show a complete agreement between Nanopore and Sanger sequencing. Nevertheless, a comprehensive data-base research did not reveal any information about the clinical relevance of this transversion. Since the initial diagnostic Sanger sequencing runs did not identify this variant, the corresponding variant calls were treated as false positive in the calculation of performance measures.” R2-12: Line 295: Question: What benefits do R10.4 flowcells offer the region of interest? Reply2-12: R10 nanopores are designed to increase homopolymer performance and thus the consensus accuracy. We assume that it would not offer a large benefit for our region of interest. Further, at the moment this new pore chemistry has the disadvantages of an increased input compared to conventional R9.4.1 flow cells and a decreased output. However, in our opinion it is an important development which might increase the overall performance of Nanopore sequencing in the future. R2-13: Line 271: Comment: A major flaw of this work is failure to harness the long-read capabilities of ONT sequencing platforms. Previous studies have demonstrated that amplification of the entire gene is in fact feasible. This would also bypass the need for multiple PCR reactions, reduce variability in amplicon coverage, and enabling phasing of the SNPs. Would this information provide valuable diagnostic and therapeutic value in a clinical setting? If so, these researchers should have used primers to generate overlapping amplicons that spanned the entire gene in PCR reactions containing the same amount of DNA at minimum. Because long reads are mentioned as an advantage of nanopore sequencing throughout, these points should be included as a limitation in the discussion section. The development of a standardized workflow for sample processing should also be mentioned as a requirement prior to clinical applicability to bypass these limitations. Reply2-13: We thank the reviewer for this major comment. As mentioned by the reviewer, sequencing of the whole gene by using long reads would simplify the protocol because a single amplification/enrichment step should be sufficient. However, since FMF is a clinical diagnosis that is corroborated by sequencing and only certain exons should be covered during genetic screening as proposed by the expert committee, in our opinion this would not provide a major diagnostic benefit in a clinical setting. Furthermore, we were not sure if the accuracy of current Nanopore sequencing would be sufficient for accurate SNP genotyping at all when we designed the study. We therefore decided to use a very well characterized reference to evaluate the performance of Nanopore sequencing in a clinical setting. We added the following section to the Discussion of the manuscript: “Another important limitation of our study is that we did not utilize the full potential of Nanopore sequencing regarding long read sequencing. By using long reads and tiling amplicon sequencing, it should be possible to sequence the whole gene without the need of amplifying individual exons. While providing the same diagnostic information, this approach would simplify the protocol and reduce the variability in read depth distribution. Further, prior to clinical application a standardized workflow for sample processing is required.” Submitted filename: Response_to_Reviewers.docx Click here for additional data file. 7 Mar 2022 Genotyping of familial Mediterranean fever gene (MEFV)- single nucleotide polymorphism - Comparison of Nanopore with conventional Sanger sequencing PONE-D-21-38344R1 Dear Dr. Roggenbuck, We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements. Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication. An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org. Kind regards, J Francis Borgio, Ph.D., Academic Editor PLOS ONE Additional Editor Comments (optional): Revised MS can be accepted Reviewers' comments: Reviewer's Responses to Questions Comments to the Author 1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation. Reviewer #1: All comments have been addressed Reviewer #2: All comments have been addressed ********** 2. Is the manuscript technically sound, and do the data support the conclusions? The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: Yes Reviewer #2: Yes ********** 3. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: Yes Reviewer #2: Yes ********** 4. Have the authors made all data underlying the findings in their manuscript fully available? The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: Yes Reviewer #2: Yes ********** 5. Is the manuscript presented in an intelligible fashion and written in standard English? PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: Yes Reviewer #2: Yes ********** 6. Review Comments to the Author Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: Thank you for thoroughly addressing the review comments; I look forward to seeing this work published. Reviewer #2: (No Response) ********** 7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files. If you choose “no”, your identity will remain anonymous but your review may still be made public. Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: No Reviewer #2: No 9 Mar 2022 PONE-D-21-38344R1 Genotyping of familial Mediterranean fever gene (MEFV)- single nucleotide polymorphism - Comparison of Nanopore with conventional Sanger sequencing Dear Dr. Roggenbuck: I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department. If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org. If we can help with anything else, please email us at plosone@plos.org. Thank you for submitting your work to PLOS ONE and supporting open access. Kind regards, PLOS ONE Editorial Office Staff on behalf of Dr. J Francis Borgio Academic Editor PLOS ONE

45 in total

1. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data.

Authors: Heng Li
Journal: Bioinformatics Date: 2011-09-08 Impact factor: 6.937

Review 2. Nanopore sequencing: Review of potential applications in functional genomics.

Authors: Nobuaki Kono; Kazuharu Arakawa
Journal: Dev Growth Differ Date: 2019-04-29 Impact factor: 2.053

3. Genetic panel screening in patients with clinically unclassified systemic autoinflammatory diseases.

Authors: Ferhat Demir; Özlem Akgün Doğan; Yasemin Kendir Demirkol; Kübra Ermiş Tekkuş; Sezin Canbek; Şerife Gül Karadağ; Hafize Emine Sönmez; Nuray Aktay Ayaz; Hamdi Levent Doğanay; Betül Sözeri
Journal: Clin Rheumatol Date: 2020-05-26 Impact factor: 2.980

4. Clinical evaluation of R202Q alteration of MEFV genes in Turkish children.

Authors: Elif Comak; Sema Akman; Mustafa Koyun; Cagla Serpil Dogan; Arife Uslu Gokceoglu; Yunus Arikan; Ibrahim Keser
Journal: Clin Rheumatol Date: 2014-04-10 Impact factor: 2.980

Review 5. The regulation of MEFV expression and its role in health and familial Mediterranean fever.

Authors: S Grandemange; I Aksentijevich; I Jeru; A Gul; I Touitou
Journal: Genes Immun Date: 2011-07-21 Impact factor: 2.676

6. NanoVar: accurate characterization of patients' genomic structural variants using low-depth nanopore sequencing.

Authors: Cheng Yong Tham; Roberto Tirado-Magallanes; Yufen Goh; Melissa J Fullwood; Bryan T H Koh; Wilson Wang; Chin Hin Ng; Wee Joo Chng; Alexandre Thiery; Daniel G Tenen; Touati Benoukraf
Journal: Genome Biol Date: 2020-03-03 Impact factor: 13.583

7. Sanger sequencing is no longer always necessary based on a single-center validation of 1109 NGS variants in 825 clinical exomes.

Authors: A Arteche-López; A Ávila-Fernández; R Romero; R Riveiro-Álvarez; M A López-Martínez; A Giménez-Pardo; C Vélez-Monsalve; J Gallego-Merlo; I García-Vara; Berta Almoguera; A Bustamante-Aragonés; F Blanco-Kelly; S Tahsin-Swafiri; E Rodríguez-Pinilla; P Minguez; I Lorda; M J Trujillo-Tiebas; C Ayuso
Journal: Sci Rep Date: 2021-03-11 Impact factor: 4.379

8. Sanger Validation of High-Throughput Sequencing in Genetic Diagnosis: Still the Best Practice?

Authors: Rosina De Cario; Ada Kura; Samuele Suraci; Alberto Magi; Andrea Volta; Rossella Marcucci; Anna Maria Gori; Guglielmina Pepe; Betti Giusti; Elena Sticchi
Journal: Front Genet Date: 2020-12-02 Impact factor: 4.599

9. Estimation of sequencing error rates in short reads.

Authors: Xin Victoria Wang; Natalie Blades; Jie Ding; Razvan Sultana; Giovanni Parmigiani
Journal: BMC Bioinformatics Date: 2012-07-30 Impact factor: 3.169

10. A Sample-to-Report Solution for Taxonomic Identification of Cultured Bacteria in the Clinical Setting Based on Nanopore Sequencing.

Authors: Stefan Moritz Neuenschwander; Miguel Angel Terrazos Miani; Heiko Amlang; Carmen Perroulaz; Pascal Bittel; Carlo Casanova; Sara Droz; Jean-Pierre Flandrois; Stephen L Leib; Franziska Suter-Riniker; Alban Ramette
Journal: J Clin Microbiol Date: 2020-05-26 Impact factor: 5.948