Literature DB >> 35586190

The Brazilian Rare Genomes Project: Validation of Whole Genome Sequencing for Rare Diseases Diagnosis.

Antonio Victor Campos Coelho¹, Bruna Mascaro-Cordeiro¹, Danielle Ribeiro Lucon¹, Maria Soares Nóbrega¹, Rodrigo de Souza Reis¹, Rodrigo Bertollo de Alexandre¹, Livia Maria Silva Moura¹, Gustavo Santos de Oliveira¹, Rafael Lucas Muniz Guedes¹, Marcel Pinheiro Caraciolo¹, Nuria Bengala Zurro¹, Murilo Castro Cervato¹, João Bosco Oliveira¹.

Abstract

Rare diseases affect up to 13.2 million individuals in Brazil. The Brazilian Rare Genomes Project is envisioned to further the implementation of genomic medicine into the Brazilian public healthcare system. Here we report the validation results of a whole genome sequencing (WGS) procedure for implementation in clinical laboratories. In addition, we report data quality for the first 1,200 real-world patients sequenced. We sequenced a well-characterized group of 76 samples, including seven gold standard genomes, using a PCR-free WGS protocol on Illumina Novaseq 6,000 equipment. We compared the observed variant calls with their expected calls, observing good concordance for single nucleotide variants (SNVs; mean F-measure = 99.82%) and indels (mean F-measure = 99.57%). Copy number variants and structural variants events detection performances were as expected (F-measures 96.6% and 90.3%, respectively). Our WGS protocol presented excellent intra-assay reproducibility (coefficients of variation ranging between 0.03% and 0.20%) and inter-assay reproducibility (coefficients of variation ranging between 0.02% and 0.09%). Limitations of the WGS protocol include the inability to confidently detect variants such as uniparental disomy, balanced translocations, repeat expansion variants, and low-level mosaicism. In summary, the observed performance of the WGS protocol was in accordance with that seen in the best centers worldwide. The Rare Genomes Project is an important initiative to bring pivotal improvements to the quality of life of the affected individuals.

Entities: Chemical

Keywords: genetic diagnostic test; genomics; precision medicine; rare diseases; whole genome sequencing

Year: 2022 PMID： 35586190 PMCID： PMC9108541 DOI： 10.3389/fmolb.2022.821582

Source DB: PubMed Journal: Front Mol Biosci ISSN： 2296-889X

Introduction

Rare diseases represent a group of over 9,000 disorders affecting an estimated 114 to 470 million patients globally (1.5%–6.2% of the global population) (Ferreira, 2019). Rare diseases with genetic etiology are the leading cause of death in children, and the diagnosis is challenging (Lionel et al., 2018). Early genetic testing leads to clear benefits by reducing the time until diagnosis, leading to a better choice of therapeutic interventions, improving couples’ confidence in having children again, and reducing healthcare costs (Lionel et al., 2018). The human genome was first mapped through the Human Genome Project (HGP), an extensive international collaboration over 13 years (Lander et al., 2001; Venter et al., 2001). Essential advances in sequencing technology, such as the development of next-generation sequencing (NGS), have enabled the sequencing of a complete genome within hours, at a fraction of the initial cost, which resulted in the generation of a large amount of data and a widespread application for diagnosis and research (Wetterstrand, 2020). NGS encompasses several approaches: whole genome (WGS), whole exome (WES), and targeted (panel) sequencing. With WGS, it is possible to read approximately all three billion base pairs of the human genome (Nagarajan and Pop, 2013). The falling cost, increasing ease of application, and comprehensive nature of WGS make it the ideal tool for routine use in rare disease diagnosis. WGS can overcome many of the technical limitations of other NGS approaches, including uneven coverage and low sensitivity for the detection of copy, number structural, and expansion repeat variants (Belkadi et al., 2015). In addition, it enables the identification of noncoding and mitochondrial variants (Bick et al., 2019). In fact, many studies have shown that WGS has a high diagnostic yield and that early molecular diagnosis improves outcomes and reduces healthcare costs (Vissers et al., 2017; Howell et al., 2018). The WGS workflow can be divided into three major steps: wet laboratory sample processing, bioinformatics analyses for variant calling and annotation, and correlation of the clinical and molecular findings, resulting in a medical report. The implementation of WGS in clinical laboratories thus requires critical assay design, validation, and implementation of quality control measures according to specific guidelines recommendations to ensure adequate performance before use in diagnostic routine (Barra et al., 2018; Marshall et al., 2020). The Brazilian Rare Genomes Project envisions further the implementation of genomic medicine into the Brazilian national public healthcare system (SUS), complementing current policies and significantly improving the diagnostic capacity for rare disorders. Moreover, as Brazilian populations have high genetic diversity and are underrepresented in ancestry and human genetic variation databases such as 1,000 Genomes (The 1000 Genomes Project Consortium, 2015), a secondary objective of the Brazilian Rare Genomes Project is to assess the ancestry of the participants, improving precision medicine in the country. Here we report the results of the development and validation of a PCR-free WGS protocol for clinical use in the project, including wet-lab workflow and bioinformatics pipelines. In addition, we document the protocol performance in the first 1,200 samples sequenced.

Material and Methods

Sample Selection and Test Scope

Our validation dataset was composed of 76 samples (Supplementary Table S1). Among them, 22 were international reference samples purchased from Coriell Life Sciences (Philadelphia, PA, United States) for benchmarking and validation, including seven reference samples from Genome in a Bottle Consortium (GiaB) (12). The remaining 54 are samples previously characterized by other methodologies: conventional Sanger sequencing, single nucleotide polymorphism (SNP) array, array comparative genomic hybridization (aCGH), conventional karyotyping, or fluorescence in-situ hybridization (FISH). We intended to detect and report single nucleotide variants–SNVs, insertion/deletions–indels, copy number variants–CNVs (large deletions and duplications, chromosomal aneuploidy), and structural variants–SVs (inversions, translocations), as well as mitochondrial SNVs. Repeat expansions and mosaicism were not included in the scope of this first phase of validation. Samples were sequenced across three independent workflows (library preparation → sequencing → data analysis). We selected two benchmark samples to assess reproducibility: the reference sample NA24385 was replicated into one workflow for intra-assay reproducibility evaluation, whereas NA24694 was included in all three workflows for inter-assay reproducibility evaluation. Different operators independently performed the workflows. All methodological procedures were performed in the Hospital Israelita Albert Einstein CLIA/CAP-accredited laboratories (Aziz et al., 2015).

Research Ethics Statement

This study adhered to the Declaration of Helsinki principles for research in human beings and was approved by the Hospital Israelita Albert Einstein’s Research Ethics Committee (São Paulo, Brazil. Protocol number: CAAE 29567220.4.1001.0071). All individuals provided written consent for WGS testing and use in research, since the Brazilian Rare Genomes Project will make variant and summary-level data available for public use through periodic submissions to databases such as ClinVar and Matchmaker Exchange.

DNA Extraction, Quantification, and Fragmentation

DNA was extracted from whole blood samples using QIAsymphony DNA Mini Kit at QIAsymphony automated system (both Qiagen, Valencia, CA, United States). The extracted DNA was eluted into a final volume of 90 µl with an elution buffer. Genomic DNA purity was evaluated using NanoDrop 2000 (thresholds: 260/280 ratio ≈1.8 and 260/230 ratio between 1.8 and 2.2). DNA quantification was performed with Qubit® 4 fluorometer using the Qubit® dsDNA HS assay (both Life Technologies, Carlsbad, CA, United States). If sample input does not reach the desired quality, we reject it and request sample recollection. Genomic DNA was fragmented into 350 bp inserts by Covaris ME220 ultrasonicator (Covaris, Woburn, MA, United States), with the following treatment settings–DNA input: 1 µg (final volume of 55 μl, completed with resuspension buffer); peak incident power: 50 W, duty factor: 20%, cycles per burst: 200, duration: 65 s, and temperature set point: 20 C. The identity of all specimens remained unknown to wet lab staff throughout the workflow.

Whole Genome Sequencing Library Preparation

The paired-ends sequencing libraries were prepared using 50 µl of the fragmented DNA solution (1 µg DNA final) as input and Illumina TruSeq® DNA PCR-Free Library Prep protocol HS (Illumina Inc., San Diego, CA, United States) for Whole Genome Sequencing reagent kit, following the manufacturers’ instructions. Briefly, the protocol steps were: 1) Cleanup of fragmented DNA, 2) Repair ends and selection of library size, 3) Removal of large DNA fragments, 4) Removal of small DNA fragments, 5) 3′-ends adenylation 6) Adapter ligation, and 7) Cleanup of not-ligated fragments.

Library Quality Control

For quality control of adapter-ligated fragment sizes, libraries were diluted 1:5 with water, and 2 µl were evaluated in the automated electrophoresis analysis TapeStation System, with D1000 High Screen Tape (Agilent Technologies, Santa Clara, CA, United States). High-quality (ideal) libraries displayed only one peak around 900 bp (equivalent to ≈470 bp fragments due to the forked structures of adapter-ligated fragments) and peak molarity ≥300 p.m. Good-quality libraries had peak molarity between 200 and 300 p.m. Libraries with peak molarity ≤200 p.m. were rejected, and preparation was repeated.

Library Pooling and Quantification

We optimized the protocol for pooling a maximum of 28 sample libraries for sequencing on each NovaSeq® 6,000s4 flow cell. Briefly, each library was quantified with Qubit® 4 fluorometer then normalized to 7 nM in a final volume of 11 µl. Then, we pooled the 28 libraries into a final volume of 308 µl (28 × 11 = 308 µl). Next, starting with 5 µl of the pooled solution as input, we performed two dilutions in a resuspension buffer (1:10 and 1:100, reaching the final 1:1,000 concentration). Four µl of the diluted pooled solution were used for real-time quantitative polymerase chain reaction (qPCR) on ABI 7500 real-time platform (Thermo Fisher Scientific, Waltham, MA, United States) using KAPA Library Quantification Kit (Roche, Pleasanton, CA, United States). The qPCR was performed in triplicate. In each qPCR run, six KAPA DNA standards with defined concentrations were included to produce a standard quantification curve. With the mean cycle threshold (CT) of the diluted samples, we calculated the concentration of the pooled libraries solutions via linear regression while correcting for the size-difference of the KAPA standards in relation to the adapter-ligated fragments (452 bp versus 470 bp). Each pooled library was then normalized to 3 nM final concentration. The pooled libraries were then spiked with 1.9 µl of 2.5 nM PhiX Control v3 reagent (Illumina Inc., San Diego, CA, United States). The pooled libraries were then denatured with 77 µl of fresh 0.2 N NaOH solution, followed by homogenization by vortex (1800 RPM for 1 min), centrifugation at 280 g for 1 min, and incubation at room temperature for 8 min. Then, 78 µl of 400 mM Tris-HCl (pH 8.0) solution was added to the libraries pool to neutralize the NaOH. Once again, the pooled libraries solution was homogenized by vortex (1800 RPM for 1 min) and centrifuged at 280 × g for 1 min. The total volume (466.9 µl) of the PhiX-spiked denatured library pool solution was then transferred to NovaSeq® 6,000 Reagent Kit tube and proceeded to sequencing.

Sequencing

We performed sequencing with NovaSeq® 6,000 platform using S4 flow cells with 300 cycles (150 for forward reads and 150 for reverse reads). Usually, each sequencing round was composed of 28 pooled samples as described above, using both flow cells available (total 56 samples per run). Desirable sequencing quality metrics were cluster passing filter >70% and flow cell occupation >70%.

Bioinformatics Pipeline and Quality Control Metrics

The raw sequencing files (base call file, BCL format) were converted to FASTQ format and demultiplexed in a single step using Illumina’s bcl2fastq program (Illumina Inc, 2019). Illumina’s DRAGEN pipeline version 3.6.3 was used to perform all alignment and variant call (SNVs, indels, CNVs, SVs) steps. Quality control metrics are provided during each DRAGEN run. Desirable alignment quality metrics were percentage of bases that meet Q30 score >90%, 20X minimum coverage for both whole genome and autosomes, uniformity of coverage ≥80%, median insert size >300 bp, percentage of mapped reads >98%, percentage of chimeric (supplementary) reads <5%, DNA contamination ≤2%, and percentage of mapped reads marked as duplicate <10%. Some of these thresholds were adopted from recommendations published elsewhere (Marshall et al., 2020). The DRAGEN-generated Variant Call Format (VCF) files were validated to ensure they had the correct format, and sample- and variant-specific quality metrics were also calculated. Each sample was assessed to ensure that the percent autosomal callability was >95%, as suggested elsewhere (Marshall et al., 2020). High-quality variants were those which passed Variant Quality Score Recalibration (VQSR) filter, had read depth (RD) ≥ 10, and genotype quality (GQ) ≥ 20 in at least 80% of the individuals in the sample; their alternative alleles were present in at least one individual with RD ≥ 10 and GQ ≥ 20, and were not located into locations with high multiallelic variation (more than four alleles, includes the non-pseudoautosomal region of X and Y chromosomes). Functional annotation of the variants was performed with a proprietary tool, Varstation (https://varsomics.com/varstation/), developed by Hospital Israelita Albert Einstein (HIAE). The VCF files were uploaded into the service, whose workflow is based on ANNOVAR (Wang et al., 2010). The variants are then classified according to international good practices on genetic variants analyses and guidelines from the American College of Medical Genetics (ACMG) (Richards et al., 2015) and the Association for Molecular Pathology (AMP) (Li et al., 2017).

Data Analysis

The 76 samples were separated into two different analytical groups. The first group included seven sequencing libraries corresponding to GiaB benchmark samples (NA12878, NA24385, NA24149, NA24143, NA24631, NA24694, and NA24695). The second group included the remaining 69 samples, i.e., the remaining 15 GiaB samples and the 54 in-house characterized samples. The first group was analyzed by comparison of the VCF files generated by our Bionformatics pipeline with reference VCF files provided by GiaB (version NISTv3.3.2). Each sample had an accompanying BED file with high-confidence regions coordinates. We performed the comparison through vcfeval software (Real Time Genomics, Hamilton, New Zealand) (Cleary et al., 2015). Briefly, vcfeval quantifies the number of true positives (the variant call is present in both the reference file and our file), false positives (the variant call is absent in the reference file but present in our file), and false negatives (the variant call is present in the reference file but absent in our file). We then calculated the precision, sensitivity (recall), and F-measures with those numbers. Additionally, we stratified each file by SNVs and indels coordinates. In this step we calculated the mean of each metric mentioned above alongside 95% confidence intervals (95% CI). The second analytical group samples were evaluated manually to assess the performance of not only SNVs detection, but also for CNVs and SVs, by comparing the pipeline output with the in-house annotation or the GiaB annotation, depending on the sample origin. Supplementary Table S1 contains a list of the samples used, quality metrics, and a summary of expected and observed variant calls.

Results

Considering all workflows, the mean sequencing yield was 2.84 TB of data per S4 flow cell. Mean %Q30 score was 92.60% ± 1.36%, mean genomic coverage 38.96X ± 10.37X and mean uniformity was 96.31% ± 0.25%. Mean mitochondrial coverage was 7,650.97X ± 5,559.1X. Variant calls from our WGS procedure yielded very high concordance with the reference samples. For SNVs, the mean F-measure (n = 7 reference GiaB samples) was 99.82% (95% CI = 99.44%–100.0%), whereas for indels of any length was 99.57% (95% CI = 99.29%–99.85%) (Table 1, Supplementary Table S2, Supplementary Figure S1).

TABLE 1

Target	Metric	Mean	Standard deviation	95% confidence interval
Target	Metric	Mean	Standard deviation	Lower bound	Upper bound
	Precision	0.9986	0.0011	0.9965	1.0000
SNVs	Sensitivity	0.9979	0.0029	0.9922	1.0000
	F-measure	0.9982	0.0020	0.9944	1.0000
	Precision	0.9961	0.0008	0.9944	0.9977
Indels, overall	Sensitivity	0.9954	0.0021	0.9912	0.9995
	F-measure	0.9957	0.0014	0.9929	0.9985
	Precision	0.9965	0.0008	0.9949	0.9981
Indels, 1 to 5 bp	Sensitivity	0.9961	0.0019	0.9923	0.9998
	F-measure	0.9963	0.0013	0.9936	0.9989
	Precision	0.9939	0.0015	0.9910	0.9968
Indels, 6 to 15 bp	Sensitivity	0.9916	0.0028	0.9861	0.9971
	F-measure	0.9927	0.0021	0.9887	0.9968
	Precision	0.9832	0.0055	0.9725	0.9939
Indels, ≥ 16 bp	Sensitivity	0.9795	0.0082	0.9634	0.9955
	F-measure	0.9813	0.0036	0.9744	0.9883

Quality metrics. Seven Genome in a Bottle Consortium gold standard samples were whole-genome sequenced, and variant call was performed with our bioinformatics pipeline. The variant call files were then compared with the gold standard files using the vcfeval software. Precision, Sensitivity, and F-measure are displayed. Our procedure worked best for small indels with length between one and five base-pairs (bp) (mean F-measure = 99.63%, 95% CI = 99.36%–99.89%). Six to 15 bp indels yielded mean F-measure = 99.27%, 95% CI = 98.87%–99.68% and 16-bp or more indels yielded mean F-measure = 98.13%, 95% CI = 97.44%–98.83% (Table 1). Our optimized WGS protocol presented excellent intra- and inter-assay reproducibility. Regarding SNVs, the intra-assay coefficient of variation (CV) of the F-measures was 0.04%, whereas the inter-assay was 0.03%. Regarding indels, the intra-assay F-measures CV was 0.16% whereas the inter-assay CV was 0.07% (Table 2).

TABLE 2

Reproducibility	Samples	SNVs			Indels
Reproducibility	Samples	Precision	Sensitivity	F-measure	Precision	Sensitivity	F-measure
	NA24385	0.9970	0.9918	0.9944	0.9969	0.9948	0.9959
	NA24385-2	0.9963	0.9914	0.9939	0.9952	0.9920	0.9936
Intra-assay	Mean	0.9967	0.9916	0.9941	0.9961	0.9934	0.9947
	SD	0.0005	0.0003	0.0004	0.0012	0.0020	0.0016
	CV (%)	0.0501	0.0322	0.0411	0.1221	0.1991	0.1607
	NA24694	0.9992	0.9993	0.9993	0.9968	0.9980	0.9974
	NA24694-2	0.9994	0.9993	0.9993	0.9974	0.9986	0.9980
Inter-assay	NA24694-3	0.9988	0.9989	0.9988	0.9963	0.9968	0.9966
	Mean	0.9991	0.9991	0.9991	0.9968	0.9978	0.9973
	SD	0.0003	0.0002	0.0003	0.0005	0.0009	0.0007
	CV (%)	0.0312	0.0235	0.0271	0.0551	0.0909	0.0726

SD, standard deviation; CV, coefficient of variation.

Reproducibility. The benchmark sample NA24385 was selected for intra-assay reproducibility evaluation, whereas NA24694 was included in all three independent workflows for inter-assay reproducibility evaluation. Coefficients of variation (CV) of quality metrics are reported. SD, standard deviation; CV, coefficient of variation. The pathogenic/likely pathogenic variant profile of the 54 in-house characterized samples included: 12 SNVs (eight missense, two nonsense, one splicing acceptor, and another splicing donor), 65 large deletions (lengths ranging between 538 bp–53, 247, 491 bp), including 29 loss of heterozygosity (LOH) regions identified by SNP array (lengths ranging between 812,863 bp and 72, 740, 279 bp); 22 large duplications (ranging between 6,147 bp and 95, 325, 642 bp), three events of trisomy (chromosomes 13, 15 or 21), one insertion, four inversions, ten translocations, two Robertsonian translocations and a single occurrence of uniparental disomy, totaling 120 events. All SNVs were correctly detected by our variant call procedure (F-measure = 100.0%). The detection of the single event of uniparental disomy failed (Table 3). The CNV and SV events detection performances were overall good (F-measures 96.6% and 90.3%, respectively).

TABLE 3

Variant	True positives (TP)	False negatives (FN)	Precision	Sensitivity	F-measure
SNVs
Missense	8	0	1.0000	1.0000	1.0000
Nonsense	2	0	1.0000	1.0000	1.0000
Splicing acceptor	1	0	1.0000	1.0000	1.0000
Splicing donor	1	0	1.0000	1.0000	1.0000
Overall	12	0	1.0000	1.0000	1.0000
CNVs
Deletions	60	5	1.0000	0.9231	0.9600
Duplications	22	0	1.0000	1.0000	1.0000
Trisomy 13	1	0	1.0000	1.0000	1.0000
Trisomy 15	0	1	Not calculated	0.0000	0.0000
Trisomy 21	1	0	1.0000	1.0000	1.0000
Overall	84	6	1.0000	0.9333	0.9655
SVs
Insertions	1	0	1.0000	1.0000	1.0000
Inversions	4	0	1.0000	1.0000	1.0000
Robertsonian translocations	0	2	Not calculated	0.0000	0.0000
Translocations	9	1	1.0000	0.9000	0.9474
Overall	14	3	1.0000	0.8235	0.9032
Uniparental disomy (UPD)	0	1	Not calculated	0.0000	0.0000
SNVs + CNVs + SVs + UPD	110	10	1.0000	0.9167	0.9565

Quality metrics of the variant call procedure were performed on 69 samples, including 54 in-house characterized samples by other methodologies. The seven GiaB gold-standard samples are not considered here; see Table 1. Also, see Supplementary Table 1 for a breakdown of expected and observed variant calls (analysis group 2 rows). Currently, we have sequenced over 2,000 among 3,000 enrolled patients with rare diseases or hereditary cancer syndromes with our optimized WGS protocol. Sequencing and alignment metrics are available for about 1,200 samples and have been consistently high-quality, compatible with clinical diagnostic workflow (Supplementary Table S3). For example, cross-individual contamination is virtually non-existent (men 0.008% ± 0.11), the mean uniformity of coverage is 96.4% ± 0.26%, the median genome coverage is 36.5X, the mean percentage of bases with quality score Q30 or more is 91.3% ± 3.6% and mean genome callability is 96.3% ± 1.2%. Of those, over 300 patients have received a diagnostic report, with approximately 37% presenting a definitive molecular diagnosis, with the detection of a pathogenic/likely pathogenic variant compatible with the patient’s phenotype.

Discussion

The diagnosis of patients with rare disorders is currently a lengthy process, taking four or more years on average. Early adoption of WGS could be beneficial, shortening the diagnostic odyssey (Wu et al., 2020; Rehm, 2022). A recent meta-analysis of 37 studies involving children with genetic diseases showed that WGS testing had higher clinical utility (ACMG Board of Directors, 2015) than chromosomal microarray. An accompanying meta-regression showed that the odds of diagnosis through WGS increased by 16% each year, possibly due to methodological improvements (the meta-analysis included WGS studies published between 2015 and 2017) (Clark et al., 2018). Other studies reported (Costain et al., 2020)the clinical utility of rapid WGS for children undergoing intensive care (Sanford et al., 2019). A recent application of WGS to rare diseases diagnosis in a national context (the United Kingdom 100 K Genomes Project) revealed a remarkable benefit to routine healthcare (Turro et al., 2020). A meta-analysis of psychological outcomes suggested no harm following WGS result disclosure and even an overall trend for a decrease in anxiety (Robinson et al., 2019). Thus, it is becoming increasingly clearer that genomic medicine can revolutionize the healthcare of an individual with a rare disease or cancer by offering prompt and accurate diagnosis, risk stratification based upon genotype, and the ability for personalized treatments. Brazil is the only country with a population larger than 100 million people, which has a public, universal, and free of charge health care system (Castro et al., 2019). Thus, provisioning a cost-effective genomic testing strategy within a national healthcare service to deliver equity of access is challenging (Berg et al., 2017), with a system of this magnitude. To further our progress in the area, we are performing a pilot project for the use of WGS for the diagnosis of rare diseases (The Rare Genomes Project - www.genomasraros.com) in Brazil, which will sequence over 9,000 individuals until the end of 2023. To this end, we developed and validated a comprehensive WGS workflow with an optimized laboratory turnaround time coupled with a cutting-edge bioinformatics pipeline for variant calling, functional annotation, and classification. Our procedure was performed following important benchmarking guidelines (Krusche et al., 2018; Koboldt, 2020) and yielded excellent performance. One critical step for robust validation is careful sample selection. Using a set composed of reference benchmark samples, which have millions of completely validated variants of different types, and in-house characterized or purchased samples for more complex variants such as structural, mitochondrial, and LOH events is of paramount importance. In addition, the validation of detection of hard-to-detect variant types, such as repeat expansions, variants in genes with pseudogenes or homologous genes, and low-level mosaicism, requires even further steps, including additional samples, possibly on a gene-by-gene basis (Marshall et al., 2020). Assessing and interpreting variants is challenging, and we acknowledge some limitations of our protocol. For example, we did not evaluate repeat expansion variants, tandem duplications, mitochondrial genome heteroplasmy, mosaicism, and processed pseudogene insertions. Moreover, only CNVs over than 500 bp were detected using our pipeline. Therefore, the detection sensitivity of CNVs with less than 538 bp may differ from the one reported here. We plan soon to validate the detection of some of these variant types to improve the test robustness, sensitivity, specificity, and detection limits. Moreover, we are currently developing ancestry analysis pipelines to describe and quantify ancestry in the Brazilian Rare Genomes Project participants. Brazilian populations. Population substructure and genetic ancestry are fundamental issues to consider when assessing rare diseases. Brazilian populations are admixed, with each individual having a substantial genetic contribution from European, African, and Amerindian ancestral populations. In general, European genomic contribution is most represented, followed by the African and then the Amerindian contribution (Pena et al., 2009).

Conclusion

Large-scale WGS projects are important initiatives to expand the population’s access to these robust genomic technologies. The validation of our WGS workflow is the first step for this achievement. It has the potential to reduce the time until diagnosis of patients with rare diseases, improving the affected individuals and their family’s quality of life. Also, considering the high diversity of our population, The Rare Genomes Project is fundamental for creating a disease-related variants database, contributing with the future of precision medicine in this country.

28 in total

1. Ending the Diagnostic Odyssey-Is Whole-Genome Sequencing the Answer?

Authors: Ann Chen Wu; Pamela McMahon; Christine Lu
Journal: JAMA Pediatr Date: 2020-09-01 Impact factor: 16.193

Review 2. The burden of rare diseases.

Authors: Carlos R Ferreira
Journal: Am J Med Genet A Date: 2019-03-18 Impact factor: 2.802

Review 3. Sequence assembly demystified.

Authors: Niranjan Nagarajan; Mihai Pop
Journal: Nat Rev Genet Date: 2013-01-29 Impact factor: 53.242

4. Best practices for benchmarking germline small-variant calls in human genomes.

Authors: Peter Krusche; Len Trigg; Paul C Boutros; Christopher E Mason; Francisco M De La Vega; Benjamin L Moore; Mar Gonzalez-Porta; Michael A Eberle; Zivana Tezak; Samir Lababidi; Rebecca Truty; George Asimenos; Birgit Funke; Mark Fleharty; Brad A Chapman; Marc Salit; Justin M Zook
Journal: Nat Biotechnol Date: 2019-03-11 Impact factor: 54.908

5. Rapid Whole Genome Sequencing Has Clinical Utility in Children in the PICU.

Authors: Erica F Sanford; Michelle M Clark; Lauge Farnaes; Matthew R Williams; James C Perry; Elizabeth G Ingulli; Nathaly M Sweeney; Ami Doshi; Jeffrey J Gold; Benjamin Briggs; Matthew N Bainbridge; Michele Feddock; Kelly Watkins; Shimul Chowdhury; Shareef A Nahas; David P Dimmock; Stephen F Kingsmore; Nicole G Coufal
Journal: Pediatr Crit Care Med Date: 2019-11 Impact factor: 3.624

6. A population-based cost-effectiveness study of early genetic testing in severe epilepsies of infancy.

Authors: Katherine B Howell; Stefanie Eggers; Kim Dalziel; Jessica Riseley; Simone Mandelstam; Candace T Myers; Jacinta M McMahon; Amy Schneider; Gemma L Carvill; Heather C Mefford; Ingrid E Scheffer; A Simon Harvey
Journal: Epilepsia Date: 2018-05-11 Impact factor: 5.864

7. Clinical utility of genetic and genomic services: a position statement of the American College of Medical Genetics and Genomics.

Authors:
Journal: Genet Med Date: 2015-03-12 Impact factor: 8.822

8. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology.

Authors: Sue Richards; Nazneen Aziz; Sherri Bale; David Bick; Soma Das; Julie Gastier-Foster; Wayne W Grody; Madhuri Hegde; Elaine Lyon; Elaine Spector; Karl Voelkerding; Heidi L Rehm
Journal: Genet Med Date: 2015-03-05 Impact factor: 8.822

Review 9. Case for genome sequencing in infants and children with rare, undiagnosed or genetic diseases.

Authors: David Bick; Marilyn Jones; Stacie L Taylor; Ryan J Taft; John Belmont
Journal: J Med Genet Date: 2019-04-25 Impact factor: 6.318

Review 10. Best practices for variant calling in clinical sequencing.

Authors: Daniel C Koboldt
Journal: Genome Med Date: 2020-10-26 Impact factor: 11.117

1 in total

1. Genomic study of nonsyndromic hearing loss in unaffected individuals: Frequency of pathogenic and likely pathogenic variants in a Brazilian cohort of 2,097 genomes.

Authors: Caio Robledo D' Angioli Costa Quaio; Antonio Victor Campos Coelho; Livia Maria Silva Moura; Rafael Lucas Muniz Guedes; Kelin Chen; Jose Ricardo Magliocco Ceroni; Renata Moldenhauer Minillo; Marcel Pinheiro Caraciolo; Rodrigo de Souza Reis; Bruna Mascaro Cordeiro de Azevedo; Maria Soares Nobrega; Anne Caroline Barbosa Teixeira; Matheus Martinelli Lima; Thamara Rayssa da Mota; Marina Cadena da Matta; Gabriela Borges Cherulli Colichio; Aline Lulho Roncalho; Ana Flavia Martinho Ferreira; Gabriela Pereira Campilongo; Eduardo Perrone; Luiza do Amaral Virmond; Carolina Araujo Moreno; Joana Rosa Marques Prota; Marina de França; Murilo Castro Cervato; Tatiana Ferreira de Almeida; Joao Bosco de Oliveira Filho
Journal: Front Genet Date: 2022-08-30 Impact factor: 4.772

1 in total