Literature DB >> 23831772

A practical method to detect SNVs and indels from whole genome and exome sequencing data.

Daichi Shigemizu¹, Akihiro Fujimoto, Shintaro Akiyama, Tetsuo Abe, Kaoru Nakano, Keith A Boroevich, Yujiro Yamamoto, Mayuko Furuta, Michiaki Kubo, Hidewaki Nakagawa, Tatsuhiko Tsunoda.

Abstract

The recent development of massively parallel sequencing technology has allowed the creation of comprehensive catalogs of genetic variation. However, due to the relatively high sequencing error rate for short read sequence data, sophisticated analysis methods are required to obtain high-quality variant calls. Here, we developed a probabilistic multinomial method for the detection of single nucleotide variants (SNVs) as well as short insertions and deletions (indels) in whole genome sequencing (WGS) and whole exome sequencing (WES) data for single sample calling. Evaluation with DNA genotyping arrays revealed a concordance rate of 99.98% for WGS calls and 99.99% for WES calls. Sanger sequencing of the discordant calls determined the false positive and false negative rates for the WGS (0.0068% and 0.17%) and WES (0.0036% and 0.0084%) datasets. Furthermore, short indels were identified with high accuracy (WGS: 94.7%, WES: 97.3%). We believe our method can contribute to the greater understanding of human diseases.

Entities: Chemical Disease Gene Species

Mesh：

Year: 2013 PMID： 23831772 PMCID： PMC3703611 DOI： 10.1038/srep02161

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Since genetic variation plays a key role in human disease and, in particular, rare Mendelian disorders, one of the most important goals of genetic studies is to identify genetic variants in individuals12. Next-generation sequencing (NGS) technology345 has made whole genome sequencing (WGS) possible at an individual level. WGS has revealed numerous single nucleotide variations (SNVs), de novo mutations6 and somatic mutations in cancer genomes78910 that had not been previously reported. Whole exome sequencing (WES), which captures and sequences only the coding exons of the genome, is also used to identify genetic variations in the coding regions11. WES is more cost-effective for exonic regions than WGS, and can obtain a deeper depth of coverage in the target region. WES has been used to successfully identify causal mutations of Mendelian diseases212 and driver mutations in tumors13141516. The data produced by WGS and WES are composed of numerous short read sequences ranging in length from 50–150 bp17. So far, many methods have been developed for short read alignment and variant calling. However, accurate detection of SNVs and indels is still difficult and a critical issue. For example, even for a false positive rate of only 0.1%, three million false positive SNVs (3 × 109 (bp) × 0.001 = 3,000,000) would be identified in the entire human genome. False negative rate is another critical issue, especially for genetic diagnosis and Mendelian disease studies. For the future of personalized medicine and genetic diagnosis, highly accurate variant calling is still one of the most important problems. Here, we provide a tool package for our previously reported method that detects SNVs and short insertions and deletions (indels)18, and show evaluation and comparison with other available methods for both WGS and WES of new test samples generated on the Illumina DNA-sequencing platform. A high concordance rate with the SNP genotyping arrays was observed for both WGS calls (homozygous concordance rate: 99.99%, heterozygous concordance rate: 99.83%) and WES calls (heterozygous and homozygous concordance rate: 99.99%). Compared to other available methods, our method suppressed both false positive and false negative rates (WGS: 0.0068% and 0.17%, WES: 0.0037% and 0.0084%). Furthermore, we identified short indels with high accuracy (WGS: 94.7%, WES: 97.3%). We believe our method is a useful tool for understanding human diseases through WGS and WES analysis. This program, “Variant Caller with Multinomial probabilistic Model (VCMM)”, is publicly available at http://emu.src.riken.jp/VCMM/.

Results

Sequencing and mapping

WGS and WES data were generated on the Illumina HiSeq2000 platform with library sizes of 500 bp and 150–200 bp. 101 Gbp and 7 Gbp of short read sequences were obtained for WGS and WES, respectively. Mapping was performed using the short read mapping algorithm BWA19; 96.1% of WGS reads and 98.5% of WES reads were mapped to the human reference genome. PCR duplications were defined using SAMtools20. The PCR duplication rates were 3.5% and 5.0% in the WGS and WES data, respectively. After PCR duplications removal, the remaining 93.4 Gbp of WGS and 6.5 Gbp of WES contributed sufficient coverage to perform variant calling. A total of 63.7% of the WES reads were mapped to on-target regions.

Distribution of depth of coverage

In the WGS data, the average depth was 28.4 and 4.3% of genomic positions were identified to have low read depth (read depth < 5). The average read depth of on-target regions was 70.7 in the WES data, with 4.1% of genomic positions showing low read depth (read depth < 10). We also compared the distribution of the depth of coverage between WGS and WES. The shape of the coverage distributions was quite different (Figure 1a). The distribution for WGS was bell shaped, though approximately 3% of the genome had zero coverage, likely due to repetitive sequence. In contrast, the distribution for WES was wider; the peak of the distribution being lower than the average read depth, and the proportion of positions covered by at least one uniquely mapped read was much higher than that for WGS. This comparison suggests that the skew of the WES distribution is a result of the capturing process (Figure 1b). Improvement of the exon capture method would reduce the proportion of low coverage regions.

Figure 1

Read depth per nucleotide and GC content.

(a) Distribution of read depth in WGS and WES on-target regions. (b) Distribution of GC content of WES on-target regions.

Evaluation of SNV and indel calls

SNV calls

We performed SNV and indel calling using our method, VCMM. The algorithm was originally developed with data from a Japanese male individual (HapMap NA18943)18. In this study, we evaluated the accuracy of SNV calling using two independent samples. We compared our results to array-based genotype calls, using Illumina Human OmniExpress BeadChip for WGS and Illumina Human Exome BeadChip for WES. The number of SNPs available for verification was 644,167 for WGS and 193,280 for WES. Since standard NGS analysis aligns reads to a reference sequence and identifies non-reference alleles as variants, we can assume that all non-reported positions were homozygous for the reference allele. We classified variants based on the genotype from the genotyping array (homozygous or heterozygous) and reference allele (homozygous of reference allele and not homozygous of reference allele). We then classified the discordant SNPs as either false positive or false negative candidates (Table 1).

Table 1

Estimation of the accuracy of VCMM using SNP genotyping platforms

			WGS		WES
	Genotyping array†	WGS or WES†	Before Sanger sequencing validation	After Sanger sequencing validation*	Before Sanger sequencing validation	After Sanger sequencing validation*
Not analyzed	-	-	1,126	-	3,083	-
Concordance	No-Ref-Ho	-	137,786	-	2,893	-
	Ref-Ho	-	326,125	-	183,411	-
	Ht	-	177,949	-	3,843	-
	Total	-	641,860	-	190,147	-
False positive	Ref-Ho	No-Ref-Ho	5	2 (2)	0	0
	Ref-Ho	Ht	37	16 (15)	11	6 (3)
	No-Ref-Ho	Ht	31	17 (16)	1	1 (0)
	Ht	Ht (Different genotype)	25	9 (9)	2	0 (0)
	Total		98	44 (42)	14	7 (3)
False negative	Ht	Ho	850	850	23	13 (8)
	No-Ref-Ho	Ref-Ho	233	233	13	3 (1)
	Total		1,083	1,083	36	16 (9)

†: No-Ref-Ho; Non reference homozygous genotype, Ref-Ho; Reference homozygous genotype, Ht; Heterozygous genotype.

*: The numbers in parenthesis represented the number of SNPs that could not be amplified by PCR.

In the WGS analysis, 0.17% (n = 1,126 in OmniExpress BeadChip) of the SNPs were not analyzed due to insufficient depth of coverage (read depth < 5) (Table 1). The total number of false positive and false negative candidates was 98 and 1,083, respectively (Table 1). We performed Sanger sequencing verification for the false positive candidates. Of the 98 false positive candidates, 42 SNPs could not be amplified by PCR. Attempts to amplify these with a lower annealing temperature also failed. Of the amplified 56 false positive candidates, 54 were consistent with our variant calls. The two remaining false positives were located in tandem repeat regions, suggesting that mapping error, rather than sequencing error, was causative. The proportion of false positive and false negative SNPs was 0.0068% (44/643,041) and 0.17% (1,083/643,041), respectively (Table 1). Note that the false positive rate is a conservative estimation, because PCR unamplified SNPs were also counted as false positives. In the WES analysis, 1.6% (n = 3,083 in Exome BeadChip) of the SNPs were not analyzed due to insufficient depth of coverage (read depth < 10) (Table 1). The total number of false positive and false negative candidates was 14 and 36, respectively. Sanger sequencing verification revealed that 7 out of the 14 false positive candidates and 20 out of the 36 false negative candidates were consistent with our variant calls, though 3 out of the 14 false positive and 9 out of the 36 false negative candidates could not be amplified by PCR. The false positive and false negative rates were estimated to be 0.0036% (7/190,197) and 0.0084% (16/190,197), respectively (Table 1). Note that the reported false positive and the false negative rates are conservative estimations, PCR unamplified SNPs were counted as false positives and false negatives. We also conducted the same examination on additional samples, and equivalent results were obtained (data not shown). In both the WGS and WES analysis results, the number of SNVs using VCMM was similar to that of previous studies21 (Table 2). The transition/transversion (Ts/Tv) rates were 2.08 in WGS and 2.39 in WES.

Table 2

Number of identified SNVs and indels

Number	WGS	WES
Total SNVs	3,406,875	79,060
Total indels *	763,944 (106,732)	10,999
Total SNVs in splice sites	105	56
Total SNVs in coding region	20,314	19,861
Missense	9,502	9,360
Nonsense	109	83
Synonymous	10,703	10,418
Total indels in coding region	461	509

*: In the WGS, the numbers of indels in all region and non-repeat regions are shown.

Indel calls

It is difficult to detect indels from short read sequence data, but which are likely to be functionally important, particularly when they cause frameshifts. VCMM identified 763,944 (106,732 in non-repeat region) and 10,999 indels in the WES and WES data. The proportions of frameshift indels were 45.8% (221/461) in WGS and 51.1% (260/509) in WES. There was no significant difference between these proportions (P-value = 0.33). Since we could not evaluate all the indels identified, we estimated the concordance rate using PCR and Sanger sequencing verification for a randomly selecting subset of indels. In the WGS analysis, we randomly selected 96 indels from the 461 coding indels detected for validation. Of the 75 successful assays, 71 indels were verified as correct, and 4 were false positives (5.3%). In the WES analysis, we randomly selected 47 indels from 509 coding indels detected for validation. Of the 37 successful assays, 36 indels were verified, and 1 was a false positive (2.7%). In both the WGS and WES analysis results, the number of indels using VCMM was similar to that of pervious studies21 (Table 2). Furthermore, the ratios of indels to SNVs were 0.22 in WGS and 0.14 in WES, similar to that of reported by 1000 Genomes22 (0.19). Note that we used all indels identified for the calculation in the WGS.

Comparison with other call methods for SNV

We compared the VCMM's SNV calling to that of two popular alternative methods, GATK23 and SAMtools20 (Figure 2a and c). In the WGS and WES, we observed a large number of common SNVs identified by all three methods, and a similar number of uniquely identified SNVs by VCMM and GATK (Figure 2a and c). The proportion of SNVs identified by the both VCMM and SAMtools was larger in WGS than WES (Figure 2a and c). We further compared the genotype array concordance rate of our method to that of the two alternative programs. The resulting concordance rates of our analysis were higher than those of GATK and SAMtools for both WGS and WES (Table 3). Our method also achieved acceptable levels for both false positive and false negative rates (false positive rates: 0.015% in WGS and 0.007% in WES, false negative rates: 0.17% in WGS and 0.02% in WES). These results demonstrate our method is an efficient framework for detecting SNVs (Table 3). These comparisons were also performed using three published datasets in HapMap database for WES and another dataset for WGS, and similar results were obtained (see Supplementary Table S1 online).

Figure 2

Common indels identified by VCMM, GATK and SAMtools.

(a) SNV in WGS. SNVs in repeat regions and unknown contigs were not used for the comparison. (b) Indel in WGS. Indels in repeat regions and unknown contigs were not used for the comparison. (c) SNV in WES. (d) Coding indel in WES.

Table 3

Comparison of VCMM with other methods using SNP genotyping platforms

			Number			Proportion (%)
	Chip		VCMM	GATK	SAMtools	VCMM	GATK	SAMtools
WGS	OmniExpress BeadChip	Concordant	641,860	641,538	639,112	99.816	99.766	99.389
		FN	1,083	1,366	3,832	0.168	0.212	0.595
		FP	98	137	97	0.015	0.021	0.015
WES	Exome BeadChip	Concordant	190,147	190,137	189,825	99.974	99.968	99.804
		FN	36	46	361	0.019	0.024	0.190
		FP	14	14	11	0.007	0.007	0.006

Comparison with other call methods for indel

We also compared the indel calling results of each of the three methods (Figure 2b and d). The number of indels identified by GATK was smaller than those by VCMM and SAMtools in both WGS and WES. SAMtools identified larger number of indels than other methods. In the WES, most indels detected by GATK were included these by VCMM (Figure 2d). These results suggest that GATK is more conservative than the other methods. To examine the accuracy of indel calling, we performed Sanger sequencing verification. In the WGS, we sequenced 22 indels in total, 14 were identified by all three methods, four by SAMtools only, two by VCMM and SAMtools, one by GATK only, and one by GATK and SAMtools. For the examined candidates, one indel identified solely by SAMtools, and one by GATK and SAMtools were false positives (see Supplementary Table S2 online). The genotype of the indel identified solely by SAMtools was discordant to that by Sanger sequencing verification. For WES, we verified 24 indels in total, 12 identified by VCMM and SAMtools, 11 by only SAMtools, and one by only GATK. For the examined candidates, seven out of eleven indels identified solely by SAMtools were false positives (see Supplementary Table S2 online). The genotypes of one indel identified by GATK, four by SAMtools and one by VCMM and SAMtools were was discordant to those by Sanger sequencing verification. To compare sensitivity of indel calling, we counted the common indels between those indentified by each of the three methods and the Mills Indels22: VCMM: 202/509 (40%), GATK: 185/427 (43%), SAMtools: 228/1,764 (13%) in WES; VCMM: 55,810/106,732 (52%), GATK: 54,713/101,267 (54%), SAMtools: 59,356/275,112 (22%) in WGS. The sensitivity of GATK and VCMM was similar. For SAMtools, the number of the common indels was higher than the other two methods, but the proportion of the common indels was much lower, indicating that the false negative rate was lower while the false positive rate was higher. Appropriate filtering is required for accurate indel calling with SAMtools.

Computational performance

VCMM takes a pileup file, as generated by SAMtools, as input. Using one CPU core (2.67 GHz Intel Xeon Processor) on a computational cluster, the variant and indel calling for the largest contig (NT_032977.9: 2 GB pileup formatted file) took approximately one and half hours for GATK and 3 minutes for pileup file generation by SAMtools and variant calling by VCMM for WES. VCMM is written in the C programming language. The supporting programs, BWA19 and SAMtools20, are also required.

Discussion

In the human genome, SNVs and indels are the most abundant type of genetic variation. Accurate identification of SNVs and indels is one of the most important problems in genome analysis. Several variant calling methods and programs have been developed and used for both WGS and WES analysis2023. However, reducing the false positive and false negative rate is still one of the most important challenges in sequencing analysis. In this study, we introduce a SNV and indel calling method. Our method is based on a multinomial probabilistic model as previously described18. Since base quality score (the probability that the called base in the read is the true sequenced base) reflects sequencing error rate as shown in a previous study242526, a multinomial probabilistic model with quality score can be applied to identifying SNVs with high accuracy. Additionally, we applied a strand bias filter (see the Materials and Methods)27. The two false positive SNV calls in the WGS analysis were suspected to be caused by mapping error, indicating that improvement in short read alignment methods should decrease false positives. Most false negative calls were identified in regions of low read coverage, and thus could be corrected by increasing the total coverage depth. This is also evident in the lower false negative rate observed in WES analysis as compared to WGS analysis. We compared the performance of VCMM to that of existing methods, GATK and SAMtools, for SNV and indel calling. For SNV calling, the results from SAMtools had a larger number of false negatives than VCMM and GATK, suggesting that the variant calling of SAMtools is conservative. Although the false positive and false negative rates of VCMM were lower than these of GATK, the difference was not significant and we consider VCMM and GATK to be comparable in SNV calling. For indel calling, the proportion of commonly identified indels by the three methods was smaller than SNVs, suggesting greater difficulty in indel calling over SNV calling. Sanger sequencing verification revealed that coding indels that were detected by all three methods, or only VCMM and SAMtools, were higher quality than other indels. Indel verification also suggested that indel calls by GATK is more conservative than that of VCMM and SAMtools. Furthermore, although the false positive rate for indels detected only by SAMtools was higher than that of other methods, the false negative rate was lower, suggesting that further filtering is necessary for indel calling. Although our method showed high concordance with DNA genotyping arrays, most of the SNPs present on the DNA genotyping array are located in uniquely mappable regions. It is unknown whether the observed false positive and false negative rates are applicable to all genomic regions. For variant calling, filtering with Hardy-Weinberg equilibrium, discarding variants in tandem repeat regions and a local realignment around multiple indels should be necessary for accurate variant detection1828. Our method incorporates a probabilistic error model, base quality filtering, and strand bias. Although our method can improve genotyping accuracy, it is still difficult to identify several types of variants, such as long insertions and deletions, as well as variations in highly repetitive regions. Continued advancement in sequencing technology, such as longer sequence reads and improvements in sequencing accuracy and mapping algorithms can be expected, by which the false positive and false negative rates of our method would further be improved. Additionally, while the target of the current version was deep sequenced single samples, we believe that our likelihood function can be expanded to multi-sample calling by considering population frequency as the prior probability.

Methods

DNA sample

Samples, RK001 (WES) and RK130 (WGS), were obtained with consent and institutional ethics approval from RIKEN. High molecular weight genomic DNA was extracted from a human blood sample. All groups participating in this study approved this work. WES data of three samples were downloaded from NCBI FTP site (ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/) and used for the comparison.

Whole exome and whole genome sequencing

Exome capture was performed by the Agilient SureSelect Human All Exon V4 according to the manufacturer's instructions. This kit captures genomic DNA by in-solution hybridization with RNA oligonucleotides, enabling specific targeting regions of approximately 51 Mb of the human genome, contained in 185,636 capture regions, as large target regions were composed of several capture regions. The captured DNA was sequenced using the Illumina HiSeq2000 platform with paired-end reads of 101 bp for insert libraries of 150–200 bp according to the manufacture's instructions. For the whole genome sequencing, we prepared 500 bp insert library. The sample was sequenced using the Illumina HiSeq2000 platform with paired-end reads of 101 bp according to the manufacture's instructions.

Read mapping

Read sequences were mapped by the Burrows-Wheeler Aligner (BWA: version 0.6.1)19 to the human reference genome (GRCh37.p5). Paired reads were mapped by considering not only the mapping distance between the paired reads within average ±3 s.d, but also the mapping uniqueness and the orientation. The removal of the possible PCR duplicate reads and the conversion of the mapping results into pileup-formatted files were conducted using SAMtools (v.0.1.8)20. Note that the pileup-formatted files were generated by ‘pileup’ command of SAMtools (v.0.1.8), because several parameters required for SNV and indel calls with VCMM were removed by the ‘mpileup’ command in SAMtools (v0.1.17 (r973) and later).

SNV and indel calls

We performed SNV calling with respect to each nucleotide site using reads with a BWA mapping quality score ≥ 20 and a depth of 5 or more in WGS and 10 or more in WES. Both of the major and minor alleles were required to have at least one read with a base quality score ≥ 30, as calculated by the Illumina pipeline software. The SNV calls were distinguished by the ratio of the probabilities that the minor allele at a nucleotide site is an error P and a major allele P as described previously18. The two probabilities were calculated as follows. where is the probability that minor allele is an error, and M and m are the number of major and minor alleles with a base quality score i, and n is the total read depth (n = M + m). If , where C is the cut-off value, we detected a SNV at that nucleotide site. C was set into 5,000. We also considered a strand bias. If a minor allele described by over 30 reads in only one orientation, that minor allele was treated as an error. For WGS analysis, we discarded SNVs located within 5 bp of indels and included in short repeat regions (segmental duplications, simple repeat regions detected by tandem repeat finder and microsatellites), and regions where more than three SNVs occurred within 10 bp, as described18. This probabilistic model was applied to only SNV calling and not applied to indel calling. Short indels (<50 bp) were identified on the basis of the gaps within read sequences from BWA. We performed indel calls at nucleotide sites, where more than 0.15 of total read sequences were mapped as indels after removing reads with mapping quality less than 20. The identified indels by the frequency was filtered by the following criteria; (1) average base quality of the base preceding the indel < 10, and (2) SAMtools consensus quality < 60 or SAMtools SNP quality < 60, if three or less reads support the indel. These parameter sets were determined by Sanger sequencing of indels, which was independent from verification in this study. For WGS, we discarded all indels and SNVs in simple repeat regions as defined by tandem repeat finder29.

Accuracy evaluation

In order to evaluate accuracy of our call method, we compared our SNV calls results with the concordant genotypes from SNP typing platforms: Illumina Human OmniExpress BeadChip and Illumina Human Exome BeadChip. We estimated concordance of genotype calls separately for homozygous and heterozygous SNPs. In total, 644,167 autosomal SNPs on the Illumina Human OmniExpress BeadChip for WGS and 193,280 autosomal SNPs on the Illumina Human Exome BeadChip on target regions were used for the estimation of the concordance rate in WGS and WES. The accuracy of our indel calls was estimated by the randomly selection of 96 indels and 47 indels in WGS and WES, respectively, and subsequent validation using Sanger sequencing.

SNV calls using other methods

We performed SNV calls using the GATK (v.1.6.13)23 and SAMtools (v.0.1.8)20 with the following parameters; minimum base quality > 30 and minimum mapping quality > 60 for GATK, and consensus quality ≥ 20 and root mean square (RMS) ≥ 2530 for SAMtools. The commands for SAMtools and GATK were “samtools pileup -s -cf reference.fa bam_file” and “java -jar GenomeAnalysisTK.jar-l INFO -T UnifiedGenotyper -R reference.fa -mbq 30 --read_filter MappingQuality --min_mapping_quality_score 60 -I bam_file”, respectively. For indel calling in SAMtools, we examined candidate indels with the depth ≥ 5 for WGS and depth ≥ 10 for WES, and indels with a “*/*” genotype were excluded. Indels from GATK were identified with the following command; “java -jar GenomeAnalysisTK.jar -T UnifiedGenotyper -R reference.fa -I bam_file --out output -glm INDEL”.

Author Contributions

D.S. and A.F. analyzed the data and wrote the manuscript; D.S. and A.F. contributed equally as first authors; M.K. performed the experiments of DNA genotyping arrays; K.N. and H.N. performed the experiments of next generation sequencing; K.N., M.F., U.Y. and H.N. performed Sanger sequencing verification; S.A., T.A. and K.A.B. provided the technical assistance; T.T. organized this work and wrote the manuscript. All authors contributed to and approved the final manuscript.

30 in total

1. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors: Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal: Genome Res Date: 2010-07-19 Impact factor: 9.043

2. Whole-genome sequencing and comprehensive variant analysis of a Japanese individual using massively parallel sequencing.

Authors: Akihiro Fujimoto; Hidewaki Nakagawa; Naoya Hosono; Kaoru Nakano; Tetsuo Abe; Keith A Boroevich; Masao Nagasaki; Rui Yamaguchi; Tetsuo Shibuya; Michiaki Kubo; Satoru Miyano; Yusuke Nakamura; Tatsuhiko Tsunoda
Journal: Nat Genet Date: 2010-10-24 Impact factor: 38.330

3. Quality scores and SNP detection in sequencing-by-synthesis systems.

Authors: William Brockman; Pablo Alvarez; Sarah Young; Manuel Garber; Georgia Giannoukos; William L Lee; Carsten Russ; Eric S Lander; Chad Nusbaum; David B Jaffe
Journal: Genome Res Date: 2008-01-22 Impact factor: 9.043

Review 4. Next-generation DNA sequencing methods.

Authors: Elaine R Mardis
Journal: Annu Rev Genomics Hum Genet Date: 2008 Impact factor: 8.929

5. Tandem repeats finder: a program to analyze DNA sequences.

Authors: G Benson
Journal: Nucleic Acids Res Date: 1999-01-15 Impact factor: 16.971

Review 6. Sequencing technologies - the next generation.

Authors: Michael L Metzker
Journal: Nat Rev Genet Date: 2009-12-08 Impact factor: 53.242

7. A comprehensive catalogue of somatic mutations from a human cancer genome.

Authors: Erin D Pleasance; R Keira Cheetham; Philip J Stephens; David J McBride; Sean J Humphray; Chris D Greenman; Ignacio Varela; Meng-Lay Lin; Gonzalo R Ordóñez; Graham R Bignell; Kai Ye; Julie Alipaz; Markus J Bauer; David Beare; Adam Butler; Richard J Carter; Lina Chen; Anthony J Cox; Sarah Edkins; Paula I Kokko-Gonzales; Niall A Gormley; Russell J Grocock; Christian D Haudenschild; Matthew M Hims; Terena James; Mingming Jia; Zoya Kingsbury; Catherine Leroy; John Marshall; Andrew Menzies; Laura J Mudie; Zemin Ning; Tom Royce; Ole B Schulz-Trieglaff; Anastassia Spiridou; Lucy A Stebbings; Lukasz Szajkowski; Jon Teague; David Williamson; Lynda Chin; Mark T Ross; Peter J Campbell; David R Bentley; P Andrew Futreal; Michael R Stratton
Journal: Nature Date: 2009-12-16 Impact factor: 49.962

8. Natural genetic variation caused by small insertions and deletions in the human genome.

Authors: Ryan E Mills; W Stephen Pittard; Julienne M Mullaney; Umar Farooq; Todd H Creasy; Anup A Mahurkar; David M Kemeza; Daniel S Strassler; Chris P Ponting; Caleb Webber; Scott E Devine
Journal: Genome Res Date: 2011-04-01 Impact factor: 9.043

9. Inference of human population history from individual whole-genome sequences.

Authors: Heng Li; Richard Durbin
Journal: Nature Date: 2011-07-13 Impact factor: 49.962

10. Next generation sequence analysis and computational genomics using graphical pipeline workflows.

Authors: Federica Torri; Ivo D Dinov; Alen Zamanyan; Sam Hobel; Alex Genco; Petros Petrosyan; Andrew P Clark; Zhizhong Liu; Paul Eggert; Jonathan Pierce; James A Knowles; Joseph Ames; Carl Kesselman; Arthur W Toga; Steven G Potkin; Marquis P Vawter; Fabio Macciardi
Journal: Genes (Basel) Date: 2012-08-30 Impact factor: 4.096

20 in total

Review 1. Whole genome sequencing as a means to assess pathogenic mutations in medical genetics and cancer.

Authors: Beryl Royer-Bertrand; Carlo Rivolta
Journal: Cell Mol Life Sci Date: 2014-12-30 Impact factor: 9.261

2. Whole genome analysis of a Vietnamese trio.

Authors: Dang Thanh Hai; Nguyen Dai Thanh; Pham Thi Minh Trang; Le Si Quang; Phan Thi Thu Hang; Dang Cao Cuong; Hoang Kim Phuc; Nguyen Huu Duc; Do Duc Dong; Bui Quang Minh; Pham Bao Son; Le Sy Vinh
Journal: J Biosci Date: 2015-03 Impact factor: 1.826

3. WHATIF: An open-source desktop application for extraction and management of the incidental findings from next-generation sequencing variant data.

Authors: Zhan Ye; Christopher Kadolph; Robert Strenn; Daniel Wall; Elizabeth McPherson; Simon Lin
Journal: Comput Biol Med Date: 2015-04-08 Impact factor: 4.589

4. Novel calmodulin mutations associated with congenital arrhythmia susceptibility.

Authors: Naomasa Makita; Nobue Yagihara; Lia Crotti; Christopher N Johnson; Britt-Maria Beckmann; Michelle S Roh; Daichi Shigemizu; Peter Lichtner; Taisuke Ishikawa; Takeshi Aiba; Tessa Homfray; Elijah R Behr; Didier Klug; Isabelle Denjoy; Elisa Mastantuono; Daniel Theisen; Tatsuhiko Tsunoda; Wataru Satake; Tatsushi Toda; Hidewaki Nakagawa; Yukiomi Tsuji; Takeshi Tsuchiya; Hirokazu Yamamoto; Yoshihiro Miyamoto; Naoto Endo; Akinori Kimura; Kouichi Ozaki; Hideki Motomura; Kenji Suda; Toshihiro Tanaka; Peter J Schwartz; Thomas Meitinger; Stefan Kääb; Pascale Guicheney; Wataru Shimizu; Zahurul A Bhuiyan; Hiroshi Watanabe; Walter J Chazin; Alfred L George
Journal: Circ Cardiovasc Genet Date: 2014-06-10

5. Comprehensive genetic exploration of skeletal dysplasia using targeted exome sequencing.

Authors: Jun-Seok Bae; Nayoung K D Kim; Chung Lee; Sang Cheol Kim; Hey Ran Lee; Hae-Ryong Song; Kun Bo Park; Hyun Woo Kim; Soon Hyuck Lee; Ha Yong Kim; Soon Chul Lee; Changhoon Jeong; Moon Seok Park; Won Joon Yoo; Chin Youb Chung; In Ho Choi; Ok-Hwa Kim; Woong-Yang Park; Tae-Joon Cho
Journal: Genet Med Date: 2015-09-24 Impact factor: 8.822

6. Targeted Sequencing Reveals Large-Scale Sequence Polymorphism in Maize Candidate Genes for Biomass Production and Composition.

Authors: Moses M Muraya; Thomas Schmutzer; Chris Ulpinnis; Uwe Scholz; Thomas Altmann
Journal: PLoS One Date: 2015-07-07 Impact factor: 3.240

7. Genome-Wide Analysis of Simple Sequence Repeats and Efficient Development of Polymorphic SSR Markers Based on Whole Genome Re-Sequencing of Multiple Isolates of the Wheat Stripe Rust Fungus.

Authors: Huaiyong Luo; Xiaojie Wang; Gangming Zhan; Guorong Wei; Xinli Zhou; Jing Zhao; Lili Huang; Zhensheng Kang
Journal: PLoS One Date: 2015-06-12 Impact factor: 3.240

8. Personalized tumor-specific DNA junctions to detect circulating tumor in patients with endometrial cancer.

Authors: Tommaso Grassi; Faye R Harris; James B Smadbeck; Stephen J Murphy; Matthew S Block; Francesco Multinu; Janet L Schaefer Klein; Piyan Zhang; Giannoula Karagouga; Minetta C Liu; Alyssa Larish; Maureen A Lemens; Marla Kay S Sommerfield; Serena Cappuccio; John C Cheville; George Vasmatzis; Andrea Mariani
Journal: PLoS One Date: 2021-06-10 Impact factor: 3.240

9. Performance comparison of four commercial human whole-exome capture platforms.

Authors: Daichi Shigemizu; Yukihide Momozawa; Testuo Abe; Takashi Morizono; Keith A Boroevich; Sadaaki Takata; Kyota Ashikawa; Michiaki Kubo; Tatsuhiko Tsunoda
Journal: Sci Rep Date: 2015-08-03 Impact factor: 4.379

10. Integrated Genome and Transcriptome Sequencing to Solve a Neuromuscular Puzzle: Miyoshi Muscular Dystrophy and Early Onset Primary Dystonia in Siblings of the Same Family.

Authors: Feng Zhu; Fengxiao Zhang; Lizhi Hu; Haowen Liu; Yahua Li
Journal: Front Genet Date: 2021-07-02 Impact factor: 4.599