Literature DB >> 25876137

SNPer: an R library for quantitative variant analysis on single nucleotide polymorphisms among influenza virus populations.

Unitsa Sangket1, Sukanya Vijasika1, Hasnee Noh1, Wasun Chantratita2, Chonticha Klungthong3, In Kyu Yoon3, Stefan Fernandez3, Wiriya Rutvisuttinunt3.   

Abstract

Influenza virus (IFV) can evolve rapidly leading to genetic drifts and shifts resulting in human and animal influenza epidemics and pandemics. The genetic shift that gave rise to the 2009 influenza A/H1N1 pandemic originated from a triple gene reassortment of avian, swine and human IFVs. More minor genetic alterations in genetic drift can lead to influenza drug resistance such as the H274Y mutation associated with oseltamivir resistance. Hence, a rapid tool to detect IFV mutations and the potential emergence of new virulent strains can better prepare us for seasonal influenza outbreaks as well as potential pandemics. Furthermore, identification of specific mutations by closely examining single nucleotide polymorphisms (SNPs) in IFV sequences is essential to classify potential genetic markers associated with potentially dangerous IFV phenotypes. In this study, we developed a novel R library called "SNPer" to analyze quantitative variants in SNPs among IFV subpopulations. The computational SNPer program was applied to three different subpopulations of published IFV genomic information. SNPer queried SNPs data and grouped the SNPs into (1) universal SNPs, (2) likely common SNPs, and (3) unique SNPs. SNPer outperformed manual visualization in terms of time and labor. SNPer took only three seconds with no errors in SNP comparison events compared with 40 hours with errors using manual visualization. The SNPer tool can accelerate the capacity to capture new and potentially dangerous IFV strains to mitigate future influenza outbreaks.

Entities:  

Mesh:

Year:  2015        PMID: 25876137      PMCID: PMC4395159          DOI: 10.1371/journal.pone.0122812

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Influenza virus (IFV), a rapidly evolving virus in the orthomyxoviridae family, causes frequent epidemics and occasional pandemics. The diversity of the IFV genome generates mixtures of viral subpopulations, which subsequently can lead to the emergence of new virulent strains. Genetic divergence of IFV sequences can be driven by pressure from host immunity and host cell factors [1]. IFV evolves through several mechanisms including RNA recombination, point mutation or antigenic drift, and gene reassortment or genetic shift [2]. There are three types of IFVs (influenza A, B and C) with human disease most commonly caused by influenza A/H3N2, A/H1N1 and B. A full length RNA genome of influenza A is approximately 13.6 kb; influenza B is about 14.6 kb. The full genomic structure is composed of eight fragments with approximate lengths of 2341 nucleotides (nt) for RNA polymerase PB1 unit; 2300 nt for RNA polymerase PB2; 2233 nt for RNA polymerase PA; 1765 nt for hemagglutinin (HA); 1565 nt for nucleoprotein (NP); 1413 nt for neuraminidase (NA) with additional NB protein in influenza B; 1027 nt for matrix (M); 890 nt for nonstructural protein (NS) including NS1 and NS2 proteins. Close monitoring of current circulating strains is crucial to evaluate IFV evolution as well as the possible detection of novel virulent and drug resistant viral strains. This information is essential for determining seasonal influenza vaccine design and composition. Therefore, a technique to identify the signature sequence or single nucleotide polymorphisms (SNPs) of viruses from the large amount of sequence information typically generated using next-generation sequencing (NGS) is essential to evaluate the sequence signatures in viral subpopulations associated with virulent or drug resistant phenotypes. NGS [3], also known as high-throughput sequencing, is a powerful sequencing technique often used to obtain large amounts of genomic sequences to investigate specific questions about an organism’s genetic information. Particularly in viruses, this methodology has been adopted for diagnosis and advanced investigations to detect novel mutations and evolving quasispecies [4]. Analyzing SNPs by manual visualization is time consuming and resource intensive. A computational program to compare viral SNPs would provide an efficient tool compared to manual analysis. R [5] is an open source statistics programming language and environment (http://www.r-project.org/). A wide variety of packages are provided by R, especially bioinformatics packages such as Bioconductor [6] (www.bioconductor.org), GenABEL [7] and ParallABEL [8] (http://www.genabel.org/packages). MySQL (http://www.mysql.com/) is a well known free database management software useful for storing and retrieving SNPs data using SQL command [9]. MySQL can be manipulated by R using RMySQL [10], a database interface and MySQL driver for R. In this article, we present the development of the “SNPer” library, a new R library for identification of IFV mutations by differentiation of SNPs among viral subpopulations. “SNPer” is named to denote the action of searching for SNPs in a MySQL database.

Methods

Operating System and Computer Software for Input Data Preparation

A personal computer running CentOS (Community Enterprise Operating System) version 6.3 (http://www.centos.org/download/) was utilized for data analysis. The computer consisted of an Intel icore i5-2400 (3.10 GHz) processor and 4 GB RAM. This computer also provided R program version 3.0.2, RMySQL library version 0.9–3, and MySQL 5.1.67, which were utilized as components by the SNPer library. SNPs data of three different influenza A/H3N2 viral subpopulations obtained by NGS and published in Sequence Read Archive (SRA) and GenBank [11-12] were used to create an input data file to measure the performance of SNPer. The IFV genomic information of NGS sequence read data as described in detail in Rutvisuttinunt et al. 2014 [12] including viral subpopulations from sample #VIROAF1 (GenBank accession number KJ577146-KJ577153), VIROAF2 (KJ577154-KJ577161) and VIROAF6 (KJ577186-KJ577193) were downloaded from the database (http://www.ncbi.nlm.nih.gov/sra and http://www.ncbi.nlm.nih.gov/nuccore/). Each subpopulation had eight fragments: PB2, PB1, PA, HA, NP, NA, M and NEP. The SNPs data of the three IFV subpopulations in csv files were created by Burrows-Wheeler Aligner (BWA) [13] and Genome Analysis Toolkit (GATK) [14-15] from the raw fastq files. Table 1 illustrates an example of SNPs data of the HA gene fragment from VIROAF1 contained in “AF1_HA.csv” file. A *.csv file contained SNPs information of the sequence reads of interest after being aligned with the reference genome, and the fields of each line were separated by commas and enclosed within double quotation marks. The specific “AF1_HA.csv” file was created by MiSeq Reporter aligning sequence reads [13] with the NGS data of influenza A/H3N2 subpopulation 1 against the HA gene of the reference genome influenza A/H3N2 (GenBank CY121792). The Call column presents the alleles of the SNPs. For example, the first SNP at location 51 is A51G (VIROAF1 contains allele G while reference contains allele A at position 51 of HA gene fragment).
Table 1

Detected SNPs in HA gene fragment from sample #VIROAF1 (“AF1_HA.csv”).

# Sample ID Sample Name Chr Position Score Variant Type Call Frequency Depth Filter
1 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 51 13099SNP A->[G/G] 1426LowGQ
2 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 72 16707SNP C->[T/T] 1567LowGQ
3 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 146 23020SNP A->[G/G] 1864LowGQ
4 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 182 23194SNP G->[A/A] 1872LowGQ
5 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 191 23154SNP C->[T/T] 1852LowGQ
6 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 285 17047SNP C->[T/T] 1620LowGQ
7 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 308 16337SNP T->[A/A] 1580LowGQ
8 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 405 21306SNP A->[G/G] 1791LowGQ
9 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 413 20528SNP G->[A/A] 1760LowGQ
10 VIROAF1 VIROAF1H3N2_CY121792_HA.seq 456 20446SNP C->[T/T] 1768LowGQ

Each line describes the SNPs at each position in the HA gene fragment of VIROAF1. For instance, position 51 of the HA gene in VIROAF1 is G while the reference allele is A.

Each line describes the SNPs at each position in the HA gene fragment of VIROAF1. For instance, position 51 of the HA gene in VIROAF1 is G while the reference allele is A.

Executing SNPer

After installing the SNPer package, users can use SNPer to compare IFV SNPs by executing the SNPer function. An example of SNPer usage is shown in Fig 1. The user can change the variables including input_files_list (a file contained the list of SNPs files), mysql_user (a user name in MySQL), mysql_password (a password for the user name), mysql_host (the host name of MySQL), and db_name (the database name for storing SNPs data). The SNPer database must be created by the user before executing SNPer.
Fig 1

An example of SNPer usage.

The user loads the SNPer library to current environment before using the SNPer function.

An example of SNPer usage.

The user loads the SNPer library to current environment before using the SNPer function. The workflow for SNPs comparison is presented in Fig 2. The SNPs data in input files is processed by the SNPer library under the R program. SNPer uses RMySQL library and SQL commands to create three tables of the SNPer database including sp1, sp2 and sp3 [Table 2]. These tables store SNPs data of each viral subpopulation in the SNPer database. For instance, the sp1 table stores the SNPs data of the HA fragment of the first IFV subpopulation (VIROAF1) contained in “AF1_HA.csv” file, the sp2 table stores the SNPs data of the HA fragment of the second IFV subpopulation (VIROAF2) contained in “AF2_HA.csv” file, and the sp3 table stores the SNPs data of the HA fragment of the third IFV subpopulation (VIROAF6) contained in “AF6_HA.csv” file. SNPer executes data according to the list of subpopulations illustrated in Table 2. The SNPs data from “AF1_HA.csv”, “AF2_HA.csv” and “AF6_HA.csv” are pulled from the SNPer database for SNPs comparison by SNPer.
Fig 2

SNPs comparison workflow.

SNPer, an R library, analyzes SNPs data using RMySQL and MySQL producing SNPs comparison data as its output.

Table 2

List of SNP input files ("input_files_list.csv") to be compared by SNPer.

sp1 sp2 sp3
AF1_HA.csvAF2_HA.csvAF6_HA.csv
AF1_M.csvAF2_M.csvAF6_M.csv
AF1_NA.csvAF2_NA.csvAF6_NA.csv
AF1_NEP.csvAF2_NEP.csvAF6_NEP.csv
AF1_NP.csvAF2_NP.csvAF6_NP.csv
AF1_PA.csvAF2_PA.csvAF6_PA.csv
AF1_PB1.csvAF2_PB1.csvAF6_PB1.csv
AF1_PB2.csvAF2_PB2.csvAF6_PB2.csv

SNPs from eight fragments of sp1, sp 2 and sp3 were compared by SNPer. Each computational SNPs comparison among three viral subpopulations was conducted according to the name of the files listed in each row. For instance, for row #1, SNPer compared the SNPs data in file “AF1_HA.csv” of population 1 (sp1), “AF2_HA.csv” file from population 2 (sp2) and “AF6_HA.csv” file from population 3 (sp3).

SNPs comparison workflow.

SNPer, an R library, analyzes SNPs data using RMySQL and MySQL producing SNPs comparison data as its output. SNPs from eight fragments of sp1, sp 2 and sp3 were compared by SNPer. Each computational SNPs comparison among three viral subpopulations was conducted according to the name of the files listed in each row. For instance, for row #1, SNPer compared the SNPs data in file “AF1_HA.csv” of population 1 (sp1), “AF2_HA.csv” file from population 2 (sp2) and “AF6_HA.csv” file from population 3 (sp3). In addition, the setting of RMySQL table structure for the input data is required as illustrated in Table 3 if different datasets are used for SNPs comparison. The input data (as seen in Table 1) contains information which is organized into eight fields, separated by columns in the csv file. The assigned primary key, “id” column, must be unique and not null. SNPer queries the SNPs comparison in the tables using RMySQL with SQL commands. Furthermore, SNPer is executed to compare the SNPs of three viral subpopulations ordered by the list of SNPs files as shown in Table 2. The SNPs comparison outputs from RMySQL are processed and written in the output files by SNPer.
Table 3

The required table structure of the input data for SNPer computational analysis based on MySQL.

Field Type Null Key Default Extra
idint(5)NOPRINULL
sample_idchar(20)YESNULL
sample_namechar(20)YESNULL
chrchar(100)YESNULL
positionint(20)YESNULL
scoreint(20)YESNULL
variant_typechar(20)YESNULL
call_char(8)YESNULL
frequencyint(5)YESNULL
depthint(10)YESNULL
filterchar(20)YESNULL

The table structure of the input data in the SNPer database can be retrieved by sql command (DESCRIBE ).

The table structure of the input data in the SNPer database can be retrieved by sql command (DESCRIBE ).

Expected Output Data for SNPs comparison

SNPer analyzes the output variants data from NGS done by the MiSeq Illumina platform and groups the SNPs of three different IFV subpopulations into (1) universal SNPs (shared in all viral subpopulations), (2) likely common SNPs (shared in almost all viral subpopulations), and (3) unique SNPs (not shared by other viral subpopulations). Fig 3 shows an example of SNPs comparison of three different viral subpopulations of influenza A/H3N2: viral subpopulation 1 (sp1), viral subpopulation 2 (sp2) and viral subpopulation 3 (sp3). Universal SNPs are located in area A. Likely common SNPs are presented in area B (in sp1 and sp2 but not in sp3), area C (in sp1 and sp3 but not in sp2) and area D (in sp2 and sp3 but not in sp1). Unique SNPs are indicated in area E (only in sp1), area F (only in sp2) and area G (only in sp3).
Fig 3

Comparison of SNPs from three different IFV subpopulations.

Area A contains universal SNPs. Areas B, C and D consist of likely common SNPs. Areas E, F and G contain unique SNPs.

Comparison of SNPs from three different IFV subpopulations.

Area A contains universal SNPs. Areas B, C and D consist of likely common SNPs. Areas E, F and G contain unique SNPs.

Results

SNPer took three seconds to compare the SNPs of the complete eight IFV genomic fragments against three viral subpopulations as shown in Table 4. Each row displays distinct groups of SNPs [unique (only_spX), likely common (only_spX_spY), and universal SNPs (all)]. The HA fragment has the highest number of universal SNPs (all = 12) and unique SNPs (only_X = 6 + 10 + 8 = 24) as visualized in Fig 4. The comparison results from SNPer were checked by adding groups of SNPs according to the formula; no errors were found using SNPer. The summation of the SNPs of all groups from SNPer was equal to the summation of the total SNPs numbers in the fragments of the three subpopulations.
Table 4

The outputs of SNPer for the eight fragments of the three different IFVs (“summary.csv”).

samples_table sp1 sp2 sp3 all only_sp1 only_sp2 only_sp3 only_sp1_sp2 only_sp2_sp3 only_sp3_sp1
AF1_HA_vs_AF2_HA_vs_AF6_HA272229126108009
AF1_M_vs_AF2_M_vs_AF6_M8674120003
AF1_NA_vs_AF2_NA_vs_AF6_NA1312165365015
AF1_NEP_vs_AF2_NEP_vs_AF6_NEP5452020003
AF1_NP_vs_AF2_NP_vs_AF6_NP119103362005
AF1_PA_vs_AF2_PA_vs_AF6_PA158183356009
AF1_PB1_vs_AF2_PB1_vs_AF6_PB118212193115016
AF1_PB2_vs_AF2_PB2_vs_AF6_PB2311626117520013

Each row shows the number of SNPs of VIROAF1 (sp1), VIROAF2 (sp2) and VIROAF6 (sp3) for each fragment after comparison by SNPer. For example, in the first row, the HA fragment contains 27 SNPs in VIROAF1, 22 SNPs in VIROAF2 and 29 SNPs in VIROAF6. Twelve universal SNPs are in VIROAF1, VIROAF2, and VIROAF6. There are six SNPs in only VIROAF1, 10 SNPs in only VIROAF2 and eight SNPs in only VIROAF6. Only nine SNPs exist in both VIROAF6 and VIROAF1.(1)

Fig 4

The SNPs composition output chart of three HA sequences from three IFV subpopulations.

Each circle represents the number of SNPs of VIROAF1 (sp1), VIROAF2 (sp2) and VIROAF6 (sp3) for the HA fragment; universal SNPs; unique SNPs; and likely common SNPs.

The SNPs composition output chart of three HA sequences from three IFV subpopulations.

Each circle represents the number of SNPs of VIROAF1 (sp1), VIROAF2 (sp2) and VIROAF6 (sp3) for the HA fragment; universal SNPs; unique SNPs; and likely common SNPs. Each row shows the number of SNPs of VIROAF1 (sp1), VIROAF2 (sp2) and VIROAF6 (sp3) for each fragment after comparison by SNPer. For example, in the first row, the HA fragment contains 27 SNPs in VIROAF1, 22 SNPs in VIROAF2 and 29 SNPs in VIROAF6. Twelve universal SNPs are in VIROAF1, VIROAF2, and VIROAF6. There are six SNPs in only VIROAF1, 10 SNPs in only VIROAF2 and eight SNPs in only VIROAF6. Only nine SNPs exist in both VIROAF6 and VIROAF1.(1) number of SNPs only detected in the X subpopulation. number of SNPs shared between the X and Y but not Z subpopulation. : number of SNPs shared by all subpopulations (X, Y and Z). For example, the results from SNPer for the HA fragment of viral subpopulations VIROAF1, VIROAF2, and VIROAF6 were validated with the number of SNPs from Table 3. The summation of the number of SNPs in the HA fragment of the three subpopulations is 78 (27+22+29). The summation of SNPs from each group for the HA fragment of the three subpopulations is 78 [6+10+8+(0+0+9)*2+(12*3)]. Therefore, SNPer correctly produced the outputs of the HA fragment of the three subpopulations. The complete list of SNPs for each group is provided in an output folder. An example of the list is shown in Table 5. The position and call (e.g., A405G) illustrates the universal SNPs from the three subpopulations.
Table 5

The allelic list of the universal SNPs found in HA fragment (“AF1_HA_vs_AF2_HA_vs_AF3_HA_sim_all.csv”).

idsample_idsample_nameidsample_idsample_nameidsample_idsample_namepositioncall_
8VIROAF1VIROAF15VIROAF2VIROAF26VIROAF6VIROAF6405A->[G/G]
9VIROAF1VIROAF16VIROAF2VIROAF27VIROAF6VIROAF6413G->[A/A]
11VIROAF1VIROAF19VIROAF2VIROAF211VIROAF6VIROAF6482A->[G/G]
13VIROAF1VIROAF110VIROAF2VIROAF214VIROAF6VIROAF6629C->[T/T]
14VIROAF1VIROAF111VIROAF2VIROAF215VIROAF6VIROAF6640G->[T/T]
17VIROAF1VIROAF113VIROAF2VIROAF216VIROAF6VIROAF6715G->[A/A]
19VIROAF1VIROAF116VIROAF2VIROAF221VIROAF6VIROAF6973A->[G/G]
21VIROAF1VIROAF117VIROAF2VIROAF222VIROAF6VIROAF61195A->[C/C]
23VIROAF1VIROAF118VIROAF2VIROAF225VIROAF6VIROAF61323A->[G/G]
24VIROAF1VIROAF119VIROAF2VIROAF226VIROAF6VIROAF61341T->[G/G]
26VIROAF1VIROAF121VIROAF2VIROAF228VIROAF6VIROAF61606C->[T/T]
27VIROAF1VIROAF122VIROAF2VIROAF229VIROAF6VIROAF61671A->[G/G]

Twelve universal SNPs detected among three subpopulations in HA fragment are A405G, G413A, A482G, C629T, G640T, G715A, A973G, A1195C, A1323G, C1606T, A1671G when compared to the influenza A/H3N2 reference (GenBank CY121792). Each row displays each SNP position.

Twelve universal SNPs detected among three subpopulations in HA fragment are A405G, G413A, A482G, C629T, G640T, G715A, A973G, A1195C, A1323G, C1606T, A1671G when compared to the influenza A/H3N2 reference (GenBank CY121792). Each row displays each SNP position.

Discussion and Conclusion

The SNPer library utilizes RMySQL to compare IFV SNPs stored in MySQL. SNPer was efficiently executed on Linux and Microsoft Windows operating systems. In addition, manual visualization utilizing the same set of SNPs data by qualified performers under non-distracting conditions generally required more than 40 hours (data not shown). Although similar software packages already exist, SNPer has certain advantages compared to currently available package tools. For instance, VCFtools [16] can compare two subpopulations (two files) whereas SNPer can compare three subpopulations. Although the VCFtools user can merge the SNPs data of two subpopulations to compare with a third subpopulation, it is a cumbersome and time-consuming process, especially when running many subpopulation fragments. A second software package, VariantToolChest [17], requires reference genomes in fasta format, and is time and memory consuming. In contrast, SNPer does not need any reference genome during comparison and takes less time and memory when compared to VariantToolsChest. Moreover, the VariantToolsChest user needs to create many different sets of commands to compare the fragments from three subpopulations. For example, to compare the SNPs of eight gene fragments from three subpopulations, VariantToolsChest requires 56 different commands, while SNPer requires one command. Therefore, SNPer is more efficient than VariantToolsChest, especially during analysis of multiple genomic fragments. In addition, users validate the outcome from VariantToolsChest and VCFtools whereas sql commands automatically validate outcomes from SNPer. In terms of running time and the resources needed to analyze our test dataset, SNPer requires the least, followed by VCFtools and VariantToolsChest. In conclusion, SNPer is a rapid and efficient tool to detect SNPs to monitor IFV evolution. This efficiency only increases with higher numbers of SNPs. SNPer could be used to analyze quantitative variants of SNPs among not only IFV subpopulations [12] but also other pathogens such as human immunodeficiency virus (HIV). SNPer has the potential to improve our ability to understand evolving populations of viruses and other pathogens, particularly for identifying novel universal SNPs associated with specific traits (e.g., drug resistance, virulence, etc.) which can emerge under selective pressure. This tool could allow for more timely response to these newly emerging pathogens.

Software Availability

The SNPer package and its manual are available at http://www.mbb.psu.ac.th/SNPer/index.html
  16 in total

1.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data.

Authors:  Aaron McKenna; Matthew Hanna; Eric Banks; Andrey Sivachenko; Kristian Cibulskis; Andrew Kernytsky; Kiran Garimella; David Altshuler; Stacey Gabriel; Mark Daly; Mark A DePristo
Journal:  Genome Res       Date:  2010-07-19       Impact factor: 9.043

Review 2.  Bioconductor: an open source framework for bioinformatics and computational biology.

Authors:  Mark Reimers; Vincent J Carey
Journal:  Methods Enzymol       Date:  2006       Impact factor: 1.600

3.  GenABEL: an R library for genome-wide association analysis.

Authors:  Yurii S Aulchenko; Stephan Ripke; Aaron Isaacs; Cornelia M van Duijn
Journal:  Bioinformatics       Date:  2007-03-23       Impact factor: 6.937

4.  SNPtoGO: characterizing SNPs by enriched GO terms.

Authors:  Daniel F Schwarz; Oliver Hädicke; Jeanette Erdmann; Andreas Ziegler; Daniel Bayer; Steffen Möller
Journal:  Bioinformatics       Date:  2007-11-17       Impact factor: 6.937

Review 5.  Evolution and ecology of influenza A viruses.

Authors:  R G Webster; W J Bean; O T Gorman; T M Chambers; Y Kawaoka
Journal:  Microbiol Rev       Date:  1992-03

6.  ParallABEL: an R library for generalized parallelization of genome-wide association studies.

Authors:  Unitsa Sangket; Surakameth Mahasirimongkol; Wasun Chantratita; Pichaya Tandayya; Yurii S Aulchenko
Journal:  BMC Bioinformatics       Date:  2010-04-29       Impact factor: 3.169

7.  Structural and evolutionary characteristics of HA, NA, NS and M genes of clinical influenza A/H3N2 viruses passaged in human and canine cells.

Authors:  O P Zhirnov; I V Vorobjeva; O A Saphonova; S V Poyarkov; A V Ovcharenko; D Anhlan; N A Malyshev
Journal:  J Clin Virol       Date:  2009-08       Impact factor: 3.168

8.  A framework for variation discovery and genotyping using next-generation DNA sequencing data.

Authors:  Mark A DePristo; Eric Banks; Ryan Poplin; Kiran V Garimella; Jared R Maguire; Christopher Hartl; Anthony A Philippakis; Guillermo del Angel; Manuel A Rivas; Matt Hanna; Aaron McKenna; Tim J Fennell; Andrew M Kernytsky; Andrey Y Sivachenko; Kristian Cibulskis; Stacey B Gabriel; David Altshuler; Mark J Daly
Journal:  Nat Genet       Date:  2011-04-10       Impact factor: 38.330

9.  The variant call format and VCFtools.

Authors:  Petr Danecek; Adam Auton; Goncalo Abecasis; Cornelis A Albers; Eric Banks; Mark A DePristo; Robert E Handsaker; Gerton Lunter; Gabor T Marth; Stephen T Sherry; Gilean McVean; Richard Durbin
Journal:  Bioinformatics       Date:  2011-06-07       Impact factor: 6.937

10.  Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors:  Heng Li; Richard Durbin
Journal:  Bioinformatics       Date:  2009-05-18       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.