Literature DB >> 24972110

A new genotype imputation method with tolerance to high missing rate and rare variants.

Yumei Yang1, Qishan Wang1, Qiang Chen1, Rongrong Liao1, Xiangzhe Zhang1, Hongjie Yang2, Youmin Zheng2, Zhiwu Zhang3, Yuchun Pan4.   

Abstract

We report a novel algorithm, iBLUP, to impute missing genotypes by simultaneously and comprehensively using identity by descent and linkage disequilibrium information. The simulation studies showed that the algorithm exhibited drastically tolerance to high missing rate, especially for rare variants than other common imputation methods, e.g. BEAGLE and fastPHASE. At a missing rate of 70%, the accuracy of BEAGLE and fastPHASE dropped to 0.82 and 0.74 respectively while iBLUP retained an accuracy of 0.95. For minor allele, the accuracy of BEAGLE and fastPHASE decreased to -0.1 and 0.03, while iBLUP still had an accuracy of 0.61.We implemented the algorithm in a publicly available software package also named iBLUP. The application of iBLUP for processing real sequencing data in an outbred pig population was demonstrated.

Entities:  

Mesh:

Year:  2014        PMID: 24972110      PMCID: PMC4074155          DOI: 10.1371/journal.pone.0101025

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Benefited from the advances of sequencing technologies, Genome-Wide Association Studies (GWAS) have revealed substantial genetic loci controlling human diseases and agriculturally important traits [1]–[3]. However, the identified loci collectively explain only a small proportion of total variation [4]–[7]. In addition to the path of common diseases and common variants, the new path of common disease and rare variants shed a new hope to have a better understand of complex traits [8]. Multiplexing is one the advances that revolutionized the high throughput Genotyping By Sequencing (GBS). Samples are individually tagged and pooled into a single lane of flow cell. It exponentially increases the number of samples analyzed in a single run without dramatically increasing cost and time [9]. Recently, several GBS methods used for both inbred and outbred population have been developed [10], [11]. The challenge is that the sequencing data contains a lot of missing genotypes. Imputation of missing genotypes at high missing rate is hard and imputation for rare variants are extreme hard, especially for general outbred populations because of the high degree of heterogeneity and phase ambiguity in the haplotype [12]. The information for imputation can be divided into two categories. One is linkage disequilibrium (LD) among genetic loci; the other is the relationship, termed identity by decent (IBD), among individuals [13]. Imputation methods have been developed to use either of them, or both with different degrees of complexity. These methods include allele frequencies based methods(PLINK, SNPMSTAT, UNPHASED, and TUNA), Hidden Markov Chain based methods(IMPUT, MACH, fastPHASE), mixed model based methods (S-MM,M-MM) and graphic theory based method(BEAGLE) [14]–[22]. A clear linkage phase, such as a haplotype, is the most desirable situation for most of the algorithms to work with [13]. However, phasing becomes extremely difficult with GBS at low coverage with high missing rate, especially in outbred population such as human, maize landrace, dog, cattle and pig where heterogeneity is high [23]. As we known, none of the existing methods can work well for the GBS data with low coverage and high missing rate in outbred population and no convenient software can impute the missing genotypes based on this kind of data. The objective of this study was to make full usage of LD and IBD simultaneously and develop a genotype imputation algorithm and software with tolerance to high missing rate, especially for rare variants.

Methods

Approval by the Institutional Animal Care and Use Committee of Shanghai Jiao Tong University (contract no. 2011–0033) was given for all experimental procedures involving pigs in the present study. All the 72 sequenced pigs were housed in Shanghai Xiangxin Livestock Ltd. Co., Shanghai, China, and were raised according to the standard practice for housing and care of Xiangxin Livestock Ltd. Co.(http://www.shxxgx.com/sygl.htm). Additional information of the sequenced pigs was shown in .

1 iBLUP method

The chromosomes were divided into a large number of blocks on the basis of the extent of LD, and the LD of any two markers in a block is necessarily greater than some criteria. The marker that is less than some criteria will be removed from the block and will be imputed by a single variable BLUP model. All the markers in one block were analyzed by modeling using multivariate BLUP, and missing genotypes were predicted simultaneously. The imputation model for each marker in the block was: Where y is an observed genotype vector for the number of copies of the minor allele (0, 1 or 2) for the marker, The length of the vector equals the number of individuals, b is the fixed effect and X is the design matrix for b, a is the effect underlying the observed genotype, Z is the design matrix for a and e is the residual. The vector b and a have the same length as y. Assumed that there are m markers in one block, i from 1 to m, we set The multivariate BLUP equations are:where R is the residual variance–covariance matrix. Following Gengler et al. (2007), we set G is the genetic variance–covariance matrixwhere r is the correlation between markers i and j. When the value of r was >0.95 (or <−0.95), r was set to 0.95 (or −0.95) to avoid the singularities matrix [22]. is a marker-based kinship matrix [24] and we develop an iterative kinship algorithm to construct the matrix considering the missing data described in a later section. The symbol represents the Kronecker product.

2 Iterative kinship algorithm

In the model, K was calculated using the iterative algorithm because of the high missing rate of the genomic genotype data. The initial K was calculated using only the genotype data from genotyped individuals, and homogenized after dividing by the number of common typed loci. The following K was calculated using the imputed genotype based on BLUP. When the difference between the correlation coefficients of two continuous K values is <0.001, the iterative process is finished. To ensure that K converged, we forced the imputed genotype of multivariate BLUP to be 0 (or 2), if it is <0 (or >2). To improve the computation speed, the Equation (2) was solved with an LU-factored method based on subroutines from the IntelR Math Kernel Library using parallel execution on Linux workstations.

3 iBLUP pipeline

The iBLUP pipeline can deal with both SNP array and sequencing data. The analysis steps are dependent on the kind of input data. If SNP array data is the input data, only step 5 need be performed. If user wants to impute sequencing data directly, all the 5 steps can be run automatically. We introduce the analysis steps briefly as follow, and the details can be found in the online manual (http://klab.sjtu.edu.cn/iBLUP/). Step 1. To assign raw sequencing reads to individuals. Step 2. To filter reads on quality. Step 3. To map qualified reads to reference genome. The qualified reads are clustered by aligning reads with the reference genome using BWA [25] by the following steps. We first mapped all filtered reads to the reference panel and then attempted to divide remaining single reads into two or three shorter reads according to the sequences of enzyme cutting sites to align them individually with the reference genome because of uncertain ligation, such as adapter-DNA-DNA-adapter. We then used the sliding window method to query the remaining reads to ensure that we can make use of the incomplete reference panel. The rule of the sliding window is that the selected 25 uninterrupted bases from the first base at the 5′ end of a read was aligned with the reference genome and a single base was added at each alignment until the maximum aligned sequence was reached. If the first 25 bp at the 5′ end were not aligned successfully, the next 25 bp, i.e. base pairs 2–26, were aligned with the reference genome and so on. Step 4. To call genotype for each marker and individual. Reads that aligned with the reference panel were stored as “sam” files. Our iBLUP applied SAMtools to discover SNPs [26]. Step 5. To impute missing genotypes by iBLUP.

4 Simulation data

There were 3220 individuals genotyped on 9990 markers in the 15th QTL-MAS workshop [27]. The 9,990 SNP markers were evenly distributed on 5 chromosomes. Each chromosome had a size of 1 Morgan and carried 1998 SNPs equally distributed (1 SNP every 0.05 cM). The 3220 individuals were from two generations, of which 220 individuals (200 females and 20 males) are parents, and the remaining 3000 individuals are offspring to be divided into 200 full-sib families consisting of 15 progeny per dam, which were generated by regular cross-hybridization of male and female parents (See a). All the genotype of 9,990 SNPs on the 3220 individuals are known. Subset of individuals were sampled from the workshop data under two sampling schema: 1) Half sib schema. The sampled data included all the parents (20 males and 200 females) and two progeny selected randomly from each full-sib family. This schema sampled more families with smaller family size (See b). 2) Full sib schema. The sample data included 5 males, the corresponding mates and eleven progeny from each full sib family. This schema sampled fewer families with larger family size (See c). A subset of markers were sample from the entire genetic markers (9,990) to investigate the effect of marker density. One of marker was selected for every five adjacent five markers. The sampled marker data set contained 1998 markers. The known genotype data were randomly masked as missing data at specific proportions. The proportions were ranged from 10% to 80%. Accuracy was calculated as Pearson correlation coefficient between known genotype and imputed.

5 Real sequencing data

The data were generated from an Illumina High-seq 2000 sequencer. A flow-cell lane was used to sequence 72 pigs (36 Landrace pigs and 36 Large White pigs) by using a DNA barcoding and genome reducing protocol (http://klab.sjtu.edu.cn/GGRS/). There were 380,971,530 raw reads. The number of reads per individual ranged from 1,570,923 to 10,077,526 and the average was 5,022,387.

Results

We were motivated to develop a non-phasing algorithm [28] in 5a multivariate mixed model (M-MM) [22]. To take full advantage of a M-MM to fully incorporate both LD and IBD simultaneously, we made two major changes to enhance the representations of marker IBD information on the relationship matrix (K) among individuals, and marker LD information on the covariance matrix (G) of underlying variables (See ). First, we replaced overall IBD derived from the pedigree by the IBD derived from the markers. We developed an iterative algorithm to derive a robust IBD to situations with missing genotypes. Second, instead of using an arbitrary fixed size of LD block (e.g. two mega base pair), we implemented an optimization process to determine the LD threshold that to determine a variable size of LD block. The value of LD was represented as the squared correlation coefficient (r 2) calculated for the markers on the LD block. Our improved method of imputation by Best Linear Unbiased Prediction (iBLUP) had markedly higher accuracy than the conventional M-MM method, even higher than the sophisticated graphic phasing method (BEAGLE), especially for situations with high missing rate.
Figure 1

The mechanism and performance of iBLUP.

The figure illustrates how observed genotype with missing value is imputed by Best Linear Unbiased Prediction (BLUP). The imputation uses both relationship among markers represented by Linkage Disequilibrium (LD), and relationship among individuals represented as Identity By Decent (IBD).G and K are the genetic variance–covariance matrix and marker-based kinship matrix respectively, and the symbol represents the Kronecker product.

The mechanism and performance of iBLUP.

The figure illustrates how observed genotype with missing value is imputed by Best Linear Unbiased Prediction (BLUP). The imputation uses both relationship among markers represented by Linkage Disequilibrium (LD), and relationship among individuals represented as Identity By Decent (IBD).G and K are the genetic variance–covariance matrix and marker-based kinship matrix respectively, and the symbol represents the Kronecker product. The performance of iBLUP was compared to three other types of commonly used methods, M-MM, BEAGLE and fastPHASE on a data set from 15thQTLMAS. The iBLUP method outperformed over M-MM at all range of missing rates. When missing genotypes were below 40%, iBLUP had similar accuracy to BEAGLE and fastPHASE. With higher missing rates, iBLUP markedly outperformed BEAGLE and fastPHASE. At a missing rate of 50%, the accuracy of fastPHASE dropped to 0.79 while iBLUP retained an accuracy of 0.98. At a missing rate of 70%, the accuracy of BEAGLE fell to 0.82 while iBLUP held an accuracy of 0.95 ( ).
Table 1

The comparison of four genotype imputation methods: iBLUP, BEAGLE, M-MM and fastPHASE.

Method10%20%30%40%50%60%70%80%
iBLIUP0.990.990.990.990.980.970.950.92
BEAGLE0.990.990.990.990.990.950.820.76
M-MM0.900.890.890.870.870.850.820.71
fastPHASE0.990.990.990.990.790.760.740.69
It is critical to dissect overall accuracy across all genotypes into major and minor allele genotypes. The major genotypes can be accurately imputed for rare variants if the accuracy of minor allele is ignored. The iBLUP method is superior to BEAGLE and fastPHASE not only on overall accuracy, but also for minor allele genotypes. When missing rate was 60%, the accuracy of fastPHASE dropped to −0.01, iBLUP kept an accuracy of 0.72 for minor allele genotypes. At missing rate of 70%, the accuracy of BEAGLE dropped to −0.1, while iBLUP still retained an accuracy of 0.61 for minor allele genotypes (Table 2). *Genetic markers were classified into three categories based on Minor Allele Frequency (MAF). The cutoffs of MAF were 5% and 25%. Known genotypes were masked as missing at three different rates: 60%, 70% and 80%. Three imputation methods (BEAGLE, fastPHASE and iBLUP) were used to impute the masked genotypes. Accuracy was calculated as Pearson correlation coefficient between known genotype and imputed. Three subset of genotypes were examined: 1) Genotypes with major allele (Major), including homozygous of major allele and heterozygous; 2) Genotypes with minor allele (Minor), including homozygous of minor allele and heterozygous; 3) Genotypes of two homozygous and heterozygous (All). We expanded our examination over a variety of circumstances. First we examined the responses of imputation accuracy to the level of kinship among individuals. Two subsets of the data from the QTLMAS 15th dataset were used for the examination. The two datasets contained all the available markers with the average LD value of 0.137, but varied on family structure. The first dataset consisted of a family structure of two full-sib individuals sampled from each family and the second dataset consisted of a family structure of parents and their eleven progeny. The average kinship coefficients were 0.0073 and 0.048 for the first and second family structures, respectively. In both cases, iBLUP had better imputation accuracy than BEAGLE and fastPHASE at missing rates ranged from 60% to 80% ( ).
Table 3

Responses of imputation accuracy on marker density and individual relationship*.

Missingrate60%70%80%
iBLUPBEAGLEfastPHASEiBLUPBEAGLEfastPHASEiBLUPBEAGLEfastPHASE
Sibsa Half0.97±0.00060.95±0.00020.76±0.01630.95±0.00050.82±0.00040.74±0.00810.92±0.00020.76±0.00020.69±0.0041
Full0.97±0.00030.96±0.00020.79±0.01130.96±0.00060.83±0.00040.75±0.00760.94±0.00070.77±0.00020.72±0.0046
Densityb High0.97±0.00060.95±0.00020.76±0.01630.95±0.00050.82±0.00040.74±0.00810.92±0.00020.76±0.00020.69±0.0041
Low0.90±0.00060.85±0.00050.76±0.01220.87±0.00040.78±0.00030.73±0.00930.83±0.00070.75±0.00060.71±0.0062

*The full dataset from 15th QTL-MAS workshop was sampled on individual relationship and marker density. The full dataset contains 3220 individuals genotyped with 9990 markers. The 3220 individual include 20 sires, 200 dams (10 dam per sire), and 3000 progeny (15 progeny per dam) as displayed in . The full population were randomly sampled to form two sub populations, one with individuals more related each other (full sibs see ) and the other with individuals less related each other (half sibs, see ). The known genotypes were randomly masked as missing at three different rates: 60%, 70%, and 80%. Two imputation methods (BEAGLE, fastPHASE and iBLUP) were used to impute the masked genotypes. Accuracy was calculated as Pearson correlation coefficient between known genotype and imputed. The sampling of missing genotypes was repeated ten times. The average and standard error of imputation accuracy are reported in the table.

All the genetic markers were used to evaluate the responses of imputation accuracy on individual relationship, i.e. half sib vs. full sibs population.

The half sib population was used to evaluate the responses of imputation accuracy on marker density. Two levels of marker density were examined. The high level marker density contained all the available markers (9990). The low density contained one fifth of the total available markers which are sampled evenly (choosing one out of every five adjacent markers).

*The full dataset from 15th QTL-MAS workshop was sampled on individual relationship and marker density. The full dataset contains 3220 individuals genotyped with 9990 markers. The 3220 individual include 20 sires, 200 dams (10 dam per sire), and 3000 progeny (15 progeny per dam) as displayed in . The full population were randomly sampled to form two sub populations, one with individuals more related each other (full sibs see ) and the other with individuals less related each other (half sibs, see ). The known genotypes were randomly masked as missing at three different rates: 60%, 70%, and 80%. Two imputation methods (BEAGLE, fastPHASE and iBLUP) were used to impute the masked genotypes. Accuracy was calculated as Pearson correlation coefficient between known genotype and imputed. The sampling of missing genotypes was repeated ten times. The average and standard error of imputation accuracy are reported in the table. All the genetic markers were used to evaluate the responses of imputation accuracy on individual relationship, i.e. half sib vs. full sibs population. The half sib population was used to evaluate the responses of imputation accuracy on marker density. Two levels of marker density were examined. The high level marker density contained all the available markers (9990). The low density contained one fifth of the total available markers which are sampled evenly (choosing one out of every five adjacent markers). Second, we examined the effect of markers density on imputation accuracy. The half sib family structure described above was used with two set of markers. One set contained all the available markers (9990 SNPs) with average LD of 0.137 and the other contained one fifth markers (1998 SNPs) with average LD of 0.092.Compared to BEAGLE and fastPHASE, iBLUP performed higher imputation accuracy at missing rate ranged from 60% to 80% in both cases ( ). We implemented the iBLUP algorithm in a publicly available pipeline also named iBLUP. The imputation step can be used independently for any genotype data, including the ones from DNA chips. The imputation step can also be used for raw sequencing data after four prior steps in iBLUP pipeline ( ).
Figure 2

Diagram of iBLUP pipeline.

(1) Blended raw data were generated from the same flow cell lane. (2) Raw data were assigned to individuals according to the barcode. (3) Assigned reads were filtered for high quality reads according to several rules, including trimming the barcode and the last low quality base etc. (4) Filtered reads were aligned with the reference sequence. (5) SNP calling and genotyping were done according to the mapping results. (6) Missing genotypes were imputed by the iBLUP algorithm.

Diagram of iBLUP pipeline.

(1) Blended raw data were generated from the same flow cell lane. (2) Raw data were assigned to individuals according to the barcode. (3) Assigned reads were filtered for high quality reads according to several rules, including trimming the barcode and the last low quality base etc. (4) Filtered reads were aligned with the reference sequence. (5) SNP calling and genotyping were done according to the mapping results. (6) Missing genotypes were imputed by the iBLUP algorithm. The iBLUP pipeline provides users the option to optimize the LD threshold to determine the extent of the LD block. We examined the imputation accuracy with LD thresholds of 0.05, 0.1 and 0.2 on the QTLMAS 15th dataset at missing rate of 30% (). The analysis showed that an LD threshold of 0.1 would achieve the highest imputation accuracy. Interestingly, this threshold was also observed as the optimum value of LD threshold for the pig sequencing data. This observation might help narrow the optimization range for LD to reduce computing cost in other experiments. We applied the iBLUP pipeline to sequencing data from a pig outbreed population for high-density SNP discovery and genotyping. The sequences were collected in one lane of a single flow-cell at 72-plex by a genome reducing and sequencing protocol (http://klab.sjtu.edu.cn/GGRS/). There were 36% of missing data among 403,928 SNPs called. The accuracy of imputation is 97% for iBLUP and 92% for BEAGLE. In order to make a comparison of imputation accuracy between iBLUP and BEAGLE, known genotypes were masked as missing at four other different rates: 50%, 60%, 70% and 80%. The imputation accuracy decreased with the increase of missing rates for both methods. The iBLUP method outperformed over BEAGLE at all range of missing rates for the real sequencing data ( ).
Table 4

Imputation accuracy of real pig sequencing data.

Method36%50%60%70%80%
iBLIUP0.970.970.960.960.95
BEAGLE0.920.920.910.910.91

Discussion

Missing genotype imputation is a critical process between sequencing and utilization for GWAS and genomic prediction [29]–[31]. Imputation accuracy relies on how well LD and IBD information are incorporated. IBD information is widely used in population and quantitative genetics. It is traditionally calculated from pedigree [32]. An alternative way to estimating IBD coefficients is from genetic markers [33]. This marker-based IBD more accurately specifies the actual difference between individuals and is superior to the pedigree kinship for genome-wide association studies [34]. The similar advantage was brought to genotype imputation in this study. The accuracy improvement of iBLUP also relate to the optimization to determine the LD block. We demonstrate that it was critical to have an appropriate LD block for imputation. Too broad or too narrow LD blocks would lead to the information dilution (). The best LD block size can be determined by the optimization process in iBLUP. The suggested LD threshold of 0.1 can be used to save computing time or a starting value of optimization. The tolerance of iBLUP to high missing rate makes it possible to gain markers at high density. Take the pig data for example, haplotype blocks are about 10 kb within pig breed [35]. We need to identify markers that cover around 300,000 genomic locations at least for the GWAS or GS studies (one SNP per haplotype block). However, the commonly used pig DNA chip (PorcineSNP60) only contains 60,000 SNPs [36]. In the present pig sequencing experiment, we only use one lane of flow cell for 72-plex. After imputation of 36% of missing genotypes, we gained more than 403,928 SNPs, which has much better coverage than the commonly used chip. One of the limitations of our proposed iBLUP is the computing speed for large sample size. When the sample size is medium (<300), the computing speed of iBLUP can compare with BEAGLE. Take the real 72 pig sequencing data (403,928 SNPs) for example, it takes about 20 minutes to perform imputation for both iBLUP and BEAGLE; 18 hours for fastPHASE. When the sample size is large, iBLUP will take more time than BEAGLE. To improve the computation speed, our iBLUP software can be run in parallel. Recently, factored spectrally transformed linear mixed models has been developed to improve the computing speed of genome-wide association studies [37], [38]. The idea can be applied in the iBLUP algorithm to improve the computing speed in the future. A comprehensive package is provided at iBLUP website, including executable programs on multiple computing platforms (Linux and Windows) and demonstration data. The usage of iBLUP would boost imputation accuracy, especially for high missing genotypes and rare variants. Consequently it would lead to a better understanding the genetic architecture of complex traits in multiple organisms. The scheme of sampling Individuals. The top panel (a) is the complete pedigree of the 15thQTLMAS workshop data [27]with 20 sires. Each Sire (S) mated with 10 Dams (D). Each dam produced 15 Progeny (P). All individuals are named randomly with sequential number. The first subscript indicates sire, the second indicates dam the third indicates progeny. The total numbers of individuals within each category are labeled on the fight column. The middle panel (b) keeps all the sires and dams. The difference (highlighted in red) is that each sire-dam family keeps only the first two progeny. This scheme has more families (all) and less progeny within family. As half sib is the major relationship among individuals, this scheme is named half sib scheme. The bottom panel (c) keeps the first 5 sires and their mates from panel a. Each sire-dam family keeps eleven progeny. This scheme has fewer families but more progeny within family. As full sib is the major relationship among individuals, this scheme is named full sib scheme. (TIF) Click here for additional data file. Impact of linkage disequilibrium threshold. The Linkage Disequilibrium (LD) was calculated as the squared correlation coefficient. The adjacent markers with LD above the threshold were considered as a LD block to perform imputation. The evaluation was performed on subset of the 15th QTLMAS common dataset by using the half-sib sampling scheme described in . A total of 100 replications were conducted and the imputation accuracy is the average of 100 replications. (TIF) Click here for additional data file. Additional information of the 72 pigs that were sequenced. (DOCX) Click here for additional data file.
Table 2

Imputation accuracy characterized by minor allele*.

60%70%80%
GenotypeMAFiBLUPBEAGLEfastPHASEiBLUPBEAGLEfastPHASEiBLUPBEAGLEfastPHASE
Major<5%1.00±0.00021.00±0.00000.99±0.0091.00±0.00010.99±0.00000.99±0.00051.00±0.00020.99±0.00000.99±0.0003
5–25%0.96±0.00050.94±0.00020.76±0.01570.94±0.00020.88±0.00010.73±0.01120.91±0.00030.87±0.00000.70±0.0081
>25%0.89±0.00110.88±0.00040.42±0.03110.85±0.00110.68±0.00080.37±0.02210.76±0.00090.64±0.0010.30±0.014
All0.97±0.00030.96±0.00010.79±0.01340.96±0.00020.89±0.00020.77±0.00960.93±0.00020.87±0.00020.74±0.0063
Minor<5%0.10±0.0013−0.07±0.0027−0.04±0.00950.05±0.0015−0.10±0.0008−0.05±0.00740.00±0.0018−0.11±0.0008−0.06±0.0062
5–25%0.52±0.00290.27±0.0018−0.04±0.02730.38±0.0017−0.24±0.0011−0.08±0.01940.20±0.0014−0.30±0.0001−0.13±0.0113
>25%0.77±0.00190.7±0.00130.11±0.03890.67±0.0018−0.07±0.0030.06±0.02880.51±0.0016−0.41±0.00170.01±0.0195
All0.72±0.00210.59±0.0011−0.01±0.01710.61±0.0017−0.1±0.00240.03±0.0260.44±0.0015−0.38±0.0013−0.01±0.0171
All0.97±0.00030.95±0.00020.76±0.01640.95±0.00030.82±0.00040.74±0.01190.92±0.00020.76±0.00000.69±0.0079

*Genetic markers were classified into three categories based on Minor Allele Frequency (MAF). The cutoffs of MAF were 5% and 25%. Known genotypes were masked as missing at three different rates: 60%, 70% and 80%. Three imputation methods (BEAGLE, fastPHASE and iBLUP) were used to impute the masked genotypes. Accuracy was calculated as Pearson correlation coefficient between known genotype and imputed. Three subset of genotypes were examined: 1) Genotypes with major allele (Major), including homozygous of major allele and heterozygous; 2) Genotypes with minor allele (Minor), including homozygous of minor allele and heterozygous; 3) Genotypes of two homozygous and heterozygous (All).

  35 in total

1.  The mystery of missing heritability: Genetic interactions create phantom heritability.

Authors:  Or Zuk; Eliana Hechter; Shamil R Sunyaev; Eric S Lander
Journal:  Proc Natl Acad Sci U S A       Date:  2012-01-05       Impact factor: 11.205

2.  Improved linear mixed models for genome-wide association studies.

Authors:  Jennifer Listgarten; Christoph Lippert; Carl M Kadie; Robert I Davidson; Eleazar Eskin; David Heckerman
Journal:  Nat Methods       Date:  2012-05-30       Impact factor: 28.547

3.  Simple and efficient analysis of disease association with missing genotype data.

Authors:  D Y Lin; Y Hu; B E Huang
Journal:  Am J Hum Genet       Date:  2008-02       Impact factor: 11.025

4.  Increased accuracy of artificial selection by using the realized relationship matrix.

Authors:  B J Hayes; P M Visscher; M E Goddard
Journal:  Genet Res (Camb)       Date:  2009-02       Impact factor: 1.588

5.  Personal genomes: The case of the missing heritability.

Authors:  Brendan Maher
Journal:  Nature       Date:  2008-11-06       Impact factor: 49.962

6.  Multiplex DNA sequencing.

Authors:  G M Church; S Kieffer-Higgins
Journal:  Science       Date:  1988-04-08       Impact factor: 47.728

7.  Common SNPs explain a large proportion of the heritability for human height.

Authors:  Jian Yang; Beben Benyamin; Brian P McEvoy; Scott Gordon; Anjali K Henders; Dale R Nyholt; Pamela A Madden; Andrew C Heath; Nicholas G Martin; Grant W Montgomery; Michael E Goddard; Peter M Visscher
Journal:  Nat Genet       Date:  2010-06-20       Impact factor: 38.330

8.  A second generation human haplotype map of over 3.1 million SNPs.

Authors:  Kelly A Frazer; Dennis G Ballinger; David R Cox; David A Hinds; Laura L Stuve; Richard A Gibbs; John W Belmont; Andrew Boudreau; Paul Hardenbol; Suzanne M Leal; Shiran Pasternak; David A Wheeler; Thomas D Willis; Fuli Yu; Huanming Yang; Changqing Zeng; Yang Gao; Haoran Hu; Weitao Hu; Chaohua Li; Wei Lin; Siqi Liu; Hao Pan; Xiaoli Tang; Jian Wang; Wei Wang; Jun Yu; Bo Zhang; Qingrun Zhang; Hongbin Zhao; Hui Zhao; Jun Zhou; Stacey B Gabriel; Rachel Barry; Brendan Blumenstiel; Amy Camargo; Matthew Defelice; Maura Faggart; Mary Goyette; Supriya Gupta; Jamie Moore; Huy Nguyen; Robert C Onofrio; Melissa Parkin; Jessica Roy; Erich Stahl; Ellen Winchester; Liuda Ziaugra; David Altshuler; Yan Shen; Zhijian Yao; Wei Huang; Xun Chu; Yungang He; Li Jin; Yangfan Liu; Yayun Shen; Weiwei Sun; Haifeng Wang; Yi Wang; Ying Wang; Xiaoyan Xiong; Liang Xu; Mary M Y Waye; Stephen K W Tsui; Hong Xue; J Tze-Fei Wong; Luana M Galver; Jian-Bing Fan; Kevin Gunderson; Sarah S Murray; Arnold R Oliphant; Mark S Chee; Alexandre Montpetit; Fanny Chagnon; Vincent Ferretti; Martin Leboeuf; Jean-François Olivier; Michael S Phillips; Stéphanie Roumy; Clémentine Sallée; Andrei Verner; Thomas J Hudson; Pui-Yan Kwok; Dongmei Cai; Daniel C Koboldt; Raymond D Miller; Ludmila Pawlikowska; Patricia Taillon-Miller; Ming Xiao; Lap-Chee Tsui; William Mak; You Qiang Song; Paul K H Tam; Yusuke Nakamura; Takahisa Kawaguchi; Takuya Kitamoto; Takashi Morizono; Atsushi Nagashima; Yozo Ohnishi; Akihiro Sekine; Toshihiro Tanaka; Tatsuhiko Tsunoda; Panos Deloukas; Christine P Bird; Marcos Delgado; Emmanouil T Dermitzakis; Rhian Gwilliam; Sarah Hunt; Jonathan Morrison; Don Powell; Barbara E Stranger; Pamela Whittaker; David R Bentley; Mark J Daly; Paul I W de Bakker; Jeff Barrett; Yves R Chretien; Julian Maller; Steve McCarroll; Nick Patterson; Itsik Pe'er; Alkes Price; Shaun Purcell; Daniel J Richter; Pardis Sabeti; Richa Saxena; Stephen F Schaffner; Pak C Sham; Patrick Varilly; David Altshuler; Lincoln D Stein; Lalitha Krishnan; Albert Vernon Smith; Marcela K Tello-Ruiz; Gudmundur A Thorisson; Aravinda Chakravarti; Peter E Chen; David J Cutler; Carl S Kashuk; Shin Lin; Gonçalo R Abecasis; Weihua Guan; Yun Li; Heather M Munro; Zhaohui Steve Qin; Daryl J Thomas; Gilean McVean; Adam Auton; Leonardo Bottolo; Niall Cardin; Susana Eyheramendy; Colin Freeman; Jonathan Marchini; Simon Myers; Chris Spencer; Matthew Stephens; Peter Donnelly; Lon R Cardon; Geraldine Clarke; David M Evans; Andrew P Morris; Bruce S Weir; Tatsuhiko Tsunoda; James C Mullikin; Stephen T Sherry; Michael Feolo; Andrew Skol; Houcan Zhang; Changqing Zeng; Hui Zhao; Ichiro Matsuda; Yoshimitsu Fukushima; Darryl R Macer; Eiko Suda; Charles N Rotimi; Clement A Adebamowo; Ike Ajayi; Toyin Aniagwu; Patricia A Marshall; Chibuzor Nkwodimmah; Charmaine D M Royal; Mark F Leppert; Missy Dixon; Andy Peiffer; Renzong Qiu; Alastair Kent; Kazuto Kato; Norio Niikawa; Isaac F Adewole; Bartha M Knoppers; Morris W Foster; Ellen Wright Clayton; Jessica Watkin; Richard A Gibbs; John W Belmont; Donna Muzny; Lynne Nazareth; Erica Sodergren; George M Weinstock; David A Wheeler; Imtaz Yakub; Stacey B Gabriel; Robert C Onofrio; Daniel J Richter; Liuda Ziaugra; Bruce W Birren; Mark J Daly; David Altshuler; Richard K Wilson; Lucinda L Fulton; Jane Rogers; John Burton; Nigel P Carter; Christopher M Clee; Mark Griffiths; Matthew C Jones; Kirsten McLay; Robert W Plumb; Mark T Ross; Sarah K Sims; David L Willey; Zhu Chen; Hua Han; Le Kang; Martin Godbout; John C Wallenburg; Paul L'Archevêque; Guy Bellemare; Koji Saeki; Hongguang Wang; Daochang An; Hongbo Fu; Qing Li; Zhen Wang; Renwu Wang; Arthur L Holden; Lisa D Brooks; Jean E McEwen; Mark S Guyer; Vivian Ota Wang; Jane L Peterson; Michael Shi; Jack Spiegel; Lawrence M Sung; Lynn F Zacharia; Francis S Collins; Karen Kennedy; Ruth Jamieson; John Stewart
Journal:  Nature       Date:  2007-10-18       Impact factor: 49.962

Review 9.  Invited review: Genomic selection in dairy cattle: progress and challenges.

Authors:  B J Hayes; P J Bowman; A J Chamberlain; M E Goddard
Journal:  J Dairy Sci       Date:  2009-02       Impact factor: 4.034

10.  A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species.

Authors:  Robert J Elshire; Jeffrey C Glaubitz; Qi Sun; Jesse A Poland; Ken Kawamoto; Edward S Buckler; Sharon E Mitchell
Journal:  PLoS One       Date:  2011-05-04       Impact factor: 3.240

View more
  12 in total

1.  An Efficient Genotyping Method in Chicken Based on Genome Reducing and Sequencing.

Authors:  Rongrong Liao; Zhen Wang; Qiang Chen; Yingying Tu; Zhenliang Chen; Qishan Wang; Changsuo Yang; Xiangzhe Zhang; Yuchun Pan
Journal:  PLoS One       Date:  2015-08-27       Impact factor: 3.240

2.  A genome-wide scan for selection signatures in Yorkshire and Landrace pigs based on sequencing data.

Authors:  Zhen Wang; Qiang Chen; Yumei Yang; Hongjie Yang; Pengfei He; Zhe Zhang; Zhenliang Chen; Rongrong Liao; Yingying Tu; Xiangzhe Zhang; Qishan Wang; Yuchun Pan
Journal:  Anim Genet       Date:  2014-10-19       Impact factor: 3.169

3.  Construction of relatedness matrices using genotyping-by-sequencing data.

Authors:  Ken G Dodds; John C McEwan; Rudiger Brauning; Rayna M Anderson; Tracey C van Stijn; Theodor Kristjánsson; Shannon M Clarke
Journal:  BMC Genomics       Date:  2015-12-09       Impact factor: 3.969

4.  Genotyping-by-sequencing provides the discriminating power to investigate the subspecies of Daucus carota (Apiaceae).

Authors:  Carlos I Arbizu; Shelby L Ellison; Douglas Senalik; Philipp W Simon; David M Spooner
Journal:  BMC Evol Biol       Date:  2016-10-28       Impact factor: 3.260

5.  Genetic diversity and population structure of six Chinese indigenous pig breeds in the Taihu Lake region revealed by sequencing data.

Authors:  Z Wang; Q Chen; Y Yang; R Liao; J Zhao; Z Zhang; Z Chen; X Zhang; M Xue; H Yang; Y Zheng; Q Wang; Y Pan
Journal:  Anim Genet       Date:  2015-09-16       Impact factor: 3.169

6.  Haplotype-based genome-wide association study identifies loci and candidate genes for milk yield in Holsteins.

Authors:  Zhenliang Chen; Yunqiu Yao; Peipei Ma; Qishan Wang; Yuchun Pan
Journal:  PLoS One       Date:  2018-02-15       Impact factor: 3.240

7.  Identifying Genetic Differences Between Dongxiang Blue-Shelled and White Leghorn Chickens Using Sequencing Data.

Authors:  Qing-Bo Zhao; Rong-Rong Liao; Hao Sun; Zhe Zhang; Qi-Shan Wang; Chang-Suo Yang; Xiang-Zhe Zhang; Yu-Chun Pan
Journal:  G3 (Bethesda)       Date:  2018-02-02       Impact factor: 3.154

8.  Genomic Prediction and Association Mapping of Curd-Related Traits in Gene Bank Accessions of Cauliflower.

Authors:  Patrick Thorwarth; Eltohamy A A Yousef; Karl J Schmid
Journal:  G3 (Bethesda)       Date:  2018-02-02       Impact factor: 3.154

9.  Exploring the Structure of Haplotype Blocks and Genetic Diversity in Chinese Indigenous Pig Populations for Conservation Purpose.

Authors:  Qing-Bo Zhao; Hao Sun; Zhe Zhang; Zhong Xu; Babatunde Shittu Olasege; Pei-Pei Ma; Xiang-Zhe Zhang; Qi-Shan Wang; Yu-Chun Pan
Journal:  Evol Bioinform Online       Date:  2019-01-23       Impact factor: 1.625

10.  Impact of imputation methods on the amount of genetic variation captured by a single-nucleotide polymorphism panel in soybeans.

Authors:  A Xavier; William M Muir; Katy M Rainey
Journal:  BMC Bioinformatics       Date:  2016-02-02       Impact factor: 3.169

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.