| Literature DB >> 29995946 |
Daria V Zhernakova1,2, Sergei Kliver1, Nikolay Cherkasov1, Gaik Tamazian1, Mikhail Rotkevich1, Ksenia Krasheninnikova1, Igor Evsyukov1, Sviatoslav Sidorov1, Pavel Dobrynin1, Andrey A Yurchenko1, Valentin Shimansky1, Irina V Shcherbakova3, Andrey S Glotov3, David L Valle4, Minzhong Tang5, Emilia Shin6, Kathleen B Schwarz6, Stephen J O'Brien1,7.
Abstract
A comparative analysis of whole genome sequencing (WGS) and genotype calling was initiated for ten human genome samples sequenced by St. Petersburg State University Peterhof Sequencing Center and by three commercial sequencing centers outside of Russia. The sequence quality, efficiency of DNA variant and genotype calling were compared with each other and with DNA microarrays for each of ten study subjects. We assessed calling of SNPs, indels, copy number variation, and the speed of WGS throughput promised. Twenty separate QC analyses showed high similarities among the sequence quality and called genotypes. The ten genomes tested by the centers included eight American patients afflicted with autoimmune hepatitis (AIH), plus one case's unaffected parents, in a prelude to discovering genetic influences in this rare disease of unknown etiology. The detailed internal replication and parallel analyses allowed the observation of two of eight AIH cases carrying a rare allele genotype for a previously described AIH-associated gene (FTCD), plus multiple occurrences of known HLA-DRB1 alleles associated with AIH (HLA-DRB1-03:01:01, 13:01:01 and 7:01:01). We also list putative SNVs in other genes as suggestive in AIH influence.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29995946 PMCID: PMC6040705 DOI: 10.1371/journal.pone.0200423
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Sample description.
| Sample | Diagnosis | Gender | Ethnicity | Age at biopsy/diagnosis | Sequenced by | ||
|---|---|---|---|---|---|---|---|
| M | I | P | |||||
| trio_mother | Healthy | F | EA | NA | + | + | + |
| trio_father | Healthy | M | EA | NA | + | + | + |
| trio_case1 | AIH-type II | F | EA | 19 months | + | + | + |
| case2 | AIH-type I | F | EA | 6 years | + | + | - |
| case3 | AIH-type I | F | EA | 20 months | + | + | - |
| case4 | AIH-type I | F | IA | 11 years | + | + | + |
| case5 | AIH-type I | F | EA | 15 years | + | + | + |
| case6 | AIH-type I | M | AA | 8 years | + | + | + |
| case7 | AIH-type I | M | EA | 17 years | + | + | - |
| case8 | AIH-type I | M | EA | 12 years | + | + | - |
Phenotype information for 10 samples under study. Last four columns show whether a sample was sequenced at the corresponding sequencing center (+) or not (-).
*EA—European American; IA—Native American; AA—African American
Sequencing centers:
**M—Macrogen-X10; I—Illumina-X10; P—Peterhof-HiSeq4000
Fig 1Raw read quality control parameters.
Raw sequence read QC parameters are shown for three sequencing centers (colored differently).
Comparison of sequencing results (N = 17 parameters).
| Parameter | Macrogen-X10 | Illumina-X10 | Peterhof-HiSeq4000 | |
|---|---|---|---|---|
| Sequencing strategy | Library preparation kit | Illumina TruSeq DNA PCR-Free | Illumina TruSeq DNA PCR-Free | Illumina TruSeq DNA PCR-Free |
| Insert size | 300–400 bp | 450 bp | 400 bp | |
| Read length | 151bp, paired-end | 151bp, paired-end | 150bp, paired-end | |
| Raw read QC | Estimated mean coverage | 31.685 | 36 | 32 |
| Variance coefficient of coverage | 0.245 | 0.28 | 0.27 | |
| Fraction of read pairs with both reads retained after filtration | 0.989 | 0.986 | 0.981 | |
| Fraction of kmers with errors | 0.076 | 0.068 | 0.069 | |
| Fraction of read pairs without adapters or Ns | 0.994 | 0.994 | 0.998 | |
| Mapping QC | Reads before mapping | 812,203,657 | 834,018,799 | 912,695,503 |
| Percentage of mapped reads | 97.85% | 97.14% | 97.43% | |
| Variant QC | Number of SNVs | 3956042 | 3971375 | 3552604 |
| % of novel SNVs | 2.01% | 2.05% | 1.64% | |
| Number of indels | 459983 | 708225 | 335164 | |
| # Multiallelic sites | 30180 | 122066 | 14031 | |
| Mendel errors | 0.58% | 0.30% | 0.27% | |
| Genotype concordance with microarray | 96.80% | 96.88% | 96.67% |
Main parameters used for comparison of sequencing centers are presented in this table. These and additional parameters can also be found in S1–S3 Tables. All sequenced samples were used in this comparison.
Fig 2Genotype comparison.
(A) Concordance of WGS genotypes with microarray genotypes. The concordance was estimated based on the trio data as the ratio of microarray SNPs with identical genotypes in WGS results. (B) Comparison of the three WGS datasets between each other in terms of precision, sensitivity and F-measure for pairwise comparisons. Color legend is given on the top right. (C) Concordance of genotypes in the three WGS datasets for all variants, SNPs and indels. Color legend is given on the top right.
HLA-DRB1 and FTCD genotypes.
| sample id | |||
|---|---|---|---|
| trio_case1 | 12:01:01 | 2/2 | |
| trio_father | 10:01:01 | 12:01:01 | 1/1 |
| trio_mother | 08:01:01 | 1/2 | |
| case2 | 1/2 | ||
| case3 | 15:01:01 | 1/1 | |
| case4 | 15:01:01 | 1/1 | |
| case5 | 11:01:02 | 15:01:01 | 1/1 |
| case6 | 04:05:01 | 15:03:01 | 1/1 |
| case7 | 1/1 | ||
| case8 | 15:01:01 | 14:54:01 | 1/1 |
HLA-DRB1 and FTCD G>GC insertion genotypes are shown for all samples. HLA-DRB1 are given based on molecular typing or Illumina-X10 data when molecular typing results were not available. Alleles associated with AIH are shown in bold.