| Literature DB >> 26092298 |
Yu S Huang1,2, Vasily Ramensky1, Susan K Service1, Anna J Jasinska1,3, Yoon Jung1, Oi-Wa Choi1, Rita M Cantor4, Nikoleta Juretic5, Jessica Wasserscheid5, Jay R Kaplan6, Matthew J Jorgensen6, Thomas D Dyer7, Ken Dewar5, John Blangero7, Richard K Wilson8, Wesley Warren8, George M Weinstock9, Nelson B Freimer10.
Abstract
BACKGROUND: We report here the first genome-wide high-resolution polymorphism resource for non-human primate (NHP) association and linkage studies, constructed for the Caribbean-origin vervet monkey, or African green monkey (Chlorocebus aethiops sabaeus), one of the most widely used NHPs in biomedical research. We generated this resource by whole genome sequencing (WGS) of monkeys from the Vervet Research Colony (VRC), an NIH-supported research resource for which extensive phenotypic data are available.Entities:
Mesh:
Year: 2015 PMID: 26092298 PMCID: PMC4494155 DOI: 10.1186/s12915-015-0152-2
Source DB: PubMed Journal: BMC Biol ISSN: 1741-7007 Impact factor: 7.431
Fig. 1Using the variant data from the WGS of the trio shown on the left, we evaluated three different down-sampling schemes, drawn on the right, to determine a pedigree-wide strategy for selecting monkeys for medium (4X) or low (1X) sequencing coverage. a The frequency of Mendelian errors in a trio increases in all three down-sampling experiments compared to the original data; however, the increase in error rate is greatest when both parents are low coverage and the child is medium coverage. b The percentage of concordant genotype calls between original data and down-sampled data is lowest when both parents are low coverage and the child is medium coverage. The percentages shown for both the rate of Mendelian inconsistency and for genotype concordance represent averages over three down-sampling experiments
Fig. 2Boxplot of actual sequencing depth versus planned sequencing depth for the 725 monkeys in the WGS study
Filtering and QC procedures in Stage 1: identifying unequivocal segregating sites. Stage 1 started with 13,550,322 sites and after QC ended with 4,235,761 sites
| QC filtering procedure | Number of variants removed |
|---|---|
| Multi-allelic or multi-nucleotide | 1,110,071 |
| Cumulative coverage outside of twofold range of global median coverage | 1,158,822 |
| MAF in 17 monkeys <25 % | 6,859,481 |
| >0 % missing data | 164,781 |
| Within 5 bp of another site | 21,406 |
| TOTAL | 9,314,561 |
Filtering and QC procedures in Stage 2: calling genotypes in all 725 monkeys at the unequivocal segregating sites identified in Stage 1. Stage 2 started with 4,235,761 sites and ended with 3,369,989 sites
| QC filtering procedure | Number of variants removed |
|---|---|
| Not passing SAMtools filters (“mpileup -S -D -q 30 -Q 20”, “vcfutils.pl varFilter -w 10 -d 3 -D 12740 -e 0–2 0”) | 209,826 |
| Cumulative coverage outside of twofold range of global median coverage | 20,843 |
| MAF in 723 monkeys <10 % | 10,766 |
| Missing >50 % of data | 105 |
| Too few (<3) loci in 3Mb regions, not enough for TrioCaller to work. | 1,360 |
| Loci unmapped or not mapped uniquely during LiftOver | 32,419 |
| Filtered out by GATK’s FilterLiftedVariants | 4,094 |
| Whole contig removed for contigs with >1 chromosome switching events per 100 loci | 6,208 |
| LiftOver MapScore <0.5 | 61,721 |
| Loci mapped to the same coordinate in the new reference genome | 4 |
| Alignment: identified regions of poor alignment (mapping quality <2- or coverage >2-fold range of global median depth) and masked these genotypes as missing. Sites with >50 % missing in 4X and above monkeys are removed | 438,423 |
| Sex chromosome SNPs | 65,271 |
| >=5 Mendel errors in parent–child comparisons | 8,563 |
| >60 % heterozygous calls | 6,201 |
| Total | 865,772 |
Characteristics of the two mapping sets of markers derived in Stage 4
| Approx. 500K mapping set | Approx. 150K mapping set | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| CHR | N | SNP/Mba | Max Gapb (BP) | Mean Gapb (BP) | Mean R2c | N | SNP/Mba | Max Gapb (BP) | Mean Gapb (BP) | Mean R2c |
| 1 | 25863 | 205.2 | 1074224 | 4873 | 0.38 | 7695 | 61.1 | 1112341 | 16380 | 0.31 |
| 2 | 16774 | 185.7 | 3730217 | 5385 | 0.40 | 4403 | 48.8 | 3731818 | 20515 | 0.34 |
| 3 | 18390 | 199.7 | 1144078 | 5007 | 0.38 | 5653 | 61.4 | 1176302 | 16291 | 0.33 |
| 4 | 15560 | 209.3 | 1663269 | 4777 | 0.39 | 5476 | 73.7 | 1689958 | 13576 | 0.34 |
| 5 | 15940 | 211.4 | 1150416 | 4730 | 0.38 | 4674 | 62.0 | 1275087 | 16132 | 0.32 |
| 6 | 10112 | 198.7 | 1020214 | 5032 | 0.36 | 2792 | 54.9 | 1030612 | 18227 | 0.29 |
| 7 | 24288 | 178.9 | 1518445 | 5590 | 0.40 | 7026 | 51.8 | 1631219 | 19310 | 0.35 |
| 8 | 28272 | 220.8 | 2004329 | 4530 | 0.37 | 9192 | 71.8 | 2598031 | 13933 | 0.33 |
| 9 | 21550 | 171.6 | 1066472 | 5828 | 0.38 | 6456 | 51.4 | 1089666 | 19453 | 0.33 |
| 10 | 20112 | 156.4 | 1043042 | 6393 | 0.39 | 6104 | 47.5 | 1050307 | 21054 | 0.37 |
| 11 | 25512 | 198.5 | 2199143 | 5038 | 0.39 | 7530 | 58.6 | 2424911 | 17072 | 0.32 |
| 12 | 17587 | 162.0 | 1420166 | 6171 | 0.39 | 5615 | 51.7 | 1520086 | 19331 | 0.34 |
| 13 | 18794 | 191.1 | 1516380 | 5234 | 0.38 | 5934 | 60.4 | 1518768 | 16573 | 0.33 |
| 14 | 19105 | 177.6 | 1817866 | 5632 | 0.44 | 5943 | 55.3 | 1830044 | 18099 | 0.39 |
| 15 | 18395 | 200.5 | 1542801 | 4988 | 0.39 | 5110 | 55.7 | 1545377 | 17952 | 0.32 |
| 16 | 14040 | 186.9 | 1023476 | 5350 | 0.36 | 3698 | 49.3 | 1026661 | 20304 | 0.30 |
| 17 | 15640 | 217.3 | 1093222 | 4602 | 0.39 | 4292 | 59.7 | 1122085 | 16754 | 0.34 |
| 18 | 15291 | 211.6 | 1330661 | 4726 | 0.41 | 4623 | 64.0 | 1542198 | 15620 | 0.35 |
| 19 | 5596 | 173.5 | 476395 | 5765 | 0.41 | 1416 | 43.9 | 554050 | 22793 | 0.35 |
| 20 | 27113 | 209.3 | 459070 | 4778 | 0.35 | 8034 | 62.1 | 711003 | 16094 | 0.28 |
| 21 | 18878 | 149.6 | 1477644 | 6685 | 0.41 | 6175 | 48.9 | 1502779 | 20435 | 0.37 |
| 22 | 15859 | 158.4 | 258174 | 6314 | 0.43 | 5064 | 50.6 | 422458 | 19777 | 0.40 |
| 23 | 16564 | 202.4 | 391245 | 4940 | 0.38 | 5118 | 62.6 | 732327 | 15985 | 0.32 |
| 24 | 18746 | 223.4 | 225900 | 4476 | 0.37 | 4889 | 58.3 | 325648 | 17162 | 0.32 |
| 25 | 18731 | 221.2 | 238967 | 4522 | 0.37 | 5563 | 65.7 | 421038 | 15228 | 0.30 |
| 26 | 12435 | 217.8 | 215585 | 4593 | 0.38 | 3324 | 58.3 | 270040 | 17166 | 0.31 |
| 27 | 12055 | 259.1 | 284037 | 3860 | 0.37 | 3470 | 74.8 | 425185 | 13379 | 0.31 |
| 28 | 4122 | 207.4 | 199874 | 4822 | 0.43 | 1046 | 52.7 | 379794 | 19003 | 0.37 |
| 29 | 5839 | 252.6 | 112574 | 3960 | 0.35 | 1652 | 71.5 | 211468 | 13998 | 0.29 |
aNumber of SNPs per Mb
bMax and mean distance between consecutive SNPs, in basepairs
cAverage of pairwise estimates of windows of five markers