| Literature DB >> 21668978 |
Gloria L Fawcett1, Muthuswamy Raveendran, David Rio Deiros, David Chen, Fuli Yu, Ronald Alan Harris, Yanru Ren, Donna M Muzny, Jeffrey G Reid, David A Wheeler, Kimberly C Worley, Steven E Shelton, Ned H Kalin, Aleksandar Milosavljevic, Richard Gibbs, Jeffrey Rogers.
Abstract
BACKGROUND: Rhesus macaques are the most widely utilized nonhuman primate model in biomedical research. Previous efforts have validated fewer than 900 single nucleotide polymorphisms (SNPs) in this species, which limits opportunities for genetic studies related to health and disease. Extensive information about SNPs and other genetic variation in rhesus macaques would facilitate valuable genetic analyses, as well as provide markers for genome-wide linkage analysis and the genetic management of captive breeding colonies.Entities:
Mesh:
Year: 2011 PMID: 21668978 PMCID: PMC3141668 DOI: 10.1186/1471-2164-12-311
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Bioinformatically validated SNPs
| 17573 Sanger | 17573 SOLiD | r1766 | r02120 | Sub-species comparison | MamuSNP | ENCODE | dbSNP | |
|---|---|---|---|---|---|---|---|---|
| 17573 Sanger | ||||||||
| 17573 SOLiD | 1,056,266 | |||||||
| r1766 | 22,665 | 412,920 | ||||||
| r02120 | 5,734 | 103,323 | 328,910 | |||||
| Sub-species comparison | 4,185 | 6,519 | 1,070 | 226 | ||||
| MamuSNP | 389 | 781 | 159 | 33 | 19 | |||
| ENCODE | 29 | 76 | 14 | 7 | 0 | 1 | ||
| dbSNP | 11 | 12 | 5 | 0 | 0 | 0 | 0 | |
| egeno.assembly.hets | -- | 421,234 | 64,584 | 41,607 | 34,339 | -- | -- | -- |
| egeno.17573.SOLiD | -- | -- | 109,485 | 61,941 | 9 | -- | -- | -- |
| egeno.r1766.frag | -- | 155,454 | -- | 93,016 | 17 | -- | -- | -- |
| egeno.r02120.frag | -- | 41,836 | 69,424 | -- | 3 | -- | -- | -- |
| egeno.sub-species comp | -- | 677 | 743 | 443 | -- | -- | -- | -- |
Unique non-overlapping validated SNPs identified in each pair-wise comparison are displayed. Validated SNPs were determined on the basis of both alleles at a genomic location being identified in at least two data sets and one of either multiple animals or multiple chemistries. Validated SNPs that fit into multiple cells were represented in the left-most appropriate cell.
Figure 1Validation by category, direct comparison examples. Only the largest data sets are shown. (A) Validation in a single animal with multiple chemistries. All data for the reference animal 17573 are displayed by chemistry. SNPs falling into the overlap region are considered to be validated. (B) Validation in multiple animals with a single chemistry and with multiple chemistries. SOLiD data from re-sequencing efforts were compared to the unrelated animals r1766 and r02120. A large number (603,463) of additional SNPs not identified in the reference animal were identified and validated by comparing r02120and r1766. We observed roughly one-third the number of validated SNPs from r02120 that were obtained using the sequence data from r1766, due to reduced uniquely mapping sequence coverage in r02120 (Table 1).
Figure 2Genomic distribution of SNPs. (A) The total number of SNPs identified in each chromosome. (B) The second bar graph shows the relative concentrations of SNPs validated in SNPs/Mb by chromosome. Only chromosome X (**) displays a SNP concentration that deviates more than one standard deviation unit (195.3 SNPs/Mb) from the mean (1057.1 SNPs/Mb).
Figure 3SNP annotation. Genomic context of annotated validated SNPs. All annotations were obtained from Ensemble build 57 (released March 2010). The number of SNPs that fit into each category represented by a bar is listed over the respective bar. The bar for "Additional transcripts" represents only those SNPs where the specific annotation placement information was different between different transcripts for the same SNP.
Diseases associated with genes containing nsSNPs
| General Category | # Associated Deleterious SNPs | Gene names | Diseases | ||
|---|---|---|---|---|---|
| Apoptosis | 8 | Huntington's disease | DNA damage | Alzheimer's disease | |
| Thrombocytopenia | Bipolar | Rheumatoid arthritis | |||
| Diabetes mellitus type I | Renal cell carcinoma | Myeloproliferative disorders | |||
| Leukemia | Diabetes mellitus type II | Amyotrophic lateral Sclerosis | |||
| Pancreatitis | Carcinoma (multiple types) | Multiple sclerosis | |||
| Cell cycle | 20 | Spontaneous abortion | Genomic instability | Drug toxicity | |
| Endometriosis | Anemia | Li-Fraumeni Syndrome | |||
| Aneuploidy | Amyotrophic lateral Sclerosis | Diabetes mellitus type I | |||
| Carcinoma (multiple types) | Diabetes mellitus type II | Werner Syndrome | |||
| Male infertility | Rheumatoid arthritis | osteoporosis | |||
| Development | 11 | Carcinoma (multiple types) | Endometriosis | Asthma | |
| Deglutition disorders | Male infertility | Coronary disease | |||
| Oculopharyngeal Muscular dystrophy | Li-Fraumeni Syndrome | Idiopathic pulmonary fibrosis | |||
| DNA repair | 9 | Genomic instability | Werner Syndrome | Fanconi anemia | |
| Carcinoma (multiple types) | Diabetes mellitus type I | Drug toxicity | |||
| Inflammation | 22 | Huntington's disease | Severe combined Immunodeficiency | Stomach ulcer | |
| Thrombocytopenia | Multiple sclerosis | Ataxia | |||
| Asthma | Rheumatoid arthritis | Leukemia | |||
| Diabetes mellitus type I | Myeloproliferative disorders | Sarcoidosis | |||
| Diabetic nephropathies | Crohn's disease | Carcinoma (multiple types) | |||
| Autosomal dominant polycystic kidney | Sjogren's syndrome | Gout | |||
| Rhinitis | Nasal polyps | Telangiectasis | |||
| Lesch-Nyhan syndrome | Arteriosclerosis | Genome instability | |||
| Metabolism | 4 | Anoxia | Encephalitis | Rheumatoid arthritis | |
| Carcinoma (multiple types) | Diabetes mellitus type II | Glomberulonephritis | |||
| Nervous System | 17 | Microcephaly | Lipodystrophy | Rheumatoid arthritis | |
| Anoxia | Mucolipidoses | Fragile X Syndrome | |||
| Schizophrenia | Pain | Liver cirrhosis | |||
| Parkinson's disease | Asthma | Diabetes mellitus type I | |||
| Glioma | Acquired immunodeficiency | ||||
| Protein folding | 4 | Aneurysm | Pulmonary fibrosis | Diabetes mellitus type I | |
| Carcinoma (multiple types) | Graft vs. host disease | Paralysis | |||
| Reproduction | 13 | Gastrointestinal diseases | Hyperaldosteronism | Genomic instability | |
| Carcinoma (multiple types) | Diabetes mellitus type II | Oligospermia | |||
| anemia | |||||
| Signal transduction | 19 | Spontaneous abortion | Arteriosclerosis | Dilated cardiomyopathy | |
| Carcinoma (multiple types) | Hypertension | Asthma | |||
| Huntington's disease | Severe combined immunodeficiency | Sarcoidosis | |||
| Thrombocytopenia | Multiple sclerosis | Diabetes mellitus type I | |||
| Rheumatoid arthritis | Alzheimer's disease | Arrhythmogenic right ventricular dysplasia | |||
| Transcription | 8 | Retinoblastoma | Oculopharyngeal muscular dystrophies | Rheumatoid arthritis | |
| Spontaneous abortion | Aneuploidy | Deglutition disorders | |||
| Carcinoma (multiple types) | Infertility | ||||
| Transport | 11 | Anemia | Genomic instability | Liver cirrhosis | |
| Alzheimer's disease | Crohn's disease | Obesity | |||
| Epilepsy | Diabetes mellitus type II | Barrett esophagus | |||
| Carcinoma (multiple types) | Gastritis | progeria | |||
A list of genes containing probably or possibly damaging nsSNPs as predicted by PolyPhen-2 was submitted for network analysis in GeneGo. Characterization of the genes containing nsSNPs is achieved by listing the general cellular process (column 1), the total number of deleterious nsSNPs in the genes that GeneGo annotated (column 2), the genes associated with the cellular process (column 3), and the specific diseases that have been associated by GeneGo with the genes in question (columns 4-6).
Figure 4Low stringency vs. high stringency SNP call validation effectiveness. Validation efficiencies cannot be determined using existing validated SNP data sets, as only 777 SNPs are currently available for rhesus macaque in dbSNP, most of which are polymorphic between subspecies (Chinese to Indian) rather than within subspecies (ie: Indian to Indian). We observed improved validation efficiency using low-stringency SNP calls rather than high-stringency. Both high and low stringency SNP calls were obtained for the reference animal (17573) mate-pair sequence data. The percentage of total SNPs validated in the low-stringency SNP set was slightly less (33.8%) than that observed in the high stringency SNP set (45.7%). In absolute numbers, however, there were 1.8X more SNPs validated from the low stringency SNP calls compared to the high stringency SNP calls.
Data sources
| Chemistry | Name | DNA source | SNP calling software | References |
|---|---|---|---|---|
| Sanger | Assembly | Animal 17573 ♀ | SNPdetector | Gibbs, et al. (2007) [ |
| Sanger | ENCODE | 47 pooled individuals | SNPdetector | Hernandez, et al. (2007) [ |
| SOLiD | 17573 fragment | Animal 17573 ♀ | Corona_lite | |
| SOLiD | 17573 mate-pair | Animal 17573 ♀ | Corona_lite | |
| SOLiD | r1766 | Animal r1766 ♀ | Corona_lite | |
| SOLiD | r02120 | Animal r02120 ♂ | Corona_lite | |
| 454 | Sub-species comparison | 16 pooled individuals | AtlasSNP | Gibbs, et al. (2007) [ |
| 454 | MamuSNP | ~7 pooled individuals | Unknown | Malhi, et al. (2007) [ |
| unknown | dbSNP/MonkeySNP | Unknown, >30 estimated | Unknown | |
Each data source is displayed with the appropriate chemistry used, DNA source (if known) and SNP calling software.