| Literature DB >> 30329086 |
Nicholas Kinney1, Kyle Titus-Glover2, Jonathan D Wren3,4, Robin T Varghese1, Pawel Michalak1,5,6, Han Liao2, Ramu Anandakrishnan1, Arichanah Pulenthiran1, Lin Kang1, Harold R Garner1,7.
Abstract
The human genome harbors an abundance of repetitive DNA; however, its function continues to be debated. Microsatellites-a class of short tandem repeat-are established as an important source of genetic variation. Array length variants are common among microsatellites and affect gene expression; but, efforts to understand the role and diversity of microsatellite variation has been hampered by several challenges. Without adequate depth, both long-read and short-read sequencing may not detect the variants present in a sample; additionally, large sample sizes are needed to reveal the degree of population-level polymorphism. To address these challenges we present the Comparative Analysis of Germline Microsatellites (CAGm): a database of germline microsatellites from 2529 individuals in the 1000 genomes project. A key novelty of CAGm is the ability to aggregate microsatellite variation by population, ethnicity (super population) and gender. The database provides advanced searching for microsatellites embedded in genes and functional elements. All data can be downloaded as Microsoft Excel spreadsheets. Two use-case scenarios are presented to demonstrate its utility: a mononucleotide (A) microsatellite at the BAT-26 locus and a dinucleotide (CA) microsatellite in the coding region of FGFRL1. CAGm is freely available at http://www.cagmdb.org/.Entities:
Mesh:
Year: 2019 PMID: 30329086 PMCID: PMC6323891 DOI: 10.1093/nar/gky969
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of samples in the CAGm database
| Ethnicity | Gender | Samples |
|---|---|---|
| AFR | M | 318 |
| F | 349 | |
| EUR | M | 239 |
| F | 263 | |
| AMR | M | 171 |
| F | 181 | |
| EAS | M | 250 |
| F | 264 | |
| SAS | M | 263 |
| F | 231 |
Five ethnicities (super populations) are shown: African (AFR), American (AMR), European (EUR), East Asian (EAS) and South Asian (SAS). Each ethnicity draws from four to seven populations (not shown).
Figure 1.A map of the main pages (top row) and partial list of the subordinate pages (bottom row) on the CAGm website.
A key novelty of CAGm are tables that aggregate microsatellite variation by population, ethnicity (super population) and gender
| genotype | AFR | AMR | EUR | EAS | SAS |
|---|---|---|---|---|---|
| 17|21 | 1 | 1 | 0 | 0 | 2 |
| 19|21 | 33 | 11 | 12 | 11 | 30 |
| 21|21 | 403 | 164 | 238 | 198 | 297 |
| 21|23 | 5 | 2 | 0 | 0 | 4 |
| 21|25 | 0 | 0 | 0 | 0 | 1 |
Variants can be aggregated by genotype (shown), allele and zygosity. The table above shows genotypes by ethnicity (super population) for a CA dinucleotide repeat embedded in exon 6 of the FGFRL1 gene (chr4:1025267-1025287). Five ethnicities are shown: African (AFR), American (AMR), European (EUR), East Asian (EAS) and South Asian (SAS).