| Literature DB >> 25943489 |
Roel Hermsen1, Joep de Ligt2, Wim Spee3, Francis Blokzijl4, Sebastian Schäfer5, Eleonora Adami6, Sander Boymans7, Stephen Flink8, Ruben van Boxtel9, Robin H van der Weide10, Tim Aitman11, Norbert Hübner12, Marieke Simonis13, Boris Tabakoff14, Victor Guryev15, Edwin Cuppen16.
Abstract
BACKGROUND: Since the completion of the rat reference genome in 2003, whole-genome sequencing data from more than 40 rat strains have become available. These data represent the broad range of strains that are used in rat research including commonly used substrains. Currently, this wealth of information cannot be used to its full extent, because the variety of different variant calling algorithms employed by different groups impairs comparison between strains. In addition, all rat whole genome sequencing studies to date used an outdated reference genome for analysis (RGSC3.4 released in 2004).Entities:
Mesh:
Year: 2015 PMID: 25943489 PMCID: PMC4422378 DOI: 10.1186/s12864-015-1594-1
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Sequence variation in 40 + 1 rat strains
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| ACI/EurMcwi | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,539,775 | 1,651,251 | 7,259 |
| ACI/N | Baud et al. | 23708188 | SOLiD 4 and 5500 | 3,125,523 | 1,382,793 | 19,541 |
| BBDP/Wor | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,279,444 | 1,526,223 | 3,678 |
| BN/SsN | Baud et al. | 23708188 | SOLiD 4 and 5500 | 59,402 | 660,918 | 14,126 |
| BN- | Simonis et al.; Atanur et al. | 22541052; 23890820 | SOLiD 2,3 and 4 | 102,359 | 627,056 | 13,391 |
| BN- | Hermsen et al. | na | Illumina HiSeq2000 | 140,376 | 420,433 | 13,410 |
| BUF/N | Baud et al. | 23708188 | SOLiD 4 and 5500 | 2,848,992 | 1,302,710 | 18,481 |
| DA/BklArbNsi | Guo et al. | 23695301 | Illumina HiSeq2000 | 3,368,008 | 1,567,160 | 4,184 |
| F334/N | Baud et al. | 23708188 | SOLiD 4 and 5500 | 2,947,509 | 1,342,709 | 20,881 |
| F344/NCrl | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,369,205 | 1,579,418 | 3,492 |
| F344/NHsd | Guo et al. | 23695301 | Illumina HiSeq2000 | 3,367,166 | 1,573,573 | 3,950 |
| FHH/EurMcwi | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,389,304 | 1,592,915 | 3,011 |
| FHL/EurMcwi | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,361,824 | 1,586,543 | 8,504 |
| GK/Ox | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,549,952 | 1,575,619 | 4,241 |
| LE/Stm (Illumina) | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,412,610 | 1,578,099 | 2,598 |
| LE/Stm (SOLiD) | Baud et al. | 23708188 | SOLiD 4 and 5500 | 2,949,814 | 1,359,947 | 21,038 |
| LEW/Crl | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 2,884,477 | 1,409,659 | 3,642 |
| LEW/NCrl | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 2,884,763 | 1,402,459 | 3,996 |
| LH/MavRrrc | Atanur et al.; Ma et al. | 23890820; 24628878 | Illumina HiSeq2X00 | 3,369,852 | 1,584,236 | 2,891 |
| LL/MavRrrc | Atanur et al.; Ma et al. | 23890820; 24628878 | Illumina HiSeq2X00 | 3,329,343 | 1,565,343 | 3,070 |
| LN/MavRrrc | Atanur et al.; Ma et al. | 23890820; 24628878 | Illumina HiSeq2X00 | 3,319,381 | 1,562,698 | 2,952 |
| M520/N | Baud et al. | 23708188 | SOLiD 4 and 5500 | 2,896,825 | 1,321,431 | 19,308 |
| MHS/Gib | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,183,312 | 1,513,330 | 2,917 |
| MNS/Gib | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,168,796 | 1,538,413 | 3,105 |
| MR/N | Baud et al. | 23708188 | SOLiD 4 and 5500 | 2,878,806 | 1,350,411 | 18,001 |
| SBH/Ygl | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,393,610 | 1,617,252 | 14,787 |
| SBN/Ygl | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,300,171 | 1,592,247 | 15,216 |
| SHR/NCrlPrin | Hermsen et al. | na | Illumina HiSeq2000 | 3,736,435 | 1,694,012 | 14,179 |
| SHR/NHsd | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,756,155 | 1,705,126 | 3,950 |
| SHR/OlaIpcv | Simonis et al.; Atanur et al. | 22541052; 23890820 | Illumina Genome Analyser 2 | 3,747,579 | 1,706,963 | 4,066 |
| SHR/OlaIpcvPrin | Hermsen et al. | na | Illumina HiSeq2000 | 3,709,362 | 1,689,758 | 14,069 |
| SHRSP/Gla | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,700,495 | 1,723,961 | 2,301 |
| SR/Jr | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,353,579 | 1,568,778 | 3,699 |
| SS/Jr | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,311,117 | 1,553,050 | 3,685 |
| SS/JrHsdMcwi | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,310,209 | 1,595,799 | 7,938 |
| SUO_F344 | Hermsen et al. | na | Illumina HiSeq2000 | 3,349,024 | 1,549,272 | 11,864 |
| WAG/Rij | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,092,505 | 1,485,673 | 3,650 |
| WKY/Gla | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,777,400 | 1,725,868 | 3,292 |
| WKY/N | Baud et al. | 23708188 | SOLiD 4 and 5500 | 3,213,913 | 1,419,460 | 21,832 |
| WKY/NCrl | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,502,459 | 1,700,646 | 3,630 |
| WKY/NHsd | Atanur et al. | 23890820 | Illumina HiSeq2X00 | 3,682,736 | 1,665,949 | 4,691 |
| WN/N | Baud et al. | 23708188 | SOLiD 4 and 5500 | 2,899,096 | 1,323,116 | 18,995 |
Sequence information from 40 known strains was used. The unknown SUO_F344 strain was also included in the analysis. In addition LE/Stm was sequenced with two separate sequencing platforms: Illumina and SOLiD; these two datasets were treated as separate samples in the analysis. Therefore in total this table contains variant information of 42 samples from 40 + 1 rat strains.
Prediction of the functional consequences of the SNVs
|
|
|
|
|
|
|---|---|---|---|---|
| Stop gained | High | 285 | 0.0% | 696 |
| Splice site donor | 209 | 0.0% | ||
| Splice site acceptor | 158 | 0.0% | ||
| Start lost | 26 | 0.0% | ||
| Stop lost | 18 | 0.0% | ||
| Non synonymous coding | Moderate | 26,239 | 0.3% | 26,239 |
| Synonymous coding | Low | 42,182 | 0.4% | 42,947 |
| Start gained | 725 | 0.0% | ||
| Synonymous stop | 35 | 0.0% | ||
| Non synonymous start | 5 | 0.0% | ||
| Intergenic | Modifier | 6,509,332 | 62.2% | 10,394,771 |
| Intron | 2,991,180 | 28.6% | ||
| Downstream | 430,875 | 4.1% | ||
| Upstream | 427,613 | 4.1% | ||
| UTR 3 Prime | 27,145 | 0.3% | ||
| UTR 5 Prime | 4,357 | 0.0% | ||
| Exon | 4,269 | 0.0% | ||
|
|
|
|
Figure 1‘Repression’ of genes and exons containing high impact SNVs. (a) Genome-wide average FPKM ± SEM across all tissues compared to the average FPKM of genes containing high impact SNVs for 12 tissues. Genes containing high impact SNVs are significantly lower expressed (Non-parametric ANOVA; p < 0.0001). (b) The average Percentage Spliced In (PSI) ± SEM across the transcriptome was compared to the average PSI of exons containing high impact SNVs for 12 tissues. Exons containing high impact SNVs are significantly more spliced out/not used (Non-parametric ANOVA; P < 0.0001).
Figure 2Cross-species comparison of SNV densities. (a) An example of a locus (black rectangle) on mouse chromosome 9 with the lowest SNV density in five domesticated species. (b) An example of a locus (black rectangle) on mouse chromosome 4 with the highest SNV density in five domesticated species.
Figure 3‘Population’ structure of 40 + 1 rat strains. (a) Per strain, the contribution from the 9 different clusters is plotted as percentage of the genome. Each cluster is represented by a separate color. The cluster designated with a ‘m’ represents the strains that have membership from multiple clusters. (b) Per strain, the genomic distribution along rat chromosome 1 is plotted as an example. The colors match the cluster colors from (a).
Strains and substrains included in the substrain variability analysis
|
|
|
|
|---|---|---|
| ACI | ACI/N | 3,432 |
| ACI/EurMcwi | ||
| BN | BN- | 2,291 |
| BN- | ||
| BN/SsN | ||
| F344 | F334/N | 5,854 |
| F344/NHsd | ||
| F344/NCrl | ||
| SUO_F344 | ||
| LEW | LEW/Crl | 1,046 |
| LEW/NCrlBR | ||
| SHR | SHR/OlaIpcv | 2,950 |
| SHR/NCrlPrin | ||
| SHR/NHsd | ||
| SHR/OlaIpcvPrin | ||
| SS | SS/Jr | 2,495 |
| SS/JrHsdMcwi | ||
| WKY | WKY/N | 10,250 |
| WKY/NCrl | ||
| WKY/NHsd | ||
| Total | 28,318 |
Figure 4Genomic distribution of substrain variants per strain. For each strain the distance between two consecutive SNVs (y-axis) is plotted along the genomic position (x-axis). The windows on the x-axis represent the different chromosomes. Loci with a high density of substrain SNVs can be observed as clusters that drop down from the average genome-wide pattern.
Figure 5Substrain variant characteristics. (a) Bar plots showing the contribution of each nucleotide change for all substrain variants (observed) versus the control variants (expected). Error bars represent the 95% confidence interval. (b) Bar plot showing the Ka/Ks ratio ratio of the substrain variants versus the control variants. (c) Bar plot showing the average phastCons score for each substrain variant compared to the control variants. Substrain variants affect nucleotides with a significantly higher phastCons score (Student’s t-test; p < 2.2e-16). Error bars represent the SEM.