| Literature DB >> 20822512 |
Pin Tong1, James G D Prendergast, Amanda J Lohan, Susan M Farrington, Simon Cronin, Nial Friel, Dan G Bradley, Orla Hardiman, Alex Evans, James F Wilson, Brendan Loftus.
Abstract
BACKGROUND: Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20822512 PMCID: PMC2965383 DOI: 10.1186/gb-2010-11-9-r91
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Read information
| Data type | Library number | Number of reads | Number of mapped reads | Total bases (Gb) | Mapped base (Gb) | Effective depth |
|---|---|---|---|---|---|---|
| Single-end read | 4 | 155,704,190 | 142,333,466 | 9.7 | 9.1 | 3.2 |
| Paired-end read | 5 | 324,936,690 | 297,787,256 | 23.2 | 21.2 | 7.4 |
| Total | 9 | 480,640,880 | 440,120,722 | 32.9 | 30.3 | 10.6 |
Figure 1Comparison of detected SNPs and indels to dbSNP130. The dbSNP alleles were separated into validated and non-validated, and the detected variations that were not present in dbSNP were classified as novel.
Types of SNPs found
| Consequence | Number of SNPs | % of SNPs |
|---|---|---|
| Essential_splice_site | 135 | 0.0043 |
| Stop_gained | 107 | 0.0034 |
| Stop_lost | 23 | 0.0007 |
| Non_synonymous_coding | 10,201 | 0.3263 |
| Splice_site | 2,002 | 0.0640 |
| Synonymous_coding | 9,781 | 0.3129 |
| Within_mature_mirna | 30 | 0.0010 |
| Within_non_coding_gene | 16,512 | 0.5282 |
| 5prime_utr | 4,599 | 0.1471 |
| 3prime_utr | 19,639 | 0.6283 |
| Intronic | 1,083,616 | 34.6666 |
| Other | 1,979,180 | 63.3170 |
Figure 2The linkage disequilibrium structure in the immediate region of the . Red boxes indicate SNPs in high LD. rs3197999, which has previously been associated with inflammatory bowel disease, and our novel nonsense SNP are highlighted in blue.
Figure 3Multidimensional scaling plot illustrating the Irish individual's relationship to the CEU HapMap individuals and other previously sequenced genomes.
Figure 4Improved SNP calling using haplotype data. SNP calling performance on chromosome 20 at various read depths with and without the inclusion of haplotype or genotype frequency data.
Regions of high positive selection, in close proximity to genes, identified in the analysis of Williamson et al. [41]
| Corresponding regions of low Tajima's D in this analysis | |||||
|---|---|---|---|---|---|
| Chr | Position (hg18) | Nearest gene | Position (hg18) | Nearest gene | Tajima's D |
| 1 | 113519196 | LRIG2 (50 kb) | 113505001-113515000 | - | -1.72 |
| 1 | 155990832 | FCRL2 (0) | 155990001-156000000 | FCRL2 (0 kb) | |
| 1 | 212654925 | PTPN14 (0) | 212595001-212605000 | - | -1.09 |
| 2 | 140931201 | LRP1B (0) | 140930001-140940000 | LRP1B (0 kb) | |
| 2 | 201548002 | MGC39518 (3 kb) | 201455001-201465000 | - | -1.73 |
| 3 | 29922879 | RBMS3 (0) | 29915001-29925000 | RBMS3 (0 kb) | |
| 3 | 43338322 | SNRK (0) | 43385001-43395000 | - | -1.30 |
| 3 | 145075381 | SLC9A9 (26 kb) | 145090001-145100000 | - | -1.71 |
| 4 | 71744283 | IGJ (0) | 71740001-71750000 | IGJ (0 kb) | |
| 4 | 169386385 | FLJ20035 (0) | 169395001-169405000 | FLJ20035/DDX60 (0 kb) | |
| 5 | 15527762 | FBXL7 (26 kb) | 15535001-15545000 | FBXL7 (8.3 kb) | |
| 6 | 128662923 | PTPRK (0) | 128655001-128665000 | PTPRK (0 kb) | |
| 8 | 57165523 | RPS20 (16 kb) | 57200001-57210000 | PLAG1 (26 kb) | |
| 10 | 45498260 | ANUBL1 (10 kb) | 45495001-45505000 | FAM21C (0 kb) | |
| 12 | 81525433 | DKFZp762A217 (79 kb) | 81520001-81530000 | DKFZp762A217 (75 kb) | |
| 13 | 37806830 | UFM1 (15 kb) | 37805001-37815000 | - | -1.38 |
| 15 | 37639096 | THBS1 (21 kb) | 37640001-37650000 | - | -1.95 |
| 15 | 89644996 | SV2B (5 kb) | 89640001-89650000 | SV2B (0 kb) | |
| 16 | 80605406 | HSPC105 (3 kb) | 80595001-80605000 | - | -1.87 |
| 18 | 30388871 | DTNA (0) | 30380001-30390000 | DTNA (0 kb) | |
| 18 | 44274281 | KIAA0427 (45 kb) | 44365001-44375000 | KIAA0427 (0 kb) | |
Regions in this analysis with a Tajima's D value of less than -2 within 100 kb of the corresponding region from Williamson et al. [41] are highlighted in bold. (Selection of 21 random positions in the genome 1,000 times never produced as many within close proximity to a window whose Tajima's D was less than -2.)
Figure 5Tajima's D values for paralogs arisen from gene duplications of different ages. Mean Tajima's D values for genes involved in duplication events of differing ages. Horizontal dotted line indicates median Tajima's D value of all genes in human genome. As can be seen, genes involved in a recent duplication event in general show lower values of D than the genome-wide average, with genes involved in a duplication event specific to Humans, as a group, showing the lowest values of D. (Kruskal-Wallis P < 2.2 × 10-16).