| Literature DB >> 23435088 |
Giulio Genovese1, Robert E Handsaker, Heng Li, Nicolas Altemose, Amelia M Lindgren, Kimberly Chambert, Bogdan Pasaniuc, Alkes L Price, David Reich, Cynthia C Morton, Martin R Pollak, James G Wilson, Steven A McCarroll.
Abstract
Tens of millions of base pairs of euchromatic human genome sequence, including many protein-coding genes, have no known location in the human genome. We describe an approach for localizing the human genome's missing pieces using the patterns of genome sequence variation created by population admixture. We mapped the locations of 70 scaffolds spanning 4 million base pairs of the human genome's unplaced euchromatic sequence, including more than a dozen protein-coding genes, and identified 8 new large interchromosomal segmental duplications. We find that most of these sequences are hidden in the genome's heterochromatin, particularly its pericentromeric regions. Many cryptic, pericentromeric genes are expressed at the RNA level and have been maintained intact for millions of years while their expression patterns diverged from those of paralogous genes elsewhere in the genome. We describe how knowledge of the locations of these sequences can inform disease association and genome biology studies.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23435088 PMCID: PMC3683849 DOI: 10.1038/ng.2565
Source DB: PubMed Journal: Nat Genet ISSN: 1061-4036 Impact factor: 38.330
Segmental duplications localized by admixture mapping.
| CHR | FROM | TO | BAND | GENE | SIZE | CHR’ | FROM’ | TO’ | BAND’ | SCAFFOLD | DIV | CARE | ICDB | HAPMAP | FISH |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| chr1 | 83,598,160 | 83,955,427 | 1p31.1 |
| ~400kbp | chr7 | 76,182,346 | 76,575,579 | 7q11.23 | NA | ~1.4% | 6 | 1 | yes | no |
| chr1 | 206,072,708 | 206,558,788 | 1q32.1 |
| ~240kbp | chr1 | 143,880,004 | 144,095,783 | 1q21.1 | NA | ~0.6% | 3 | 0 | no | no |
| chr2 | 37,958,019 | 38,003,219 | 2p22.2 | NA | ~45kbp | chr22 | NA | NA | 22q11.1 | SCAF_1103279187616 | ~4.0% | 3 | 0 | yes | no |
| chr2 | 91,737,476 | 91,880,745 | 2p11.1 |
| ~140kbp | chr1 | NA | NA | 1q21.1 | RP11-247L13 | ~1.2% | 2 | 0 | yes | no |
| chr2 | 133,005,020 | 133,120,083 | 2q21.2 | NA | ~115kbp | chr20 | NA | NA | 20q11.21 | RP11-462H3 | >2.0% | 1 | 1 | yes | yes |
| chr3 | 612,223 | 663,367 | 3p26.3 | NA | ~50kbp | chr22 | NA | NA | 22q11.1 | GL000217 | ~2.0% | 1 | 0 | no | yes |
| chr3 | 75,761,051 | 75,871,577 | 3p12.3 |
| >110kbp | chr21 | NA | NA | 21q11.2 | RP4-813B7 | >5.0% | 1 | 0 | no | no |
| chr4 | 25,709 | 68,702 | 4p16.3 |
| ~40kbp | chr22 | NA | NA | 22q11.1 | RP11-85C8 | ~0.5% | 1 | 0 | no | no |
| chr4 | 3,536,207 | 3,636,136 | 4p16.3 |
| ~100kbp | chr9 | NA | NA | 9p11.2 | SCAF_1103279188214 | ~3.0% | 1 | 0 | yes | yes |
| chr4 | 190,470,115 | 190,684,480 | 4q35.2 | NA | ~215kbp | chr21 | NA | NA | 21q11.2 | GL000193 | >2.0% | 2 | 0 | no | no |
| chr5 | 21,506,326 | 21,573,437 | 5p14.3 | NA | ~65kbp | chr6 | 58,137,660 | 58,139,549 | 6p11.2 | CH17-92N24 | ~1.5% | 0 | 0 | yes | yes |
| chr6 | 256,518 | 382,461 | 6p25.3 |
| ~125kbp | chr16 | NA | NA | 16p11.2 | NA | ~0.1% | 0 | 1 | no | no |
| chr6 | 57,204,729 | 57,435,462 | 6p11.2 |
| ~230kbp | chr6 | NA | NA | 6p11.2 | SCAF_1103279188350 | ~2.0% | 0 | 0 | no | yes |
| chr6 | 57,204,729 | 57,608,453 | 6p11.2 |
| ~400kbp | chr6 | NA | NA | 6q11.1 | SCAF_1103279188263 | ~2.0% | 0 | 0 | no | yes |
| chr6 | 57,369,236 | 57,608,453 | 6p11.2 |
| ~240kbp | chr3 | NA | NA | 3p11.1 | SCAF_1103279180085 | ~2.0% | 3 | 0 | yes | yes |
| chr6 | 57,401,565 | 57,570,618 | 6p11.2 |
| >170kbp | chr3 | NA | NA | 3p11.1 | RP1-216J23 | ~2.0% | 3 | 0 | yes | yes |
| chr6 | 57,447,574 | 57,575,919 | 6p11.2 |
| ~130kbp | chr6 | NA | NA | 6p11.2 | SCAF_1103279188406 | ~2.0% | 0 | 0 | no | yes |
| chr12 | 147,380 | 188,194 | 12p13.33 |
| >40kbp | chr20 | 62,947,067 | 62,965,512 | 20q13.33 | SCAF_1103279187960 | ~1.2% | 1 | 0 | no | no |
| chr13 | 19,020,001 | 19,167,977 | 13q11 |
| ~200kbp | chr21 | 14,447,204 | 14,594,419 | 21q11.2 | NA | ~0.8% | 3 | 0 | yes | no |
| chr14 | 19,817,857 | 20,194,548 | 14q11.2 |
| ~400kbp | chr22 | 16,085,071 | 16,459,525 | 22q11.1 | NA | ~0.6% | 8 | 0 | yes | no |
| chr16 | 70,845,287 | 71,202,573 | 16q22.2 |
| ~360kbp | chr1 | 146,341,167 | 146,400,000 | 1q21.1 | GL000192 | ~0.6% | 58 | 8 | yes | no |
| chr21 | 10,971,951 | 11,032,242 | 21p11.1 |
| >60kbp | chr13 | NA | NA | 13q11 | RP5-1039L24 | ~0.2% | 1 | 1 | no | no |
| chr21 | 11,083,847 | 11,156,072 | 21p11.1 |
| >80kbp | chr13 | NA | NA | 13q11 | NA | NA | 2 | 0 | no | no |
CHR, FROM, TO, BAND: chromosome, hg19 coordinates, and localization of the ancestral copy of the duplication; GENE: protein coding gene(s) overlapping the duplication; SIZE: estimated size of the duplication; CHR’, FROM’, TO’, BAND’: chromosome, hg19 coordinates, and localization of the derived copy of the duplication; SCAFFOLD: genomic scaffold containing the sequence in the derived copy of the duplication; DIV: estimated sequence divergence between the ancestral and the derived copies of the duplication; CARE: number of Affymetrix 6.0 SNPs re-mapped in the CARe dataset; ICDB: number of Illumina SNPs re-mapped in the ICDB dataset; HAPMAP: whether independent evidence of the cryptic duplication was confirmed by inter-chromosomal LD from HapMap genotypes; FISH: whether a FISH experiment was performed to validate the duplication.