| Literature DB >> 24885025 |
Majesta O'Bleness, Veronica B Searles, C Michael Dickens, David Astling, Derek Albracht, Angel C Y Mak, Yvonne Y Y Lai, Chin Lin, Catherine Chu, Tina Graves, Pui-Yan Kwok, Richard K Wilson, James M Sikela1.
Abstract
BACKGROUND: Although the reference human genome sequence was declared finished in 2003, some regions of the genome remain incomplete due to their complex architecture. One such region, 1q21.1-q21.2, is of increasing interest due to its relevance to human disease and evolution. Elucidation of the exact variants behind these associations has been hampered by the repetitive nature of the region and its incomplete assembly. This region also contains 238 of the 270 human DUF1220 protein domains, which are implicated in human brain evolution and neurodevelopment. Additionally, examinations of this protein domain have been challenging due to the incomplete 1q21 build. To address these problems, a single-haplotype hydatidiform mole BAC library (CHORI-17) was used to produce the first complete sequence of the 1q21.1-q21.2 region.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24885025 PMCID: PMC4053653 DOI: 10.1186/1471-2164-15-387
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Comparison of GRCh37/hg19 assembly (left) with the WUSTL CHM1 assembly (right). NBPF genes are indicated in red, all other genes are in blue. Black boxes on the GRCh37 map denote gaps. The vertical bar to the right of the CHM1 map denotes the novel inversion spanning multiple genes discussed in the text.
Copy number differences in 1q21 between the GRCh37 build and the CHM1 assembly
| Gene name | GRCh37/hg19 | CHM1 assembly |
|---|---|---|
|
| 1 | 2 |
|
| 1 | 2 |
|
| 0 | 1 |
|
| 1 | 0 |
|
| 1 | 0 |
|
| 1 | 0 |
|
| 0 | 1 |
|
| 1 | 3 |
|
| 1 | 3 |
|
| 3 | 2 |
|
| 1 | 3 |
|
| 1 | 2 |
| Total 1q21 DUF1220 | 242 | 238 |
| DUF1220 CON1 | 22 | 17 |
| DUF1220 CON2 | 13 | 11 |
| DUF1220 CON3 | 12 | 11 |
| DUF1220 HLS1 | 60 | 62 |
| DUF1220 HLS2 | 68 | 69 |
| DUF1220 HLS3 | 64 | 66 |
| DUF1220 Triplets | 51 | 59 |
Table 1 describes changes in copy number between the GRCh37 and new CHM1 assemblies. The majority of non-DUF1220 changes were gain in copy, with 8 genes previously under represented in GRCh37. Three NBPF genes are no longer present, although DUF1220 numbers remained close to the same with DUF1220 copies being mostly redistributed among the remaining NBPF genes. This may indicate that the gene loss is an artifact of misassembly rather than true gene copy number differences. Numbers do not include the additional DUF1220 domains and NBPF genes added to the 1p11.2 region as a result of the human-specific 1q21.2 segmental duplication described in the text.
Figure 2Organization of the DUF1220 domain and gene families in the 1q21.1-21.2 region in the GRCh37/hg19 assembly (black) and new CHM1 assembly (red). Three NBPF genes have been lost in the CHM1 assembly, and were likely artifacts of misassembly rather than true differences between the two. Six NBPF genes show different DUF1220 copy numbers between builds. The 6 different DUF1220 clades are denoted by colored boxes and DUF1220 triplets are underlined.
Description of NBPF genes in GRCh38 assembly
| Name | Location | No. of DUF1220 | No. of DUF1220 triplets |
|---|---|---|---|
|
| 1p36.13 | 7 | 0 |
|
| 1p36.12 | 3 | 0 |
|
| 1p36.12 | 5 | 0 |
|
| 1p13.3 | 4 | 0 |
|
| 1p13.3 | 2 | 0 |
|
| 1p13.3 | 4 | 0 |
|
| 1p12 | 2 | 0 |
|
| 1p11.2 | 8 | 1 |
|
| 1p11.2 | 13 | 1 |
|
| 1q21.1 | 0 | 0 |
|
| 1q21.1 | 6 | 0 |
|
| 1q21.1 | 6 | 1 |
|
| 1q21.1 | 67 | 20 |
|
| 1q21.1 | 6 | 1 |
|
| 1q21.1 | 42 | 12 |
|
| 1q21.1 | 11 | 2 |
|
| 1q21.1 | 5 | 0 |
|
| 1q21.2 | 7 | 0 |
|
| 1q21.2 | 32 | 7 |
|
| 1q21.2 | 9 | 1 |
|
| 1q21.2 | 45 | 14 |
|
| 1q21.3 | 0 | 0 |
|
| 3p22.2 | 1 | 0 |
|
| 5q14.3 | 2 | 0 |
Table 2 displays a summary of all annotated NBPF regions in the GRCh38 build. There are 23 NBPF-like regions, with 14 NBPF genes and 9 pseudogenes.
Figure 3Comparison of 1q21 to 1p11.2 showing two separate duplication events between the two regions: 1) a segmental duplication between 1q21.2 and 1p11.2 containing 11 genes, including 2 NBPF genes and 2) a smaller duplication from 1q21.1 to 1p11.2[21].
Figure 4Single-molecule genome maps (orange) from three hydatidiform mole BACs were assembled into consensus genome maps (blue). One of the assembled consensus genome maps is shown here (BAC CH17-112A12, blue) and is aligned to an in silico map based on the 1q21 sequence assembly described in this paper (green). Locations of the NBPF12 and NBPF13 genes on the 1q21 sequence assembly are marked in red. Segment lengths between labels in the NBPF12 gene are consistent across in silico and de novo maps.
Figure 5Phylogeny of DUF1220 triplets in the CHM1 assembly.
Figure 6Comparison of arrayCGH profiles of patients with 1q21 deletions and duplications between the GRCh37/hg19 assembly and the CHM1 assembly. Samples with known duplications are represented in pink, Type I deletions in blue, and Type II deletions, which are larger than Type I deletions and include the thrombocytopenia-absent radius (TAR) region, in black. Gray vertical regions in the GRCh37/hg19 assembly represent gaps that were eliminated in the CHM1 assembly. Green bars above the GRCh37map and below the CHM1 map indicate the approximate location of the Type I deletion in each assembly. Note that the inverted gene segment in the CHM1 assembly requires a two-deletion event rather than single-deletion event to explain the Type I deletion mapping pattern. Tick marks at the bottom of the figure are separated by 2 Mb; the GRCh37 assembly starts at 142,000,000 and the CHM1 assembly starts at 0.