| Literature DB >> 30413723 |
Haojing Shao1, Chenxi Zhou1, Minh Duc Cao1, Lachlan J M Coin2.
Abstract
The majority of human chromosome ends remain incompletely assembled due to their highly repetitive structure. In this study, we use BioNano data to anchor and extend chromosome ends from two European trios as well as two unrelated Asian genomes. At least 11 BioNano assembled chromosome ends are structurally divergent from the reference genome, including both missing sequence and extensions. These extensions are heritable and in some cases divergent between Asian and European samples. Six out of nine predicted extension sequences from NA12878 can be confirmed and filled by nanopore data. We identify two multi-kilobase sequence families both enriched more than 100-fold in extension sequence (p-values < 1e-5) whose origins can be traced to interstitial sequence on ancestral primate chromosome 7. Extensive sub-telomeric duplication of these families has occurred in the human lineage subsequent to divergence from chimpanzees.Entities:
Mesh:
Year: 2018 PMID: 30413723 PMCID: PMC6226469 DOI: 10.1038/s41598-018-34774-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Paralogy map for chromosome tips region. (a) 9q; (b) 15q. These chromosome tips are assembled by BioNano data from two trios and two individuals. The enzyme recognition sites (labels) are marked as black bars and gray connecting lines indicate alignment between samples. The homologous sequence is indicated by color block: blue indicates homology to human reference sequence at the given chromosome; grey indicates unknown reference sequence; purple, cyan and pink indicate homology to 19p, 16q and 1p respectively. The remaining unaligned regions are all colored with red. Overlapping colors indicate a shared homology to multiple sources. Yellow and bright green indicate homology with family A and family B respectively. An overlapping color indicates this region is homologous to two regions.
Figure 2Validation of six termini by nanopore reads. (a) 3q; (b) 6p; (c) 9q; (d) 15q; (e) 20p; (f) 20q. Reference, bionano assembly in NA12878 and predicted extension sequence (see Methods) are showed as coloured rectangle in the middle. In silico bionano enzyme recognition sites (labels) are showed as vertical black line. The grey lines between labels indicate they are matched. The dotplots of nanopore read to extension sequence are shown at the bottom. Green and red are forward and reverse alignment, respectively. The nanopore read names are listed in Table S10.
Figure 3Phylogenetic trees for two sequence families identified in human extension sequences. (A) Family A (9 kb); (B) Family B (8 kb). Phylogenetic trees are generated from their homologous sequences. Dark purple, light purple, red and blue indicate intra-chromosome duplication group, inter-chromosome duplication group, chimpanzee subtelomeric duplication group and human subtelomeric duplication group, respectively. H for Human, C for Chimpanzee, G for Gorilla and O for Orangutan. p for p arm, q for q arm, Un for unplaced contig, un for unlocalized within a chromosome and i for interstitial genome regions. ‘+‘ for forward aligned and ‘−’ for reverse aligned. ‘.A’, ‘.B’, ‘.C’ and ‘.D’ are appended to distinguish different copies in the same region. Bootstrap value more than 95 is shown. Their genome location is shown as different shapes at each chromosome on the right. The human ancestor copy of intra-chromosome duplication group, inter-chromosome duplication group, and subtelomeric group are indicated by the shape of a diamond, triangle, and square. Otherwise, the shape is a circle. Top and bottom indicate different orientation. Gene FAM157A, FAM157B and FAM157C contain both families A and B at H3q, H9q and H16q, respectively. Silhouette images are from PhyloPic. Homo sapiens (“http://phylopic.org/image/c089caae-43ef-4e4e-bf26-973dd4cb65c5”) and Gorilla (“http://phylopic.org/image/d9af529d-e426-4c7a-922a-562d57a7872e/”) are both licensed under the Public Domain Dedication 1.0 license. Pan troglodytes (“http://phylopic.org/image/2f7da8c8-897a-445e-b003-b3955ad08850/”) by T. Michael Keesey (vectorization) and Tony Hisgett (photography), and Pongo abelii (“http://phylopic.org/image/67144c22-93c2-4dc0-ba13-9f9dd2d223b9/”) by Gareth Monger, are both licensed under the Creative Commons Attribution 3.0 Unported license (“http://creativecommons.org/licenses/by/3.0/”).
Figure 4Gene innovation for two sequence families identified in human extension sequences. (a) Overview of genes inside ancestral copy. Family A contains last exon of GTF2IP13-202 (red), a pseudogene CICP20-201 (blue) and an unannotated pseudogene (green). Family B contain exons 10-13 of GTF2IP13-202. (b) Putative gene fusion in family A. The unannotated pseudogene (green) is homologous to the last exon of SEPT14-201. The AC096582.1-201 is a combination of part of CICP20-201 and part of last exon of SEPT14-201. AC096582.1-201 later combines with sequences from both interstitial and subtelomeric regions into LINC00266-4P-202 (first two exons), AL627309.1-202 (first exon) and AC069287.1-202 (first two exons). Derived* indicates that it is inferred from the phylogentics tree in Fig. 3. (c) Gene modification in family B. Each exon of GTF2IP13-202 is colored in different color. Boxes and links are drawn for duplication pairs (inferred from database[20]). FAM157C-204 exons 1-3, 5-6 and 7 are homologous to GTF2IP13-202 exons 11–13, 16-17 and 15. Notably, GTF2IP13-202 exon 15 (brown) is downstream of GTF2IP13-202 exon 16-17 (grey, lightblue) in FAM157C-204. FAM157C-204 exon 4 is homologous to GTF2IP13-202 intron sequence. FAM157C-204 last exon (8) is homologous to SEPT14-201 last exon.