| Literature DB >> 30602775 |
Lukas F K Kuderna1, Esther Lizano2, Eva Julià3,4, Jessica Gomez-Garrido5, Aitor Serres-Armero6, Martin Kuhlwilm6, Regina Antoni Alandes5, Marina Alvarez-Estape6, David Juan6, Heath Simon5,7, Tyler Alioto5,7, Marta Gut5,7, Ivo Gut5,7, Mikkel Heide Schierup8,9, Oscar Fornas4,7, Tomas Marques-Bonet10,11,12,13,14.
Abstract
Mammalian Y chromosomes are often neglected from genomic analysis. Due to their inherent assembly difficulties, high repeat content, and large ampliconic regions, only a handful of species have their Y chromosome properly characterized. To date, just a single human reference quality Y chromosome, of European ancestry, is available due to a lack of accessible methodology. To facilitate the assembly of such complicated genomic territory, we developed a novel strategy to sequence native, unamplified flow sorted DNA on a MinION nanopore sequencing device. Our approach yields a highly continuous assembly of the first human Y chromosome of African origin. It constitutes a significant improvement over comparable previous methods, increasing continuity by more than 800%. Sequencing native DNA also allows to take advantage of the nanopore signal data to detect epigenetic modifications in situ. This approach is in theory generalizable to any species simplifying the assembly of extremely large and repetitive genomes.Entities:
Mesh:
Year: 2019 PMID: 30602775 PMCID: PMC6315018 DOI: 10.1038/s41467-018-07885-5
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Flow-sorting and sequencing specificity. a Flow-karyogram of a human genome. The different clusters correspond to different chromosomes. The red circle delimits the cluster corresponding to the Y chromosome used for this project. b Enrichment specificity of the sequencing data. Sequences on the Y chromosome are ~ 110-fold enriched compared with WGS sequencing. Chromosome 22 partially co-sorts with Y. All other chromosomes are depleted. c Read length (log10 scale) distribution of the four runs. d N50 values for all four runs and the combined dataset. Colors in panels c and d correspond to the different runs. Source data are provided as a Source Data file
Fig. 2Chromosome-Y assembly overview and comparisons. a Dot-plot comparing the resolved MSY of GRCh38 with HG02982. The reconstruction is highly continuous along most sequence classes, with ampliconic regions showing a higher degree of fragmentation. Seg. Dup. (intra) refers to intra-chromosomal segmental duplications, Seg. Dup. (inter) refers to inter-chromosomal segmental duplications. Altogether, ~ 50% of the of the Y chromosomes resolved sequence space in GRCh38—> 13 Mb—are annotated as segmental duplications. b Treemap comparing the contiguity of HG02982 chrY with GRCh38 chrY and the gorilla Y chromosome by Tomaszkiewicz et al. The size of each rectangle corresponds to the size of a contig within each of the assemblies. Neighboring rectangles are colored differently as a visual aid. c Repeat landscape of common, interspersed repeats annotated equally in GRCh38 and HG02982. Common repeats—including very recent ones—are well resolved in HG02982. The exception are satellite sequences, and a population of somewhat divergent (~ 20%) LTR elements, which are absent in HG02982 (see supplementary Figures 7-9). Source data are provided as a Source Data file
Assembly statistics overview
| Seq. class | Aln. HG02982 (b) | HG02982 ID SNP (%) | HG02982 ID SNP + InDel(%) | Rec. HG02982 (%) | Aln. NA24385 (b) | Rec. in NA24385 (%) | Len. w/o gaps (b) |
|---|---|---|---|---|---|---|---|
| Ampliconic | 6,146,087 | 99.91 | 99.67 | 62.67 | 5,242,461 | 53.46 | 9,807,089 |
| Heterochromatic | 543,005 | 99.66 | 99.31 | 32.77 | 171,045 | 10.32 | 1,656,797 |
| Others | 295,160 | 99.47 | 99.18 | 385.59 | 63,973 | 83.57 | 76,547 |
| Pseudo-autosomal | 2,219,743 | 99.58 | 99.13 | 78.02 | 117,626 | 4.13 | 2,844,939 |
| X-degenerate | 8,537,493 | 99.95 | 99.81 | 98.94 | 8,238,733 | 95.48 | 8,628,904 |
| X-transposed | 3,374,011 | 99.94 | 99.81 | 99.21 | 1,474,610 | 43.36 | 3,400,750 |
Summary of sequence class coverage of HG02982 versus GRCh38, as well as the contigs from NA23385 identified as derived from the Y chromosome. The proportion of recovered sequences and % identity are calculated over the resolved sequences in GRCh38, excluding gaps. There are currently 30.8 Mb of unresolved sequence (represented by the ambiguous base N) in the reference Y chromosome of GRCh38, the vast majority of which belongs to heterochromatin on the q arm
Aln.: aligned bases to GRCh38, ID.: percent identical bases in GRCh38, Rec.: recovered proportion from GRCh38, Len.: length in GRCh38