| Literature DB >> 31350896 |
GiWon Shin1, Stephanie U Greer1, Li C Xia1, HoJoon Lee1, Jun Zhou2, T Christian Boles2, Hanlee P Ji1,3.
Abstract
The human genome is composed of two haplotypes, otherwise called diplotypes, which denote phased polymorphisms and structural variations (SVs) that are derived from both parents. Diplotypes place genetic variants in the context of cis-related variants from a diploid genome. As a result, they provide valuable information about hereditary transmission, context of SV, regulation of gene expression and other features which are informative for understanding human genetics. Successful diplotyping with short read whole genome sequencing generally requires either a large population or parent-child trio samples. To overcome these limitations, we developed a targeted sequencing method for generating megabase (Mb)-scale haplotypes with short reads. One selects specific 0.1-0.2 Mb high molecular weight DNA targets with custom-designed Cas9-guide RNA complexes followed by sequencing with barcoded linked reads. To test this approach, we designed three assays, targeting the BRCA1 gene, the entire 4-Mb major histocompatibility complex locus and 18 well-characterized SVs, respectively. Using an integrated alignment- and assembly-based approach, we generated comprehensive variant diplotypes spanning the entirety of the targeted loci and characterized SVs with exact breakpoints. Our results were comparable in quality to long read sequencing.Entities:
Mesh:
Year: 2019 PMID: 31350896 PMCID: PMC6821272 DOI: 10.1093/nar/gkz661
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.CATCH targeting and linked read sequencing of HMW DNA. (A) Overview of the process is illustrated. First, guide RNAs target and cut multiple genomic regions of interest. Second, target HMW DNA within the specific size range is isolated by an electrophoresis-based process. At last, the target DNA is used for linked read library preparation and sequencing. The alignment of barcode linked reads shows how sequence coverage is increased across the target segment. In the alignment plot, the X-axis indicates the reference coordinates and the Y-axis shows different barcodes representing individual HMW molecules. Dashed vertical lines indicate Cas9-gRNA cut sites. (B) Sequencing coverage for the target regions is shown for the three assays. For Assays 1 and 2, BRCA1-R2 and MHC-30 libraries are shown. For Assay 3, an example of a homozygous deletion (SV1) is shown. Black bars indicate the target regions. Blue and green areas in plots indicate coverage for forward and reverse reads, respectively.
Figure 2.Assembly results from assays targeting a single continuous region. (A) The 0.2 Mb BRCA1 assembly (BRCA1-R1) was compared with other long read assemblies. The X-axis indicates the coordinates of each NA12878 assembly across different platforms. The Y-axis indicates the corresponding segment from the GRCh38 reference. Each point indicates where a CRISPR-linked read aligned to the reference versus where it aligned to the NA12878 assemblies. (B) The assembled MHC scaffolds with assigned haplotype blocks where red and blue indicate the parental haplotype. The X-axis represents the GRCh38 reference genome on which the assembly scaffold is aligned. The HLA genes are indicated below the scaffolds, with the labels only for the major class I (HLA-A, HLA-B and HLA-C) and II (HLA-DRB1, HLA-DQA1 and HLA-DQB1) genes. All the NA12878 genotypes of these genes from our assembly are available in Table 1. For HLA-A, alignment of all the alleles coding the same protein sequence [A*11:01:XX(:XX)] are shown. The red dots indicate mismatch bases to the allele in Haplotype 1 (A*11:01:01:01).
Alignment of reported HLA gene alleles to NA12878 assemblies
| CRISPR-linked read assembly | Oxford assembly | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Haplotype 1 | Haplotype 2 | Haplotype 1 | Haplotype 2 | |||||||
| HLA gene | Aligned allele | Edit distanceb | Aligned allele | Edit distanceb | Aligned allele | Edit distanceb | Allele predictiona by Jain | Aligned allele | Edit distanceb | Allele predictiona by Jain |
| A | 11:01:01:01 | 0 | 01:01:01:01 | 0 | n.a. | 11:01:01G | n.a. | 01:01:01G | ||
| B | 56:01:01:01 + 2 alleles | 1 | 08:01:01:01 | 0 | n.a. | 56:01:01G | 08:177, 08:182 | 9 | 08:01:01G | |
| C | 01:02:01:02 + 28 alleles | 1 | 07:01:01:01 | 0 | 01:148 | 16 | 01:02:01G | 07:01:01:14Q | 10 | 07:01:01G |
| DQA1 | 01:01:01:03 | 2 | 05:01:01:02 | 0 | 01:16N | 82 | 01:01:01G | 05:03:01:01 | 129 | 05:01:01G |
| DQB1 | 05:01:01:03 | 0 | 02:01:01 | 0 | 05:01:01:03 | 4 | 05:01:01G | 02:02:01:01 | 45 | 02:01:01G |
| DRB1 | 01:01:01 | 2 | 03:01:01:01 | 0 | n.a. | 01:01:01G | n.a. | 03:01:01G | ||
Only the six major HLA genes are shown. All alignments for both assemblies are available in Supplementary Tables S10 and 11.
aThe prediction is based on exons 2 and 3 for MHC class I genes and exon 2 for MHC class II genes, and information regarding the ‘G’ group is available at ‘http://hla.alleles.org/alleles/g_groups.html’.
bThe edit distance is between assembly versus allele.
n.a.: no alignment having a percent match >90%.
Figure 3.Assembly results from multiplex SV assay. (A) CRISPR-linked read assembly for SV1 in Assay 3 was aligned to the reference genome and compared with other long read assemblies. Red, green and blue bars indicate the portion of the assembly that aligns to the reference while a gray gap indicates no alignment. Although the gray regions have some similarity to the reference, they generally have too many homopolymer errors to successfully align. Fraction of aligned bases in the assembly is indicated at the end of the bars. (B) Two different sets of deletion breakpoints were determined for SV5. CRISPR-linked read SV assay captured the two SV alleles with different deletion sizes. For (A and B), the X- and Y-axes indicate the reference coordinates and the alignment of barcoded linked reads, respectively. Dashed vertical lines indicate Cas9-gRNA cut sites. (C) Illustration of how the breakpoints are determined in segmental duplications. Duplicated copies from GRCh38 reference genome were aligned to CRISPR-linked read assemblies. Breakpoint ranges in the reference duplicates were determined by alignment and mismatches. The example shown here is from our SV17 assembly. The two 15-kb segments have 93% similarity.