| Literature DB >> 31289836 |
Timothy H Webster1,2, Madeline Couse3,4, Bruno M Grande5, Eric Karlins6, Tanya N Phung7, Phillip A Richmond4,8, Whitney Whitford9,10, Melissa A Wilson1,11.
Abstract
BACKGROUND: Mammalian X and Y chromosomes share a common evolutionary origin and retain regions of high sequence similarity. Similar sequence content can confound the mapping of short next-generation sequencing reads to a reference genome. It is therefore possible that the presence of both sex chromosomes in a reference genome can cause technical artifacts in genomic data and affect downstream analyses and applications. Understanding this problem is critical for medical genomics and population genomic inference.Entities:
Keywords: X chromosome; Y chromosome; aneuploidy; genomics; mapping; ploidy; variant calling
Mesh:
Year: 2019 PMID: 31289836 PMCID: PMC6615978 DOI: 10.1093/gigascience/giz074
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:Sequencing depth on chromosome X before and after XYalign. Mean sequencing depth for the Dataset 1 XX individual in 5-kb windows across the X chromosome before (A) and after (B) XYalign processing. Changes in depth (D) are presented as the sign of the difference times the absolute value of the log10 difference, where the difference is depth after XYalign minus depth before XYalign. The chromosome map (C) presents the location of X chromosome genomic features depicted in the legend. X chromosome coordinates are identical in all plots.
Figure 3:Y chromosome sequencing depth and quality. Mean sequencing depth (A) and mapping quality (MAPQ; C) for the Dataset 1 XY individual in 5-kb windows across the Y chromosome. The chromosome map (B) presents the location of Y chromosome genomic features depicted in the legend. Y chromosome coordinates are identical in all plots.
Figure 2:Mapping quality on chromosome X before and after XYalign. Mean mapping quality (MAPQ) for the Dataset 1 XX individual in 5-kb windows across the X chromosome before (A) and after (B) XYalign processing. Changes in MAPQ (D) are presented as the difference in MAPQ after XYalign minus MAPQ before XYalign. The chromosome map (C) presents the location of X chromosome genomic features depicted in the legend. X chromosome coordinates are identical in all plots.
The effect of sex chromosome homology on variant calling on the X chromosome[a]
| Region[ | Length (bp)[ | Before Only (per Mb)[ | After Only (per Mb)[ |
|---|---|---|---|
| PAR1 | 2,589,520 | 0 (0) | 7,563 (2,920.6) |
| PAR2 | 329,516 | 0 (0) | 633 (1921) |
| XTR | 4,287,237 | 40 (9.3) | 366 (85.4) |
| XAR | 55,982,492 | 299 (5.3) | 400 (7.2) |
| XCR | 89,011,795 | 610 (6.9) | 523 (5.9) |
| Total | 152,250,560 | 949 (6.2) | 9,485 (62.3) |
aHigh-coverage whole-genome data from XX individual in Dataset 1.
bPAR1: pseudoautosomal region 1; PAR2: pseudoautosomal region 2; XTR: X-transposed region; XAR: X-added region; XCR: X-conserved region.
cTotal sequence length of region in base pairs.
dTotal number of variants, after filtering, present before but not after Y chromosome masking. Variants per Mb of sequence are presented in parentheses.
eTotal number of variants, after filtering, present after but not before Y chromosome masking. Variants per Mb of sequence are presented in parentheses.
Figure 4:Read balance in XY and XX samples. Histograms of read balance for an XY sample (left column; A, C, and E) and XX sample (right column; B and D) from Dataset 1 across chromosome 19 (A and B), chromosome X (C and D), and chromosome Y (E). Read balance at a given site is defined as the number of reads containing a non-reference allele divided by the total number of reads mapped to a site. Read balances between 0.05 and 1.0, non-inclusive, are presented to highlight “heterozygous” read balances. Full distributions, including fixed sites, are presented in Supplemental Fig. 1.
Figure 5:Relative sequencing depth and mapping quality on the X and Y chromosomes across different sequencing strategies. Values of relative (A) sequencing depth and (B) mapping quality come from exome (circles), low-coverage whole-genome sequencing (WGS; squares), and high-coverage WGS (triangles) for a single male (green) and female (blue) individual. Mean depth and MAPQ on chromosome 19 was used to normalize the sex chromosomes.