| Literature DB >> 33034633 |
Davide Bolognini1,2, Alberto Magi3, Vladimir Benes2, Jan O Korbel4, Tobias Rausch2,4.
Abstract
BACKGROUND: Tandem repeat sequences are widespread in the human genome, and their expansions cause multiple repeat-mediated disorders. Genome-wide discovery approaches are needed to fully elucidate their roles in health and disease, but resolving tandem repeat variation accurately remains a challenging task. While traditional mapping-based approaches using short-read data have severe limitations in the size and type of tandem repeats they can resolve, recent third-generation sequencing technologies exhibit substantially higher sequencing error rates, which complicates repeat resolution.Entities:
Keywords: bioinformatics software; long-read sequencing; tandem repeat variation
Year: 2020 PMID: 33034633 PMCID: PMC7539535 DOI: 10.1093/gigascience/giaa101
Source DB: PubMed Journal: Gigascience ISSN: 2047-217X Impact factor: 6.524
Figure 1:TRiCoLOR’s P (x-axis), R (y-axis), and F1 (dashed lines) on synthetic TR contractions (A) and expansions (B). ONT and PB reads exhibit variable error rates (accuracy ∼0.85, red; accuracy ∼0.90, blue; accuracy ∼0.95, green) and were simulated using variable haplotype-specific depth of coverage. P, R, and F1 were calculated allowing no motif discrepancies (circles), 1 motif discrepancy (triangles), or 2 motif discrepancies (rhombi) between TRiCoLOR’s predictions and the number of repeated motifs in the ground truth.
Figure 2:Correlation results between the number of repeated motifs in the ground truth (x-axis) and the number of repeated motifs predicted by TRiCoLOR and NCRF (y-axis) for synthetic TR contractions (A) and expansions (B). Each dot represents the synthetic contraction/expansion of a single TR. R is the Pearson correlation coefficient, p is the P-value of the linear regression analysis, m is the slope of the regression line, and the dashed line is the bisector of the first quadrant angle that marks the perfect correspondence between expected and predicted number of TRs.
Comparison between TRiCoLOR’s mapping-based and HGSVC’s assembly-based approaches for Mendelian consistent long TRs identified by TRiCoLOR on the HG0733 PB individual
| Chromosome | Start | End | HGSVC assembly | TRiCoLOR call |
|---|---|---|---|---|
| chr1 | 23703657 | 23703893 | DEL;INS | DEL;INS |
| chr1 | 223672571 | 223672681 | INS;INS | INS;INS |
| chr10 | 69539376 | 69539572 | INS;INS | INS;INS |
| chr11 | 79190887 | 79191145 |
|
|
| chr11 | 128436913 | 128437081 | INS;INS | INS;INS |
| chr14 | 84276747 | 84276903 |
|
|
| chr15 | 70364402 | 70364587 | INS; | INS; |
| chr16 | 3529535 | 3529854 | REF; | LC; |
| chr17 | 27525992 | 27526118 | INS;INS | INS;INS |
| chr18 | 44544809 | 44545037 | INS;INS | INS;INS |
| chr18 | 59081301 | 59081379 | INS; | INS; |
| chr18 | 71198388 | 71198450 | REF; | REF; |
| chr2 | 160426201 | 160426342 | INS;INS | INS;INS |
| chr2 | 211860947 | 211861156 | DEL; | DEL; |
| chr21 | 35063465 | 35063588 | INS;INS | INS;INS |
| chr22 | 46174187 | 46174274 | REF;INS | REF;INS |
| chr3 | 13856835 | 13857013 | DEL;INS | DEL;INS |
| chr4 | 13807826 | 13807982 | REF; | REF; |
| chr4 | 18837113 | 18837320 | INS;DEL | INS;DEL |
| chr4 | 81637241 | 81637408 | DEL;DEL | DEL;DEL |
| chr5 | 54513584 | 54513735 |
|
|
| chr6 | 25450910 | 25450975 | REF; | REF; |
| chr6 | 55543085 | 55543393 | INS;INS | INS;INS |
| chr6 | 106945844 | 106946002 | DEL; | DEL; |
| chr7 | 38610247 | 38610412 |
|
|
| chr7 | 71847696 | 71847865 | INS;INS | INS;INS |
| chr7 | 109663557 | 109663744 | INS;DEL | INS;DEL |
| chr7 | 131933466 | 131933651 | INS;INS | INS;INS |
| chr9 | 82850174 | 82850347 | DEL;DEL | DEL;DEL |
| chr9 | 91622218 | 91622365 |
|
|
| chr9 | 91634814 | 91634973 |
|
|
| chr9 | 116632126 | 116632280 | INS;INS | INS;INS |
DEL: deletion; INS: insertion; REF: reference allele; NA: region is not covered by the assembly or mis-assembled; LC: TRiCoLOR could not generate a consensus sequence for the allele owing to the low coverage in the region. The 2 alleles are separated by a semicolon.