| Literature DB >> 25651527 |
Deanna M Church, Valerie A Schneider, Karyn Meltz Steinberg, Michael C Schatz, Aaron R Quinlan, Chen-Shan Chin, Paul A Kitts, Bronwen Aken, Gabor T Marth, Michael M Hoffman, Javier Herrero, M Lisandra Zepeda Mendoza, Richard Durbin, Paul Flicek.
Abstract
The human genome reference assembly is crucial for aligning and analyzing sequence data, and for genome annotation, among other roles. However, the models and analysis assumptions that underlie the current assembly need revising to fully represent human sequence diversity. Improved analysis tools and updated data reporting formats are also required.Entities:
Mesh:
Year: 2015 PMID: 25651527 PMCID: PMC4305238 DOI: 10.1186/s13059-015-0587-3
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Examples of regions with alternative loci, sequences within these regions and genes unique to them
|
|
|
|
|
|---|---|---|---|
| APOBEC | GL383583.2 | 22 |
|
| MHC | GL000250.2 | 6 |
|
| GL000251.2 |
| ||
| GL000252.2 |
| ||
| GL000253.2 |
| ||
| GL000254.2 | |||
| GL000255.2 | |||
| GL000256.2 | |||
| KI270758.1 | |||
| CCL5_TBC1D3 | KI270909.1 | 17 |
|
|
| |||
|
| |||
| CYP2D6 | KB663069.1 | 22 |
|
| LRC_KIR | KI270938.1, | 19 |
|
| GL949747.2 | |||
| GL000209.2, |
| ||
| GL949747.2 | |||
| KI270882.1, | |||
| KI270887.1, | |||
| KI270890.1, | |||
| KI270916.1, | |||
| KI270921.1 | |||
| TRB | KI270803.1 | 7 |
|
Figure 1Histogram displaying unique sequence per alternative locus. While the vast majority of alternative loci contribute less than 50 kb of unique sequence, some contribute well over 100 kb of novel sequence.
Figure 2(a) Alignment of decoy-specific reads to GRCh38. To identify decoy-specific reads, the following read sets were used: /panfs/traces01/compress/1KG.p2/NA19200.mapped.ILLUMINA.bwa.YRI.low_coverage.20120522.8levels.csra and /panfs/traces01/compress/1KG.p2/NA12878.mapped.ILLUMINA.bwa.CEU.low_coverage.20120522.8levels.csra. Using SRPRISM [26] (with parameters [p] (force paired/unpaired search): false; [n] (maximum number of allowed errors): 6; [M] (maximum allowed memory usage): 2048.) reads were extracted with MAPQ >20 that uniquely aligned to SN:hs37d5 (the decoy). As a control, these reads were aligned to a target set comprising the GRCh37 primary assembly unit (GCF_000001305.13), chr. MT (GCF_000006015.1) and the decoy. Collectively, this target set is referred to as GRCh37pmd. To assess capture of the decoy sequence in the updated assembly, the reads were aligned to a target set comprising the GRCh38 primary assembly (GCF_000001305.14) and chr. MT (GCF_000006015.1) (referred to as GRCh38pm) or the full GRCh38 assembly (GCF_000001405.26). A captured read was defined as one that aligned only to the decoy (by SRPRISM) when using GRCh37pmd as the target set, and also aligned to GRCh38pm or full. (b) Incorporation of decoy sequences in GRCh38. Upper panel: graphical view of NT_187681.1, a GRCh38 alternative loci scaffold. Blue bars are assembly components; gene annotations are shown in green. Two components (AC226006.2 and AC208587.4, highlighted in red) in this alt scaffold are fosmids from which decoy sequences were derived (AC208587.4:7500–12511, AC226006.2:5581–8770, AC226006.2:20483–21626). The decoy sequences represent variants from the chromosome. The alignment of the decoy fragments and GRCh38 chr. 11 to the alternative scaffold are shown as gray bars (red tick, mismatch; blue tick, deletion; thin red line, insertion). Note that the decoy fragments align to regions that are missing from the reference chromosome. Lower panel: graphical view of NC_000011.10, GRCh38 chr.11. An assembly component (AC239832.2, highlighted in red) added to this region for GRCh38 is a fosmid from which decoy sequence was derived (AC239832.3:6602–10993, AC239832.3:24434–27364). In this case, the decoy adds exonic sequence from MUC2 that is missing from component AC139749.4. The alignments of the decoy fragments and NT_187681.1 to the alternative scaffold are shown as gray bars (red tick, mismatch; blue tick, deletion; thin red line, insertion).