| Literature DB >> 27378447 |
Amina Khan1, Eric J Belfield2, Nicholas P Harberd2, Aziz Mithani1.
Abstract
Characterization of homoeallelic base-identity in allopolyploids is difficult since homeologous subgenomes are closely related and becomes further challenging if diploid-progenitor data is missing. We present HANDS2, a next-generation sequencing-based tool that enables highly accurate (>90%) genome-wide discovery of homeolog-specific base-identity in allopolyploids even in the absence of a diploid-progenitor. We applied HANDS2 to the transcriptomes of various cruciferous plants belonging to genus Brassica. Our results suggest that the three C genomes in Brassica are more similar to each other than the three A genomes, and provide important insights into the relationships between various Brassica tetraploids and their diploid-progenitors at a single-base resolution.Entities:
Mesh:
Year: 2016 PMID: 27378447 PMCID: PMC4932600 DOI: 10.1038/srep29234
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Characterization of homoeallelic base-identities using HANDS2.
(a) Illustration of Homeolog-Specific Polymorphisms (HSPs) in the B. napus genome using RNA-seq data. RNA-seq reads from B. napus (allotetraploid) and the two progenitor-diploids (B. rapa and B. oleracea) were aligned against the B. rapa transcriptomic reference sequence. Bases that match the reference sequence are shown in grey and base substitutions (versus the reference sequence) are shown in other colours. HSP positions are marked with arrows. (b) An example of using HANDS2 to assign homoeallelic base-identities at HSP positions. HANDS2 takes sequencing alignment/mapping (SAM) file of the polyploid, start and end coordinates of genes/contigs, list of HSPs in the polyploid in conjunction with the lists of single base substitutions (SBSs) in the diploid-progenitors and optional coverage files (for the validation of HSPs and SBSs) as input to assign homoeallelic base-identities in the polyploid using a six-step algorithm: creation of base patterns from aligned reads; filtering of potential sequencing errors and embedded base patterns; iterative merging of overlapping base patterns; assignment of base patterns to subgenomes; assignment of bases to subgenomes using the assigned base patterns; and finalization of base assignments to subgenomes. A green star indicates a new step introduced in HANDS2 whereas a blue start indicates an improvement in HANDS2 over HANDS (see text for details). (c) Iterative merging of overlapping base patterns. The base patterns are depicted as grey lines with colours indicating the overlap between different patterns. The patterns are iteratively merged such that the two patterns with the longest overlap are merged first followed by the second longest overlap and so on.
Performance comparison of HANDS2 versus HANDS using T. aestivum data.
| Chromosome | Missing Diploid | Subgenome | HANDS | HANDS2 | ||||
|---|---|---|---|---|---|---|---|---|
| Positions Assigned | Filtered Positions | Correct Assignments | Positions Assigned | Filtered Positions | Correct Assignments | |||
| Chr 1 | None | A | 28,324 | 26,478 (93.48%) | 26,005 (98.21%) | 34,154 | 32,756 (95.91%) | 32,079 (97.93%) |
| B | 27,453 | 25,555 (93.09%) | 24,714 (96.71%) | 32,943 | 31,456 (95.49%) | 30,408 (96.67%) | ||
| D | 28,788 | 26,592 (92.37%) | 26,287 (98.85%) | 34,727 | 33,220 (95.66%) | 32,690 (98.40%) | ||
| A | 32,989 | 32,005 (97.02%) | 31,237 (97.60%) | |||||
| B | Option not available | 33,492 | 29,273 (87.40%) | 28,171 (96.24%) | ||||
| D | 33,439 | 32,499 (97.19%) | 31,958 (98.34%) | |||||
| Chr 5 | None | A | 34,553 | 32,166 (93.09%) | 31,677 (98.48%) | 42,209 | 40,339 (95.57%) | 39,539 (98.02%) |
| B | 33,499 | 31,140 (92.96%) | 30,141 (96.79%) | 40,825 | 38,970 (95.46%) | 37,560 (96.38%) | ||
| D | 36,363 | 33,435 (91.95%) | 33,053 (98.86%) | 44,406 | 42,545 (95.81%) | 41,823 (98.30%) | ||
| A | 40,904 | 39,522 (96.62%) | 38,762 (98.08%) | |||||
| B | Option not available | 42,802 | 37,262 (87.06%) | 35,913 (96.38%) | ||||
| D | 43,043 | 41,714 (96.91%) | 41,031 (98.36%) | |||||
*Positions where all HSP bases were assigned to the three sub-genomes, the genome was not silenced in the hexaploid and the diploid had an unambiguous base with read coverage ≥3.
§Assignments were evaluated against the assignments made using nullisomic-tetrasomic lines. See text for details.
Figure 2Characterization of homoeallelic base-identities in Brassica.
Diploid species are depicted using single circles whereas tetraploids are shown with double circles with colours corresponding to their diploid-progenitors. The arrows represent the relationships between different tetraploid subgenomes and their corresponding diploid-progenitors whereas the dashed lines represent the relationships between corresponding subgenomes in different tetraploids. The first number along these lines represents the number of shared bases between a subgenomes and its diploid-progenitor (arrow), and between the two subgenomes (dashed line) whereas the second number represents the number of shared positions between them. The number inside the dotted triangle is the number of HSP positions shared between all three tetraploids and the numbers in colours along the dotted lines represent the number of shared bases in the corresponding subgenomes (A: red, B: blue, and C: green) at these common positions. The coloured numbers along the solid lines represent the number shared bases between the two subgenomes and the corresponding diploid-progenitor.
HSP characterization in B. napus using HANDS2.
| Reference | Missing Diploid | Subgenome | Positions Assigned | Filtered Positions | Correct Assignments |
|---|---|---|---|---|---|
| None | A/C | 495,164 | 448,972 (90.67%) | – | |
| A/C | 495,164 | 467,321 (94.38%) | 430,030 (92.02%) | ||
| A/C | 495,164 | 461,528 (93.21%) | 424,215 (91.92%) | ||
| None | A/C | 502,716 | 458,244 (91.15%) | – | |
| A/C | 502,716 | 467,382 (92.97%) | 432,353 (92.51%) | ||
| A/C | 502,716 | 474,560 (94.40%) | 439,153 (92.54%) |
*Positions where all HSP bases were assigned to the two sub-genomes. These positions were used for further analysis.
§Compared to complete dataset.