| Literature DB >> 24564644 |
Vincent Ranwez, Yan Holtz, Gautier Sarah, Morgane Ardisson, Sylvain Santoni, Sylvain Glémin, Muriel Tavaud-Pirra, Jacques David.
Abstract
BACKGROUND: Using Next Generation Sequencing, SNP discovery is relatively easy on diploid species and still hampered in polyploid species by the confusion due to homeology. We develop HomeoSplitter; a fast and effective solution to split original contigs obtained by RNAseq into two homeologous sequences. It uses the differential expression of the two homeologous genes in the RNA. We verify that the new sequences are closer to the diploid progenitors of the allopolyploid species than the original contig. By remapping original reads on these new sequences, we also verify that the number of valuable detected SNPs has significantly increased.Entities:
Mesh:
Year: 2013 PMID: 24564644 PMCID: PMC3851826 DOI: 10.1186/1471-2105-14-S15-S15
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Overview of the SNPs identification pipeline .
Figure 2Notations and key principles of HomeoSplitter. Given a mapping on the contig (fragment) "ACCTGCT" one can count the nucleotides observed at each site for each accession (A). Questionable sites are those for which an excess of heterozygotes is observed (red arrows of A). Restricting the nucleotide counts to those questionable sites leads to the array Nb represented in (B). Even though homeologous copies are highly differentially expressed in each accession, considering them all at once here blur the signal. Indeed, at the second questionable site almost the same number of A (43) and G (44) are observed (C). To handle this problem, HomeoSplitter uses a specific expression bias for each accession; for instance considering the split defined by C= [2,1] (i.e., pattern "CA") the estimated proportion of Cwill be ~1/4 for the first accession (average of 5/20 and 10/42) and ~4/5 for the second one.
Figure 3Distribution of . Fis values were calculated as the heterozygosity deficit relative to panmixia. Allelic frequencies were estimated from the called genotypes with high confidence values (see text). Negative Fis values suggest fixed divergence between homeologous/paralogous copies. Fis around 0 indicate possible mixture of homeology and intra genome polymorphism. Fis values close to 1 sign a priori intra genome polymorphic sites. See Figure 1 for the pipeline leading to the detection of these three SNP sets
Figure 4Phylogenetic tree inferred on homologous contigs from cluster 6960. The original contig obtained by de novo assembling (contig_de_novo_durum) was more similar to the speltoides contig (contig_speltoides) than to the urartu one (contig_urartu). After using HomeoSplitter we obtained the most expressed split contig (contig_HomoeoSplitter_1) that is still similar to the speltoides one and one less expressed contig (contig_HomoeoSplitter_2) highly similar to the urartu one.