| Literature DB >> 27845747 |
Shimna Sudheesh1, Preeti Verma2, John W Forster3,4, Noel O I Cogan5,6, Sukhjiwan Kaur7.
Abstract
RNA-Seq using second-generation sequencing technologies permits generation of a reference unigene set for a given species, in the absence of a well-annotated genome sequence, supporting functional genomics studies, gene characterisation and detailed expression analysis for specific morphophysiological or environmental stress response traits. A reference unigene set for lentil has been developed, consisting of 58,986 contigs and scaffolds with an N50 length of 1719 bp. Comparison to gene complements from related species, reference protein databases, previously published lentil transcriptomes and a draft genome sequence validated the current dataset in terms of degree of completeness and utility. A large proportion (98%) of unigenes were expressed in more than one tissue, at varying levels. Candidate genes associated with mechanisms of tolerance to both boron toxicity and time of flowering were identified, which can eventually be used for the development of gene-based markers. This study has provided a comprehensive, assembled and annotated reference gene set for lentil that can be used for multiple applications, permitting identification of genes for pathway-specific expression analysis, genetic modification approaches, development of resources for genotypic analysis, and assistance in the annotation of a future lentil genome sequence.Entities:
Keywords: Illumina; de novo assembly; legume; pulse; sequence annotation; tissue-specific gene expression
Mesh:
Substances:
Year: 2016 PMID: 27845747 PMCID: PMC5133886 DOI: 10.3390/ijms17111887
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Overview of sequencing outputs and assembly.
| Primary Assembly | Statistics |
|---|---|
| SOAPdenovo-Trans | |
| Total number of filtered reads | 660,842,789 |
| Total number of reads in contig assembly | 553,644,566 |
| Total number of scaffolds and contigs | 77,778 |
| N50 | 1731 |
| Total base pairs | 76,992,636 |
| Total base pairs without 'N' | 75,665,777 |
| Secondary assembly | |
| CAP3 | |
| Total number of scaffolds and contigs | 58,994 |
| N50 | 1719 |
| Total base pairs | 66,767,914 |
| Total base pairs without 'N' | 65,746,675 |
Figure 1Sequence conservation of the Cassab-derived reference unigene transcriptome in comparison to sequences from other species: (A) percentage of sequence similarity of Cassab-derived reference transcripts with sequences from other plant species in the Nr and UniRef100 databases; (B) Venn diagram summarising the distribution of BLAST matches between the Cassab-derived reference unigene transcriptome and sequences from three other legume genomes and two databases. Numbers within the Venn diagram indicate the number of sequences sharing similarity using BLAST.
Figure 2Functional annotation of assembled Cassab-derived reference transcripts based on gene ontology (GO) categorisation: GO analysis was performed at the level 2 for three main categories (biological process, molecular function, cellular component).
Overview of different assembly statistics and BLAST analysis.
| Assembly | Number of Transcripts | N50 | Average Transcript Length (bp) | Number of Transcripts with BLAST Hit to Reference Assembly | Number of Unique Reference Transcripts Having Hits |
|---|---|---|---|---|---|
| Kaur et al. (2011) assembly [ | 84,069 | 349 | 360 | 75,747 | 23,417 |
| Sharpe et al. (2013) assembly [ | 50,146 | 530 | 501 | 48,013 | 16,905 |
| Reference assembly | 58,994 | 1719 | 1132 | - | - |
Figure 3Expression patterns in different tissue samples: The percentage of transcripts expressed in each tissue sample.
Figure 4A heat map of the 1000 most differently expressed transcripts showing the hierarchical clustering of different tissues: The colour key represents the normalised log transformed counts. Red indicates high expression, white indicates intermediate expression and blue indicates low expression.
Figure 5Schematic depictions of comparisons between (A) the boron tolerance quantitative trait loci (QTL)-containing-interval on linkage group (LG) Lc IV.2 of the Cassab × ILL2024 linkage map and both lentil genome assembly v1.2 and candidate Cassab transcripts with corresponding M. truncatula boron tolerance candidate gene sequences; (B) The flowering time QTL-containing interval on LG1 of the LR-18 linkage map and both lentil genome assembly v1.2 and candidate Cassab-derived transcripts with corresponding flowering time genes from M. truncatula. LGs or pseudomolecules are labelled accordingly. The names of genetic markers are shown to the left of each LG.