| Literature DB >> 28155670 |
Piyarat Ponyared1, Jiradej Ponsawat2, Sissades Tongsima3, Pusadee Seresangtakul4, Chutipong Akkasaeng5, Nathpapat Tantisuwichwong6.
Abstract
BACKGROUND: Simple sequence repeats (SSRs) have become widely used as molecular markers in plant genetic studies due to their abundance, high allelic variation at each locus and simplicity to analyze using conventional PCR amplification. To study plants with unknown genome sequence, SSR markers from Expressed Sequence Tags (ESTs), which can be obtained from the plant mRNA (converted to cDNA), must be utilized. With the advent of high-throughput sequencing technology, huge EST sequence data have been generated and are now accessible from many public databases. However, SSR marker identification from a large in-house or public EST collection requires a computational pipeline that makes use of several standard bioinformatic tools to design high quality EST-SSR primers. Some of these computational tools are not users friendly and must be tightly integrated with reference genomic databases.Entities:
Keywords: Bioinformatics; EST-SSR development pipeline; Expressed sequence tags (ESTs); Simple sequence repeats (SSRs)
Mesh:
Substances:
Year: 2016 PMID: 28155670 PMCID: PMC5260030 DOI: 10.1186/s12864-016-3328-4
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1ESAP Plus workflow. The system is based on three-tier architecture including user interface, processing logic, and database tier. Users can interact with the pipeline via the web interface to submit input data and run the pipeline. Processing logic tier consists of multiple tasks which facilitate the analysis of input ESTs. All data and results from the pipeline are stored in the database tier. Users can view and download the stored data though ESAP Plus web interface
Fig. 2A screenshot of ESAP Plus interface. a Upload section: users can upload input data into the pipeline. b Software parameters can be configured at all stages of the proposed pipeline. The clustering step provides users to choose between two optional software tools for clustering or/and assembly ESTs, namely CD-HIT-EST and TGICL. It also provides two alternatives for mining SSR by using MISA or RepeatMasker. c User page shows the status of all users’ projects such as configuration jobs, running jobs, and completed jobs
Fig. 3Overview of ESAP Plus result page. ESAP plus offers graphical web interface for users to visualize statistical information during the processing of input ESTs. a An overview of a project is shown in pie chart. b Users can view detailed information such as length and N detection. c Types of SSR, SSR motif as well as repeats are reported with detailed information of each SSR primer, e.g., forward and reverse primers, position, expected product size
Number of intermediate EST results at each ESAP Plus major stage
| Number of raw ESTs | 232352 | |||
|---|---|---|---|---|
| Step 1. Pre-processing | ||||
| N > 5% | 1322 | |||
| Vector trimming | 42594 | |||
| Vector-contaminated deletion | 62 | |||
| Low-complexity masking | 5194 | |||
| Number of high-quality ESTs | 226078 | |||
| Step 2. Clustering and assembly | ||||
| Software options | CD-HIT-EST | TGICL | ||
| Number of NR clusters/ASs | 142788 | 65713 | ||
| Total length (bp) | 95445342 | 51256243 | ||
| Step 3. SSR identification | ||||
| Software options | MISA | RepeatMasker | MISA | RepeatMasker |
| Number of SSR containing in NR clusters/ASs | 9327 | 25708 | 5412 | 14490 |
| Total number of identified SSRs | 10240 | 32021 | 5951 | 18526 |
| Number of SSR types | ||||
| di-NTRs | 2446 | 1784 | 1203 | 884 |
| tri-NTRs | 7311 | 15410 | 4446 | 9291 |
| tetra-NTRs | 247 | 2419 | 159 | 1341 |
| penta-NTRs | 128 | 2524 | 77 | 1355 |
| hexa-NTRs | 108 | 9884 | 66 | 5655 |
| An average of 1 SSR/Kbp | 9.32 | 2.98 | 8.61 | 2.77 |
| Step 4. Primer design | ||||
| Number of successfully designed SSR primer pairs | 7850 | 6272 | 4613 | 3783 |
Feature comparisons of publicly available EST analysis pipelines
| Pipeline options | ||||||
|---|---|---|---|---|---|---|
| Pre-processing | Clustering and assembly | SSR mining | Primer design | Website | Output | |
| PESTAS | Phred | TGICL | - | - | JSP | View on website |
| Cross_match | ||||||
| SeqClean | ||||||
| RepeatMasker | ||||||
| ESTpass | Cross_match | CAP3 | - | - | HTML | View on website |
| RepeatMasker | d2_cluster | Java | Download file | |||
| ESMP | Cross_match | CAP3 | MISA | - | HTML | View on website |
| Trimest | CSS | Download file (.rar) | ||||
| Java script | ||||||
| PHP | ||||||
| ESAP Plus | Length_N.pl | CD-HIT-EST | MISA | BatchPrimer3 | HTML | View on website |
| SeqClean | TGICL | RepeatMasker | CSS | Download file (.zip) | ||
| RepeatMasker | Java script | |||||
| PHP | ||||||