| Literature DB >> 27071032 |
Richard M Sharpe1,2, Tyson Koepke1,3, Artemus Harper3, John Grimes4, Marco Galli3, Mio Satoh-Cruz5, Ananth Kalyanaraman4, Katherine Evans3, David Kramer5, Amit Dhingra1,3.
Abstract
High-throughput sequencing continues to produce an immense volume of information that is processed and assembled into mature sequence data. Data analysis tools are urgently needed that leverage the embedded DNA sequence polymorphisms and consequent changes to restriction sites or sequence motifs in a high-throughput manner to enable biological experimentation. CisSERS was developed as a standalone open source tool to analyze sequence datasets and provide biologists with individual or comparative genome organization information in terms of presence and frequency of patterns or motifs such as restriction enzymes. Predicted agarose gel visualization of the custom analyses results was also integrated to enhance the usefulness of the software. CisSERS offers several novel functionalities, such as handling of large and multiple datasets in parallel, multiple restriction enzyme site detection and custom motif detection features, which are seamlessly integrated with real time agarose gel visualization. Using a simple fasta-formatted file as input, CisSERS utilizes the REBASE enzyme database. Results from CisSERS enable the user to make decisions for designing genotyping by sequencing experiments, reduced representation sequencing, 3'UTR sequencing, and cleaved amplified polymorphic sequence (CAPS) molecular markers for large sample sets. CisSERS is a java based graphical user interface built around a perl backbone. Several of the applications of CisSERS including CAPS molecular marker development were successfully validated using wet-lab experimentation. Here, we present the tool CisSERS and results from in-silico and corresponding wet-lab analyses demonstrating that CisSERS is a technology platform solution that facilitates efficient data utilization in genomics and genetics studies.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27071032 PMCID: PMC4829253 DOI: 10.1371/journal.pone.0152404
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Restriction Analysis Program Comparison.
A comparison of some essential traits of CisSERS and other restriction site analysis programs highlights the advantages of CisSERS and some of the shared components with previously available tools. Many of these tools were designed for CAPS marker or derived CAPS (dCAPS) marker development and each has varying limitations.
| Program | Web-based | Automated decision making | Primer design | Data input type | Enzyme list | Primary Functions | Predicted gel image | Major limitation | Citation |
|---|---|---|---|---|---|---|---|---|---|
| C | No | No | No | Fasta or multi-fasta | REBASE with customization | multiple digest site analyses | Yes | Processing resource limited | |
| dCAPS Finder 2.0 | Yes | No | Yes | Requires 2 sequences | Preset database | CAPS or dCAPS design | No | 2–60 base sequences | [ |
| BlastDigester | Yes | Yes | Yes | multi-fasta | unknown | CAPS design | No | Limited by Blast | [ |
| SNP2CAPS | No | Partial | No | Alignment file | User input | CAPS design | No | Multiple alignments | [ |
| CapsID | Yes | Yes | Yes | Alignments | unknown | CAPS design | Yes | unknown | [ |
| SNP cutter | Yes | no | Yes | dbSNP or preformatted SNP file | Premade lists using REBASE | CAPS or dCAPS design | No | Format dependent | [ |
| SNP-RFLPing | Yes | partial | Yes | SNPs | REBASE | CAPS or dCAPS design | No | Human and Rat only | [ |
| NEBcutter | Yes | No | Yes | Fasta | REBASE | Comprehensive digest site analysis | Yes | Max file size 1Mb, max sequence length300kb | [ |
Fig 1CisSERS experimental flow chart.
A graphical depiction of the three phases of the CisSERS process and their subsections.
AT rich transcription initiation motifs, total motif sites and number of identified motifs associated with annotated gene transcription initiation areas.
| Motif Name | Motif | Total number of Identified sites | Total number of gene Identified sites |
|---|---|---|---|
| weakTss-10-4N | 4847 | 227 | |
| weakTss-10-5N | 4715 | 220 | |
| weakTss-10-6N | 4759 | 206 | |
| weakTss-10-7N | 4769 | 197 | |
| weakTss-10-8N | 4585 | 192 | |
| weakTss-10-9N | 4544 | 196 | |
| weakTss-10-10N | 4669 | 208 | |
| weakTss-10-11N | 4565 | 212 | |
| weakTss-10-12N | 4488 | 217 |
CisSERS summary table of analysis of potential polyA initiation sites from 1,816,638 Arabidopsis cDNAs of the ATH_cDNA_EST_sequences_FASTA dataset from the ftp://ftp.arabidopsis.org/home/tair/Sequences/ website.
| PolyA Motif Name | Motif | Number of Motifs Found in the EST dataset | Percentage of ESTs Motif Found |
|---|---|---|---|
| PolyA_Init_1 | 358,447 | 19.73% | |
| PolyA_Init_10 | 117,536 | 6.47% | |
| PolyA_Init_11 | 169,316 | 9.32% | |
| PolyA_Init_2 | 446,967 | 24.60% | |
| PolyA_Init_3 | 152,030 | 8.37% | |
| PolyA_Init_4 | 152,717 | 8.41% | |
| PolyA_Init_5 | 142,002 | 7.82% | |
| PolyA_Init_6 | 224,897 | 12.38% | |
| PolyA_Init_7 | 166,592 | 9.17% | |
| PolyA_Init_8 | 174,486 | 9.60% | |
| PolyA_Init_9 | 135,197 | 7.44% | |
| PolyA_Init_canonical | 246,069 | 13.55% |
Fig 2Demonstration case 1: CisSERS predicted gel image vs. wet-lab gel visualization test with the Arabidopsis ATPC1, cfq mutant sequences.
The two were linked to create the “F1 het” lane image while the F1 heterozygous plant DNA was analyzed and labeled “F1 het” in the wet-lab validation image. The banding patterns of all three samples of the CisSERS prediction match the wet-lab validation confirming the effectiveness of CisSERS to determine effective CAPS marker enzymes.
Fig 3Demonstration case 2: CisSERS predicted gel image vs. wet-lab gel visualization test with the Galli sequences.
A. CisSERS predicted gel image of 12 identified alleles from 8 apple cultivar’s cDNA clones, and 2 linked gel images (Gold_Del, and Red_Grav). B. Wet-lab electrophoresed gel image of amplified products (#a) and corresponding restriction digest (#b); 1. ‘Macintosh’, 2. ‘Winesap’, 3. ‘Red Gravenstein’, 4. ‘Haralson’, 5. ‘Cox’s Orange Pippin’, 6. ‘Braeburn’, 7. ‘HoneyCrisp’, 8. ‘Golden Delicious’, MM = 100bp DNA molecular marker. Analysis of the individual cultivars (A: Haralson 2 and B: 4b) suggest that ‘Haralson’ is homozygous for the sequenced allele; (A: Macintosh 9, Macintosh 2, Macintosh 5 and B: 1b) indicates that each allele present in ‘Macintosh’ is not yet sequenced; and (A: Cox_Org 10, Cox_Org 5 and B: 5b) also indicates that each allele of ‘Cox’s Orange Pippin’ has not yet been sequenced.
Restriction enzymes identified with the highest predicted cut site percentages of the Arabidopsis cDNA dataset.
| Motif | Cut Seq | # Seqs Cut | # Total Cuts | Percent of Total Sequences Cut | Motif | Cut Seq | # Seqs Cut | # Total Cuts | Percent of Total Sequences Cut |
|---|---|---|---|---|---|---|---|---|---|
| AatII* | 66,910 | 70,203 | 4.37% | BsgI | 133,382 | 145,615 | 8.72% | ||
| AccI* | 607,694 | 884,956 | 39.73% | BsiWI* | 35,290 | 36,309 | 2.31% | ||
| AccIII* | 90,096 | 95,917 | 5.89% | Bso31I* | 258,799 | 300,090 | 16.92% | ||
| AclI* | 128,216 | 137,480 | 8.38% | Bsp1286I* | 745,627 | 1,233,399 | 48.74% | ||
| AcsI* | 890,483 | 1,555,052 | 58.21% | Bsp1407I* | 34,156 | 37,236 | 2.23% | ||
| AcuI* | 330,051 | 398,938 | 21.58% | BspHI* | 43,037 | 46,935 | 2.81% | ||
| AflII* | 132,971 | 141,810 | 8.69% | BspLI* | 336,757 | 762,139 | 22.01% | ||
| AflIII | 460,914 | 617,710 | 30.13% | BssECI* | 563,045 | 1,374,552 | 36.81% | ||
| ApaI* | 21,606 | 23,212 | 1.41% | BssHII* | 8,030 | 8,266 | 0.52% | ||
| ApaLI* | 47,527 | 49,058 | 3.11% | Bst1107I* | 54,764 | 56,807 | 3.58% | ||
| AvaI* | 462,908 | 609,100 | 30.26% | Bst6I* | 657,095 | 1,014,786 | 42.96% | ||
| BaeGI* | 76,258 | 93,761 | 4.99% | BstBAI* | 412,601 | 528,315 | 26.97% | ||
| BalI* | 111,189 | 119,300 | 7.27% | BstC8I* | 451,250 | 715,776 | 29.50% | ||
| BamHI | 113,866 | 122,596 | 7.44% | BstDSI* | 458,532 | 593,070 | 29.98% | ||
| BanI* | 458,532 | 593,070 | 29.98% | BstMCI* | 336,221 | 417,841 | 21.98% | ||
| BanII* | 462,908 | 609,100 | 30.26% | BstSFI* | 191,390 | 266,617 | 12.51% | ||
| Bbv12I* | 565,452 | 815,716 | 36.96% | BstV2I* | 412,497 | 525,230 | 26.97% | ||
| BclI* | 205,509 | 231,475 | 13.43% | BstX2I* | 665,518 | 1,048,893 | 43.51% | ||
| BfuAI* | 74,558 | 81,931 | 4.87% | BsuI* | 91,341 | 100,997 | 5.97% | ||
| BglII | 235,829 | 269,983 | 15.42% | BtgZI | 143,795 | 155,002 | 9.40% | ||
| BlnI* | 36,333 | 37,306 | 2.38% | BtsI | 170,608 | 190,109 | 11.15% | ||
| BmuI* | 68,520 | 73,919 | 4.48% | Cfr10I* | 447,942 | 605,631 | 29.28% | ||
| BpmI* | 238,862 | 274,263 | 15.61% | ClaI* | 147,263 | 161,798 | 9.63% | ||
| BpuEI | 344,371 | 414,909 | 22.51% | DraI | 221,889 | 257,617 | 14.51% | ||
| BsaWI | 610,391 | 902,258 | 39.90% | EaeI* | 447,942 | 605,631 | 29.28% | ||
| Bse3DI* | 213,691 | 236,982 | 13.97% | EciI | 150,946 | 170,687 | 9.87% | ||
| BseRI | 479,092 | 689,960 | 31.32% | Eco47III* | 33,781 | 35,254 | 2.21% | ||
| Eco52I* | 39,284 | 42,514 | 2.57% | NspV* | 93,554 | 100,814 | 6.12% | ||
| Eco57MI | 495,072 | 672,679 | 32.36% | PciI* | 132,704 | 143,845 | 8.68% | ||
| EcoRI | 132,547 | 142,270 | 8.66% | PinAI* | 109,189 | 118,721 | 7.14% | ||
| EcoRV* | 140,416 | 152,012 | 9.18% | PsiI* | 152,592 | 167,870 | 9.98% | ||
| Esp3I* | 135,776 | 159,958 | 8.88% | PspCI* | 42,489 | 46,326 | 2.78% | ||
| FspI* | 33,983 | 34,697 | 2.22% | PstI | 130,827 | 143,216 | 8.55% | ||
| HaeII* | 261,757 | 314,036 | 17.11% | PvuI* | 59,418 | 62,867 | 3.88% | ||
| Hin1I* | 418,344 | 541,230 | 27.35% | PvuII | 156,402 | 173,376 | 10.22% | ||
| HincII* | 652,459 | 970,197 | 42.65% | SacI* | 180,014 | 198,914 | 11.77% | ||
| HindIII | 353,375 | 429,786 | 23.10% | SacII* | 34,831 | 36,071 | 2.28% | ||
| HpaI* | 103,809 | 109,093 | 6.79% | SalI | 72,616 | 77,632 | 4.75% | ||
| Hpy166II* | 660,473 | 2,352,103 | 43.18% | ScaI* | 110,216 | 115,920 | 7.21% | ||
| Hpy188III | 1,310,936 | 4,488,839 | 85.70% | SmaI* | 37,962 | 39,858 | 2.48% | ||
| KpnI* | 59,800 | 61,722 | 3.91% | SmlI* | 779,347 | 1,353,965 | 50.95% | ||
| MluI | 29,567 | 30,996 | 1.93% | SnaBI* | 43,318 | 44,707 | 2.83% | ||
| MmeI | 454,882 | 595,189 | 29.74% | SpeI* | 63,608 | 66,363 | 4.16% | ||
| MspA1I | 468,051 | 617,393 | 30.60% | SphI* | 51,290 | 53,131 | 3.35% | ||
| MunI* | 122,886 | 130,536 | 8.03% | SspI | 162,718 | 182,345 | 10.64% | ||
| NaeI* | 48,336 | 50,517 | 3.16% | StuI* | 66,041 | 68,945 | 4.32% | ||
| NarI* | 39,029 | 39,944 | 2.55% | StyI* | 652,286 | 978,963 | 42.64% | ||
| NcoI* | 180,000 | 196,511 | 11.77% | TatI | 702,344 | 1,105,498 | 45.91% | ||
| NdeI* | 107,887 | 114,961 | 7.05% | TsoI | 439,177 | 562,079 | 28.71% | ||
| NheI* | 61,076 | 63,117 | 3.99% | VspI* | 131,420 | 145,945 | 8.59% | ||
| NmeAIII | 146,659 | 161,103 | 9.59% | XbaI | 109,179 | 116,409 | 7.14% | ||
| NruI* | 40,330 | 42,131 | 2.64% | XhoI* | 120,878 | 131,508 | 7.90% | ||
| NsiI* | 119,796 | 128,076 | 7.83% | ||||||
| NspI* | 424,159 | 561,111 | 27.73% | ||||||
Enzymes with a * symbol at the end represent multiple enzymes which all have the same recognition site.