| Literature DB >> 28653026 |
K V S S R Murthy1, K V V Satyanarayana2.
Abstract
Now a day׳s SSRs occupy the dominant role in different areas of bio-informatics like new virus identification, DNA finger printing, paternity & maternity identification, disease identification, future disease expectations and possibilities etc., Due to their wide applications in various fields and their significance, SSRs have been the area of interest for many researchers. In the SSRs extraction, retrieval algorithms are used; if retrieval algorithms quality is improved then automatically SSRs extraction system will achieve the most relevant results. For this retrieval purpose in this paper a new retrieval mechanism is proposed which will extracted the MONO, DI and TRI patterns. To extract the MONO, DI and TRI patterns using proposed retrieval mechanism in this paper, DNA sequence of 1403 virus genome data sets are considered and different MONO, DI and TRI patterns are searched in the data genome sequence file. The proposed Next Generation Sequencing (NGS) retrieval mechanism extracted the MONO, DI and TRI patterns without missing anything. It is observed that the retrieval mechanism reduces the unnecessary comparisons. Finally the extracted SSRs provide the useful, single view and useful resource to researchers.Entities:
Year: 2017 PMID: 28653026 PMCID: PMC5476967 DOI: 10.1016/j.dib.2017.06.008
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1virus_category table actual data.
virus_category.
| Type | Collation |
|---|---|
| varchar(100) | |
| varchar(20) | |
| category1 | varchar(20) |
| category2 | varchar(20) |
| – | – |
virus_category.
| Type | Collation |
|---|---|
| varchar(100) | |
| varchar(20) | |
| varchar(20) | |
| varchar(20) | |
| varchar(20) | |
| varchar(20) | |
| int(15) | |
| varchar(20) |
Fig. 2virus_acgt_count table actual data.
virus_ssrs.
| Type | Collation |
|---|---|
| varchar(100) | |
| varchar(20) | |
| varchar(20) | |
| int(10) | |
| int(10) |
Fig. 3virus_ssrs table actual data.
Category wise virus genome sequences.
| Amalgaviridae | 4 |
| Ampullaviridae | 1 |
| Anelloviridae | 6 |
| Aumaivirus. | 1 |
| Bacilladnavirus | 4 |
| Baculoviridae | 1 |
| Bicaudaviridae | 1 |
| Birnaviridae | 4 |
| Botybirnavirus. | 1 |
| Caudovirales | 14 |
| Caulimoviridae | 34 |
| Chrysoviridae | 2 |
| Circoviridae | 35 |
| Corticoviridae | 1 |
| Endornaviridae | 8 |
| Fuselloviridae | 4 |
| Geminiviridae | 141 |
| Hepadnaviridae | 10 |
| Herpesvirales | 2 |
| Hypoviridae | 3 |
| Inoviridae | 7 |
| Lavidaviridae | 1 |
| Ligamenvirales | 6 |
| Microviridae | 5 |
| Mimiviridae | 2 |
| Nanoviridae | 5 |
| Papanivirus. | 1 |
| Papillomaviridae | 85 |
| Partitiviridae | 21 |
| Parvoviridae | 40 |
| Polyomaviridae | 39 |
| Poxviridae | 1 |
| Reoviridae | 3 |
| Retroviridae | 42 |
| Salterprovirus | 2 |
| Satellite Nucleic Acids | 75 |
| Satellites | 4 |
| ssRNA negative-strand viruses | 151 |
| ssRNA positive-strand viruses, no DNA | 566 |
| Totiviridae | 26 |
| Turriviridae | 1 |
| unassigned ssRNA viruses | 1 |
| unclassified dsDNA phages. | 1 |
| unclassified dsDNA viruses. | 2 |
| unclassified Gemycircularvirus. | 7 |
| unclassified ssDNA viruses. | 30 |
| unclassified ssRNA viruses. | 2 |
| Total | 1403 |
Fig. 4category wise virus count.
Virus genome overall frequency, MONO, DI and TRI frequencies.
| 1.2482250811894526 | |||
| 2.4448562907955393 | |||
| 1.0749041913092998 | |||
| 1.0247784693226274 | |||
Virus genome sizes and their classification based on different size ranges.
| Circoviridae | 1 |
| Nanoviridae | 3 |
| Papanivirus. | 1 |
| Partitiviridae | 2 |
| Satellite Nucleic Acids | 20 |
| ssRNA negative-strand viruses | 2 |
| ssRNA positive-strand viruses, no DNA | 2 |
| Aumaivirus. | 1 |
| Circoviridae | 27 |
| Nanoviridae | 2 |
| Partitiviridae | 12 |
| Reoviridae | 1 |
| Satellite Nucleic Acids | 55 |
| Satellites | 4 |
| ssRNA negative-strand viruses | 15 |
| ssRNA positive-strand viruses, no DNA | 12 |
| unclassified ssDNA viruses. | 6 |
| Ampullaviridae | 1 |
| Caudovirales | 8 |
| Endornaviridae | 7 |
| Fuselloviridae | 4 |
| Hypoviridae | 1 |
| Lavidaviridae | 1 |
| Ligamenvirales | 6 |
| Retroviridae | 8 |
| Salterprovirus | 2 |
| ssRNA negative-strand viruses | 81 |
| ssRNA positive-strand viruses, no DNA | 89 |
| Totiviridae | 1 |
| Turriviridae | 1 |
| unclassified dsDNA viruses. | 1 |
| unclassified ssDNA viruses. | 1 |
| Bicaudaviridae | 1 |
| Caudovirales | 2 |
| Baculoviridae | 1 |
| Caudovirales | 3 |
| Herpesvirales | 2 |
| Poxviridae | 1 |
| Mimiviridae | 2 |
Virus ggenome sizes of Mitochondria category wise.
| Amalgaviridae | 3110 | 3387 | 3314.0000 |
| Ampullaviridae | 23471 | 23471 | 23,471.0000 |
| Anelloviridae | 2109 | 3720 | 2782.8333 |
| Aumaivirus. | 1151 | 1151 | 1151.0000 |
| Bacilladnavirus | 5472 | 5914 | 5668.2500 |
| Baculoviridae | 152844 | 152844 | 152,844.0000 |
| Bicaudaviridae | 61833 | 61833 | 61,833.0000 |
| Birnaviridae | 2744 | 3380 | 3203.5000 |
| Botybirnavirus. | 6126 | 6126 | 6126.0000 |
| Caudovirales | 7203 | 165318 | 58,854.2857 |
| Caulimoviridae | 6845 | 9073 | 7683.9706 |
| Chrysoviridae | 2860 | 3203 | 3031.5000 |
| Circoviridae | 846 | 2883 | 1920.8286 |
| Corticoviridae | 9935 | 9935 | 9935.0000 |
| Endornaviridae | 9620 | 17236 | 13,734.1250 |
| Fuselloviridae | 14634 | 23840 | 17,159.0000 |
| Geminiviridae | 2456 | 3588 | 2664.9504 |
| Hepadnaviridae | 2974 | 3328 | 3115.7000 |
| Herpesvirales | 131808 | 208496 | 170,152.0000 |
| Hypoviridae | 9406 | 12552 | 10,526.0000 |
| Inoviridae | 5721 | 8339 | 6957.4286 |
| Lavidaviridae | 17029 | 17029 | 17,029.0000 |
| Ligamenvirales | 24302 | 40582 | 36,293.8333 |
| Microviridae | 4070 | 6360 | 5200.4000 |
| Mimiviridae | 1006757 | 1241026 | 1,123,891.5000 |
| Nanoviridae | 965 | 1083 | 1010.2000 |
| null | 928 | 9877 | 4157.1429 |
| Papanivirus. | 814 | 814 | 814.0000 |
| Papillomaviridae | 6919 | 8484 | 7556.4353 |
| Partitiviridae | 303 | 2315 | 1730.7143 |
| Parvoviridae | 3726 | 6243 | 5048.6000 |
| Polyomaviridae | 4629 | 6130 | 5056.7692 |
| Poxviridae | 142509 | 142509 | 142,509.0000 |
| Reoviridae | 1646 | 2752 | 2333.0000 |
| Retroviridae | 3120 | 13056 | 8384.5238 |
| Salterprovirus | 14255 | 15837 | 15,046.0000 |
| Satellite Nucleic Acids | 216 | 1457 | 1127.6133 |
| Satellites | 1326 | 1342 | 1335.2500 |
| ssRNA negative-strand viruses | 800 | 18688 | 8945.1523 |
| ssRNA positive-strand viruses, no DNA | 944 | 19901 | 7476.2845 |
| Totiviridae | 2066 | 11394 | 5663.6538 |
| Turriviridae | 16382 | 16382 | 16,382.0000 |
| unassigned ssRNA viruses | 4312 | 4312 | 4312.0000 |
| unclassified dsDNA phages. | 8059 | 8059 | 8059.0000 |
| unclassified dsDNA viruses. | 7966 | 14914 | 11,440.0000 |
| unclassified Gemycircularvirus. | 2059 | 2218 | 2139.1429 |
| unclassified ssDNA viruses. | 1788 | 10503 | 3369.4333 |
| unclassified ssRNA viruses. | 5916 | 6195 | 6055.5000 |
Fig. 5average tract length analysis.
MONO SSRs.
| Feline_astrovirus_2_uid218014 | G | 99 | 1 | |
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | A | 9 | 3 | |
| Eupatorium_yellow_vein_virus_satellite_DNA_beta_ui… | A | 9 | 3 | |
| Hedyotis_uncinella_yellow_mosaic_betasatellite_uid… | A | 9 | 2 | |
| Honeysuckle_yellow_vein_mosaic_disease_associated_… | A | 9 | 2 | |
| Malvastrum_yellow_mosaic_virus_satellite_DNA_beta_… | A | 9 | 2 | |
| Mamestra_configurata_NPV_A_uid14168 | A | 9 | 4 | |
| Megavirus_chiliensis_uid74349 | A | 9 | 118 | |
| Moumouvirus_uid186430 | A | 9 | 71 | |
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | C | 9 | 2 | |
| Canine_papillomavirus___4_uid28243 | C | 9 | 2 | |
| Feline_leukemia_virus_uid14686 | C | 9 | 7 | |
| Potato_mop_top_virus_uid14789 | C | 9 | 3 | |
| Tolypocladium_cylindrosporum_virus_1_uid61451 | C | 9 | 2 | |
| Trichechus_manatus_latirostris_papillomavirus_2_ui… | C | 9 | 2 | |
| Trematomus_polyomavirus_1_uid282773 | T | 9 | 2 | |
| Canine_oral_papillomavirus_uid14326 | T | 9 | 2 | |
| Chaetoceros_lorenzianus_DNA_Virus_uid63565 | T | 9 | 2 | |
| Citrus_chlorotic_dwarf_associated_virus_uid170854 | T | 9 | 2 | |
| Ferret_papillomavirus_uid218024 | T | 9 | 2 | |
| Megavirus_chiliensis_uid74349 | T | 9 | 115 | |
| Mamestra_configurata_NPV_A_uid14168 | T | 9 | 4 | |
| Moumouvirus_uid186430 | T | 9 | 78 | |
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | T | 9 | 2 |
DI SSRs.
| AC | 9 | |||
| Sauropus_leaf_curl_disease_associated_DNA_beta_uid… | AC | 9 | ||
| AG | 7 | 1 | ||
| Vanilla_distortion_mosaic_virus_uid263828 | AG | 7 | 1 | |
| AT | 9 | 2 | ||
| Moumouvirus_uid186430 | AT | 9 | 2 | |
| Zalophus_californianus_papillomavirus_1_uid65277 | CG | 7 | 1 | |
| CT | 7 | 3 | ||
| Baboon_endogenous_virus_M7_uid222253 | CT | 7 | 2 | |
| Cowpea_mosaic_virus_uid15283 | CT | 7 | 1 | |
| CA | 9 | 1 | ||
| Sauropus_leaf_curl_disease_associated_DNA_beta_uid… | CA | 9 | 1 | |
| GT | 8 | 3 | ||
| Spleen_focus_forming_virus_uid14641 | GT | 8 | 1 | |
| Norway_rat_hepacivirus_1_uid267736 | GT | 8 | 1 | |
| Human_papillomavirus_type_26_uid15507 | GT | 8 | 1 | |
| GA | 6 | 2 | ||
| Vanilla_distortion_mosaic_virus_uid263828 | GA | 6 | 1 | |
| Oat_golden_stripe_virus_uid15093 | GA | 6 | 1 | |
| GC | 6 | 1 | ||
| Zalophus_californianus_papillomavirus_1_uid65277 | GC | 6 | 1 | |
| TA | 9 | 1 | ||
| Moumouvirus_uid186430 | TA | 9 | 1 | |
| TC | 7 | 1 | ||
| Cowpea_mosaic_virus_uid15283 | TC | 7 | 1 | |
| TG | NULL | NULL |
TRI SSRs.
| AAC | 7 | |||
| Penicillium_chrysogenum_virus_uid16141 | AAC | 7 | ||
| Santeuil_nodavirus_uid62547 | AAG | 6 | ||
| Mamestra_configurata_NPV_A_uid14168 | AAT | 7 | ||
| Penicillium_chrysogenum_virus_uid16141 | ACA | 7 | ||
| ACC | 4 | |||
| Zamilon_virophage_uid230580 | ACC | 4 | ||
| – | ||||
| Human_papillomavirus_type_49_uid15455 | ACC | 4 | ||
| Mamestra_configurata_NPV_A_uid14168 | ACG | 5 | ||
| Microviridae_phi_CA82_uid70009 | ACT | 6 | ||
| Santeuil_nodavirus_uid62547 | AGA | 7 | ||
| Ursus_maritimus_papillomavirus_1_uid29915 | AGC | 6 | ||
| AGG | 6 | |||
| Procyon_lotor_papillomavirus_1_uid15468 | AGG | 6 | ||
| – | ||||
| Epsilonpapillomavirus_1_uid14220 | AGG | 6 | ||
| AGT | 6 | |||
| Mamestra_configurata_NPV_A_uid14168 | AGT | 6 | ||
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | AGT | 6 | ||
| AGT | 6 | |||
| Mamestra_configurata_NPV_A_uid14168 | ATA | 6 | ||
| Himetobi_P_virus_uid14801 | ATA | 6 | ||
| Mamestra_configurata_NPV_A_uid14168 | ATC | 9 | ||
| ATG | 5 | |||
| Potato_yellow_dwarf_virus_uid74995 | ATG | 5 | ||
| Puumala_virus_uid14930 | ATG | 5 | ||
| ATT | 4 | |||
| Mamestra_configurata_NPV_A_uid14168 | ATT | 4 | ||
| – | ||||
| CAA | 6 | |||
| Penicillium_chrysogenum_virus_uid16141 | CAA | 6 | ||
| Cucumber_green_mottle_mosaic_virus_uid14681 | CAA | 6 | ||
| CAC | 4 | |||
| Zamilon_virophage_uid230580 | CAC | 4 | ||
| – | ||||
| Magnaporthe_oryzae_chrysovirus_1_uid51685 | CAC | 4 | ||
| CAG | 6 | |||
| Ursus_maritimus_papillomavirus_1_uid29915 | CAG | 6 | ||
| – | ||||
| Mamestra_configurata_NPV_A_uid14168 | CAG | 6 | ||
| Mamestra_configurata_NPV_A_uid14168 | CAT | 8 | ||
| Mamestra_configurata_NPV_A_uid14168 | CAT | 8 | ||
| CCA | 4 | |||
| Zamilon_virophage_uid230580 | CCA | 4 | ||
| – | ||||
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | CCA | 4 | ||
| CCG | 4 | |||
| Phlebiopsis_gigantea_mycovirus_dsRNA_1_uid46855 | CCG | 4 | ||
| – | ||||
| Halastavi_arva_RNA_virus_uid77939 | CCG | 4 | ||
| CCT | 6 | |||
| Curionopolis_virus_uid264939 | CCT | 6 | ||
| – | ||||
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | CCT | 6 | ||
| CGA | 5 | |||
| Mamestra_configurata_NPV_A_uid14168 | CGA | 5 | ||
| Human_papillomavirus_109_uid36519 | CGA | 5 | ||
| CGC | 4 | |||
| Phlebiopsis_gigantea_mycovirus_dsRNA_1_uid46855 | CGC | 4 | ||
| – | ||||
| Horseshoe_bat_hepatitis_B_virus_uid253463 | CGC | 4 | ||
| CGG | 4 | |||
| Woolly_monkey_sarcoma_virus_uid19547 | CGG | 4 | ||
| – | ||||
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | CGG | 4 | ||
| Mamestra_configurata_NPV_A_uid14168 | CGT | 6 | ||
| Microviridae_phi_CA82_uid70009 | CTA | 6 | ||
| Mamestra_configurata_NPV_A_uid14168 | CTC | 7 | ||
| CTG | 4 | |||
| Saguaro_cactus_virus_uid14981 | CTG | 4 | ||
| – | ||||
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | CTG | 4 | ||
| Abalone_herpesvirus_Victoria_AUS_2009_uid177933 | CTT | 7 | ||
| Santeuil_nodavirus_uid62547 | GAA | 6 | ||
| GAC | 5 | |||
| Mamestra_configurata_NPV_A_uid14168 | GAC | 5 | ||
| GAG | 7 | |||
| Procyon_lotor_papillomavirus_1_uid15468 | GAG | 7 | ||
| – | ||||
| Crocuta_papillomavirus_1_uid174774 | GAG | 7 | ||
| GAT | 5 | |||
| Puumala_virus_uid14930 | GAT | 5 | ||
| Acidianus_bottle_shaped_virus_uid19605 | GAT | 5 | ||
| Ursus_maritimus_papillomavirus_1_uid29915 | GCA | 7 | ||
| GCC | 4 | |||
| Raphanus_sativus_cryptic_virus_1_uid17127 | GCC | 4 | ||
| – | ||||
| Mycobacteriophage_Velveteen_uid215123 | GCC | 4 | ||
| Halorubrum_pleomorphic_virus_3_uid157259 | GCG | 5 | ||
| GCT | 5 | |||
| Saguaro_cactus_virus_uid14981 | GCT | 5 | ||
| – | ||||
| Mamestra_configurata_NPV_A_uid14168 | GCT | 5 | ||
| GGA | 6 | |||
| Procyon_lotor_papillomavirus_1_uid15468 | GGA | 6 | ||
| – | ||||
| Human_papillomavirus_type_103_uid17119 | GGA | 6 | ||
| Halorubrum_pleomorphic_virus_3_uid157259 | GGC | 4 | ||
| Mamestra_configurata_NPV_A_uid14168 | GGT | 5 | ||
| Mamestra_configurata_NPV_A_uid14168 | GTC | 6 | ||
| GTG | 4 | |||
| Periplaneta_fuliginosa_densovirus_uid14091 | GTG | 4 | ||
| – | ||||
| Mamestra_configurata_NPV_A_uid14168 | GTG | 4 | ||
| GTT | 5 | |||
| Cherry_rasp_leaf_virus_uid15131 | GTT | 5 | ||
| – | ||||
| Ovine_enzootic_nasal_tumour_virus_uid15410 | GTT | 5 | ||
| TAA | 6 | |||
| Mamestra_configurata_NPV_A_uid14168 | TAA | 6 | ||
| Himetobi_P_virus_uid14801 | TAA | 6 | ||
| Microviridae_phi_CA82_uid70009 | TAC | 6 | ||
| Mamestra_configurata_NPV_A_uid14168 | TAG | 5 | ||
| TAT | 4 | |||
| Yaba_like_disease_virus_uid14595 | TAT | 4 | ||
| – | ||||
| Human_papillomavirus_54_uid15466 | TAT | 4 | ||
| Mamestra_configurata_NPV_A_uid14168 | TCA | 8 | ||
| TCC | 6 | |||
| Curionopolis_virus_uid264939 | TCC | 6 | ||
| – | ||||
| Mamestra_configurata_NPV_A_uid14168 | TCC | 6 | ||
| Mamestra_configurata_NPV_A_uid14168 | TCG | 6 | ||
| TCT | 5 | |||
| Mamestra_configurata_NPV_A_uid14168 | TCT | 5 | ||
| – | ||||
| Nyamanini_virus_uid38109 | TCT | 5 | ||
| TGA | 5 | |||
| Puumala_virus_uid14930 | TGA | 5 | ||
| Cycas_necrotic_stunt_virus_uid15397 | TGA | 5 | ||
| TGC | 5 | |||
| Chicken_gallivirus_1_uid259980 | TGC | 5 | ||
| Mamestra_configurata_NPV_A_uid14168 | TGC | 5 | ||
| TGG | 4 | |||
| Peanut_clump_virus_uid14776 | TGG | 4 | ||
| – | ||||
| Acinetobacter_bacteriophage_AP22_uid167576 | TGG | 4 | ||
| TGT | 5 | |||
| Cherry_rasp_leaf_virus_uid15131 | TGT | 5 | ||
| Ovine_enzootic_nasal_tumour_virus_uid15410 | TGT | 5 | ||
| TTA | 5 | |||
| Walleye_dermal_sarcoma_virus_uid14718 | TTA | 5 | ||
| Mamestra_configurata_NPV_A_uid14168 | TTA | 5 | ||
| TTC | 5 | |||
| Squash_leaf_curl_China_virus____B__uid15591 | TTC | 5 | ||
| – | ||||
| Nyamanini_virus_uid38109 | TTC | 5 | ||
| TTG | NULL |
Fig. 6MONO,DI & TRI extraction process.
| Subject area | Bio-informatics |
| More specific subject area | Genomes of VIRUSES |
| Type of data | Tables, figures |
| How data was acquired | VIRUS SSR markers extraction with NGS string matching |
| Data format | Analyzed |
| Experimental factors | MONO, DI and TRI SSRs: |
| Experimental features | Each of the MONO, DI and TRI markers are extracted from genomes of VIRUSES. All the SSRs showed the 1,2,3-bp in allele size. These differences showed that there are some polymorphisms among the genomes to the number of SSR repeats. |
| Data source location | BHIMAVARAM, INDIA |
| Data accessibility | The data is provided with this article |
| 1 | n←T.length, m←P.length | |
| 2 | for each MONO, DI & TRI patterns | |
| 3 | for i ← 0 to n-m do | |
| 4 | begin | |
| 5 | count←ngs_search(T,P,i,count); | |
| 6 | tandem_repeat_count←check_for_tandem_repeat(T,P,i,count); | |
| 7 | ngs_database_insertion(P,i,tandem_repeat_count) | |
| 8 | end for | |
| 9 | end for | |
| /* | ||
| 18 | ||
| 19 | begin | |
| 20 | | |
| 21 | while ( | |
| 22 | do | |
| 23 | | |
| 24 | done; | |
| 25 | if ( | |
| 26 | count++; | |
| 27 | end if | |
| 28 | return count; | |
| 29 | ||
| /* | ||
| 30 | ||
| 31 | begin | |
| 32 | if (diff_of_two_repeats==-P.length) | |
| 33 | tandem_repeat_count++; | |
| 34 | else | |
| 35 | tandem_repeat_count= tandem_repeat_count; | |
| 36 | end if | |
| 37 | return tandem_repeat_count; | |
| 38 | ||
| 39 | /* | |
| 40 | ||
| 41 | begin | |
| 42 | insert into virus_ssrs(virus_name, genome_id, P, tandem_repeat_count,i); | |
| 43 | ||