Literature DB >> 23336431

In silico mining of putative microsatellite markers from whole genome sequence of water buffalo (Bubalus bubalis) and development of first BuffSatDB.

Vasu Arora, Mir Asif Iquebal, Anil Rai, Dinesh Kumar.   

Abstract

BACKGROUND: Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and "finishing" expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such markers have potential role in improvement of desirable characteristics, such as high milk yields, resistance to diseases, high growth rate. The STR mining from whole genome and development of user friendly database is yet to be done to reap the benefit of whole genome sequence. DESCRIPTION: By in silico microsatellite mining of whole genome, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database (http://cabindb.iasri.res.in/buffsatdb/) which is a web based relational database of 910529 microsatellite markers, developed using PHP and MySQL database. Microsatellite markers have been generated using MIcroSAtellite tool. It is simple and systematic web based search for customised retrieval of chromosome wise and genome-wide microsatellites. Search has been enabled based on chromosomes, motif type (mono-hexa), repeat motif and repeat kind (simple and composite). The search may be customised by limiting location of STR on chromosome as well as number of markers in that range. This is a novel approach and not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of the selected markers enabling researcher to select markers of choice at desired interval over the chromosome. The unique add-on of degenerate bases further helps in resolving presence of degenerate bases in current buffalo assembly.
CONCLUSION: Being first buffalo STR database in the world , this would not only pave the way in resolving current assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity.

Entities:  

Mesh:

Year:  2013        PMID: 23336431      PMCID: PMC3563513          DOI: 10.1186/1471-2164-14-43

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Water buffalo (Bubalus bubalis) contributes immensely to the agricultural economy of Indian subcontinent, South East Asian countries through milk, meat, hides, fertilizer, fuel and draught animal power. A large part of human population depends on this species than any other livestock species in the world [1]. There is 188.3 million buffalo population in the world which contributes around 55 – 60% of total milk production [2]. Asia has nearly 97% of buffaloes and is an integral part of agriculture in India, China, Pakistan, Nepal, Bangladesh, Thailand, Myanmar and Malaysia. The productivity of buffaloes in these regions is higher as compared to cattle [3]. Molecular markers can play a significant role for livestock improvement through conventional breeding strategies. Scientific resources are limited in many of the countries where buffaloes are economically important livestock and as a consequence, genome research has not been supported at the level of some of the other species [4]. Limited number of researches has been conducted globally exploring the genetic diversity on molecular genetic basis in buffalo in comparison with other farm animal genetic resources. This depends, in part on the knowledge of their genetic structure based on molecular markers like microsatellites [5]. Microsatellites are sequences made up of a simple sequence motif, not more than six bases long, that is tandemly repeated and arranged head to tail without interruption by any other base or motif. Simple, tandemly repeated di- and tri- nucleotide sequences have been demonstrated to be polymorphic in length in a number of eukaryotic genome [6]. The frequency with which they occur (once every 50,000–60,000 bp), the high degree of polymorphism displayed, and their random distribution across the genome [7] make them potentially very useful as DNA markers in gene mapping studies. Furthermore, two or more microsatellites may be analyzed simultaneously [8,9], opening new opportunity for genetic analysis of large number of samples. To cater the need of microsatellite especially for biodiversity analysis, cattle microsatellite markers have been used in heterologous mode in buffalo and up to 56% of them have been found polymorphic [10]. Cattle microsatellite markers have many disadvantages in such diversity analysis like low polymorphism and loss of amplification due to null alleles, size biasness, hitch hiking and potential exclusion of abundant STR in gene pool [11]. Even there is limited work of STR mining using partial enriched genomic library [12]. There is no thorough in silico STR marker mining from buffalo genome to represent more holistic and cumulative variability of genome to be used in gene pool or biodiversity analysis and gene/QTL mapping. Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community [3]. The existing radiation hybrid of buffalo by Amaral et al.[13] and these reported STR can be used further in final gap plugging and “finishing” expected in de novo genome assembly. Such work needs extensive STR mining from buffalo genome at equal interval on each and every chromosome. In order to cater this urgent need in resolving assembly, mapping issues and biodiversity analysis, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database) which is a web based relational database of microsatellites.

Construction and content

Data collection and architecture

The BuffSatDb is an online relational database that catalogues information about the microsatellite repeats of the recently sequenced water buffalo. All the microsatellite markers extracted from buffalo genome have been generated using MIcroSAtellite tool (MISA) [14]. The database architecture is a “Three-tier architecture” (Figure 1) with a client tier, middle tier and database tier. This user-friendly interface for the database has been developed using PHP (Hypertext Preprocessor) which is an open-source server-side scripting language. In first tier of the architecture, the in silico mined STRs through MISA were stored in MySQL database. In the middleware, user need based customised query provisions have been made. For primer designing, Primer3 standalone code computes primers on user request. The information generated at the client end, i.e. third tier of the architecture are list of multiple primers along with their respective melting temperature, GC content, start position and product size (amplicon size).
Figure 1

Three-tier architecture of

Three-tier architecture of BuffSatDb has eight tabs (Home, About, Database, Analysis, Tutorial, Links, Contact, Team). General information of the developed microsatellite database, information about Water buffalo, microsatellite markers, comparative analysis of the buffalo genome has been discussed. The tutorial of this database contains the guidelines for users and terminologies used in the database contents. BuffSatDb is appended with other useful links, the team and contact persons.

In silico mining of microsatellite from whole genome of water buffalo

The Bubalus bubalis genome draft assembly version Bbu_2.0-alpha which is with 17X-19X depth and 91%–95% coverage published by research group from India [3] and available in public domain at http://210.212.93.84/bbu_2.0alpha/ was used for STR mining. All the 27 available chromosomes (Chromosome 1–24, M, U and X) were chopped into manageable range using PERL script. These were fed to MIcroSAtellite identification tool, MISA to identify and find the location of perfect and compound microsatellites. The STR numbers, motifs, repeat number, length of the repeat, size of the repeat, repeat type, GC content, start and end position of the repeat and STR sequence were compiled. A total of 910529 STRs were generated from water buffalo genome, of which 830058 were simple and 80471 were compound STRs. BuffSatDb is the comprehensive and integrated resource for retrieval of information from water buffalo. Figure 2 shows the database search in BuffSatDb.
Figure 2

The flow of database search in

The flow of database search in The user can query for microsatellites, chromosome wise (1–24, X, M and U), where more than one chromosome may be selected at a time from water buffalo genome. These searches may further be customised based on microsatellite characteristics like motif type (mono, di, tri, tetra, penta, hexa), repeat motif and repeat kind (simple and composite). The user may further go for advance search like limiting the location on chromosome as well as the number of markers in that range. This is a novel approach and to the best of our knowledge, it has not been implemented in any of the existing marker database which may be useful for the researchers. Identification of QTL and fine mapping of economically important genes based on LOD (Logarithm of the Odds) score also needs STR preferably at equal interval. Also other parameters like GC content, range of STR location and copy number may be customised for the above selection according to the requirement of researchers. The results are then displayed in tabular format, giving chromosome number, motif type, motif, copy number, basepair, start and end position along with the GC content. BuffSatDb is further appended with Primer3 tool [15]. The STRs traced by the query, may be selected with the help of radiobutton for generation of primers. Primer for selected STR locus may be designed with a template of approximately 1000 base pairs by selecting upto 500 base pairs of both flanking regions. These flexibilities would enable researchers to select markers of choice at desired interval over the chromosomes. Further one can use each individual STR of a targeted region over chromosome to narrow down location of gene of interest or linked QTL. A novel add-on for degenerate bases has been incorporated in this database search, where the users are given flexibility to replace degenerate bases with any of the alternative bases (A,T,G,C). This feature has been added to resolve the issue of some of the degenerate bases present in current buffalo genome assembly making the primer designing very difficult otherwise.

Genome analysis

The chromosome wise distribution of STRs along with its respective motif frequencies in buffalo genome were analysed. It was observed that simple STRs constituted most abundantly with 91.16% of the total STRs. Various motif types like, mono, di, tri, tetra, penta and hexa type of microsatellites have been plotted to show the respective abundance of the type in chromosomes. Mono type (64.52%) was seen to have abundance than any other types while the hexa (0.02) was the one with least occurrence (Figure 3). It was found that the proportion of GC content in STRs in the range 0–10 was maximum (68.75%) followed by the range 41–50 (15.32%) while the minimum was in the range 81–90 (0.002%) (Figure 4). Figure 5 shows the distribution of length of microsatellites in context to GC percentage. No correlation was found between size and GC content. Table 1 depicts the frequency of STRs based on their sizes. Maximum numbers were reported for the size ranging between 11–13 followed by the size 14–16. A comprehensive chromosome wise STR profile with its repeat type is depicted in Table 2.
Figure 3

Graphical view of motif wise distribution of microsatellites in Buffalo genome.

Figure 4

Graphical view of proportion of GC content in STRs at various ranges.

Figure 5

Distribution of length of microsatellites in context to GC percentage.

Table 1

Frequencies of STRs based on their sizes

Size of STRsNumber of STRsContribution in percentage
<10
153185
16.82
11–13
276065
30.32
14–16
207831
22.83
17–25
162575
17.86
>2511087312.18
Table 2

Chromosome wise distribution of STRs

ChromosomesSimple
Compound
MonoDiTriTetraPentaHexa
Chromosome 1
38690
12112
7672
454
1005
10
5858
Chromosome 2
35354
10911
7062
419
877
10
4899
Chromosome 3
32231
10406
6376
408
702
6
4611
Chromosome 4
30416
9552
6308
404
734
13
4382
Chromosome 5
22149
7486
4568
314
489
6
3365
Chromosome 6
20993
6941
4525
263
532
8
3132
Chromosome 7
21961
6884
4553
291
616
7
3282
Chromosome 8
22977
7190
4787
252
512
7
3333
Chromosome 9
20647
6278
4079
260
440
3
2937
Chromosome 10
20550
6425
4156
228
530
4
3082
Chromosome 11
19223
5662
3852
206
393
8
2701
Chromosome 12
19447
6285
3748
238
441
6
2725
Chromosome 13
17237
5220
3285
199
362
3
2721
Chromosome 14
15197
4703
2841
197
313
3
2162
Chromosome 15
14760
4892
2988
149
342
4
2219
Chromosome 16
15128
4630
3152
185
359
6
2353
Chromosome 17
14489
4585
2721
194
270
2
2160
Chromosome 18
11619
3814
2129
160
181
3
1746
Chromosome 19
13496
4373
2804
155
358
5
2098
Chromosome 20
12474
3949
2465
159
274
3
1832
Chromosome 21
11830
3467
2106
124
203
5
1588
Chromosome 22
12015
3893
2424
117
278
4
1705
Chromosome 23
18764
5898
3686
180
380
2
2704
Chromosome 24
7742
2571
1467
124
136
3
1127
Chromosome M
1
0
0
0
0
0
0
Chromosome U
50940
15373
9458
790
1279
26
9045
Chromosome X
15239
5241
3287
224
382
9
2704
Total53556916874110649966941238816680471
Graphical view of motif wise distribution of microsatellites in Buffalo genome. Graphical view of proportion of GC content in STRs at various ranges. Distribution of length of microsatellites in context to GC percentage. Frequencies of STRs based on their sizes Chromosome wise distribution of STRs

STR validation

The previously published two sets of STR markers viz., heterologous [16] and homologous [17] were evaluated in the database using PERL script. The validated STRs are presented as positive primers in Table 3.
Table 3

STRs validation result of homologous and heterologous primer pairs of water buffalo

 
Heterologous
Homologous
 ISAG–FAO recommended STRs from cattleISAG–FAO recommended STRs from buffaloNagarajan et al, monomorphic STRs from buffaloNagarajan et al, polymorphic STR loci from buffalo
Total no. of primer pairs reported
30
30
7
107
No. of positive primers (Forward)
7 (23.33%)
7 (23.33%)
2 (28.57%)
49 (45.79%)
No. of positive primers (Reverse)
11 (36.67%)
8 (26.67%)
4 (57.14%)
37 (34.58%)
No. of positive primers (common to both forward and reverse)3 (10.00%)4 (13.33%)2(28.57%)26 (24.30%)
STRs validation result of homologous and heterologous primer pairs of water buffalo

Discussion and utility

A total of 910529 microsatellite markers have been searched by in silico mining. Simple STR were found to be most abundant (91.16%). Microsatellite density has been found positively correlated with genome size [18-20]. Among fully sequenced eukaryotic genomes, microsatellite density is highest in mammals. However in case of plant, microsatellite frequency is negatively correlated with genome size [21]. In the present study of water buffalo, mono- motif was found to be most abundant. Relative distributions of different microsatellite motif length classes in genomes differ considerably from species to species [22]. In case of water buffalo, it was found that longer repeats are less in abundance which is expected as reported and described in various studies [23,24]. It was also observed that microsatellite size range is increasing from 10 up to 14–16, however beyond this size range, it again starts decreasing. This is due to cyclical nature of microsatellite marker per say in its course of evolution. The birth of microsatellite starts with, out of register loop in event of DNA replication with a threshold size of 8 repeat unit or more, in the form of simple repeat. Gradually due to background mutation simple repeat gets converted in compound repeat. At the stage of simple repeat, the rate of mutation is high and predominantly it is addition of repeat unit and hence size increases. But once background mutation converts simple repeat into compound interrupted repeat, the smaller size simple repeat of less than 8 unit gets pinched off in subsequent replications. This maintains the size of microsatellite as evolutionary constraints otherwise microsatellite marker would have been always increasing in length during course of evolution. Thus individual microsatellites arrays have a “life cycle” of sorts, they are born, they grow and ultimately they perish. These events may stretch over tens or even hundreds of millions of years [25,26]. Water buffalo microsatellite profile exhibits the similar pattern. The relative abundance of repeat motif were in order of mono, di, tri, penta, tetra and hexa (Table 2). Though di-nucleotide repeats are most abundant in eukaryotic genome [27,28] but we found most abundance of mononucleotide repeats across all chromosome. This relatively higher abundance of mono over di nucleotide repeat type might be due to inherent limitation of the NGS technology which adds more mono nucleotide causing sequencing error [29]. The longer the chromosome proportionately higher the total repeat content as expected in ubiquitously distributed STR markers [30]. In order to validate the previously reported STR markers, two sets viz. heterologous (cattle original species and buffalo focal species), homologous STR (developed from buffalo and validated in buffalo) were considered. The heterologous markers recommended by FAO-IASG [16] and homologous marker [17] were used. It was observed that both subsets of heterologous ISAG-FAO recommended primer for cattle and buffalo diversity analysis gave less validation results i.e. 10% and 13.33% respectively. Cross species amplifiability is due to conservation of cattle STR and its flanking regions in other species [31]. Though some of the primers showed validation up to 36.67% (Table 3). In the cross species amplifiability of bovidae species, such data are usually expected due to null alleles and genomic changes during speciation [32]. In validation of homologous STR, it was found that both subsets reported higher percentage of monomorphic (28.57%) and polymorphic (24.30%) loci. The validation results are limited as the first draft genome assembly of buffalo is based on cattle and it is not completely finished. The findings of this study has limitations which need to be addressed. As genome of water buffalo is just draft assembly based on cow assembly Btau 4.0, thus de novo assembly is needed to have the buffalo specific chromosome wise microsatellite profile. The current database is based on chromosome number of cattle which is certainly not the same in case of buffalo. For example cattle chromosome 4 is actually buffalo chromosome 8. In fact only chromosome number common between cattle and buffalo are just 5 viz 1, 2, 17, 18, and X [3]. The splitting and translocation has rendered syntenic relationship between these two species which are well documented. Nevertheless the microsatellites in our database with option of primer designing at desired place over “chromosome” will be of immense use especially over radiation hybrid of buffalo to resolve the problem and current issue of de novo assembly. Besides this, these markers can be further used for QTL, gene mapping as well as biodiversity analysis in setting the conservation priorities. The markers present in our database need further wet lab validation. Being first database of water buffalo microsatellite especially at juncture where de novo genome assembly is yet to be done, the use of these markers are highly warranted in order to “finishing” of water buffalo genome assembly. This will further lead to next version of buffalo microsatellite database base with proper buffalo specific chromosome wise data which is hitherto missing but critically needed. Such endeavour will fetch not only increase in buffalo productivity but also greater food security especially in third and new world countries.

Conclusion

Being first buffalo STR database in the world, this would not only pave the way in resolving current water buffalo genome assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity.

Availability and requirement

BuffSatDb, the buffalo microsatellite marker database is freely accessible for research purposes for non-profit and academic organizations at http://cabindb.iasri.res.in/buffsatdb/.

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

DK and AR conceived this study. S, VA & MAI created the work-flow, database, web-tool and performed data analyses. MAI, S, DK and AR drafted the manuscript. All authors read and approved the manuscript.
  25 in total

1.  Primer3 on the WWW for general users and for biologist programmers.

Authors:  S Rozen; H Skaletsky
Journal:  Methods Mol Biol       Date:  2000

Review 2.  Mining microsatellites in eukaryotic genomes.

Authors:  Prakash C Sharma; Atul Grover; Günter Kahl
Journal:  Trends Biotechnol       Date:  2007-10-22       Impact factor: 19.536

3.  Patterns of molecular evolution in avian microsatellites.

Authors:  C R Primmer; H Ellegren
Journal:  Mol Biol Evol       Date:  1998-08       Impact factor: 16.240

Review 4.  Advances in livestock genomics: opening the barn door.

Authors:  James E Womack
Journal:  Genome Res       Date:  2005-12       Impact factor: 9.043

5.  Microsatellites in different eukaryotic genomes: survey and analysis.

Authors:  G Tóth; Z Gáspári; J Jurka
Journal:  Genome Res       Date:  2000-07       Impact factor: 9.043

6.  Differential distribution of simple sequence repeats in eukaryotic genome sequences.

Authors:  M V Katti; P K Ranjekar; V S Gupta
Journal:  Mol Biol Evol       Date:  2001-07       Impact factor: 16.240

7.  From RNA-seq to large-scale genotyping - genomics resources for rye (Secale cereale L.).

Authors:  Grit Haseneyer; Thomas Schmutzer; Michael Seidel; Ruonan Zhou; Martin Mascher; Chris-Carolin Schön; Stefan Taudien; Uwe Scholz; Nils Stein; Klaus Fx Mayer; Eva Bauer
Journal:  BMC Plant Biol       Date:  2011-09-28       Impact factor: 4.215

8.  Microsatellite markers of water buffalo, Bubalus bubalis--development, characterisation and linkage disequilibrium studies.

Authors:  Muniyandi Nagarajan; Niraj Kumar; Gopala Nishanth; Ramachandran Haribaskar; Karthikeyani Paranthaman; Jalaj Gupta; Manish Mishra; R Vaidhegi; Shantanu Kumar; Amresh K Ranjan; Satish Kumar
Journal:  BMC Genet       Date:  2009-10-21       Impact factor: 2.797

9.  A first generation whole genome RH map of the river buffalo with comparison to domestic cattle.

Authors:  M Elisabete J Amaral; Jason R Grant; Penny K Riggs; Nedenia B Stafuzza; Edson A Rodrigues Filho; Tom Goldammer; Rosemarie Weikard; Ronald M Brunner; Kelli J Kochan; Anthony J Greco; Jooha Jeong; Zhipeng Cai; Guohui Lin; Aparna Prasad; Satish Kumar; G Pardha Saradhi; Boby Mathew; M Aravind Kumar; Melissa N Miziara; Paola Mariani; Alexandre R Caetano; Stephan R Galvão; Madhu S Tantia; Ramesh K Vijh; Bina Mishra; S T Bharani Kumar; Vanderlei A Pelai; Andre M Santana; Larissa C Fornitano; Brittany C Jones; Humberto Tonhati; Stephen Moore; Paul Stothard; James E Womack
Journal:  BMC Genomics       Date:  2008-12-24       Impact factor: 3.969

10.  Genomic conservation of cattle microsatellite loci in wild gaur (Bos gaurus) and current genetic status of this species in Vietnam.

Authors:  Trung Thanh Nguyen; Sem Genini; Linh Chi Bui; Peter Voegeli; Gerald Stranzinger; Jean-Paul Renard; Jean-Charles Maillard; Bui Xuan Nguyen
Journal:  BMC Genet       Date:  2007-11-06       Impact factor: 2.797

View more
  6 in total

1.  Genome-wide analysis of simple sequence repeats in marine animals-a comparative approach.

Authors:  Qun Jiang; Qi Li; Hong Yu; Lingfeng Kong
Journal:  Mar Biotechnol (NY)       Date:  2014-06-19       Impact factor: 3.619

2.  Genome wide characterization of simple sequence repeats in watermelon genome and their application in comparative mapping and genetic diversity analysis.

Authors:  Huayu Zhu; Pengyao Song; Dal-Hoe Koo; Luqin Guo; Yanman Li; Shouru Sun; Yiqun Weng; Luming Yang
Journal:  BMC Genomics       Date:  2016-08-05       Impact factor: 3.969

3.  Genome-wide mapping and characterization of microsatellites in the swamp eel genome.

Authors:  Zhigang Li; Feng Chen; Chunhua Huang; Weixin Zheng; Chunlai Yu; Hanhua Cheng; Rongjia Zhou
Journal:  Sci Rep       Date:  2017-06-09       Impact factor: 4.379

4.  Mining the red deer genome (CerEla1.0) to develop X-and Y-chromosome-linked STR markers.

Authors:  Krisztián Frank; Nóra Á Bana; Norbert Bleier; László Sugár; János Nagy; Júlia Wilhelm; Zsófia Kálmán; Endre Barta; László Orosz; Péter Horn; Viktor Stéger
Journal:  PLoS One       Date:  2020-11-23       Impact factor: 3.240

5.  De Novo Transcriptome Assembly of the Chinese Swamp Buffalo by RNA Sequencing and SSR Marker Discovery.

Authors:  Tingxian Deng; Chunying Pang; Xingrong Lu; Peng Zhu; Anqin Duan; Zhengzhun Tan; Jian Huang; Hui Li; Mingtan Chen; Xianwei Liang
Journal:  PLoS One       Date:  2016-01-14       Impact factor: 3.240

6.  Genomic approach for conservation and the sustainable management of endangered species of the Amazon.

Authors:  Paola Fazzi-Gomes; Jonas Aguiar; Gleyce Fonseca Cabral; Diego Marques; Helber Palheta; Fabiano Moreira; Marilia Rodrigues; Renata Cavalcante; Jorge Souza; Caio Silva; Igor Hamoy; Sidney Santos
Journal:  PLoS One       Date:  2021-02-24       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.