| Literature DB >> 20515480 |
Abstract
BACKGROUND: Sequencing of the approximately 1.7 billion bases of the zebrafish genome is currently underway. To date, few high resolution genetic maps exist for the zebrafish genome, based mainly on single nucleotide polymorphisms (SNPs) and short microsatellite repeats. The desire to construct a higher resolution genetic map led to the construction of a database of tandemly repeating elements within the zebrafish Zv8 assembly. DESCRIPTION: Exact tandem repeats with a repeat length of at least three bases and a copy number of at least 10 were reported. Repeats with a total length of 250 or fewer bases and their flanking regions were masked for known vertebrate repeats. Optimal primer pairs were computationally designed in the regions flanking the detected repeats. This database of exact tandem repeats can then be used as a resource by molecular biologists with interests in experimentally testing VNTRs within a zebrafish population.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20515480 PMCID: PMC2901318 DOI: 10.1186/1471-2164-11-347
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Entity-relationship model for zebrafish repeats database.
Frequency of tandem repeats within the zebrafish genome Zv8 assembly by base length
| 1-mers | N/A | 19-mers | 5 | 37-mers | 3 |
| 2-mers | N/A | 20-mers | 6 | 38-mers | 1 |
| 3-mers | 37,383 | 21-mers | 7 | 39-mers | 2 |
| 4-mers | 67,313 | 22-mers | 9 | 40-mers | 0 |
| 5-mers | 11,767 | 23-mers | 7 | 41-mers | 2 |
| 6-mers | 93 | 24-mers | 18 | 42-mers | 0 |
| 7-mers | 1 | 25-mers | 4 | 43-mers | 1 |
| 8-mers | 10 | 26-mers | 7 | 44-mers | 2 |
| 9-mers | 1 | 27-mers | 40 | 45-mers | 2 |
| 10-mers | 5 | 28-mers | 5 | 46-mers | 5 |
| 11-mers | 5 | 29-mers | 4 | 47-mers | 0 |
| 12-mers | 6 | 30-mers | 117 | 48-mers | 1 |
| 13-mers | 3 | 31-mers | 10 | 49-mers | 1 |
| 14-mers | 16 | 32-mers | 5 | 50-mers | 1 |
| 15-mers | 6 | 33-mers | 1 | 51-mers | 1 |
| 16-mers | 16 | 34-mers | 2 | 52-mers | 0 |
| 17-mers | 5 | 35-mers | 2 | 53-mers | 1 |
| 18-mers | 21 | 36-mers | 9 | 54-mers | 1 |
Figure 2Linkage group localization for repeats with a base length of 18, 24, 27 and 30.
Figure 3Distribution of repeats of base length 18, 24, 27 and 30 among linkage groups.
Repeats of length 18 with each of the six possible protein translations
| GP | Repeat Base | Possible Protein Translations | # | |||||
|---|---|---|---|---|---|---|---|---|
| 1 | ACCCTCCAGAGCTGCCAG | PPELPD | VWQLWR | PSRAAR | LQSCQT | GLAALE | GSSGGS | 10 |
| 1 | AGCTCTGGAGGGTCTGGC | PPELPD | VWQLWR | PSRAAR | LQSCQT | GLAALE | GSSGGS | 3 |
| 1 | ACTGGAGGGTCTGGCAGC | PPVLPD | VWQHWR | PSSAAR | LQCCQT | ALEGLA | GSTGGS | 1 |
| 1 | AGCTCTGGCGGGTCTGGC | PPELPD | VWQLWR | PARAAR | RQSCQT | GLAALA | SGSSGG | 1 |
| 1 | ACCCTCCAGTGCTGCCAG | PPVLPD | VHQHWR | PSSAAR | LQCCQT | GLAALE | SGSTGG | 2 |
| 1 | ACCCGCCAGAGCTGCCAG | PPELPD | VWQLWR | PARAAR | RQSCQT | GLAALA | GSSGGS | 1 |
| 2 | ACGCCGCAGCCAGAGTCG | CGVDSG | AASTLA | RRRLWL | ARVDAA | QPESTP | SQSRRR | 1 |
| 3 | ATCGTGGCCCCCTCGTCC | RPSWPP | VHRGPL | SIVAPS | RGPRWT | RGGHDG | EGATMD | 1 |
| 4 | CCCTGTGGTGCTGTGTGT | CGAVCP | VVLCVP | WCCVSL | GTHSTT | QGHTAP | RDTQHH | 1 |
GP: grouping; #: number of instances found in the Zv8 assembly. Note the repeat base has been reordered by lexicographical order.
Repeats of length 24 with each of the six possible protein translations
| GP | Repeat Base | Possible Protein Translations | # | |||||
|---|---|---|---|---|---|---|---|---|
| 1 | ACGCTCCAGGCCCTCCGCAGCTCC | PGPPQLHA | AWSCGGPG | SRPSAAPR | QALRSSTL | ERGAAEGL | SVELRRAW | 6 |
| 1 | AGCGTGGAGCTGCGGAGGGCCTGG | PGPPQLHA | AWSCGGPG | SRPSAAPR | QALRSSTL | ERGAAEGL | SVELRRAW | 4 |
| 1 | ACGCCCCAGGCCCTCCGCAGCTCC | PGPPQLHA | AWSCGGPG | PRPSAAPR | QALRSSTP | GRGAAEGL | GVELRRAW | 1 |
| 1 | AGCTGCGGAGGGCCTGGGGCGTGG | PGPPQLHA | AWSCGGPG | PRPSAAPR | QALRSSTP | LRGAAELG | GVELRRAW | 1 |
| 1 | ACGCTCCAGGCCCTCGGCAGCTGC | PGPRQLHA | TLQALGSC | SRPSAAAR | GACSCRGP | ERAAAELG | SVQLPRAW | 1 |
| 2 | AAGCCCGAGGCGACGCCATTGGAG | GLLQWRRL | EATPLEKP | RRRHWRSP | GDAIGEAR | SGFSNGVA | RASPMASP | 1 |
| 2 | AAGGCCGAGGCGACGCCATTGGAG | GLLQWRRL | EATPLEKA | RRRHWRRP | GDAIGEGR | SAFSNGVA | MASPRPSP | 1 |
| 3 | AAGCGGATTTTTGACGCGCGAGTG | *SGFLTRE | EADF*RAS | KRIFDARV | LARQKSAS | HSRVKNPL | TRASKIRF | 1 |
| 4 | AAGCGCCGGTGAGCCCTCGCCCTC | ALEGEGSP | RLRARAHR | A*GRGLTG | R*ALALKR | AGEPSPSS | PVSPRPQA | 1 |
| 5 | AAGCTCAGGCGGCGGCCATTCAGG | GGGHSGSS | AAAIQEAQ | RRPFRKLR | *AS*MAAA | PELPEWPP | LSFLNGRR | 1 |
GP: grouping; #: number of instances found in the Zv8 assembly. Note the repeat base has been reordered by lexicographical order.
Repeats of length 27 with each of the six possible protein translations
| GP | Repeat Base | All Possible Protein Translations | # | |||||
|---|---|---|---|---|---|---|---|---|
| 1 | AACACAGCTGATAAGAACTCGCGCGAC | VLISCVVAR | RATTQLIRT | SRDNTADKN | FLSAVLSRE | SSYQLCCRA | LARQHS**E | 2 |
| 1 | AACACAGCTGATCAGTACGCGCGCGAC | VLISCVVAR | RATTQLIST | ARDNTADQY | Y*SAVLSRA | RTDQLCCRA | RARQHS*SV | 1 |
| 1 | AGCTGTGTTGTCGCGCGCGTTATGATC | VMISCVVAR | RATTQLIIT | ARDNTADHN | L*SAVLSRA | RYDQLCCRA | RARQHS*S* | 2 |
| 1 | AGCTGTGTTGTCGCGCGCGTTCTGATC | VLISCVVAR | RATTQLIRT | ARDNTADQN | F*SAVLSRA | RSDQLCCRA | RARQHS*SE | 1 |
| 1 | AATAGCGGTGTTGTCGCGCGCGTTCTG | VLNSGVVAR | RATTPLFRT | ARDNTAIQN | F*IAVLSRA | RSE*RCCRA | RARQHRYSE | 1 |
| 1 | AACACCGCTCTTCAGTACGCGCGCGAC | VLKSGVVAR | RATTPLFST | ARDNTALQY | Y*RAVLSRA | RTEERCCRA | RARQHRSSV | 1 |
| 1 | AGCTGTGTTGTCGCGCGAGTTCTTATC | VLISCVVAR | RATTQLIRT | SRDNTADKN | FLSAVLSRE | SSYQLCCRA | LARQHS**E | 3 |
| 1 | AACACAGCTGATAAGAACGCGCGCGAC | VLISCVVAR | RATTQLIRT | ARDNTADKN | FLSAVLSRA | RSYQLCCRA | RARQHS**E | 1 |
| 1 | AACACCGCTACTCAGTACGCGCGCGAC | VLSSGVVAR | RATTPLLST | ARDNTATQY | Y*VAVLSRA | RTE*RCCRA | RARQHRYSV | 1 |
| 1 | AACACAGCTGATCAGAACGCGCGCGAC | VLISCVVAR | RATTQLIRT | ARDNTADQN | VLSRAF*SA | RSDQLCCRA | RARQHS*SE | 1 |
| 1 | AACACAGCTGACAAGAACTCGCGCGAC | VLVSCVVAR | RATTQLTRT | SRDNTADKN | FLSAVLSRE | SSCQLCCRA | LARQHS*QE | 1 |
| 2 | ACCCAGGCTCCTCGCCCTGCCGGCGCC | LALPAPPRL | SPCRRHPGS | RPAGATQAP | RSLGGAGRA | GAWVAPAGR | EPGWRRQGE | 1 |
| 2 | ACCCAGACGTCTCGCCCTGCCGGCGCC | LALPAPPRR | SPCRRHPDV | RPAGATQTS | RRLGGAGRA | DVWVAPAGR | TSGWRRQGE | 1 |
| 2 | ACGTCTGGGTGGCGCCGGCAGGGCGAG | LALPAPPRR | SPCRRHPDV | RPAGATQTS | RRLGGAGRA | DVWVAPAGR | TSGWRRQGE | 1 |
| 2 | ACGTCTGGGTGGCGCCGGCTGGGCGAG | LAQPAPPRR | SPSRRHPDV | RPAGATQTS | RRLGGAGWA | DVWVAPAGR | TSGWRRLGE | 1 |
| 2 | ACAGGCCTCCAGCCCAGCCGGCTCCCC | PAQPAPHRP | GSRLGWRPV | SPAGSPQAS | GGLWGAGWA | LGWRPVGSR | PAGLEACGE | 1 |
| 3 | AATGGCCGCCGCCTCCTGAGCTTCCTG | LPEWPPPPE | SSGGGGHSG | LRRRRPFRK | S*MAAAS*A | AQEAAAIQE | FLNGRRLLS | 1 |
| 3 | AAGCTCAGGAGGCGGCGGCCATTCAGG | LPEWPPPPE | SSGGGGHSG | LRRRRPFRK | S*MAAAS*A | AQEAAAIQE | FLNGRRLLS | 2 |
| 3 | AGCTCAGGCGGCGGCGGCCATTCAGGG | LPEWPPPPE | SSGGGGHSG | LRRRRPFRE | P*MAAAA*A | AQAAAAIQG | SLNGRRRLS | 1 |
| 3 | AGCGAGCTCGGGAGGCGGCGGCCATTC | LAEWPPPPE | SSGGGGHSA | LGRRRPFSE | R*MAAASRA | AREAAAIQR | SLNGRRLPS | 1 |
| 3 | AATGGCCGCCGCCGCCTGAGCTTCCTG | LPEWPPPPE | SSGGGGHSG | LRRRRPFRK | S*MAAAA*A | AQAAAAIQE | FLNGRRRLS | 3 |
| 3 | AAGCTCAGGAGGCGGCGGCCGTTCAGG | LPERPPPPE | SSGGGGRSG | LRRRRPFRK | S*TAAAS*A | AQEAAAVQE | FLNGRRLLS | 3 |
| 3 | AATGGCCGCCGCCGCCTGAGCTCCCTG | LPEWPPPPE | SSGGGGHSG | LRRRRPFRE | P*MAAAA*A | AQAAAAIQG | LNGRRRLSS | 1 |
| 3 | AACGGCCGCCGCCTCCTGAACTCCCTG | LPERPPPPE | SSGGGGRSG | FRRRRPFRE | P*TAAAS*T | VQEAAAVQG | LNGRRLLNS | 2 |
| 4 | AAGACCAGAGGGGAGCCGGCGGGGCTG | G*RPEGSRR | AGGAEDQRG | LKTRGEPAG | PRRLPSGLQ | PAGSPLVFS | PPAPLWSSA | 1 |
| 4 | CCCCTCTGGTCTCCTGCCCTGCCGGCT | GRRPEGSRQ | AGRAGDQRG | QETRGEPAG | PCRLPSGLL | PAGSPLVSC | LPAPLWSPA | 1 |
| 5 | AACTCTATTGAGTGTCAGGTCATCTCC | SSPTLLSVR | HLQLY*VSG | ISNSIECQV | DLTNRVGD | T*HSIELEM | PDTQ*SWR* | 1 |
| 6 | AAAGCAACAACTCCACCAACATCAGCT | QLHQHQLKQ | NSTNIS*SN | TPPTSAKAT | CCFS*CWWS | VALADVGGV | LL*LMLVEL | 1 |
| 7 | AACAGCAGAGCCATTCACTGACCCCAG | H*PQNSRAI | TDPRTAEPF | LTPEQQSHS | *MALLFWGQ | EWLCCSGVS | NGSAVLGSV | 1 |
| 8 | AAGGAGCCGGGCTGGGGCCGGCAGGGC | AGRARSRAG | PAGQGAGLG | RQGKEPGWG | APARLLALP | PQPGSLPCR | PSPAPCPAG | 1 |
GP: grouping; #: number of instances found in the Zv8 assembly. Note the repeat base has been reordered by lexicographical order.
Repeats of length 30 with each of the six possible protein translations
| GP | Repeat Base | All possible protein translations | # | |||||
|---|---|---|---|---|---|---|---|---|
| 1 | AGCCCCTGAGCGCCCTCCAGTGTCGGCTCC | APAPERPPVS | DTGGRSGAGA | RHWRALRGWS | RLQPLSALQC | GSSP*APSSV | TLEGAQGLEP | 32 |
| 1 | AGAGCGCCCGCCAGTGTCGGCTCCAGCCCC | APAPERPPVS | DTGGRSGAGA | RHWRALWGWS | RLQPQSARQC | GSSPRAPASV | TLAGALGLEP | 12 |
| 1 | ACACTGGAGGGCGCTCAGGGGCTGGAGCCG | APAPERPPVS | DTGGRSGAGA | RHWRALRGWS | RLQPLSALQC | GSSP*APSSV | TLEGAQGLEP | 16 |
| 1 | ACACTGGCGGGCGCTCTGGGGCTGGAGCCG | APAPERPPVS | DTGGRSGAGA | RHWRALWGWS | RLQPQSARQC | GSSPRAPASV | TLAGALGLEP | 12 |
| 1 | ACACTGGAGGGCGCTCTGGGGCTGGAGCCG | APAPERPPVS | DTGGRSGAGA | RHWRALWGWS | RLQPQSALQC | GSSPRAPSSV | TLEGALGLEP | 15 |
| 1 | AGAGCGCCCTCCAGTGTCGGCTCCAGCCCC | APAPERPPVS | DTGGRSGAGA | RHWRALWGWS | RLQPQSALQC | GSSPRAPSSV | TLEGALGLEP | 14 |
| 1 | ACACTGGCGGGCGCTCGGGGGCTGGAGCCG | APAPERPPVS | DTGGRSGAGA | RHWRALGGWS | RLQPPSARQC | GSSPRAPASV | TLAGARGLEP | 2 |
| 2 | ACTGGAGGCAGCTGGACTGGAGCCGGCGGG | APVQLPPVPP | AGGTGGSWTG | PAGLEAAGLE | RRDWRQLDWS | LQSSCLQSRR | SSPAASSPAG | 2 |
| 2 | ACAGGAGGCAGCTGGGCTGGAGCCGGCGGG | APAQLPPVPP | AGGTGGSWAG | PAGQEAAGLE | RRDRRQLGWS | LQPSCLLSRR | SSPAASCPAG | 1 |
| 2 | AGCTGCCTCCAGTCCCGCCGGCTCCTGCCC | APAQLPPVPP | AGGTGGSWAG | PAGLEAAGQE | RRDWRQLGRS | LLPSCLQSRR | SSPAGSCPAA | 1 |
| 2 | ACTGGAGGCAGCTGGGCAGGAGCCGGCGGG | APAQLPPVPP | AGGTGGSWAG | PAGLEAAGQE | RRDWRQLGRS | LLPSCLQSRR | SSPAGSCPAA | 1 |
| 3 | AAGATGGCCGACTCCAGTCCTCCAGCTCAC | QLTRWPTPVL | SSQDGRLQSS | AHKMADSSP | WRTGVGHLVS | GGLESAIL*A | EDWSRPSCEL | 7 |
| 3 | ACTGGAGTCGGCCATCTTGTGAGCTGGAGG | QLTRWPTPVL | SSQDGRLQSS | AHKMADSSP | WRTGVGHLVS | GGLESAIL*A | EDWSRPSCEL | 1 |
| 4 | AAACTGCCGCAAGGCTCCAAATACTTCTCC | KLPQGSKYFS | NCRKAPNTSP | TAARLQILLQ | LEKYLEPCGS | WRSIWSLAAV | GEVFGALRQF | 1 |
GP: grouping; #: number of instances found in the Zv8 assembly. Note the repeat base has been reordered by lexicographical order.
Figure 4Distribution of repeats across linkage groups. The bin number in the x-axis represents increments of 5% across each linkage group, while the y-axis represents the percentage of all repeats falling within the respective bin.
Comparison of exact tandem repeats within various eukaryotic genomes
| Base Size | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Genome | Build | Genome Size (MB) | All | 3 | 4 | 5 | 6 | 18, 24, 27, 30 | Long | Frequency |
| zebrafish | Zv8 | 1480 | 116,915 | 37,383 | 67,313 | 11,767 | 93 | 191 | 54 | 1/12,659 |
| mouse | Mm8 | 2600 | 57,145 | 16,022 | 33,430 | 5,186 | 2,066 | 17 | 68 | 1/45,498 |
| rat | Rn4 | 2800 | 41,422 | 16,746 | 22,077 | 1,213 | 930 | 35 | 70 | 1/67,597 |
| fugu | Fr1 | 395 | 3,785 | 997 | 2,375 | 366 | 33 | 0 | 48 | 1/104,359 |
| tetraodon | TetNig1 | 350 | 1,808 | 1,287 | 419 | 82 | 14 | 0 | 12 | 1/193,584 |
Long represents the longest base size repeat detected. Frequency calculated as the genome size divided by the number of total repeats.