| Literature DB >> 30425731 |
Wenlei Fan1,2, Lingyang Xu1, Hong Cheng3, Ming Li3, Hehe Liu1, Yong Jiang1, Yuming Guo2, Zhengkui Zhou1, Shuisheng Hou1.
Abstract
Short tandem repeats (STRs) are usually associated with genetic diseases and gene regulatory functions, and are also important genetic markers for analysis of evolutionary, genetic diversity and forensic. However, for the majority of STRs in the duck genome, their population genetic properties and functional impacts remain poorly defined. Recent advent of next generation sequencing (NGS) has offered an opportunity for profiling large numbers of polymorphic STRs. Here, we reported a population-scale analysis of STR variation using genome resequencing in mallard and Pekin duck. Our analysis provided the first genome-wide duck STR reference including 198,022 STR loci with motif size of 2-6 base pairs. We observed a relatively uneven distribution of STRs in different genomic regions, which indicates that the occurrence of STRs in duck genome is not random, but undergoes a directional selection pressure. Using genome resequencing data of 23 mallard and 26 Pekin ducks, we successfully identified 89,891 polymorphic STR loci. Intensive analysis of this dataset suggested that shorter repeat motif, longer reference tract length, higher purity, and residing outside of a coding region are all associated with an increase in STR variability. STR genotypes were utilized for population genetic analysis, and the results showed that population structure and divergence patterns among population groups can be efficiently captured. In addition, comparison between Pekin duck and mallard identified 3,122 STRs with extremely divergent allele frequency, which overlapped with a set of genes related to nervous system, energy metabolism and behavior. The evolutionary analysis revealed that the genes containing divergent STRs may play important roles in phenotypic changes during duck domestication. The variation analysis of STRs in population scale provides valuable resource for future study of genetic diversity and genome evolution in duck.Entities:
Keywords: duck; population genetics; short tandem repeat; variation; whole genome resequencing
Year: 2018 PMID: 30425731 PMCID: PMC6218588 DOI: 10.3389/fgene.2018.00520
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Distribution and composition of duck STR reference with respect to motif length.
| Motif length (bp) | Number of loci | % in reference | Abundance (No./Mb) | Common motifs (% in each category) |
|---|---|---|---|---|
| Di- | 54,347 | 27.44 | 49.95 | AT(44.81), AC(23.13), CA(14.09) |
| Tri- | 31,711 | 16.01 | 29.15 | AAT(26.95), AAC(23.40), ATT(9.03) |
| Tetra- | 76,100 | 38.43 | 69.95 | AAAC(26.91), AAAT(22.52), ATTT(8.21) |
| Penta- | 25,750 | 13.00 | 23.67 | AAAAC(23.62), AAAAT(10.76), ACAAA(5.98) |
| Hexa- | 10,114 | 5.11 | 9.30 | AAAAAT(10.61), AAAAAC(7.39), AAAACA(3.64) |
| Total | 198,022 | 100.00 | 182.02 |
FIGURE 1Genomic landscape of STRs in duck genome. Tracks from outside to inside are: I, chromosomes in different colors; II-VI, STR density for Di-, Tri-, Tetra-, Penta- and hexa-nucleotide repeats. STR density in a nonoverlapping window size of 1 Mb is color coded from red to blue, with deeper blue region representing higher STR density. VII and VIII, Heterozygosity of STRs in Pekin duck and mallard was drawn in a 500 kb sliding windows with a 50 kb step, respectively, and STR heterozygosity was averaged for each window.
Distribution of duck STR in different genomic regions.
| Motif length (bp) | Intergenic | Intronic | 3′-UTR | 5′-UTR | Exonic | Noncoding exon |
|---|---|---|---|---|---|---|
| Di- | 32,136 | 20,409 | 962 | 226 | 189 | 425 |
| Tri- | 19,339 | 10,355 | 405 | 198 | 1113 | 301 |
| Tetra | 48,620 | 25,894 | 709 | 200 | 121 | 556 |
| Penta | 16,560 | 8,503 | 328 | 103 | 52 | 204 |
| Hexa- | 6,332 | 3,296 | 116 | 46 | 235 | 89 |
| Total | 122,987 | 68,457 | 2,520 | 773 | 1,710 | 1,575 |
FIGURE 2Distribution of STR loci in different genomic regions. (A) Percentage of STRs with various motif sizes (B) Distribution of tri-nucleotide repeats with various GC content.
Overview of STRs genotyped in duck population.
| Motif length (bp) | Number genotyped | Number passed filter | Number polymorphic |
|---|---|---|---|
| Di- | 52,164 | 38,519 | 25,624 |
| Tri- | 30,394 | 22,948 | 14,944 |
| Tetra- | 73,503 | 58,206 | 36,302 |
| Penta- | 22,881 | 16,333 | 10,271 |
| Hexa- | 8,932 | 5,283 | 2,750 |
| Total | 187,874 | 141,289 | 89,891 |
FIGURE 3Patterns of STR variation. (A) Frequency distribution of the common allele number per locus. (B) Frequency distribution of the common allele number per locus stratifying by motif length. (C,D) STR variability positively correlated with sequence purity and reference tract length, negatively correlated with motif length. The curves were smoothed by averaging the data points by a sliding window of ± 5 bp. (E) Heterozygosity distribution of STR located in different genomic region. The red interior point indicates the median heterozygosity and the blue interior point indicates the average heterozygosity.
FIGURE 4Evaluation of genetic diversity with polymorphic STRs. (A) Genetic diversity of the 10% most heterozygous autosomal loci in Pekin ducks and mallards. The box extends from the lower to upper quartiles of the heterozygosity distribution, and the interior line indicates the median. (B) The first two principal components based on an analysis of genetic variation at 43,939 autosomal STR loci. (C) The first two principal components based on autosomal SNP.