| Literature DB >> 24260321 |
M Humberto Reyes-Valdés1, Amalio Santacruz-Varela, Octavio Martínez, June Simpson, Corina Hayano-Kanashiro, Celso Cortés-Romero.
Abstract
The strategy of bulk DNA sampling has been a valuable method for studying large numbers of individuals through genetic markers. The application of this strategy for discrimination among germplasm sources was analyzed through information theory, considering the case of polymorphic alleles scored binarily for their presence or absence in DNA pools. We defined the informativeness of a set of marker loci in bulks as the mutual information between genotype and population identity, composed by two terms: diversity and noise. The first term is the entropy of bulk genotypes, whereas the noise term is measured through the conditional entropy of bulk genotypes given germplasm sources. Thus, optimizing marker information implies increasing diversity and reducing noise. Simple formulas were devised to estimate marker information per allele from a set of estimated allele frequencies across populations. As an example, they allowed optimization of bulk size for SSR genotyping in maize, from allele frequencies estimated in a sample of 56 maize populations. It was found that a sample of 30 plants from a random mating population is adequate for maize germplasm SSR characterization. We analyzed the use of divided bulks to overcome the allele dilution problem in DNA pools, and concluded that samples of 30 plants divided into three bulks of 10 plants are efficient to characterize maize germplasm sources through SSR with a good control of the dilution problem. We estimated the informativeness of 30 SSR loci from the estimated allele frequencies in maize populations, and found a wide variation of marker informativeness, which positively correlated with the number of alleles per locus.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24260321 PMCID: PMC3833943 DOI: 10.1371/journal.pone.0079936
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Plots of information parameters for several bulk sizes, based on estimated allele frequencies in maize germplasm sources.
Average values for information parameters in bulks of 10 to 100 SSR alleles in maize.
| Parameter | 10 | 20 | 30 | 40 | 50 | 60 | 70 | 80 | 90 | 100 |
| Diversity | 0.538 | 0.560 | 0.569 | 0.574 | 0.576 | 0.577 | 0.578 | 0.579 | 0.579 | 0.579 |
| Noise | 0.086 | 0.052 | 0.033 | 0.022 | 0.014 | 0.01 | 0.006 | 0.004 | 0.003 | 0.002 |
| Information | 0.452 | 0.509 | 0.536 | 0.552 | 0.562 | 0.568 | 0.572 | 0.574 | 0.576 | 0.577 |
Figure 2Probability of an allele being undetected for different numbers of fractions of a 60-allele sample, under several dilution thresholds and population allele frequencies.
Analysis of three SSR loci for their amplification under allele dilution.
| Marker | Bulk | NP | Alleles | DR | Detection (%) | DU |
| PHI093 | PL092—1 | 10 | 3 | 0.10—0.70 | 66 | 0.10 |
| PL092—2 | 10 | 3 | 0.15—0.30 | 66 | 0.15 | |
| PL092—3 | 9 | 3 | 0.11—0.65 | 66 | 0.11 | |
| PL092 | 29 | 3 | — | 100 | — | |
| PHI072 | PL077—1 | 10 | 6 | 0.05—0.52 | 100 | — |
| PL077—2 | 10 | 5 | 0.05—0.55 | 100 | — | |
| PL077—3 | 10 | 6 | 0.05—0.43 | 100 | — | |
| PL077 | 30 | 9 | — | 100 | — | |
| PHI064 | PL077—1 | 10 | 7 | 0.05—0.15 | 100 | — |
| PL077—2 | 10 | 8 | 0.05—0.20 | 100 | — | |
| PL077—3 | 10 | 9 | 0.05—0.20 | 100 | — | |
| PL077 | 30 | 9 | — | 100 | — | |
| PL149—1 | 9 | 9 | 0.06-0.22 | 100 | — | |
| PL149—2 | 9 | 7 | 0.06-0.22 | 100 | — | |
| PL149—3 | 4 | 5 | 0.13—0.5 | 100 | — | |
| PL149 | 22 | 10 | — | 100 | — |
NP = number of plants, DR = dilution range, DU = dilution in undetected cases.
The global result for each combined bulk is given in bold face characters.
Informativeness of SSR loci for bulk DNA genotyping.
| SSR | Alleles | Information |
| phi127 | 6 | 3.429 |
| phi051 | 6 | 2.677 |
| phi115 | 3 | 1.490 |
| phi015 | 6 | 3.143 |
| phi033 | 6 | 2.848 |
| phi053 | 8 | 3.693 |
| phi072 | 7 | 3.044 |
| phi093 | 5 | 3.609 |
| phi024 | 7 | 3.820 |
| phi085 | 8 | 2.693 |
| phi034 | 8 | 3.904 |
| phi121 | 3 | 0.667 |
| phi056 | 8 | 3.758 |
| phi064 | 10 | 4.951 |
| phi050 | 3 | 1.627 |
| phi96100 | 11 | 4.130 |
| phi101249 | 8 | 4.128 |
| phi109188 | 9 | 3.662 |
| phi073 | 5 | 3.378 |
| phi96342 | 5 | 2.390 |
| phi109275 | 6 | 3.910 |
| phi427913 | 9 | 3.409 |
| phi265454 | 8 | 3.640 |
| phi402893 | 9 | 3.288 |
| phi346482 | 5 | 2.058 |
| phi308090 | 5 | 2.544 |
| phi330507 | 6 | 1.796 |
| phi213984 | 3 | 0.795 |
| phi339017 | 3 | 1.664 |
| phi159819 | 5 | 3.578 |
Figure 3Informativeness of SSRs.
Distribution of informativeness in 30 SSR loci measured in bits (a). Association between SSR marker informativeness and number of alleles per locus (b).