| Literature DB >> 26266814 |
Shi-Meng Ai1, Jian-Jun Gao2, Shu-Qun Liu2, Yun-Xin Fu3.
Abstract
Mutation primarily occurs when cells divide and it is highly desirable to have knowledge of the rate of mutations for each of the cell divisions during individual development. Recently, recessive lethal or nearly lethal mutations which were observed in a large mutation accumulation experiment using Drosophila melanogaster suggested that mutation rates vary significantly during the germline development of male Drosophila melanogaster. The analysis of the data was based on a combination of the maximum likelihood framework with numerical assistance from a newly developed coalescent algorithm. Although powerful, the likelihood based framework is computationally highly demanding which limited the scope of the inference. This paper presents a new estimation approach by minimizing chi-square statistics which is asymptotically consistent with the maximum likelihood method. When only at most one mutation in a family is considered the minimization of chi-square is simplified to a constrained weighted minimum least square method which can be solved easily by optimization theory. The new methods effectively eliminates the computational bottleneck of the likelihood. Reanalysis of the published Drosophila melanogaster mutation data results in similar estimates of mutation rates. The new method is also expected to be applicable to the analysis of mutation data generated by next-generation sequencing technology.Entities:
Mesh:
Year: 2015 PMID: 26266814 PMCID: PMC4534375 DOI: 10.1371/journal.pone.0135398
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1A genealogy of five lines.
A family of male descendents (sperms) that come from an ancestor male drosophila melanogaster (ancestor sperm) can be represented as a genealogy. The numbers at the bottom of the Fig denote 5 lines (offspring or sample in terms of coalescent theory) of the family from the Most Recent Common Ancestor (MRCA), and the order of the numbers does not matter. Solid lines denote branches, filled dots denote mutations, and l s (j = 1,2,3) denote the number of cell divisions of a single line of the genealogy in interval j (combine coalescent times into fewer intervals because of the constraint of the computation burden, l 2 and l 3 includes two coalescent times each), and l 1 = 1, l 2 = 2, and l 3 = 2. Branches can be classified according to the number of descendents in the sample. A branch is defined as a branch of size i if the branch has i descendents in the sample, for example, branch AB is a branch of size 2 and branch AC is a branch of size 1. A mutation is said to be a mutation of size i if the mutation occurred at the branch of size i, so the mutations a, b, and c are mutations of size 3, 2, and 1, respectively. Let t (j = 1,2,3) represents the number of cell divisions of all lines of the genealogy in interval j, so t 1 = 1, t 2 = 5, and t 3 = 9, denote vector = (t 1,t 2,t 3), represents the total number of cell divisions in the genealogy.
Simulated , 20 lines per family, 5 intervals , 2 million simulations, and 38 cell divisions.
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| 1 | 0.05028237 | 0.18482122 | 24.27570070 | 96.32421066 | 93.06592231 |
| 2 | 0.06784213 | 0.23011006 | 20.89096763 | 61.20499303 | 3.23522610 |
| 3 | 0.07823357 | 0.24062849 | 13.38313297 | 27.64166534 | 0.14672610 |
| 4 | 0.08673008 | 0.23875896 | 7.85545120 | 9.61194771 | 0.00563645 |
| 5 | 0.09415637 | 0.23191086 | 4.76238446 | 2.77231275 | 0.00017430 |
| 6 | 0.10076345 | 0.22031624 | 3.05434761 | 0.71554283 | 0.00000498 |
| 7 | 0.10566135 | 0.20469970 | 2.04687450 | 0.17854980 | 0.00000000 |
| 8 | 0.10885757 | 0.18783914 | 1.40841434 | 0.04467530 | 0.00000000 |
| 9 | 0.11161006 | 0.17002888 | 0.98125598 | 0.01066335 | 0.00000000 |
| 10 | 0.11211454 | 0.15119572 | 0.68993426 | 0.00270020 | 0.00000000 |
| 11 | 0.11161006 | 0.13237102 | 0.48290488 | 0.00067580 | 0.00000000 |
| 12 | 0.10885757 | 0.11362600 | 0.33614343 | 0.00012699 | 0.00000000 |
| 13 | 0.10566135 | 0.09733566 | 0.23387500 | 0.00003635 | 0.00000000 |
| 14 | 0.10076345 | 0.08119920 | 0.15908367 | 0.00001145 | 0.00000000 |
| 15 | 0.09415637 | 0.06621165 | 0.10514392 | 0.00000647 | 0.00000000 |
| 16 | 0.08673008 | 0.05296912 | 0.06897908 | 0.00000000 | 0.00000000 |
| 17 | 0.07823357 | 0.04101992 | 0.04241335 | 0.00000000 | 0.00000000 |
| 18 | 0.06784213 | 0.03062948 | 0.02500149 | 0.00000000 | 0.00000000 |
| 19 | 0.05028237 | 0.01984811 | 0.01326843 | 0.00000000 | 0.00000000 |
| 20 | 0.13980578 | 0.02652739 | 0.00949353 | 0.00000000 | 0.00000000 |
a The 5 intervals are: [1,1], [2,2], [3,14], [15,33] and [34,38].
Estimates of the mutation rate by minimizing Neyman's and Pearson’s χ 2 for 5 intervals (the intervals are the same as Table 1) and family size 20. The number of cases = 1000. (All the simulated data used in this paper has mask effect.).
| Case | Method |
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
| a | min | mean | 0.00035706 | 0.00036172 | 0.00056025 | 0.00061634 | 0.00058917 |
| std | 0.00037065 | 0.00061824 | 0.00012403 | 0.00006603 | 0.00006398 | ||
| MSE | 0.00000016 | 0.00000040 | 0.00000002 | 0.00000002 | 0.00000001 | ||
| min | mean | 0.00057439 | 0.00084619 | 0.00057393 | 0.00060853 | 0.00059269 | |
| std | 0.00056114 | 0.00095418 | 0.00013580 | 0.00006797 | 0.00006374 | ||
| MSE | 0.00000032 | 0.00000103 | 0.00000002 | 0.00000002 | 0.00000001 | ||
| min | mean | 0.00037155 | 0.00034983 | 0.00048377 | 0.00051788 | 0.00049005 | |
| std | 0.00037983 | 0.00060305 | 0.00011947 | 0.00006156 | 0.00005939 | ||
| MSE | 0.00000016 | 0.00000039 | 0.00000001 | 0.00000000 | 0.00000000 | ||
| min | mean | 0.00057937 | 0.00080615 | 0.00050008 | 0.00050922 | 0.00049384 | |
| std | 0.00055569 | 0.00090617 | 0.00012783 | 0.00006269 | 0.00005908 | ||
| MSE | 0.00000032 | 0.00000091 | 0.00000002 | 0.00000000 | 0.00000000 | ||
| ML | mean | 0.00046238 | 0.00061622 | 0.00047761 | 0.00050787 | 0.00049595 | |
| std | 0.00039839 | 0.00052960 | 0.00009202 | 0.00005047 | 0.00005230 | ||
| MSE | 0.00000016 | 0.00000029 | 0.00000001 | 0.00000000 | 0.00000000 | ||
| b | min | mean | 0.00066175 | 0.00026745 | 0.00007761 | 0.00011884 | 0.00053327 |
| std | 0.00039476 | 0.00042318 | 0.00005613 | 0.00002884 | 0.00003697 | ||
| MSE | 0.00000027 | 0.00000021 | 0.00000000 | 0.00000000 | 0.00000000 | ||
| min | mean | 0.00088555 | 0.00051624 | 0.00008421 | 0.00011553 | 0.00053454 | |
| std | 0.00058761 | 0.00062980 | 0.00005852 | 0.00002904 | 0.00003678 | ||
| MSE | 0.00000036 | 0.00000057 | 0.00000000 | 0.00000000 | 0.00000000 | ||
| min | mean | 0.00065994 | 0.00025198 | 0.00007253 | 0.00011103 | 0.00049531 | |
| std | 0.00038850 | 0.00041138 | 0.00005418 | 0.00002785 | 0.00003540 | ||
| MSE | 0.00000027 | 0.00000019 | 0.00000000 | 0.00000000 | 0.00000000 | ||
| min | mean | 0.00088518 | 0.00048659 | 0.00007905 | 0.00010788 | 0.00049645 | |
| std | 0.00057854 | 0.00060725 | 0.00005675 | 0.00002826 | 0.00003524 | ||
| MSE | 0.00000035 | 0.00000052 | 0.00000000 | 0.00000000 | 0.00000000 | ||
| ML | mean | 0.00080209 | 0.00038219 | 0.00008847 | 0.00010310 | 0.00049756 | |
| std | 0.00049468 | 0.00045320 | 0.00004117 | 0.00002249 | 0.00003426 | ||
| MSE | 0.00000028 | 0.00000029 | 0.00000000 | 0.00000000 | 0.00000000 |
a The true mutation rate u = [0.0005, 0.0005, 0.0005, 0.0005, 0.0005].
b The true mutation rate u = [0.001, 0.0001, 0.0001, 0.0001, 0.0005].
c Minimizing χ 2 by using all families and decomposing families with more than one mutation into several families. If the observed value is 0, replace it with 1 in Neyman's estimate.
d Minimizing χ 2 by only using families with no more than one mutation. If the observed value is 0, replace it with 1 in Neyman's estimate.
e Estimation using the maximum likelihood method as in Gao et al. [1].
Estimates of the mutation rate by minimizing Neyman's and Pearson's χ 2 for varying family size (from 2 to 35) and 5 intervals (the intervals are the same as Table 1). The number of cases = 2000.
| Case | Method |
|
|
|
|
| |
|---|---|---|---|---|---|---|---|
| a | min | mean | 0.00005718 | 0.00050103 | 0.00005192 | 0.00008297 | 0.00009956 |
| std | 0.00013024 | 0.00040519 | 0.00004820 | 0.00002207 | 0.00001609 | ||
| MSE | 0.00000002 | 0.00000033 | 0.00000000 | 0.00000000 | 0.00000000 | ||
| min | mean | 0.00009090 | 0.00008922 | 0.00008901 | 0.00008911 | 0.00008906 | |
| std | 0.00014880 | 0.00014909 | 0.00014720 | 0.00014715 | 0.00014714 | ||
| MSE | 0.00000002 | 0.00000002 | 0.00000002 | 0.00000002 | 0.00000002 | ||
| b | min | mean | 0.00024827 | 0.00233125 | 0.00018928 | 0.00053522 | 0.00048934 |
| std | 0.00046132 | 0.00126335 | 0.00013253 | 0.00005343 | 0.00003758 | ||
| MSE | 0.00000028 | 0.00000495 | 0.00000011 | 0.00000000 | 0.00000000 | ||
| min | mean | 0.00060545 | 0.00078802 | 0.00051082 | 0.00050401 | 0.00049697 | |
| std | 0.00055825 | 0.00086289 | 0.00010808 | 0.00004779 | 0.00003650 | ||
| MSE | 0.00000032 | 0.00000083 | 0.00000001 | 0.00000000 | 0.00000000 | ||
| c | min | mean | 0.00098438 | 0.00361194 | 0.00050110 | 0.00109199 | 0.00096821 |
| std | 0.00110979 | 0.00230323 | 0.00023313 | 0.00009138 | 0.00005871 | ||
| MSE | 0.00000123 | 0.00001213 | 0.00000030 | 0.00000002 | 0.00000000 | ||
| min | mean | 0.00161448 | 0.00161003 | 0.00109328 | 0.00099999 | 0.00098870 | |
| std | 0.00111462 | 0.00176815 | 0.00019171 | 0.00008068 | 0.00005699 | ||
| MSE | 0.00000162 | 0.00000350 | 0.00000005 | 0.00000001 | 0.00000000 |
a The true mutation rate u = [0.0001, 0.0001, 0.0001, 0.0001, 0.0001].
b The true mutation rate u = [0.0005, 0.0005, 0.0005, 0.0005, 0.0005].
c The true mutation rate u = [0.001, 0.001, 0.001, 0.001, 0.001].
d Minimizing χ 2 by only using families with zero and one mutation. If the observed value is 0, replace it with 1 in Neyman's estimate.
Estimates of the mutation rate for weighting all 34 family sizes (2–35) by using Gao et al.'s [3] data (The 6 intervals in Gao et al. [3] are: [1, 1], [2, 2], [3, 3], [4, 14], [15, 33], [34, 38].)
| Lethality | Method |
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|---|---|
| [99%,100%) | min | 0.002488 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000747 | 0.006225 |
| min | 0.006234 | 0.000000 | 0.000000 | 0.000000 | 0.000040 | 0.000802 | 0.010998 | |
| ML | 0.004054 | 0.000000 | 0.000000 | 0.000001 | 0.000028 | 0.000850 | 0.008847 | |
| [98%,99%) | min | 0.011047 | 0.000000 | 0.000000 | 0.000000 | 0.000031 | 0.000500 | 0.014131 |
| min | 0.043618 | 0.000000 | 0.000000 | 0.000000 | 0.000034 | 0.000542 | 0.046974 | |
| ML | 0.024460 | 0.000337 | 0.000006 | 0.000010 | 0.000067 | 0.000551 | 0.028941 | |
| [97%,98%) | min | 0.014709 | 0.000000 | 0.000000 | 0.000000 | 0.000052 | 0.000375 | 0.017568 |
| min | 0.078955 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000424 | 0.081075 | |
| ML | 0.039577 | 0.000605 | 0.000002 | 0.000002 | 0.000077 | 0.000398 | 0.043659 | |
| ≥98% | min | 0.011186 | 0.000000 | 0.000000 | 0.000000 | 0.000026 | 0.001273 | 0.018038 |
| min | 0.051330 | 0.000000 | 0.000000 | 0.000000 | 0.000067 | 0.001355 | 0.059378 | |
| ML | 0.028660 | 0.000272 | 0.000006 | 0.000018 | 0.000089 | 0.001443 | 0.038042 | |
| ≥97% | min | 0.019509 | 0.000000 | 0.000000 | 0.000000 | 0.000046 | 0.001705 | 0.028916 |
| min | 0.151034 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.001900 | 0.160532 | |
| ML | 0.067371 | 0.001258 | 0.000021 | 0.000031 | 0.000177 | 0.001954 | 0.082124 |
a Minimizing χ 2 by only using families with zero and one mutation. If the observed value is 0, replace it with 1 for Neyman's estimate.
b Gao et al.'s [3] Maximum Likelihood estimate.