| Literature DB >> 35951748 |
Egor Lappo1, Noah A Rosenberg1.
Abstract
Properties of gene genealogies such as tree height (H), total branch length (L), total lengths of external (E) and internal (I) branches, mean length of basal branches (B), and the underlying coalescence times (T) can be used to study population-genetic processes and to develop statistical tests of population-genetic models. Uses of tree features in statistical tests often rely on predictions that depend on pairwise relationships among such features. For genealogies under the coalescent, we provide exact expressions for Taylor approximations to expected values and variances of ratios Xn/Yn, for all 15 pairs among the variables {Hn,Ln,En,In,Bn,Tk}, considering n leaves and 2≤k≤n. For expected values of the ratios, the approximations match closely with empirical simulation-based values. The approximations to the variances are not as accurate, but they generally match simulations in their trends as n increases. Although En has expectation 2 and Hn has expectation 2 in the limit as n→∞, the approximation to the limiting expectation for En/Hn is not 1, instead equaling π2/3-2≈1.28987. The new approximations augment fundamental results in coalescent theory on the shapes of genealogical trees.Entities:
Keywords: coalescent theory; external branches; internal branches; time to the most recent common ancestor
Mesh:
Year: 2022 PMID: 35951748 PMCID: PMC9526068 DOI: 10.1093/g3journal/jkac205
Source DB: PubMed Journal: G3 (Bethesda) ISSN: 2160-1836 Impact factor: 3.542
Definitions of random variables associated with various tree summaries.
| Variable | Definition |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Here, T is the random variable representing the coalescence time from k to k—1 lineages, and is the (random) length of the ith external branch of a tree with n leaves. We define H, L, and E for , I for , and B for . The expression for B follows a form that incorporates terms associated with all of its contributing branches, following p. 1400 of Uyenoyama (1997) and Section 2.6 of Alimpiev and Rosenberg (2022), and it can be simplified to .
Expectations and variances of properties of tree branch lengths.
|
|
|
|
|
|
|---|---|---|---|---|
|
|
| 2 |
|
|
|
|
|
|
|
|
|
| 2 | 2 |
| 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
These expressions can be found in Alimpiev and Rosenberg (2022). Note that for L and I, although the limiting variance is finite, the expectation is infinite (Tavaré ; Wakeley 2009, p. 76).
Fig. 1.Properties of genealogical trees. The tree height is H. The sum of the lengths of all branches is L. External branches have total length E (green). Internal branches have total length I (orange). Basal branches have mean length B (blue).
Covariances of pairs of variables that summarize genealogical trees.
| ( |
|
|
|---|---|---|
|
|
|
|
|
|
|
|
|
|
| 0 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 0 |
|
|
|
|
|
|
|
|
|
|
| 0 |
|
|
| 0 |
|
|
| 0 |
|
|
|
|
|
|
|
|
|
|
|
|
For pairs involving E or I, expressions apply for ; expressions involving B apply for . The expressions can be found in Alimpiev and Rosenberg (2022).
Approximations to expectations of ratios of pairs of variables.
| ( |
|
|
|---|---|---|
|
|
|
|
|
|
| 0 |
|
|
|
|
|
|
| 0 |
|
|
|
|
|
|
|
|
|
|
| 0 |
|
|
| 1 |
|
|
| 0 |
|
|
|
|
|
|
| 0 |
|
|
|
|
|
|
|
|
|
|
| 0 |
|
|
|
|
Expressions involving E or I apply for ; expressions involving B apply for . The value for (H, L) follows equation 15 of Arbisser . The expressions are obtained using equation 3 and Tables 2 and 3.
Approximations to variances of ratios of pairs of variables.
| ( |
|
|
|---|---|---|
|
|
|
|
|
|
| 0 |
|
|
|
|
|
|
| 0 |
|
|
|
|
|
|
|
|
|
|
| 0 |
|
|
| 0 |
|
|
| 0 |
|
|
|
|
|
|
| 0 |
|
|
|
|
|
|
|
|
|
|
| 0 |
|
|
|
|
Expressions involving E or I apply for ; expressions involving B apply for . The value for (H, L) follows equation 18 of Arbisser . The expressions are obtained using equation 4 and Tables 2 and 3.
Fig. 2.Simulated and theoretical approximations of expectations of ratios of pairs of variables, plotted as functions of sample size n. Expressions for theoretical values are taken from Table 4.
Fig. 4.Simulated and theoretical approximations of variances of ratios of pairs of variables, plotted as functions of sample size n. Expressions for theoretical values are taken from Table 5.
Fig. 3.Theoretical approximations for variables X in , plotted as functions of k for n = 10, n = 20, and n = 50. The expressions plotted are taken from Table 4.
Fig. 5.Theoretical approximations for variables X in , plotted as functions of k for n = 10, n = 20, and n = 50. The expressions plotted are taken from Table 5.