| Literature DB >> 28619083 |
Abstract
BACKGROUND: Parentage verification by molecular markers is mainly based on short tandem repeat markers. Single nucleotide polymorphisms (SNPs) as bi-allelic markers have become the markers of choice for genotyping projects. Thus, the subsequent step is to use SNP genotypes for parentage verification as well. Recent developments of algorithms such as evaluating opposing homozygous SNP genotypes have drawbacks, for example the inability of rejecting all animals of a sample of potential parents. This paper describes an algorithm for parentage verification by constrained regression which overcomes the latter limitation and proves to be very fast and accurate even when the number of SNPs is as low as 50. The algorithm was tested on a sample of 14,816 animals with 50, 100 and 500 SNP genotypes randomly selected from 40k genotypes. The samples of putative parents of these animals contained either five random animals, or four random animals and the true sire. Parentage assignment was performed by ranking of regression coefficients, or by setting a minimum threshold for regression coefficients. The assignment quality was evaluated by the power of assignment (P[Formula: see text]) and the power of exclusion (P[Formula: see text]).Entities:
Mesh:
Substances:
Year: 2017 PMID: 28619083 PMCID: PMC5472000 DOI: 10.1186/s12711-017-0324-3
Source DB: PubMed Journal: Genet Sel Evol ISSN: 0999-193X Impact factor: 4.297
Mean (), standard deviation (s), minimum (min) and maximum (max) of the regression coefficients
| Coefficient |
|
| ||||||
|---|---|---|---|---|---|---|---|---|
|
| s | Min | Max | mean | s | Min | Max | |
| Sire | 0.492 | 0.065 | 0.000 | 0.764 | – | – | – | – |
| Ran | 0.018 | 0.028 | 0.000 | 0.337 | 0.022 | 0.035 | 0.000 | 0.413 |
| Mean | 0.435 | 0.085 | 0.034 | 1.000 | 0.891 | 0.077 | 0.389 | 1.000 |
| Sire | 0.492 | 0.091 | 0.000 | 0.831 | – | – | – | – |
| Ran | 0.036 | 0.054 | 0.000 | 0.478 | 0.042 | 0.063 | 0.000 | 0.535 |
| Mean | 0.363 | 0.139 | 0.000 | 1.000 | 0.789 | 0.141 | 0.000 | 1.000 |
| Sire | 0.492 | 0.119 | 0.000 | 0.904 | – | – | – | – |
| Ran | 0.050 | 0.074 | 0.000 | 0.548 | 0.059 | 0.087 | 0.000 | 0.773 |
| Mean | 0.308 | 0.177 | 0.000 | 1.000 | 0.704 | 0.197 | 0.000 | 1.000 |
Number of SNPs used as genotypes: upper part = 500 SNPs, middle part=100 SNPs and lower part = 50 SNPs. : the sample of the putative parents contained the true sire, four randomly selected animals and the vector of expected gene contents. : the sample of the putative parents contained five randomly selected animals and the vector of expected gene contents. sire: statistics for the coefficients regressing the focused animal on the genotype of the true sire. mean: statistics for the coefficients regressing the focused animal on the vector of expected gene contents. ran: statistics for the coefficients regressing the focused animal on randomly selected animals. The number of random animals was 4 when the sample of putative parents contained the true sire, and 5 otherwise
Power of assignment (Pa) and power of exclusion (Pe)
| Algorithm |
|
| ||
|---|---|---|---|---|
| Pa | Pe | Pa | Pe | |
|
| 0.990 (14,664) | 1.000 (1) | – | 1 (4) |
|
| 0.994 (14,730) | 0.997 (86) | – | – |
| OHL counting | 0.993 (14,717) | 0.997 (99) | – | – |
| LH | 0.995 (14,744) | 1.000 (5) | – | 1 (0) |
|
| 0.969 (14,361) | 0.999 (15) | – | 0.997 (96) |
|
| 0.993 (14,711) | 0.996 (105) | – | – |
| OHL counting | 0.992 (14,699) | 0.996 (117) | – | – |
| LH | 0.995 (14,735) | 0.993 (213) | – | 1.000 (0) |
|
| 0.918 (13,607) | 0.991 (252) | – | 0.968 (960) |
|
| 0.983 (14,570) | 0.992 (246) | – | – |
| OHL counting | 0.978 (14,489) | 0.989 (327) | – | – |
| LH | 0.988 (14,639) | 0.954 (1373) | – | 0.999 (40) |
The numerator of the related equations is given in brackets. Number of SNPs used as genotypes: upper part = 500 SNPs, middle part = 100 SNPs and lower part = 50 SNPs. : the sample of the putative parents contained the true sire, four randomly selected animals and the vector of expected gene contents. : the sample of the putative parents contained five randomly selected animals and the vector of expected gene contents. Pa: probability of assigning the right parent if the sample of putative parents contained the true sire. Pe: probability of rejecting the wrong parent in favour of the right parent or the vector of expected gene contents. SNPs were randomly selected from 40k genotypes with the sample space for the 100 and 50 sets restricted to those SNPs with a minor allele frequency