| Literature DB >> 21106127 |
Shun-ichiro Fukuyama, Hiroyuki Morino, Hiroshi Miyazawa, Tomoaki Tanaka, Tomoko Suzuki, Masakazu Kohda, Hideshi Kawakami, Yasushi Okazaki, Kuniaki Seyama, Koichi Hagiwara.
Abstract
Homozygosity mapping is a powerful procedure that is capable of detecting recessive disease-causing genes in a few patients from families with a history of inbreeding. We report here a homozygosity mapping algorithm for high-density single nucleotide polymorphism arrays that is able to (i) correct genotyping errors, (ii) search for autozygous segments genome-wide through regions with runs of homozygous SNPs, (iii) check the validity of the inbreeding history, and (iv) calculate the probability of the disease-causing gene being located in the regions identified. The genotyping error correction restored an average of 94.2% of the total length of all regions with run of homozygous SNPs, and 99.9% of the total length of them that were longer than 2 cM. At the end of the analysis, we would know the probability that regions identified contain a disease-causing gene, and we would be able to determine how much effort should be devoted to scrutinizing the regions. We confirmed the power of this algorithm using 6 patients with Siiyama-type α1-antitrypsin deficiency, a rare autosomal recessive disease in Japan. Our procedure will accelerate the identification of disease-causing genes using high-density SNP array data.Entities:
Mesh:
Year: 2010 PMID: 21106127 PMCID: PMC2957688 DOI: 10.1186/1471-2105-11-S7-S5
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Connections between AS, RHS, false negative, type A false positive, and type B false positive values. (A) In a family with a consanguineous marriage, a loop is formed in the pedigree (bold lines). A chromosomal segment that is separately inherited through both sides of the arc becomes homozygous in an offspring and forms an autozygous segment. (B) (i) a chromosomal region with 2 ASs (dark gray boxes). (ii) An RHS is a region whose genetic length greater than the cutoff value. (iii) Relationship of an RHS and an AS. ASs are shown by dark gray boxes, and RHSs are shown by light gray boxes. Three types of errors are defined: false negative, type A false positive, and type B false positive. (C) Principle used for the genotyping error correction. If a homozygous SNP in an RHS is mistyped and becomes heterozygous, it is likely to have a greater distance (i.e. x + y) from the adjacent heterozygous SNPs than a heterozygous SNP that exists in another part of the autosomes. Therefore, heterozygous SNPs with a large x +y are likely to be mistyped.
Figure 2Determination of the RHS cutoff and the probability that the disease-causing gene is contained in RHSs. (A) The false negative rate (R) and the false positive rate (R) were calculated using equations 3 and 7 using the genotyping data for 5 α1-antitrypsin deficiency patients. The false negative rate shown is for a child from a first-cousin marriage (m + n = 6). (B) The probability that RHSs contain the disease gene (P) calculated for a child from a first-cousin marriage. The coefficient of consanguinity (F) used was 1/16, which was calculated according to Wright {Wright, S. Systems of Mating. V. General Considerations Genetics 1921: 6:167-178}. F can be more precisely calculated as the total length of RHSs divided by the total length of the autosomes for the actual calculation (equation 9). Pvaries depending on the frequency of the gene in the population.
Figure 3Genotyping error corrections. (A) RHSs for NA18987. (B) RHSs detected after introducing genotyping errors to 2,105 SNPs. (C) RHSs after the genotyping error correction algorithm was applied.
Figure 4RHSs obtained for 5 patients with Siiyama-type α1-antitrypsin deficiency and the distribution of the longest AS obtained by a Monte Carlo simulation. (A) - (E) RHSs for each patient. (F) The distribution of the length of the longest AS obtained by a Monte Carlo Simulation. The distribution for 86 HapMap JPT patients is also shown in the right side.
Size of the longest RHS for each patient
| Length of the longest RHS (cM) | |
|---|---|
| Patient 1 | 36.2 |
| Patient 2 | 39.6 |
| Patient 3 | 22.1 |
| Patient 4 | 40.3 |
| Patient 5 | 30.2 |
Figure 5Case-control analysis. (A) The overlaps of RHSs for Patients 1-5. (B) The probability that the disease-causing gene is contained in the overlap (P). The probability was calculated by multiplying Pfor Patients 1-5. F for each patient was calculated as the total length of RHSs divided by the total length of the autosomes. (C) -log10(P) value obtained by a case-control analysis. The region pointed by an arrow attained the maximal value 16.47.
Genes present in the candidate RHS overlap
| chromosome 14 open reading frame 48 | |
| OTU domain, ubiquitin aldehyde binding 2 | |
| DEAD (Asp-Glu-Ala-Asp) box polypeptide 24 | |
| interferon, alpha-inducible protein 27-like 1 | |
| interferon, alpha-inducible protein 27 | |
| interferon, alpha-inducible protein 27-like 2 | |
| protein phosphatase 4, regulatory subunit 4 | |
| serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 10 | |
| serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 6 | |
| Description: hypothetical protein LOC100287997 | |
| serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 2 | |
| serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 1 | |
| serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 11 | |
| serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 9 | |
| serpin peptidase inhibitor, clade A (alpha-1 antiproteinase, antitrypsin), member 12 |
Figure 6Subject without family history of inbreeding. (A) RHSs obtained for a patient without a family history of inbreeding (Patient 6). (B) RHS overlaps for Patients 1-6. Addition of data for Patient 6 further narrowed the overlapped regions (compare with Figure 5). The disease-causing gene was contained in the region indicated by a white arrow. (C) -log10(P) value obtained by a case-control analysis. The region pointed by an arrow attained the maximal value 17.29.