Literature DB >> 30891314

Studying the effects of haplotype partitioning methods on the RA-associated genomic results from the North American Rheumatoid Arthritis Consortium (NARAC) dataset.

Mohamed N Saad1, Mai S Mabrouk2, Ayman M Eldeib3, Olfat G Shaker4.   

Abstract

The human genome, which includes thousands of genes, represents a big data challenge. Rheumatoid arthritis (RA) is a complex autoimmune disease with a genetic basis. Many single-nucleotide polymorphism (SNP) association methods partition a genome into haplotype blocks. The aim of this genome wide association study (GWAS) was to select the most appropriate haplotype block partitioning method for the North American Rheumatoid Arthritis Consortium (NARAC) dataset. The methods used for the NARAC dataset were the individual SNP approach and the following haplotype block methods: the four-gamete test (FGT), confidence interval test (CIT), and solid spine of linkage disequilibrium (SSLD). The measured parameters that reflect the strength of the association between the biomarker and RA were the P-value after Bonferroni correction and other parameters used to compare the output of each haplotype block method. This work presents a comparison among the individual SNP approach and the three haplotype block methods to select the method that can detect all the significant SNPs when applied alone. The GWAS results from the NARAC dataset obtained with the different methods are presented. The individual SNP, CIT, FGT, and SSLD methods detected 541, 1516, 1551, and 1831 RA-associated SNPs respectively, and the individual SNP, FGT, CIT, and SSLD methods detected 65, 156, 159, and 450 significant SNPs respectively, that were not detected by the other methods. Three hundred eighty-three SNPs were discovered by the haplotype block methods and the individual SNP approach, while 1021 SNPs were discovered by all three haplotype block methods. The 383 SNPs detected by all the methods are promising candidates for studying RA susceptibility. A hybrid technique involving all four methods should be applied to detect the significant SNPs associated with RA in the NARAC dataset, but the SSLD method may be preferred because of its advantages when only one method was used.

Entities:  

Keywords:  Confidence interval test; Four-gamete test; Genome-wide association study; NARAC; Rheumatoid arthritis; Solid spine of linkage disequilibrium

Year:  2019        PMID: 30891314      PMCID: PMC6403413          DOI: 10.1016/j.jare.2019.01.006

Source DB:  PubMed          Journal:  J Adv Res        ISSN: 2090-1224            Impact factor:   10.479


Introduction

RA, a chronic autoimmune disease that affects the body’s joints and bones, is considered to have a genetic basis. Genetic association studies are used to detect RA biomarkers, and SNPs are used as biomarkers for detecting RA. The number of these nucleotide morphisms is larger in RA patients than in healthy controls. These SNPs are in or near genes that commonly play a role in immunity. Most of these genes are linked to RA pathogenesis [1], [2], [3], [4]. The rapid progress in genotyping technologies has resulted in an ever-increasing volume of genotyped SNPs, which has led to advances in the understanding of complex diseases (such as RA) and represents a challenge for the future [5]. Single SNP methods are the main techniques used to identify RA biomarkers. Recently, the ability to obtain a high genomic density of SNPs (representing big data) has led to the application of haplotype block methods. These methods are applied to discover RA associations with a block rather than an SNP. A haplotype block consists of nearby SNPs that have high inter-relationships with one another. The parameter representing these relationships is the linkage disequilibrium (LD) [6], [7], [8]. The objective of the present work was to apply the individual SNP approach and three haplotype block methods to the NARAC dataset to identify RA biomarkers through a GWAS [9]. GWAS results represent a domain of big data with millions of SNPs tested against many phenotypes. These results have become a burden for bioinformaticians in terms of processing time and real-time visualization [10], [11]. The applied haplotype block methods were CIT, FGT, and SSLD. After stringent Bonferroni correction for multiple comparisons (less than 0.05 per the number of comparisons), P-values were calculated to measure the strength of association between the genetic variants and RA susceptibility [12]. In addition, the block size (in base pair (bp) and the included number of SNPs), number of blocks, percentage of SNPs not covered by the block method, percentage of significant blocks in the total number of blocks, number of significant haplotypes and SNPs were compared among the three haplotype block methods.

Material and methods

Study population

The NARAC dataset consisted of 2062 participants (1493 female and 569 male), grouped into 868 RA patients and 1194 healthy controls. All cases and controls were Caucasian [13]. The studied genetic variants were 545,080 SNPs included in the whole genome. Because allosomes (sex chromosomes (Chrs)) were outside of this research focus, 531,689 SNPs were retained for the study. After removing 22,276 SNPs because they met at least one of the following biomarker characteristics, 509,413 SNPs remained for further analysis: Less than 75% genotype match [14], Less than 0.001 Hardy-Weinberg equilibrium (HWE) P-value [15] or Less than 0.001 minor allele frequency (MAF) in the total sample [16]. The NARAC dataset represents a big data challenge because of its size and complexity. A way to handle such a challenge is to place the raw GWAS data for every Chr into a separate file. Then, each file is processed using GWAS software. Finally, the results for all the Chrs are merged together. A snapshot of the NARAC (raw) dataset is shown in Fig. 1.
Fig. 1

Snapshot of the NARAC dataset showing 10 samples with their corresponding 3 SNPs. The first column represents the individuals’ IDs. The second column refers to the affection status (0: case, 1: control). The third column shows the sex (F: female, M: male). The next columns correspond to the SNPs, with the first row providing the SNP ID. In each SNP cell, two identical alleles represent a homozygote, whereas two different alleles represent a heterozygote.

Snapshot of the NARAC dataset showing 10 samples with their corresponding 3 SNPs. The first column represents the individuals’ IDs. The second column refers to the affection status (0: case, 1: control). The third column shows the sex (F: female, M: male). The next columns correspond to the SNPs, with the first row providing the SNP ID. In each SNP cell, two identical alleles represent a homozygote, whereas two different alleles represent a heterozygote.

Material

For the NARAC dataset, each Chr data file was extracted from the NARAC data file using the programming language Perl. All Chr data files were reformatted for processing by the program PLINK in the statistical package R 3.1.0. The R language was used to extract all the Chrs map files from the NARAC map file (SNP ID, physical position, and Chr number). Each reformatted Chr data and map files were processed by PLINK 1.07 and gPLINK 2.05 in preparation for processing by the program Haploview 4.2 [17]. Haploview 4.2 was used to partition all the Chrs into successive blocks using the CIT, FGT, and SSLD methods; to calculate the corresponding P-values for each haplotype in each block; to apply the individual SNP approach; to calculate the corresponding P-value for each SNP; and to display the LD results [18]. The default parameters for the three haplotype block methods were used. The RA-associated SNPs determined by using the individual SNP approach were highlighted on a Manhattan plot generated using R [19]. The significant blocks and the associated SNPs were selected using MATLAB release 2010a. Fig. 2 shows a block diagram of the entire association analysis. The DAVID (database for annotation, visualization and integrated discovery) bioinformatics resources 6.8 was operated to perform a functional pathway analysis and a disease enrichment analysis [20], [21].
Fig. 2

Summary of the proposed system for the NARAC dataset.

Summary of the proposed system for the NARAC dataset.

Testing for associations with RA susceptibility

Both individual SNP associations and haplotype associations were measured with the aid of P-values. Statistically significant SNPs were detected using their corresponding P-values after stringent Bonferroni correction for multiple comparisons (less than 0.05 per the number of comparisons).

Results

Four methods were applied to the NARAC dataset: the individual SNP approach and three haplotype block methods. The three block methods were FGT, CIT, and SSLD. The measured parameter was the P-value after Bonferroni correction. The three haplotype block methods were compared on the basis of the block size (in bp or number of SNPs), number of blocks, percentage of uncovered SNPs, percentage of significant blocks, percentage of significant haplotypes, and number of associated SNPs. The test algorithms were applied on an Intel Core i7-4720HQ 2.6 GHz system with 16 GB of RAM. Table S1 lists the processing time for each program. The total working time for all Chrs was 3353 min (approximately 56 h). Table S2 shows the significance level after Bonferroni correction for multiple comparisons (0.05/total number of comparisons). The results related to the haplotype block methods are shown in Tables S3–S24. FGT partitioned the twenty-two Chrs into more blocks (99,856 blocks) than CIT (93,422 blocks) and SSLD (86,179 blocks). On average, the SSLD blocks included more SNPs per Chr (5 SNPs) than FGT (4 SNPs) and CIT (3 SNPs). As shown in Table 1, the median block size per Chr was larger for SSLD (12,046 bp) than for FGT (8328 bp) and CIT (7368 bp), confirming the greater genomic coverage by SSLD blocks. These results were checked for significance using Kruskal–Wallis test by ranks. The Kruskal–Wallis test showed the presence of statistically significant difference in the distribution of the median block size among the three methods (P-value = 1.39 × 10−09). Using Wilcoxon rank sum test, the differences between (FGT and SSLD), (CIT and SSLD), and (CIT and FGT) were statistically significant (P-values = 1.986 × 10−07, 1.515 × 10−08, and 0.009, respectively).
Table 1

Results of the median block size (in bp) by all three block methods for the general blocks and the significantly associated blocks with RA.

Chr no.CIT (General)FGT (General)SSLD (General)CIT (Significant)FGT (Significant)SSLD (Significant)
18489954713,54964,63447,70034,467
28495964514,34224,12311,75623,312
37938924013,544751311,85413,800
4994711,08313,544327932790
58641969714,10222,05215,38118,456
68457958313,9448672744810,123
78235900813,86927,949432632,616
87149797112,26215,28014,40410,115
96324716610,29710,66215,47313,315
107464839212,23124626699719
117764863412,455974695040
128043889813,2815705570510,091
138346913413,4109913466332,705
147458844312,74718,22512,31618,225
156151733610,451932111,21314,822
1649125562898424,155689364,712
1762637535999712,69057,21318,594
186811796211,3790821011,265
196760793010,833957110,63318,621
206413693310,5637448613321,323
216784755210,87113,02011,8174704
22527259868381929810,65024,936
Results of the median block size (in bp) by all three block methods for the general blocks and the significantly associated blocks with RA. Although, SSLD produced the lowest number of blocks, due to its median block size and median number of SNPs within each block, 95.68% of the genotyped SNPs were localized with SSLD, compared to 87.74% with FGT and 77.88% with CIT. Accordingly, the density of the genotyped SNPs was sufficient for haplotype association mapping. The lowest number of studied SNPs needed for GWASs is 100,000 [15] which was attained by the four methods. Considerable variation in the haplotype block structure across the twenty-two Chrs was uncovered, with block sizes ranging from 2 bp (for the three methods) to 498,545 bp for FGT, 498,091 bp for SSLD, and 499,937 bp for CIT. FGT generated more significant haplotypes (437 haplotypes) than CIT (396 haplotypes) and SSLD (383 haplotypes) for the twenty-two Chrs. As shown in Tables S3–S24, the average percentage of significant blocks in the total number of blocks per Chr was higher for FGT (0.248%) than for CIT (0.241%) and SSLD (0.226%). Fig. 3 shows the significant blocks obtained with the three haplotype block methods for the twenty-two Chrs. For each Chr, the total number of significant blocks, the total number of associated SNPs, and the total sizes of the significant blocks (in bp) are shown in Fig. 3a–c respectively.
Fig. 3

Comparison of the RA-associated results obtained by the three haplotype block partitioning methods. (a) The total number of significant blocks for each Chr. (b) The total number of associated SNPs for each Chr. (c) The total significant blocks size in bp for each Chr.

Comparison of the RA-associated results obtained by the three haplotype block partitioning methods. (a) The total number of significant blocks for each Chr. (b) The total number of associated SNPs for each Chr. (c) The total significant blocks size in bp for each Chr. On average, the significant SSLD blocks included more SNPs per Chr (6 SNPs) than the significant FGT (4 SNPs) and CIT (4 SNPs) blocks. The median significant block size for the twenty-two Chrs was larger for SSLD (32,550 bp) than for CIT (14,350 bp) and FGT (13,055 bp). These results were checked for significance using Kruskal–Wallis test by ranks. The difference among the three groups determined using Kruskal–Wallis was not statistically significant (P-value = 0.077). The minimum significant block size for the twenty-two Chrs was larger for SSLD (52 bp for Chr 8) than for FGT (26 bp for Chr 6) and CIT (15 bp for Chr 11). The maximum significant block size was larger for SSLD (344,667 bp for Chr 1) than for FGT (318,113 bp for Chr 3) and CIT (209,237 bp for Chr 6). The significant SSLD blocks included more associated SNPs (1831 SNPs) than the significant FGT (1551 SNPs) and CIT (1516 SNPs) blocks. In addition, the number of associated SNPs determined by the individual SNP approach was 541, as shown in Table 2. The number of significant SNPs discovered by only the SSLD method (450 SNPs) was greater than that by the CIT (159 SNPs), FGT (156 SNPs), and individual SNP (65 SNPs) methods, as shown in Fig. 4.
Table 2

Results of the individual SNP approach compared to all three block methods.

Chr no.Total no. of significant SNPs obtained by the individual SNP methodNo. of significant SNPs obtained by only the individual SNP methodNo. of significant SNPs obtained by all three block methodsNo. of significant SNPs obtained by all four methods
14381
22200
35370
45200
56482
643212916367
77320
8113141
9114167
105200
112100
123160
130000
1452111
153350
167500
1742110
183100
1952132
208531
217200
226311
Fig. 4

Number of RA biomarkers detected by each method – “all” biomarkers detected by the method or detected “only” by one method.

Results of the individual SNP approach compared to all three block methods. Number of RA biomarkers detected by each method – “all” biomarkers detected by the method or detected “only” by one method. Fig. 5 shows the associations across the entire genome, illustrating the big data challenge. The alternating colours (blue and red) distinguish between the end of one Chr and the start of the next Chr. The lower horizontal line in Fig. 5 represents the threshold for suggestive associations (−log10 (10−5)), while the higher line represents the genome-wide significance threshold (−log10 (5 × 10−8)). The associated SNPs are highlighted in green. As expected, most of the associated SNPs on Chr 6 showed highly significant associations with RA susceptibility (P-values < 0.0001). In contrast, none of the SNPs on Chr 13 showed any association with RA. Chr 6 contained most of the known genetic biomarkers for RA. The top SNP (rs660895) in the human leukocyte antigen (HLA) region (32,685,358 bp), representing the HLA-DRB1/HLA-DQA1, had the lowest P-value (1.03 × 10−113), as previously reported [22], [23], [24], [25].
Fig. 5

Manhattan plot showing the associations between the whole NARAC SNPs and RA susceptibility using the individual SNP approach. The genes with P-values lower than the genome-wide significance threshold are shown above the plot area.

Manhattan plot showing the associations between the whole NARAC SNPs and RA susceptibility using the individual SNP approach. The genes with P-values lower than the genome-wide significance threshold are shown above the plot area.

Discussion

In this study, 509,413 SNPs were used to test the association with RA susceptibility in the NARAC dataset. The examined SNPs belonged to twenty-two autosomes, providing a large data domain. The surveyed SNPs of the NARAC dataset were dense enough for examination by haplotype block methods. Four methods were applied to assign the associations (CIT, FGT, SSLD, and the individual SNP approach). The aim was to test the NARAC dataset to determine whether haplotype block methods or a single-locus approach alone can sufficiently identify the significant biomarkers associated with RA. This research failed to select the best method because each method resulted in significant findings that were not detected using any of the other methods. The individual SNP, CIT, FGT, and SSLD methods exclusively detected 65, 159, 156, and 450 SNPs respectively. Table S25 shows the SNP IDs that were uniquely identified by each method. These findings were in line with Shim et al.’s (although they did not test the SSLD method) conclusion that both the individual SNP approach and the haplotype block methods should be applied to discover valuable associations in the NARAC dataset [16]. As shown in Table 2, the 383 SNPs that were determined to be significantly associated with RA susceptibility by the individual SNP approach and the haplotype block methods represent good candidates for further investigation. In addition, 1021 RA-associated SNPs were detected by all three haplotype block methods and deserve greater attention. The SSLD method detected more significant SNPs (1831 SNPs) than the FGT (1551 SNPs), CIT (1516 SNPs), and individual SNP (541 SNPs) methods potentially because SSLD does not consider the LD between intermediate SNPs. Therefore, the SSLD method is the least conservative at including SNPs inside the haplotype blocks. The biomarkers identified by the individual SNP approach with P-values lower than the genome-wide significance threshold (shown in Fig. 5) are given in Table 3 with their corresponding haplotype blocks. Three hundred and twenty biomarkers from Chr six passed the genome-wide significance threshold (data not shown). The SNPs from Chrs 11, 13, 15, 19, and 21 failed to pass the genome-wide significance threshold. Five of the seven biomarkers from Chr 9 were members of a block that was detected by all three block methods. This finding emphasized the association of the PHF19-TRAF1-C5 region with RA [26].
Table 3

The highly significant SNPs (with P-values lower than the genome-wide significance threshold) discovered by the individual SNP approach with the corresponding haplotype blocks.

SNP IDChrPosition (bp)Assoc. AlleleaAAFb (Case, Control)P-valuecGene/Nearest GenesHaplotype Block (Method, P-valuec, No. of SNPs in Block)Haplotype Block Position (bp) (Start, End, Size)Previously Studied in
rs249329113,352,541G0.956, 0.8811.56 E-14PRDM16Not detected by any method[28]
rs24766011114,089,610A0.155, 0.0841.12 E-12PTPN22FGT, 8.5 E-13, 8114075501, 114132504, 57,004[22], [24], [25], [29], [30], [31], [32], [33]
CIT, 1.01 E-11, 10114050631, 114141503, 90,873
SSLD, 1.03 E-10, 33113787838, 114132504, 344,667
rs12467084237,860,221G0.994, 0.9641.12 E-09CDC42EP3/FAM82A1Not detected by any method
rs67526432198,949,233G0.989, 0.9562.94 E-09PLCL1/SATB2Not detected by any method
rs11915402358,957,115G0.995, 0.9568.43 E-13C3orf67FGT, 1.51 E-07, 2058754521, 59072633, 318,113
SSLD, 2.51 E-11, 958957115, 59057595, 100,481
rs512244412,775,151G0.195, 0.1253.7 E-09HS3ST1/HSP90AB2PNot detected by any method[22], [31]
rs176046704113,564,881G0.966, 0.9233.84 E-08TIFANot detected by any method
rs2278600571,792,426G0.930, 0.8653.22 E-10ZNF366Not detected by any method
rs65961475133,075,674G0.820, 0.7381.77 E-09FSTL4/C5orf15FGT, 3.51 E-06, 9133065358, 133094704, 29,347[32], [33], [34], [35]
CIT, 2.95 E-06, 9133057095, 133094704, 37,610
SSLD, 2.1 E-07, 6133075674, 133094129, 18,456
rs23068487129,556,365G0.990, 0.9485.95 E-12CPA4Not detected by any method
rs1830035763,170,795A0.996, 0.9631.47 E-11ZNF679SSLD, 3.6 E-11, 463138417, 63170795, 32,379
rs102754217100,536,496G0.991, 0.9608.12 E-09FIS1/RABL5SSLD, 7.17 E-08, 2100522057, 100536496, 14,440
rs117859958131,021,293G0.982, 0.9382.18 E-10FAM49BNot detected by any method
rs9785133820,402,898G0.916, 0.8603.9 E-08LZTS1/LOC286114FGT, 1.21 E-07, 620385189, 20404428, 19,240[34]
rs8728639123,233,908G0.993, 0.9402.25 E-16DENND1ANot detected by any method[36]
rs7854383981,666,969G0.959, 0.9061.42 E-09TLE1/FAM75D5FGT, 1.69 E-08, 281666969, 81670581, 3613[37]
CIT, 1.08 E-07, 281662684, 81666969, 4286
SSLD, 1.21 E-07, 381662684, 81670581, 7898
rs29001809120,785,936A0.390, 0.3036.24 E-09TRAF1/C5FGT, 4.66 E-08, 14120720054, 120810962, 90,909[26], [34], [36], [38], [39], [40], [41], [42], [43], [44]
CIT, 8.03 E-08, 8120720054, 120807548, 87,495
SSLD, 4.5 E-08, 12120720054, 120807548, 87,495
rs37618479120,769,793G0.468, 0.3801.24 E-08TRAF1FGT, 4.66 E-08, 14120720054, 120810962, 90,909[26], [34], [40], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51]
CIT, 8.03 E-08, 8120720054, 120807548, 87,495
SSLD, 4.5 E-08, 12120720054, 120807548, 87,495
rs8813759120,732,452A0.388, 0.3042.27 E-08PHF19/TRAF1FGT, 4.66 E-08, 14120720054, 120810962, 90,909[34], [36], [49], [52], [43], [53], [54]
CIT, 8.03 E-08, 8120720054, 120807548, 87,495
SSLD, 4.5 E-08, 12120720054, 120807548, 87,495
rs19531269120,720,054A0.387, 0.3042.76 E-08PHF19FGT, 4.66 E-08, 14120720054, 120810962, 90,909[34], [36], [43], [44], [48], [53], [54]
CIT, 8.03 E-08, 8120720054, 120807548, 87,495
SSLD, 4.5 E-08, 12120720054, 120807548, 87,495
rs107601309120,781,544G0.475, 0.3893.78 E-08TRAF1/C5FGT, 4.66 E-08, 14120720054, 120810962, 90,909[34], [36], [39], [40], [43], [44], [49], [53], [54], [55]
CIT, 8.03 E-08, 8120720054, 120807548, 87,495
SSLD, 4.5 E-08, 12120720054, 120807548, 87,495
rs491803710105,403,030G0.958, 0.8976.12 E-11SH3PXD2ANot detected by any method
rs26716921049,767,825A0.677, 0.5922.66 E-08WDFY4SSLD, 4.84 E-08, 649767825, 49777543, 9719[34], [35], [51], [53]
rs109991471071,550,864A0.976, 0.9394.16 E-08AIFM2FGT, 1.91 E-06, 271550196, 71550864, 669
rs47606091246,702,024C0.907, 0.8193 E-12COL2A1/SENP1FGT, 1.23 E-07, 346700325, 46703575, 3251
rs75712312119,263,543G0.943, 0.8881.72 E-08MSI1Not detected by any method
rs426432514104,050,531G0.997, 0.9731.94 E-08KIF26A/C14orf180FGT, 5.69 E-06, 8104045894, 104062173, 16,280
rs22923271682,588,153G0.516, 0.4051.16 E-09NECAB2Not detected by any method
rs2745106161,481,462G0.954, 0.9041.77 E-08PTX4/TELO2Not detected by any method
rs118687091773,740,166C0.817, 0.7147.38 E-11TMEM235Not detected by any method
rs80872521844,295,753G0.924, 0.8657.13 E-09ZBTB7C/CTIFNot detected by any method
rs60184322035,485,260G0.956, 0.8883.55 E-13SRC/BLCAPNot detected by any method[56]
rs11825312057,826,397C0.852, 0.7796.53 E-09PHACTR3FGT, 1 E-08, 257826397, 57832814, 6418[22], [31], [34], [35], [57]
SSLD, 1 E-08, 257826397, 57832814, 6418
rs130543552220,321,624G0.930, 0.8546.04 E-12SDF2L1FGT, 5.08 E-08, 720264229, 20321624, 57,396
CIT, 1.09 E-08, 320313153, 20321624, 8472
SSLD, 1.09 E-06, 320321624, 20346559, 24,936
rs10051332218,112,909G0.844, 0.7674.08 E-08SEPT5-GP1BB/TBX1FGT, 1.02 E-05, 218112175, 18112909, 735
CIT, 1.02 E-05, 218112175, 18112909, 735

Assoc. Allele: Associated Allele.

AAF: Associated Allele Frequency.

P-values are calculated based on the chi-squared test.

The highly significant SNPs (with P-values lower than the genome-wide significance threshold) discovered by the individual SNP approach with the corresponding haplotype blocks. Assoc. Allele: Associated Allele. AAF: Associated Allele Frequency. P-values are calculated based on the chi-squared test. In Table 3, the block sizes (in bp) – for the five biomarkers detected in the PHF19-TRAF1-C5 region – determined using the SSLD and CIT methods were the same. However, the SSLD block included more associated SNPs (12) than the CIT block (8), as depicted in Fig. 6. By further investigating this block, the four excluded SNPs by the CIT method were having MAFs less than 0.05 (a default condition in Haploview for the CIT method).
Fig. 6

Comparison for the CIT and SSLD methods on the same significant haplotype block in the PHF19-TRAF1-C5 region. (a) LD plot showing CIT block comprising eight biomarkers. (b) LD plot for SSLD block including twelve biomarkers.

Comparison for the CIT and SSLD methods on the same significant haplotype block in the PHF19-TRAF1-C5 region. (a) LD plot showing CIT block comprising eight biomarkers. (b) LD plot for SSLD block including twelve biomarkers. For the non-Chr 6 biomarkers shown in Table 3, these results were in line with those obtained by Eyre et al. [27] that verified the association of PTPN22 (rs2476601, P-value = 1.12 × 10−12) with RA for populations of European ancestry. Moreover, these two studies confirm the association of TRAF1 with RA, but for different SNPs. The detected biomarker in the present study was rs3761847 (P-value = 1.24 × 10−08), while rs10739580 (P-value = 1.7 × 10−06) was identified by Eyre et al. These two biomarkers are 163,211 bp apart from each other. A deeper view had been focused on the genes of the “never been reported” biomarkers in Table 3. Table 4 had been constructed using DAVID 6.8 to relate these genes to RA pathology and to link gene-disease associations. Ten genes were detected to play a role in RA pathology.
Table 4

Disease enrichment analysis for the genes of the “never been reported” biomarkers.

Gene nameRegionFunctional pathway related to RADiseases affected by the gene
CDC42EP32p21Induces pseudopodia formation in fibroblastsSchizophrenia [59]
FAM82A12p22.2Lung cancer [60]
PLCL12q33Affects the bone density and the level of osteocalcinOsteoporosis, hip bone size variation in females [61], intracranial aneurysm [62]
SATB22q33Affects the activity of osteoblasts and the differentiation of immunocytes, plays a role in immune regulation, and elevations in the level of alkaline phosphataseCleft palate [63], [64], microdeletion syndrome [65], head and neck squamous cell carcinoma [66], colorectal carcinoma [67], laryngeal carcinoma [68], osteosarcoma [69], pancreatic cancer [70], esophageal carcinoma [71], hepatocellular carcinoma [72], HIV/AIDS infection [73], renal cell carcinoma [74], neuroendocrine tumors [75]
C3orf673p14.2
TIFA4q25Plays a role in the activation of IL-1, TRAF6, and IKK, affects the activation of NF-kappa-B
ZNF3665q13.2Plays a role in regulating the expression of genes in response to estrogen, affects the differentiation of dendritic cells and the production of IL-4, IL-10, IL-12, and NF-kappa-BOsteoporosis [76], breast cancer [77], prostate cancer [78]
CPA47q32Benign hypertrophic prostate, prostate cancer [79]
ZNF6797q11.21
FIS17q22.1Alzheimer's disease [80], leukemia [81], thyroid tumors [82]
RABL57q22.1
FAM49B8q24.21Endometriosis [83]
SH3PXD2A10q24.33Affects the activity of osteoclastBreast cancer, melanoma [84], glioma [85], pre-eclampsia [86], lung adenocarcinoma [87], prostate cancer [88], colon cancer [89]
AIFM210q22.1Ovarian cancer, retinoblastoma [90]
COL2A112q13.11Plays a role in the activation of IL-6, Osteoarthritis, chondrodysplasia, epiphyseal dysplasia, joint deformity, spondyloepiphyseal dysplasiaStickler and Wagner syndromes [91], chondrosarcomas [92], osteonecrosis of the femoral head [93], pathological myopia [94], congenital toxoplasmosis [95], Czech dysplasia [96], Legg-Calvé-Perthes [97]
SENP112q13.1Plays a role in the activation of IL-6Prostate cancer [98], leukemia, hepatoma [99]
MSI112q24.1-q24.31Liver cancer, hepatoma, glioma and melanoma [100], neurodegenerative disorders [101], Helicobacter pylori infection [102], cervical carcinoma [103], endometriosis and endometrial carcinoma [104], medulloblastoma [105]
KIF26A14q32.33
C14orf18014q32.33
NECAB216q23.3
PTX416p13.3
TELO216p13.3Glioma [106], intellectual disability [107], You-Hoover-Fong syndrome [108]
TMEM23517q25.3Cataract [109]
ZBTB7C18q21.1Sepsis [110], kidney cancer [111], cerebral ischemia [112]
CTIF18q21.1Hearing function [113]
SDF2L122q11.21Insulinoma [114]
SEPT522q11.21Involved in cytokinesisJuvenile parkinsonism [115], pancreatic neoplasm [116], vitreoretinopathy [117], Parkinson's disease [118]
GP1BB22q11.21-q11.23Bernard-Soulier syndrome [119], Velocardiofacial syndrome [120], developmental delay, cardiac defects, dysmorphic facial features, palatal anomalies, hypocalcemia, and immune deficiency [121]
TBX122q11.21expands T lymphocytes activity, affects the activity of fibroblastic growth factorDiGeorge syndrome, pharyngeal and aortic arch defects [122], Velocardiofacial syndrome [123], psychiatric disorders [124], lung tumor [125], Tetralogy of Fallot [126], Conotruncal heart defects [127], ventricular septal defect [128], renal malformations [129], adenoid cystic carcinoma [130], cleft palate [131], indirect inguinal hernia [132], prostate cancer [133]
Disease enrichment analysis for the genes of the “never been reported” biomarkers. As shown in Table 4, TBX1 played a role in RA pathology through its immunological function. A study by Meziani et al. confirmed the association of TBX1 (rs4819522, P-value = 0.0014) with RA in both Japanese and Europeans using a meta-analysis [58]. The identified SNP in the present study (rs1005133, P-value = 4.08 × 10−08) was in a close proximity with the SNP obtained by Meziani et al. (28,427 bp). As shown in Table 3, rs1005133 was in a block with another SNP (rs5993820) detected by CIT and FGT methods. An LD plot was performed for the region that contained these two SNPs for unravelling other associations in that region from Chr 22. As depicted in Fig. 7, rs4819522 was neither in strong LD with rs1005133 (D′ = 0.2, r2 = 0.035) nor with rs5993820 (D′ = 0.411, r2 = 0.021).
Fig. 7

LD plot for the TBX1 region showing a biomarker in this study (rs1005133) and a previously detected biomarker (rs4819522).

LD plot for the TBX1 region showing a biomarker in this study (rs1005133) and a previously detected biomarker (rs4819522). The block similarity for the three applied methods of haplotype block partitioning are shown in Table 5. The similarity measure represents the SNPs detected by both methods in question divided by the total SNPs detected by the two methods. The highest block similarity was between CIT and FGT (mean ± SD = 0.464 ± 0.286). The block similarity between FGT and SSLD (mean ± SD = 0.21 ± 0.216) was nearly equal to that between CIT and SSLD (mean ± SD = 0.205 ± 0.193). The significance of these similarities was checked using one-way ANOVA with a post hoc t-test. The significance level for the three methods after Bonferroni correction was 0.0167 (0.05/3). The difference between (FGT and SSLD) and (CIT and SSLD) was not statistically significant (P-value = 0.936). The differences between (CIT and FGT) and (CIT and SSLD) and between (FGT and SSLD) and (FGT and CIT) were statistically significant (P-values = 0.001 and 0.002, respectively).
Table 5

Block similarity among the haplotype block methods for the twenty-two Chrs.

Chr no.CIT vs FGTFGT vs SSLDSSLD vs CIT
188%21%23%
239%0%0%
334%45%20%
4100%0%0%
540%21%30%
676%74%71%
79%32%6%
839%30%34%
949%29%25%
100%0%0%
1153%0%0%
1274%18%21%
1371%0%0%
1417%36%24%
1539%33%23%
160%0%54%
1752%51%35%
180%0%0%
1964%52%43%
2050%18%27%
2175%0%11%
2253%2%4%
Block similarity among the haplotype block methods for the twenty-two Chrs. As shown in Table 6, the SSLD method provided the best coverage of the hits obtained with the individual SNP approach, with 444 SNPs from 541 SNPs. The FGT method detected 432 SNPs, and the CIT method detected 415 SNPs. However, after excluding the hits on Chr 6, the FGT method was the best, detecting 45 out of 109 SNPs, and the CIT method (34 SNPs) performed better than the SSLD method (29 SNPs). The significance of the coverage by the three block methods of the hits obtained with the individual SNP approach was checked using one-way ANOVA with a post hoc t-test. The mean ± SD of the number of hits for CIT, FGT, and SSLD methods were 18.864 ± 80.909, 19.636 ± 82.071, and 20.182 ± 88.199, respectively. The significance level for the three methods after Bonferroni correction was 0.0167 (0.05/3). The difference among the three groups determined using ANOVA was not statistically significant (P-value = 0.999).
Table 6

The ability of each haplotype block method to capture the significant SNPs the determined with individual SNP approach.

Chr no.Individual SNPCITFGTSSLD
14111
22000
35121
45330
56222
6432381387415
77023
811662
911777
105012
112110
123011
130000
145221
153000
167011
174121
183020
195223
208133
217540
226231
The ability of each haplotype block method to capture the significant SNPs the determined with individual SNP approach. Most of the haplotype blocks that showed a high relationship with RA were in or near (+3 Mb) the major histocompatibility complex (MHC) region. Most of the 1021 SNPs detected by the three block methods were in the MHC region. These outcomes confirmed the firm association between the MHC region and RA susceptibility. Some associated SNPs were determined using all the methods, but others were observed by only one method. These differences could be due to several reasons. For the associations observed using only the individual SNP approach, it may be that only one SNP represents strong LD with the causal SNP. Therefore, studying haplotypes could decrease the power of association because they consist of several SNPs. For the associations observed using only the haplotype block methods, the individual SNP approach required approximately 81.71% more tests than the block methods. Consequently, the Bonferroni correction was more severe for the individual SNP approach. The block methods were able to detect the interactions among many causal SNPs. In addition, haplotypes could capture rare alleles that may not be reflected by individual SNPs. The reason for this difference could be that the power to observe associations is maximized when the frequencies of the studied biomarker and the causal SNP are similar. Some associations were observed using one but not the other haplotype block methods because each method differs greatly in its scope of the definition of a haplotype block. The limitations of this study are as follows: (a) the effects of population stratification were not accounted for; (b) a replication study in other datasets was not performed; and (c) other haplotype block methods, such as those based on hidden Markov models [134], [135], dynamic programming-based algorithms [136], [137], [138], [139], [140], wavelet decomposition [141], greedy algorithms [142], the minimum description length principle [143], [144], spatial correlation of SNPs [145], sequence kernel association tests [146], and block entropy [147], were not included.

Conclusions

Applying the individual SNP approach and the three block methods to the NARAC dataset will in turn maximize the system’s ability to discover crucial associations. In terms of selecting a method, SSLD would be the most appropriate for the NARAC dataset. The SSLD method has valuable advantages such as the highest genomic coverage; the largest minimum, median, and maximum significant block sizes; the highest number of significant SNPs included in blocks; and the highest number of associated SNPs discovered exclusively by a single method. In total, 355 SNPs showed a P-value lower than the genome-wide significance threshold. Among them (after excluding Chr 6 results – 320 SNPs), 20 SNPs corresponding to 29 genes were not detected before for the RA susceptibility. Reviewing the literature, 10 genes from these 29 genes, namely, CDC42EP3, PLCL1, SATB2, TIFA, ZNF366, SH3PXD2A, COL2A1, SENP1, SEPT5, and TBX1, played a role in RA pathogenesis. As a future perspective, a replication study should be conducted to confirm the GWAS findings.

Conflict of interest

The authors have declared no conflict of interest.

Compliance with Ethics Requirements

This article does not contain any studies with human or animal subjects.
  3 in total

1.  Genome-Wide Association Study of Fluorescent Oxidation Products Accounting for Tobacco Smoking Status in Adults from the French EGEA Study.

Authors:  Laurent Orsi; Patricia Margaritte-Jeannin; Miora Andrianjafimasy; Orianne Dumas; Hamida Mohamdi; Emmanuelle Bouzigon; Florence Demenais; Régis Matran; Farid Zerimech; Rachel Nadif; Marie-Hélène Dizier
Journal:  Antioxidants (Basel)       Date:  2022-04-20

Review 2.  Genetics of rheumatoid arthritis.

Authors:  Leonid Padyukov
Journal:  Semin Immunopathol       Date:  2022-01-27       Impact factor: 9.623

3.  Genetic architecture of type 1 diabetes with low genetic risk score informed by 41 unreported loci.

Authors:  Hui-Qi Qu; Jingchun Qu; Jonathan Bradfield; Luc Marchand; Joseph Glessner; Xiao Chang; Michael March; Jin Li; John J Connolly; Jeffrey D Roizen; Patrick Sleiman; Constantin Polychronakos; Hakon Hakonarson
Journal:  Commun Biol       Date:  2021-07-23
  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.