| Literature DB >> 29892656 |
Linnea Thörnqvist1, Mats Ohlin1.
Abstract
The highly variable complementary determining region 3 (CDR3) of antibodies is generated through recombination of immunoglobulin heavy chain variable (IGHV), diversity, and joining genes. The codons encoding the first residues of CDR3 may be derived directly from the IGHV germline gene but they may also be generated as part of the rearrangement process. Data of the nucleotide composition of these codons of rearranged genes, an indicator of the degree of contribution of the IGHV gene to CDR3 diversity, are presented in this article. Analyzed data are presented for two unrelated sets of raw sequence data. The raw data sets consisted of sequences of antibody heavy chain-encoding transcripts of six allergic subjects (European Nucleotide Archive accession number PRJEB18926), and paired antibody heavy and light chain variable region-encoding transcripts of memory B cells of three subjects (European Nucleotide Archive accession numbers SRX709625, SRX709626, and SRX709627). The nucleotide compositions of the corresponding 5'-ends of sequences encoding the CDR3 are presented for transcripts with an origin in 47 different IGHV alleles. These data have been used (Thörnqvist and Ohlin, 2018) [1] to demonstrate the extent of incorporation of the 3' most bases of IGHV germline genes into rearranged immunoglobulin encoding sequences, and the extent whereby any difference in incorporation affects the specificity of inference of the 3'-end of IGHV genes from immunoglobulin-encoding transcripts. They have also been used to assess the effect of observed gene differences on the composition of the ascending strand of CDR3 associated to antibodies with an origin in different IGHV genes (Thörnqvist and Ohlin, 2018) [1].Entities:
Year: 2018 PMID: 29892656 PMCID: PMC5992955 DOI: 10.1016/j.dib.2018.04.125
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Distribution of bases in the first three codons of 47 genes/alleles encoding CDR3 of antibody heavy chains in the main examined data set [2], [3] and in an unrelated data set [4]. For the latter data set, only transcripts that were exclusively inferred to one germline gene/allele were used. IGHV1–2*02 T163C (†) would be inferred as either IGHV1–2*02 or IGHV1–2*05, and could thus not be evaluated with the used method. IGHV3–30*03 (¶) and IGHV3–30*18 are identical in the part of the sequence that is inferred by the used approach, but differ in codon 106 where they carry an AGA and an AAA trimer, respectively. Hence, transcripts that herein have been inferred as derived from IGHV3–30*03 more likely originates from IGHV3–30*18, since they predominantly incorporated an AAA trimer in codon 106. The number of subjects used for analysis varies between 3 and 6 in the main data set and 0 and 3 in the unrelated data set (Table 1).
Fig. 2Examples of position of and potential polar interactions made by the side chain of H chain V domain residue 107. Carbon atoms of the side chain of residue 107 are highlighted in yellow and those of the side chain of other residues are highlighted in green. The backbone of H chain CDR3 is shown in light blue.
Number of subjects in which the number of transcript entries exceeded the cut-off value.
| Main Data Set | Unrelated Data Set | |
|---|---|---|
| Number of subjects with > 500 entries | Number of subjects with > 250 entries | |
| 5 | 3 | |
| 3 | Not evaluated | |
| 4 | 2 | |
| 6 | 2 | |
| 6 | 3 | |
| 4 | 3 | |
| 6 | 0 | |
| 6 | 0 | |
| 3 | 0 | |
| 3 | 0 | |
| 3 | 1 | |
| 6 | 1 | |
| 3 | 0 | |
| 6 | 3 | |
| 3 | 2 | |
| 6 | 3 | |
| 6 | 2 | |
| 3 | 1 | |
| 6 | 3 | |
| 6 | 3 | |
| 6 | 0 | |
| 6 | 0 | |
| 5 | 1 | |
| 6 | 0 | |
| 4 | 2 | |
| 4 | 2 | |
| 3 | 1 | |
| 5 | 2 | |
| 5 | 2 | |
| 3 | 0 | |
| 3 | 1 | |
| 6 | 3 | |
| 6 | 3 | |
| 4 | 2 | |
| 4 | 2 | |
| 6 | 0 | |
| 6 | 0 | |
| 6 | 3 | |
| 3 | 0 | |
| 5 | 3 | |
| 6 | 3 | |
| 3 | 1 | |
| 6 | 2 | |
| 3 | 1 | |
| 6 | 2 | |
| 5 | 2 | |
| 3 | 1 |
The cut-off value was set to 500 entries for the main data set [2], [3] and to 250 entries for the unrelated data set [4]. For the latter, only transcripts that were exclusively inferred to a single germline allele were used.
| Subject area | Biology |
| More specific subject area | Immunobiology |
| Type of data | Figures, table |
| How data was acquired | Next generation sequencing (MiSeq, Illumina) |
| Data format | Analyzed |
| Experimental factors | Extraction of peripheral blood mononuclear cell RNA, construction of libraries encoding antibody heavy chain variable domains |
| Experimental features | Analysis of the nucleotide composition in the three most 5′ codons of the CDR3 of immunoglobulin heavy chain |
| Data source location | Lund, Sweden |
| Data accessibility | Analyzed data are available within this article. Raw data generated by us |