| Literature DB >> 33722161 |
Konrad Krawczyk1, Andrew Buchanan2, Paolo Marcatili3.
Abstract
The patent literature should reflect the past 30 years of engineering efforts directed toward developing monoclonal antibody therapeutics. Such information is potentially valuable for rational antibody design. Patents, however, are designed not to convey scientific knowledge, but to provide legal protection. It is not obvious whether antibody information from patent documents, such as antibody sequences, is useful in conveying engineering know-how, rather than as a legal reference only. To assess the utility of patent data for therapeutic antibody engineering, we quantified the amount of antibody sequences in patents destined for medicinal purposes and how well they reflect the primary sequences of therapeutic antibodies in clinical use. We identified 16,526 patent families covering major jurisdictions (e.g., US Patent and Trademark Office (USPTO) and World Intellectual Property Organization) that contained antibody sequences. These families held 245,109 unique antibody chains (135,397 heavy chains and 109,712 light chains) that we compiled in our Patented Antibody Database (PAD, http://naturalantibody.com/pad). We find that antibodies make up a non-trivial proportion of all patent amino acid sequence depositions (e.g., 11% of USPTO Full Text database). Our analysis of the 16,526 families demonstrates that the volume of patent documents with antibody sequences is growing, with the majority of documents classified as containing antibodies for medicinal purposes. We further studied the 245,109 antibody chains from patent literature to reveal that they very well reflect the primary sequences of antibody therapeutics in clinical use. This suggests that the patent literature could serve as a reference for previous engineering efforts to improve rational antibody design.Entities:
Keywords: Patents; data mining; therapeutic Antibodies
Year: 2021 PMID: 33722161 PMCID: PMC7971238 DOI: 10.1080/19420862.2021.1892366
Source DB: PubMed Journal: MAbs ISSN: 1942-0862 Impact factor: 5.857
Published biological sequences and proportion thereof identified as antibody chains. We extracted raw sequences from USPTO (divided between the full text, FT, and long listing repository PSIPS), DDBJ, WIPO and EBI. The total number of raw sequences is given in column Total Raw. Of these we show how many were identified by ANARCI as containing an antibody chain (column Ab-identified). In the column “% Total” we report the proportion of identified antibody sequences out of the total of raw sequences. Both Total Raw and Ab-identified columns report the redundant number of sequences so as to exemplify the volume of antibody depositions in patent sequences – we report the number of unique heavy (H) and light (L) chains in the parentheses in column “Ab-identified”
| Source | Sequence Type | Total Raw | Ab-identified | %Total |
|---|---|---|---|---|
| USPTO FT | Amino Acid | 5,534,127 | 606,036 | 10.95 |
| Nucleotide | 7,068,248 | 229,547 | 3.24 | |
| USPTO PSIPS | Amino Acid | 25,527,942 | 470,317 | 1.84 |
| Nucleotide | 176,840,912 | 376,567 | 0.21 | |
| DDBJ | Amino Acid | 4,412,209 | 533,762 | 12.09 |
| Nucleotide | 44,968,142 | 413,485 | 0.91 | |
| WIPO | Amino Acid | 10,275,174 | 435,218 | 4.23 |
| Nucleotide | 13,490,560 | 160,542 | 1.19 | |
| EBI | Amino Acid | 10,368,431 | 713,620 | 6.88 |
| Nucleotide | 12,349,772 | 38,366 | 0.31 |
Subclasses of the patent classifications. Most common subclasses associated with patents including antibody sequences according to the Cooperative Patent Classification (CPC, https://www.cooperativepatentclassification.org/). There were 15,951 patents containing antibodies with CPC classification and the percentage of families in each class is expressed as a proportion of this number
| Class | Total families (%) | Description |
|---|---|---|
| C07K16 | 13,790 (86.4) | Immunoglobulins [IGs], e.g. monoclonal or polyclonal antibodies (antibodies with enzymatic activity, e.g. abzymes |
| C07K2317 | 12,001 (75.2) | Immunoglobulins specific features |
| A61K39 | 9,459 (59.3) | Medicinal preparations containing antigens or antibodies |
| C07K2319 | 3,451 (21.6) | Fusion polypeptide |
| G01N33 | 3,105 (19.4) | Investigating or analyzing materials by specific methods not covered by groups G01N1/00 – G01N31/00 |
| C07K14 | 3,037 (19.0) | Peptides having more than 20 amino acids; Gastrins; Somatostatins; Melanotropins; Derivatives thereof |
| A61K47 | 2,392 (14.9) | Medicinal preparations characterized by the non-active ingredients used, e.g. carriers or inert additives; Targeting or modifying agents chemically bound to the active ingredient |
| A61K38 | 2,058 (12.9) | Medicinal preparations containing peptides |
| C12N15 | 1,972 (12.3) | Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor |
| A61P35 | 1,900 (11.9) | Specific therapeutic activity of chemical compounds or medicinal preparations |
| A61K45 | 1,671 (10.4) | Medicinal preparations containing active ingredients not provided for in groups |
| A61K31 | 1,415 (8.8) | Medicinal preparations containing organic active ingredients |
| G01N2333 | 1,329 (8.3) | Assays involving biological materials from specific organisms or of a specific nature |
| C12N5 | 816 (5.1) | Undifferentiated human, animal or plant cells, e.g. cell lines; Tissues; Cultivation or maintenance thereof; Culture media therefor |
Top 30 targets in patent documents. We extracted the targets of the antibodies in patent documents and present top 30 ranked by the number of families where they were mentioned. For each target, we show the number of patent families mentioning the target (#Families), the number of therapeutics on the market/in the clinic against it (#Therapeutics) and the cumulative number of therapeutics covered by the top targets (#Therapeutics cumulative)
| Rank | Target | #Families | #Therapeutics | #Therapeutics (Cumulative) |
|---|---|---|---|---|
| 1 | pd1 | 284 | 20 | 20 |
| 2 | cd3 | 221 | 20 | 40 |
| 3 | her1 | 190 | 17 | 57 |
| 4 | pdl1 | 189 | 12 | 69 |
| 5 | tnfa | 185 | 6 | 75 |
| 6 | her2 | 175 | 9 | 84 |
| 7 | cd20 | 169 | 14 | 98 |
| 8 | influenza | 151 | 5 | 103 |
| 9 | cmet | 136 | 4 | 107 |
| 10 | vegfa | 135 | 7 | 114 |
| 11 | amyloid beta | 129 | 8 | 122 |
| 12 | hiv | 123 | 2 | 124 |
| =13 | il6 | 102 | 7 | 131 |
| =13 | cd40 | 102 | 9 | 140 |
| =13 | cd19 | 102 | 6 | 146 |
| 14 | ctla4 | 97 | 6 | 152 |
| =15 | il17 | 92 | 8 | 160 |
| =15 | igf1r | 92 | 6 | 166 |
| 16 | pcsk9 | 89 | 8 | 174 |
| 17 | her3 | 87 | 6 | 180 |
| =18 | rsv | 73 | 5 | 185 |
| =18 | cd38 | 73 | 5 | 190 |
| =19 | tau | 68 | 5 | 195 |
| =19 | lag3 | 68 | 7 | 202 |
| 20 | ox40 | 65 | 6 | 208 |
| 21 | bcma | 64 | 1 | 209 |
| =22 | il23 | 63 | 5 | 214 |
| =22 | cd47 | 63 | 2 | 216 |
| 23 | ang2 | 62 | 2 | 218 |
| 24 | vegfr2 | 56 | 5 | 223 |
Figure 1.Target usage in patent documents reporting antibody sequences. (a) Relationship between number of patent families per target and the earliest mention of the target in patent documents containing antibodies. For each target, we noted the earliest date among patent documents citing it and grouped these into 4-year intervals. Within each interval we noted the total number of patent families for a given target and plotted the aggregate for each time interval. (b) For each 4-year interval, we plot the number of new target names that were first introduced in a patent document at that time
Figure 2.The volume of patent family documents listing antibody sequences per year. For each patent family we noted the earliest and most recent dates of any documents associated with it and the aggregate numbers of these are given by red and blue bars, respectively. The apparent low activity in 2020 can be attributed to the fact that data contributed in 2020 only account for January that year
Most common V-region gene species antibodies from patents aligned to. Antibodies from patent documents destined for medicinal indications (MedPatAb) were aligned to 15 IMGT-derived[25] V region germlines from human, mouse, alpaca, rhesus, rabbit, rat, pig, cow, macaque, zebrafish, trout, salmon, dog, horse and chicken. We noted the number of patent sequences that aligned to the given species germline (#Unique Sequences) and the number of patent families (#Patent Families) these originated from
| | Per sequence | Per family | ||||
|---|---|---|---|---|---|---|
| HEAVY CHAIN | Organism | #Unique Sequences | Percentage | Organism | #Patent Families | Percentage |
| Human | 67754 | 72.80 | Human | 7070 | 75.69 | |
| Mouse | 14326 | 15.39 | Mouse | 4874 | 52.18 | |
| Alpaca | 7047 | 7.57 | Macaque | 485 | 5.19 | |
| Rabbit | 1313 | 1.41 | Horse | 473 | 5.06 | |
| Macaque | 1035 | 1.11 | Alpaca | 403 | 4.31 | |
| Horse | 799 | 0.85 | Rabbit | 256 | 2.74 | |
| Chicken | 417 | 0.44 | Chicken | 46 | 0.49 | |
| Dog | 291 | 0.31 | Dog | 28 | 0.29 | |
| Rhesus | 43 | 0.04 | Rhesus | 23 | 0.24 | |
| Cow | 30 | 0.03 | Cow | 11 | 0.11 | |
| Pig | 9 | ~0 | Pig | 9 | 0.09 | |
| Rat | 2 | ~0 | Rat | 3 | 0.03 | |
| Salmon | 1 | ~0 | Salmon | 2 | 0.02 | |
| LIGHT CHAIN | Organism | #Unique Sequences | Percentage | Organism | #Families | Percentage |
| Human | 45828 | 67.72 | Human | 6312 | 69.76 | |
| Mouse | 13320 | 19.68 | Mouse | 4880 | 53.93 | |
| Rhesus | 5333 | 7.88 | Rhesus | 2238 | 24.73 | |
| Rabbit | 1438 | 2.12 | Rat | 361 | 3.98 | |
| Rat | 778 | 1.14 | Rabbit | 240 | 2.65 | |
| Chicken | 505 | 0.74 | Chicken | 46 | 0.5 | |
| Dog | 220 | 0.32 | Cow | 31 | 0.34 | |
| Cow | 213 | 0.31 | Dog | 21 | 0.23 | |
| Pig | 17 | 0.02 | Pig | 7 | 0.07 | |
| Horse | 15 | 0.02 | Horse | 7 | 0.07 | |
Top 20 most common human V-region genes antibodies from patents aligned to. For each patent antibody sequence for medicinal applications (MedPatAb) that aligned to human germline V-regions, we noted the IMGT V-region gene. We show the number of unique sequences that aligned to a given human V-region gene (Per Sequence) and number of patent families these originated from (Per Family). We also show the number of therapeutic antibody sequences in clinical use that align to the given V-region gene (Per Therapeutic)
| | Per sequence | Per family | Per therapeutic | ||||||
|---|---|---|---|---|---|---|---|---|---|
| HEAVY CHAIN | Gene | #Sequences | Percentage | Gene | #Families | Percentage | Gene | #Sequences | Percentage |
| IGHV3-23 | 17140 | 25.29 | IGHV3-23 | 2572 | 15.56 | IGHV3-23 | 77 | 16.38 | |
| IGHV1-2 | 6206 | 9.15 | IGHV1-69 | 1369 | 8.28 | IGHV1-69 | 39 | 8.29 | |
| IGHV1-69 | 5334 | 7.87 | IGHV3-30 | 1311 | 7.93 | IGHV1-46 | 38 | 8.08 | |
| IGHV3-30 | 4501 | 6.64 | IGHV1-46 | 1136 | 6.87 | IGHV3-33 | 26 | 5.53 | |
| IGHV1-46 | 3840 | 5.66 | IGHV1-2 | 1076 | 6.51 | IGHV3-48 | 21 | 4.46 | |
| IGHV3-33 | 2508 | 3.7 | IGHV3-33 | 959 | 5.8 | IGHV3-30 | 21 | 4.46 | |
| IGHV1-18 | 2445 | 3.6 | IGHV3-66 | 945 | 5.71 | IGHV1-2 | 21 | 4.46 | |
| IGHV1-3 | 1774 | 2.61 | IGHV1-18 | 801 | 4.84 | IGHV1-18 | 19 | 4.04 | |
| IGHV3-66 | 1725 | 2.54 | IGHV1-3 | 770 | 4.65 | IGHV3-66 | 18 | 3.82 | |
| IGHV5-51 | 1590 | 2.34 | IGHV4-59 | 762 | 4.61 | IGHV1-3 | 18 | 3.82 | |
| IGHV4-59 | 1553 | 2.29 | IGHV3-7 | 696 | 4.21 | IGHV3-7 | 14 | 2.97 | |
| IGHV3-48 | 1356 | 2 | IGHV3-48 | 679 | 4.1 | IGHV5-51 | 13 | 2.76 | |
| IGHV4-4 | 1260 | 1.85 | IGHV5-51 | 640 | 3.87 | IGHV3-74 | 13 | 2.76 | |
| IGHV7-4-1 | 1185 | 1.74 | IGHV3-9 | 548 | 3.31 | IGHV4-59 | 12 | 2.55 | |
| IGHV3-7 | 1136 | 1.67 | IGHV3-21 | 519 | 3.14 | IGHV7-4-1 | 10 | 2.12 | |
| IGHV3-21 | 1104 | 1.62 | IGHV4-4 | 499 | 3.01 | IGHV3-9 | 10 | 2.12 | |
| IGHV3-9 | 1060 | 1.56 | IGHV4-34 | 423 | 2.55 | IGHV4-4 | 9 | 1.91 | |
| IGHV3-15 | 1011 | 1.49 | IGHV3-74 | 397 | 2.4 | IGHV4-39 | 8 | 1.7 | |
| IGHV3-11 | 917 | 1.35 | IGHV3-11 | 392 | 2.37 | IGHV4-34 | 8 | 1.7 | |
| IGHV4-31 | 894 | 1.31 | IGHV7-4-1 | 357 | 2.16 | IGHV2-70 | 8 | 1.7 | |
| LIGHT CHAIN | Gene | #Sequences | Percentage | Gene | #Families | Percentage | Gene | #Sequences | Percentage |
| IGKV1-39 | 6709 | 14.63 | IGKV1-39 | 2123 | 12.84 | IGKV1-39 | 70 | 18.42 | |
| IGKV3-20 | 3882 | 8.47 | IGKV3-11 | 1504 | 9.1 | IGKV3-11 | 48 | 12.63 | |
| IGLV1-51 | 2997 | 6.53 | IGKV3-20 | 1335 | 8.07 | IGKV3-20 | 35 | 9.21 | |
| IGKV3-11 | 2753 | 6 | IGKV4-1 | 1069 | 6.46 | IGKV4-1 | 23 | 6.05 | |
| IGKV4-1 | 2484 | 5.42 | IGKV1-33 | 789 | 4.77 | IGKV1-16 | 19 | 5 | |
| IGKV3-15 | 1811 | 3.95 | IGKV2-28 | 777 | 4.7 | IGKV1-33 | 18 | 4.73 | |
| IGKV1-5 | 1722 | 3.75 | IGKV1-16 | 690 | 4.17 | IGKV3-15 | 15 | 3.94 | |
| IGKV1-12 | 1627 | 3.55 | IGKV1-5 | 669 | 4.04 | IGKV1-12 | 12 | 3.15 | |
| IGKV1-33 | 1532 | 3.34 | IGKV1-12 | 669 | 4.04 | IGKV1-5 | 11 | 2.89 | |
| IGLV3-19 | 1479 | 3.22 | IGKV3-15 | 633 | 3.83 | IGLV1-40 | 10 | 2.63 | |
| IGKV2-28 | 1427 | 3.11 | IGLV2-14 | 561 | 3.39 | IGKV2-30 | 9 | 2.36 | |
| IGLV1-47 | 1377 | 3 | IGKV1-27 | 558 | 3.37 | IGKV2-29 | 9 | 2.36 | |
| IGLV1-44 | 1367 | 2.98 | IGLV3-1 | 454 | 2.74 | IGKV1-13 | 9 | 2.36 | |
| IGLV2-14 | 1310 | 2.85 | IGLV1-44 | 427 | 2.58 | IGLV3-21 | 8 | 2.1 | |
| IGLV3-1 | 1264 | 2.75 | IGKV2-30 | 418 | 2.52 | IGKV2-28 | 8 | 2.1 | |
| IGKV1-17 | 1123 | 2.45 | IGLV3-21 | 412 | 2.49 | IGKV1-27 | 7 | 1.84 | |
| IGLV1-40 | 1089 | 2.37 | IGLV1-47 | 409 | 2.47 | IGKV1-17 | 7 | 1.84 | |
| IGKV1-16 | 1076 | 2.34 | IGLV1-40 | 407 | 2.46 | IGLV1-47 | 6 | 1.57 | |
| IGLV3-21 | 1007 | 2.19 | IGLV3-19 | 371 | 2.24 | IGKV1-NL1 | 6 | 1.57 | |
| IGKV2-30 | 812 | 1.77 | IGKV1-17 | 370 | 2.23 | IGLV3-19 | 5 | 1.31 | |
Figure 3.Closest matches of antibody sequences from patents to therapeutic antibodies. For each sequence in AllPatAb and AllPatMed we noted the closest IMGT sequence identity to a therapeutic antibody. (a) Distribution of heavy chain sequence identities to closest therapeutic heavy chain. (b) Distribution of light chain sequence identities to closest therapeutic light chain
Figure 4.Distribution of CDR-H3 lengths. We plotted the distribution of CDR-H3 lengths from therapeutic antibodies (Antibody Therapeutics) and all antibodies from patents in PAD (Patents, AllPatAb)
Figure 5.Patents including single domain antibody sequences over time. For each of the 586 patent families in PAD identified as having sdAbs and classified as containing antibodies for medicinal purposes, we noted the earliest and most recent dates, given as red and blue bars, respectively
Volume of patent families corresponding to different engineering properties of antibodies. Each engineering property of antibodies is associated with CPC patent classes and keywords
| Property | #Patent families (%) | keywords | classes |
|---|---|---|---|
| HUMANIZATION | 28% | Humaniz* | A01K2207/15:Humanized animals |
| C07K2317/24:Containing regions, domains or residues from different species, e.g. chimeric, humanized or veneered | |||
| GENERAL | 24% | n/a | C07K2317/35:Valency |
| C07K2317/94:Stability, e.g. half-life, pH, temperature or enzyme-resistance | |||
| C07K2317/41:Glycosylation, sialylation, or fucosylation | |||
| C07K2317/732:Antibody-dependent cellular cytotoxicity [ADCC] | |||
| C07K2317/734:Complement-dependent cytotoxicity [CDC] | |||
| C07K2317/33:Crossreactivity, e.g. for species or epitope, or lack of said crossreactivity | |||
| SCFV | 17% | scfv, single chain variable | C07K2317/622:Single chain antibody (scFv) |
| FC-ENGINEERING | 15% | Fc,Fragment crystallizable | C07K2317/72:Increased effector function due to an Fc-modification |
| C07K2317/52:Constant Fc region, isotype | |||
| C07K2319/30:Non-immunoglobulin-derived peptide or protein having an immunoglobulin constant or Fc region, or a fragment thereof, attached thereto | |||
| C07K2317/526:CH3 domain | |||
| C07K2317/64:Comprising a combination of variable region and constant region components | |||
| C07K2317/53:Hinge | |||
| MULTISPECIFICS | 13% | Bispecific, trispecific | C07K16/468:Immunoglobulins having two or more different antigen binding sites, e.g. multifunctional antibodies |
| C07K2317/31:Multispecific | |||
| FRAGMENTS | 12% | Fab,Fab( | C07K2317/55:Fab or Fab’ |
| C07K2317/54:F(ab)2 | |||
| FUSION POLYPEPTIDE | 10% | Fusion polypeptide | C07K2319/00:Fusion polypeptide |