| Literature DB >> 32632108 |
Tianyi Qiu1,2, Jingxuan Qiu3, Yiyan Yang2, Lu Zhang2, Tiantian Mao2, Xiaoyan Zhang1, Jianqing Xu4, Zhiwei Cao5.
Abstract
Antigenicity measurement plays a fundamental role in vaccine design, which requires antigen selection from a large number of mutants. To augment traditional cross-reactivity experiments, computational approaches for predicting the antigenic distance between multiple protein antigens are highly valuable. The performance of in silico models relies heavily on large-scale benchmark datasets, which are scattered among public databases and published articles or reports. Here, we present the first benchmark dataset of protein antigens with experimental evidence to guide in silico antigenicity calculations. This dataset includes (1) standard haemagglutination-inhibition (HI) tests for 3,867 influenza A/H3N2 strain pairs, (2) standard HI tests for 559 influenza virus B strain pairs, and (3) neutralization titres derived from 1,073 Dengue virus strain pairs. All of these datasets were collated and annotated with experimentally validated antigenicity relationships as well as sequence information for the corresponding protein antigens. We anticipate that this work will provide a benchmark dataset for in silico antigenicity prediction that could be further used to assist in epidemic surveillance and therapeutic vaccine design for viruses with variable antigenicity.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32632108 PMCID: PMC7338539 DOI: 10.1038/s41597-020-0555-y
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Illustration of benchmark data collection. (a) Benchmark data for influenza virus. The HI-test data for both IAV A/H3N2 and IBV were collected from reports of international organizations and published articles with pre-processed antigenic distances. The sequence data of HA proteins were collected from multiple virus databases. (b) Benchmark data of DENV. Antisera data were collected from African green monkeys, and envelope protein sequences were collected from NCBI virus databases.
Fig. 2Data records of the benchmark dataset. (a) Data records of the HI values and haemagglutinin sequences of IAV A/H3N2. (b) Data records of the HI values and haemagglutinin sequences of IBV. (c) Data records of the neutralization titre values and E protein sequences of DENV.
Fig. 3Antigenic clustering over the past four decades (1968–2014). The X-axis illustrates different years, while the Y-axis illustrates the predicted antigenic distance. Each spot represents the dominant strain of the circulating year, whose size is proportional to the logarithm of the strain numbers in that year. Strains with similar antigenicity are grouped into one antigenic cluster and named according to the first dominant strain in the first year of the cluster. Within each cluster, the antigenic distance was calculated between the dominant strain of each year and the representative strain of the cluster, whereas the antigenic distance between the two neighbouring clusters was calculated based on the representative strain.
Fig. 4Vaccine coverage in the Northern Hemisphere from 2001 to 2017. The X-axis represents years from 2001 to 2017, and the Y-axis represents the antigenic coverage of vaccine strains in each year. Each line refers to a vaccine strain from the year before it was proposed as the vaccine strain to the year after it was replaced by updated vaccine strain. Stars indicate the years in which each vaccine strain was recommended.
| Measurement(s) | antigenic distance • antigen |
| Technology Type(s) | antiserum titration value • titration |
| Factor Type(s) | strain |