13
115 Sandhya Kortagere (ed.), In Silico Models for Drug Discovery, Methods in Molecular Biology, vol. 993, DOI 10.1007/978-1-62703-342-8_8, © Springer Science+Business Media, LLC 2013 Chapter 8 Databases and In Silico Tools for Vaccine Design Yongqun He and Zuoshuang Xiang Abstract In vaccine design, databases and in silico tools play different but complementary roles. Databases collect experimentally verified vaccines and vaccine components, and in silico tools provide computational methods to predict and design new vaccines and vaccine components. Vaccine-related databases include databases of vaccines and vaccine components. In the USA, the Food and Drug Administration (FDA) maintains a database of licensed human vaccines, and the US Department of Agriculture keeps a data- base of licensed animal vaccines. Databases of vaccine clinical trials and vaccines in research also exist. The important vaccine components include vaccine antigens, vaccine adjuvants, vaccine vectors, and vaccine preservatives. The vaccine antigens can be whole proteins or immune epitopes. Various in silico vaccine design tools are also available. The Vaccine Investigation and Online Information Network (VIOLIN; http://www.violinet.org) is a comprehensive vaccine database and analysis system. The VIOLIN database includes various types of vaccines and vaccine components. VIOLIN also includes Vaxign, a Web-based in silico vaccine design program based on the reverse vaccinology strategy. Vaccine information and resources can be integrated with Vaccine Ontology (VO). This chapter introduces data- bases and in silico tools that facilitate vaccine design, especially those in the VIOLIN system. Key words Vaccine, Vaccine design, Vaccine database, VIOLIN vaccine database and analysis system, Vaxign, Reverse vaccinology , Immune epitope, Immunoinformatics, Vaccine adjuvant Vaccine research entered a new era with the advent of computer- assisted informatics and high-throughput technologies. Discussing rational vaccine design is more feasible and reasonable now than many years ago. Rational vaccine design has benefited from two areas of progress: vaccine databases and in silico vaccine design tools. Vaccine-related databases have emerged as a way to digest the ever-increasing volume of data associated with vaccines and vacci- nation. Advanced DNA sequencing and molecular, cellular, and immunological methods have provided a huge amount of vaccine- related data. For example, high-throughput “-omics” technologies 1 Introduction

[Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

  • Upload
    sandhya

  • View
    216

  • Download
    2

Embed Size (px)

Citation preview

Page 1: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

115

Sandhya Kortagere (ed.), In Silico Models for Drug Discovery, Methods in Molecular Biology, vol. 993,DOI 10.1007/978-1-62703-342-8_8, © Springer Science+Business Media, LLC 2013

Chapter 8

Databases and In Silico Tools for Vaccine Design

Yongqun He and Zuoshuang Xiang

Abstract

In vaccine design, databases and in silico tools play different but complementary roles. Databases collect experimentally veri fi ed vaccines and vaccine components, and in silico tools provide computational methods to predict and design new vaccines and vaccine components. Vaccine-related databases include databases of vaccines and vaccine components. In the USA, the Food and Drug Administration (FDA) maintains a database of licensed human vaccines, and the US Department of Agriculture keeps a data-base of licensed animal vaccines. Databases of vaccine clinical trials and vaccines in research also exist. The important vaccine components include vaccine antigens, vaccine adjuvants, vaccine vectors, and vaccine preservatives. The vaccine antigens can be whole proteins or immune epitopes. Various in silico vaccine design tools are also available. The Vaccine Investigation and Online Information Network (VIOLIN; http://www.violinet.org ) is a comprehensive vaccine database and analysis system. The VIOLIN database includes various types of vaccines and vaccine components. VIOLIN also includes Vaxign, a Web-based in silico vaccine design program based on the reverse vaccinology strategy. Vaccine information and resources can be integrated with Vaccine Ontology (VO). This chapter introduces data-bases and in silico tools that facilitate vaccine design, especially those in the VIOLIN system.

Key words Vaccine , Vaccine design , Vaccine database , VIOLIN vaccine database and analysis system , Vaxign , Reverse vaccinology , Immune epitope , Immunoinformatics , Vaccine adjuvant

Vaccine research entered a new era with the advent of computer-assisted informatics and high-throughput technologies. Discussing rational vaccine design is more feasible and reasonable now than many years ago. Rational vaccine design has bene fi ted from two areas of progress: vaccine databases and in silico vaccine design tools.

Vaccine-related databases have emerged as a way to digest the ever-increasing volume of data associated with vaccines and vacci-nation. Advanced DNA sequencing and molecular, cellular, and immunological methods have provided a huge amount of vaccine-related data. For example, high-throughput “-omics” technologies

1 Introduction

Page 2: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

116 Yongqun He and Zuoshuang Xiang

(genomics, transcriptomics, proteomics, and metabolomics) have led the generation of DNA, RNA, protein, and metabolite data in different cells under different experimental conditions. Advanced bioinformatics tools have been developed to permit ef fi cient use of these data. As a result, the publication of vaccine-associated papers has increased exponentially. In the fi rst half of the twentieth century, 1,210 vaccine-related papers were published, according to PubMed. Between 2002 and 2011, however, more than 100,000 vaccine-related papers were published. Databases have been devel-oped to store, reorganize, and categorize these data. Advanced literature-mining approaches are used to retrieve and discover new knowledge, which in turn supports more ef fi cient accumulation of vaccine data in databases.

Various in silico vaccine design tools have been developed and are available online. For example, many immunoinformatics tools have been developed since the 1980s to predict T- and B-cell immune epitopes ( 1 ) . Many predicted T- and B-cell immune epitopes are possible epitope vaccine targets. The availability of hundreds of pathogen genomes allows us to implement genome-wide scanning for the most promising vaccine targets. The analysis of high-throughput omics data enables the testing and screening of millions of possible vaccine targets in real time. The vaccine devel-opment strategy that starts with such scanning has been named reverse vaccinology ( 2 ) . Mathematical modeling also supports vaccine design ( 3 ) .

Databases and in silico tools play different but complementary roles in vaccine design. Databases collect experimentally veri fi ed vaccines and vaccine components. These classi fi ed, well-curated data provide the standards and training and testing data for future in silico prediction programs. In contrast, in silico tools provide computational methods to predict and design new vaccines and vaccine components. Predicted results can be experimentally veri fi ed and provide results that can be further stored in databases. The Vaccine Investigation and Online Information Network (VIOLIN; http://www.violinet.org ) is a comprehensive Web-based vaccine database and analysis system targeted primarily for the use of vaccine researchers ( 4 ) . VIOLIN contains databases of vaccines and vaccine components. In addition, VIOLIN includes Vaxign, a Web-based, genome-based vaccine design program based on reverse vaccinology and immunoinformatics strategies ( 5 ) . This chapter summarizes databases and in silico tools that can be used for vaccine design, focusing on VIOLIN components.

2 Databases of Vaccines and Vaccine Components

Vaccine-related databases can be split into two groups: whole vaccine databases and databases of vaccine components. This section intro-duces these two groups of databases individually.

Page 3: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

117Databases and In Silico Tools for Vaccine Design

A vaccine database is a compilation of information about a list of vaccines, in the form of a relational database (Table 1 ) or a collection of HTML and PDF fi les. Vaccine databases generated by govern-ment agencies or international organizations are often targeted for

2.1 Databases of Vaccines

Table 1 Databases of vaccines and vaccine components

Resource name Website URL Comment

Vaccines

US CDC-listed vaccines www.cdc.gov/vaccines/vpd-vac/vaccines-list.htm

Vaccines used in USA

US FDA-licensed vaccines www.fda.gov/cber/vaccines.htm Licensed vaccines used in USA

Public Health Agency of Canada licensed vaccines

www.phac-aspc.gc.ca/dpg-eng.php#vaccines

Licensed vaccines used in Canada

UK Department of Health www.dh.gov.uk/en/Publichealth/Immunisation

Of fi cial UK vaccination site

PATH Vaccine Resource Library

www.path.org/vaccineresources Collection of vaccine resources

GAVI Alliance www.gavialliance.org Goal: save children’s lives

VIOLIN www.violinet.org Comprehensive vaccine data

Huvax www.violinet.org/huvax Licensed human vaccines

Vevex www.violinet.org/vevax Licensed veterinary vaccines

Vaccine antigens

AntigenDB www.imtech.res.in/raghava/antigendb A database of antigens

Protegen www.violinet.org/protegen Protective protein antigens

IEDB (Immune Epitope Database and Analysis Resource)

www.immuneepitope.org Immune epitope database

“Virmugens” for generating live attenuated vaccines

VirmugenDB www.violinet.org/virmugendb See text for details a

Vaccine adjuvants

Vaxjo www.violinet.org/vaxjo Adjuvants and related vaccines

Vaccine vectors

Vaxvec www.violinet.org/vaxvec Vaccine vectors database

Vaccine-induced animal responses

Vaxar www.violinet.org/vaxar Vaccine-induced animal responses

(continued)

Page 4: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

118 Yongqun He and Zuoshuang Xiang

licensed vaccines and are presented to the public in the form of HTML and PDF fi les. For example, information on the vaccines licensed in the US market is provided by the US Food and Drug Administration (FDA) ( http://www.fda.gov/cber/vaccines.htm ). The US Centers for Disease Control and Prevention (CDC) Vaccine Information Statements (VISs) system provides informa-tion sheets that explain to vaccine recipients, their relatives, and legal representatives both the bene fi ts and risks of a vaccine ( http://www.cdc.gov/vaccines/pubs/vis/ ). Federal law in the USA requires that VISs be handed out for all vaccines before they are administered. The PATH Vaccine Resource Library (VRL; http://www.path.org/vaccineresources/ ) offers high-quality, scienti fi cally accurate documents and links to speci fi c diseases and topics in immunization. The Global Alliance for Vaccines and Immunization (GAVI) is an effort to support global collaboration for vaccines and immunization with a goal of saving children’s lives ( http://www.gavialliance.org ).

With the major focus on vaccine research, VIOLIN stores and analyzes research data concerning commercial vaccines and vaccines that are being assessed in clinical trials or are in early stages of development ( 4 ) . Constructed using the MySQL database man-agement system, the VIOLIN vaccine database currently contains more than 3,000 vaccines or vaccine candidates for more than 170 pathogens manually curated from more than 1,600 peer-reviewed papers and government Web sites and databases. For each vaccine collected in the VIOLIN database, data include vaccine prepara-tion, pathogen genes used and gene engineering, vaccine adjuvants and vectors, vaccine-induced host immune responses (including vaccine-related adverse events), and vaccine ef fi cacy as a result of controlled experiments or clinical trials.

Table 1(continued)

Resource name Website URL Comment

Host–vaccine interactions

Vaxism www.violinet.org/vaxism Protective immunity

Vaccine manufacturers

Vaxcom www.violinet.org/vaxcom Commercial vaccine companies

Vaccine experts

Vaxperts www.violinet.org/vaxperts Vaccine experts with publications

CDC US Centers for Disease Control and Prevention, FDA US Food and Drug Administration a Virmugens are pathogen genes that encode for virulence factors; their mutation in virulent pathogens allows for the generation of effective live attenuated vaccines

Page 5: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

119Databases and In Silico Tools for Vaccine Design

Because licensed vaccines are the focus of much attention in the public as well as in the vaccine research community, VIOLIN has two subdatabases that focus on licensed human and animal vac-cines. The Huvax subdatabase ( http://www.violinet.org/huvax ) stores all licensed human vaccines in the USA and Canada. The data for these licensed human vaccines are annotated through manual curation and a vigorous review process. Huvax includes manually curated data including manufacturer, trade name, storage information, age at which the vaccine should be administered, and other relevant information. These vaccines are also listed by the CDC CVX codes (codes that indicate the product used in a vaccine) for tracking vaccination records. Huvax has provided cross-references based on these CDC CVX codes. Huvax provides ways for users to search, compare, and analyze licensed human vaccines. Various criteria can be used for querying human vaccines. Different vaccines can also be compared side by side. Vevax is a VIOLIN database of licensed veterinary vaccines ( http://www.violinet.org/vevax ).

As a manufactured product, a vaccine is composed of several components. A vaccine antigen(s) is a required component for any vaccine. Such a vaccine antigen can be part of a whole organism (e.g., live attenuated vaccine), a protective protein antigen, or an immune epitope. A live attenuated vaccine usually does not require a vaccine adjuvant that functions as an immune stimulant. However, killed whole organism or subunit vaccines usually need vaccine adjuvants. Otherwise, a weak or nonexistent immune response will be induced. Other vaccine components include vaccine vectors and vaccine preservatives. Databases of these vaccine components are brie fl y introduced here.

There are several types of vaccine antigens. The most common antigens used for vaccine development are protein antigens. Immune epitopes may also be manufactured and used for vaccine development. Other molecules can also be used as vaccine antigens (e.g., lipopolysaccharide) ( 6 ) . The commonly used databases for protein and epitope antigens are introduced here.

AntigenDB ( http://www.imtech.res.in/raghava/antigendb/ ) is an immunoinformatics database of pathogen antigens that stores sequences, structures, origins, and epitopes ( 7 ) . However, although a protein antigen can induce a strong immune response, the induced immunity may not be protective against virulent pathogen challenge. Protegen ( http://www.violinet.org/protegen ) is a database of protective antigens in VIOLIN ( 8 ) . Protective antigens are antigens that can induce protection against virulent pathogen infection. A protective antigen is the core part of a vaccine. Protegen curates more than 600 protective antigens. A customized Protegen BLAST program has been developed for searching similarities

2.2 Databases of Vaccine Components

2.2.1 Databases of Vaccine Antigens

Page 6: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

120 Yongqun He and Zuoshuang Xiang

between a query sequence and the sequences of all Protegen proteins at the DNA or amino acid level.

At the epitope level, the Immune Epitope Database and Analysis Resource (IEDB) is the largest immune epitope database so far ( http://www.immuneepitope.org/ ). IEDB has stored more than 65,000 antibody and T-cell epitopes since the database was established in 2004 ( 9 ) . These immune epitopes cover various spe-cies (e.g., humans, nonhuman primates, and rodents) that have been well studied in the areas of infectious diseases ( 9 ) . Because all stored epitopes in IEDB are manually curated from experimental studies, the data in IEDB have often been used as the gold stan-dard for evaluating in silico epitope prediction tools ( 10 ) .

Vaccine adjuvants enhance host immune responses to coadminis-tered antigens in vaccines. Vaxjo ( http://www.violinet.org/vaxjo/ ) is a Web-based central database and analysis system that curates, stores, and analyzes vaccine adjuvants and vaccines that use the adjuvants. The adjuvant information stored in Vaxjo includes adjuvant names, components, structure, appearance, stor-age, preparation, function, safety, and vaccines that use each adju-vant. All data are manually curated and are cited from reliable references including peer-reviewed publications, company Web sites, and government agencies. Bioinformatics scripts are devel-oped and used to link vaccine adjuvants to adjuvanted vaccines stored in the general VIOLIN vaccine database. Presently, more than 100 vaccine adjuvants have been curated in Vaxjo. These adjuvants have been used in more than 380 vaccines stored in VIOLIN against more than 81 pathogens, cancers, or allergies. A user-friendly Web query and visualization interface permits inter-active vaccine adjuvant search.

The VIOLIN database also includes databases for many other vac-cine components. For example, VIOLIN includes a vaccine vector database, Vaxvec ( http://www.violinet.org/vaxvec/ ). A vaccine vector is a weakened or killed version of a virus or bacterium that cannot cause disease in hosts (e.g., human) that is used in geneti-cally engineered vaccines to transport genes coding for antigens into the body to induce an immune response. VirmugenDB ( http://www.violinet.org/virmugendb ) is a unique VIOLIN database that includes data for pathogen genes that encode for virulence factors, and their mutation in virulent pathogens allows for the generation of effective live attenuated vaccines. We coined the term “virmugen” for these genes. Currently VirmugenDB includes more than 200 virmugens.

Vaxar ( http://www.violinet.org/vaxar ) is a VIOLIN program that focuses on annotation and analysis of vaccine-induced animal responses. Vaxism ( http://www.violinet.org/vaxism/ ) is another program targeting for vaccine-related host–pathogen interaction

2.2.2 Databases of Vaccine Adjuvants

2.2.3 Databases of Other Vaccine Components

Page 7: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

121Databases and In Silico Tools for Vaccine Design

mechanisms (e.g., pathogenesis and protective immunity). VIOLIN also includes databases for vaccine experts, vaccine manufacturers, etc. (Table 1 ).

3 In Silico Tools for Vaccine Design

Two major in silico tools in vaccine design are based on two related but distinct strategies: immunoinformatics-based immune epitope prediction and reverse vaccinology-based protective protein antigen screening.

Many immunoinformatics algorithms and resources are available for predicting T- and B-cell immune epitopes. T-cell epitopes are bound in a linear form to major histocompatibility complex (MHC) class I or class II molecules or to the human leukocyte antigen (HLA) if the host is human. The interface between ligands and T cells can now be modeled with high accuracy. However, more than 90% of B-cell epitopes are nonlinear and require the knowl-edge of native three-dimensional protein structure. It remains a challenge to predict B-cell immune epitopes in silico.

T cells are activated by direct interaction with MHC class I or class II molecules on the surface of antigen-presenting cells (APCs). T-cell receptors can interact with peptides derived from endoge-nous and exogenous proteins bound in the cleft of MHC class I or class II molecules. MHC class I molecules present peptides that are eight to ten amino acids in length, leading to the activation of CD8 + cytotoxic T lymphocytes (CTLs). MHC class I molecules typically present peptides derived from proteolytic digestion of endogenously synthesized proteins. In contrast, peptides presented by MHC class II molecules are longer, more variable in size, and have more complex anchor motifs than peptides presented by MHC class I molecules ( 11– 13 ) . MHC class II molecules bind peptides consisting of 11–25 amino acids and are recognized by CD4 + T-helper cells. MHC class II molecules generally bind pep-tides derived from the cell membrane or exogenous proteins inter-nalized by APCs.

As described above, peptide epitopes bound to MHC class I or II molecules are able to stimulate CD8 + and CD4 + T cells, respectively. Both CD4 + T-helper cells and CD8 + CTLs are critical to protective immunity and to vaccine ef fi cacy. CD4 + T-helper cells play an important role in the development of memory B-cell (anti-body) and memory CTL (cytotoxic T-cell) responses. CD4 + T-helper cells are also active against pathogens on their own. Therefore, CD4 + T-helper cells have been called the “conductors of the immune system orchestra” ( 14 ) . CD8 + CTLs are able to directly kill infected target cells and thus are critical in the containment of

3.1 Immune Epitope Prediction Tools

3.1.1 T-Cell Immune Epitopes and Their In Silico Prediction

Page 8: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

122 Yongqun He and Zuoshuang Xiang

viral and bacterial infection ( 15 ) . Therefore, the prediction of pep-tide epitopes bound to MHC class I or II molecules is critical for effective vaccine design.

A deep understanding of the MHC antigen presentation path-ways provides additional methods for T-cell immune epitope pre-dictions. For example, the generation of an MHC class I epitope starts with the degradation of endogenous proteins into oligomeric fragments by cytosolic proteases, mainly the proteasome. These oligomeric fragments may escape from the attack of amino pepti-dases by entering the endoplasmic reticulum by the transporter associated with antigen presentation (TAP) ( 16 ) . Therefore, besides the prediction of MHC class I binding af fi nity, the predic-tions of the MHC class I pathway can be improved by predictions of proteasomal cleavage and TAP transport ef fi ciency ( 17– 19 ) .

Many tools for MHC class I and II epitope prediction have been developed ( 3 ) . Examples include the SYFPEITHI website ( 20 ) , BIMAS for predicting HLA binding ( 21 ) , a proprietary algo-rithm EpiMatrix ( 22 ) , Vaxitope in the VIOLIN Vaxign vaccine design system ( 5 ) , and many tools available through the IEDB system ( 23 ) . Each of these tools has been described and validated ( 10 ) . Although none of these sites yield exactly the same predic-tions, all predictions can achieve a high degree of accuracy (often in the range of 90–95% positive predictive value for well-studied MHC class I or II) ( 10, 22 ) .

In general, good performance has been achieved for MHC class I predictions. However, success is still limited for prediction of epitopes for MHC class II ( 24, 25 ) . The low prediction accuracy of HLA-II binding peptides results from many factors, including insuf fi cient or low-quality training data; limited stringency of bind-ing due to relative permissiveness of the binding groove of HLA-II molecules for peptide binding; dif fi culty in identifying 9-mer bind-ing cores within longer peptides; and the lack of consideration of the in fl uence of fl anking residues ( 24 ) .

Antibody response represents the fi rst line of defense against most viral and bacterial pathogens. B-cell epitopes are separated into two groups: linear and discontinuous epitopes. A linear (also called sequential) epitope has a single continuous stretch of amino acids within a protein sequence. A discontinuous (also called conforma-tional) epitope is an epitope whose residues are distantly separated in the sequence but have physical proximity through three-dimensional protein folding. Approximately 90% of B-cell epitopes are nonlinear epitopes, which makes the prediction of B-cell epitopes very dif fi cult.

Owing to the dif fi culty of evaluating discontinuous epitopes, most experimentally detected epitopes are linear epitopes. Tools for prediction of linear B-cell epitopes include 3DEX, CEP, and Pepito ( 26– 29 ) . IEDB has collected a list of Web tools for B-cell epitope prediction ( http://tools.immuneepitope.org/main/

3.1.2 B-Cell Immune Epitopes and Their In Silico Prediction

Page 9: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

123Databases and In Silico Tools for Vaccine Design

html/bcell_tools.html ). In general, these tools are not predictive and need to be re fi ned ( 30, 31 ) . The huge complexity of predict-ing B-cell epitopes results partly from the inherent fl exibility in the complementarity-determining regions of the antibody and to post-translational modi fi cations such as glycosylation. These factors can result in modi fi cation of B-cell epitopes.

To counter the low success rate of B-cell epitope prediction, MHC class II epitopes for CD4 + T cells may be emphasized when antibody response is key in vaccine design. CD4 + T-helper cells are critical to induce the activation of B cells that produce antibodies. B-cell antigens that contain signi fi cant MHC class II epitopes may outperform B-cell antigens without cognate help. An identi fi ed T-cell epitope may sometimes contain a B-cell epitope. In addition, B-cell epitopes may colocalize near or overlap MHC class II epitopes ( 32, 33 ) .

Since the complete genome of Haemophilus in fl uenzae (a patho-genic bacterium) was published ( 34 ) , thousands of pathogen genomes have been sequenced. Reverse vaccinology is a new strat-egy for vaccine development that starts with predicting putative vaccine candidates by in silico genomics analysis based on various criteria. Reverse vaccinology was fi rst applied to the development of vaccines against serogroup B Neisseria meningitidis ( 35 ) . Host protection against N. meningitidis infection mainly requires anti-body response. The surface proteins and outer membrane proteins of N. meningitidis are most likely antigens that stimulate antibody responses and interact with the antibodies. Therefore, the prediction of bacterial surface or secreted proteins in N. meningitidis becomes the main criterion in the fi rst application of reverse vaccinology in N. meningitidis vaccine targets. After candidate proteins are predicted, a high-throughput experimental screening approach can be used to evaluate the vaccine protection ef fi cacy ( 35 ) .

Since the fi rst report of the application of reverse vaccinology, in 2000, many more criteria have been developed. For example, based on the observation that outer membrane proteins containing more than one transmembrane helix are often dif fi cult to clone and purify ( 35 ) , the number of transmembrane domains of a protein is often chosen by a research designer as another fi ltering criterion. Bacterial adhesins are proteins responsible for adherence, coloniza-tion, and invasion of microbial pathogens to host cells ( 36 ) . Therefore, whether or not a candidate is an adhesin is now consid-ered as an important criterion. Because many genomes have usually been sequenced for any pathogen, it is possible to select those con-served proteins that can be used to induce protection against mul-tiple pathogenic strains. It is also important to screen for sequence similarities between pathogen proteins (vaccine candidate) and host (e.g., human) proteome. If a pathogen protein is homologous to a host protein, when administered in vivo, this pathogen protein is likely to induce one of two extreme reactions: an autoimmune disease

3.2 Reverse Vaccinology for Vaccine Prediction

Page 10: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

124 Yongqun He and Zuoshuang Xiang

or immune tolerance. Furthermore, immune epitope prediction based on amino acid sequences is also considered as a criterion for the application of reverse vaccinology in rational vaccine design ( 5 ) .

Vaxign is the fi rst Web-based, genome-wide vaccine target prediction program that uses a reverse vaccinology strategy and immunoinformatics-based prediction of MHC class I and II epitopes ( http://www.violinet.org/vaxign ) ( 5 ) . Overall, the pre-dicted features of Vaxign include:

Protein subcellular location. ●

Transmembrane helices. ●

Adhesin probability. ●

Conservation to human or mouse proteins. ●

Sequence exclusion from genome(s) of nonpathogenic strain(s). ●

Epitope binding to MHC class I and class II. ●

Vaxign uses a module-based design, i.e., each feature is imple-mented using an independent program and a user can choose which features to include in fi nal result generation. To avoid rein-venting the wheel, Vaxign uses existing open-source software programs as much as possible.

Vaxign has been demonstrated to successfully predict vaccine targets for many pathogens, including Brucella spp. ( 37, 38 ) and uropathogenic Escherichia coli ( 5 ) . Currently more than 250 genomes have been precomputed using the Vaxign pipeline and are available for query at the Vaxign Web site. In addition, Vaxign allows a user to provide his or her own input sequences (up to 500 sequences at a time) and performs dynamic vaccine target prediction based on the input sequences.

4 Vaccine Ontology and its Application in Vaccine Design

Vaccine databases provide help with various aspects of vaccine knowledge and research. However, if a database is a collection of HTML or PDF fi les, such fi les are dif fi cult to parse and analyze. If a database is a relational database, the database schema is useful for internal data exchange but outside data access and process are usually not feasible. Therefore, how to integrate the large volume of vaccine information is a challenge. A biomedical ontology is a consensus-based controlled vocabulary of terms and relations, with associated de fi nitions that are logically formulated in such a way as to promote automated reasoning ( 39 ) . Biomedical ontologies structure complex biomedical knowledge in speci fi c domains in such a way as to permit shared understanding of biomedical concepts and data. The Vaccine Ontology (VO; http://www.violinet.org/vaccineontology ) is a biomedical ontology in the domain of vaccines and vaccination ( 40 ) . The collaborative, community-based

Page 11: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

125Databases and In Silico Tools for Vaccine Design

VO development aims to promote vaccine data standardization, integration, and computer-assisted reasoning.

The VO database can be used for vaccine data integration. As of January 2013, VO contains more than 5,000 terms, including more than 1,000 vaccines and vaccine candidates that are logically de fi ned and represented. These vaccines or vaccine candidates are targeted for 70 pathogens. VO also stores data for vaccine compo-nents, including protective antigens, vaccine adjuvants, and vac-cine vectors. The other data stored in VO include vaccine-induced immune responses, vaccine adverse events, and protection ef fi cacy. Each VO term is logically linked with one or more terms in the same ontology. Many terms in VO are imported from other ontol-ogies. The information is provided in a computer-understandable language called Web Ontology Language (OWL). Therefore, com-puter programs can be developed to parse and use the information in VO for appropriate vaccine design. An example of VO-supported automated reasoning is its application to vaccine literature mining. Compared with a direct PubMed search, the application of VO in searching the PubMed literature database for “live attenuated Brucella vaccine” increased the number of hits by more than tenfold with high speci fi city ( 41, 42 ) . The application of VO also enhances the discovery of interferon- γ - and vaccine-associated gene networks ( 43 ) . The role of VO in rational vaccine design consists of its applica-tion in computer-assisted automated reasoning.

This chapter introduces the following topics:

1. Commonly used vaccine-related databases. 2. In silico tools for vaccine design.

3. Vaccine Ontology and its application in vaccine design.

Many other method that are not introduced in this chapter can be used to improve vaccine design. For example, beside genomics-based vaccine target prediction in the scope of reverse vaccinology, high-throughput transcriptomics and proteomics technologies (e.g., microarray) can be used to analyze expression patterns of thousands of genes in a genome. The upregulated genes represent genes that encode for possible protein antigens. High-throughput screening of human or animal serum samples for antibodies pro-vides a more direct way to identify antigenic vaccine targets because antibodies are the products of immune responses ( 44 ) . The avail-ability of three-dimensional structures of proteins may facilitate epitope prediction and antigen discovery ( 45, 46 ) . Mathematical modeling can also be used to simulate host–pathogen interactions and can contribute dramatically to the understanding of fundamen-tal protective immunity. Mathematical modeling approaches have also supported the optimization of vaccination procedures ( 3 )

5 Notes

Page 12: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

126 Yongqun He and Zuoshuang Xiang

Despite extensive efforts, many infectious diseases, including AIDS, tuberculosis, and malaria, still lack effective and safe vac-cines. Many noninfectious diseases, including cancer, autoimmune diseases, and allergy, also lack safe and effective vaccines. One reason for this lack is the absence of precise protective immune mecha-nism and immunological correlates of protection. These challenges can only be addressed with continuous productive research. Better rational vaccine design methods can only be developed after funda-mental protective immunity mechanisms are better understood.

Acknowledgments

This work has been supported by grant R01AI081062 from the USA National Institute of Allergy and Infectious Diseases (NIAID).

References

1. De Groot AS, Sbai H, Aubin CS et al (2002) Immuno-informatics: mining genomes for vaccine components. Immunol Cell Biol 80:255–269

2. Rappuoli R (2000) Reverse vaccinology. Curr Opin Microbiol 3:445–450

3. He Y, Rappuoli R, De Groot AS, Chen RT (2010) Emerging vaccine informatics. J Biomed Biotechnol 2010:218590

4. Xiang Z, Todd T, Ku KP et al (2008) VIOLIN: vaccine investigation and online information network. Nucleic Acids Res 36:D923–D928

5. He Y, Xiang Z, Mobley HL (2010) Vaxign: the fi rst web-based vaccine design program for reverse vaccinology and applications for vac-cine development. J Biomed Biotechnol 2010:297505

6. Bhattacharjee AK, Izadjoo MJ, Zollinger WD et al (2006) Comparison of protective ef fi cacy of subcutaneous versus intranasal immuniza-tion of mice with a Brucella melitensis lipopoly-saccharide subunit vaccine. Infect Immun 74:5820–5825

7. Ansari HR, Flower DR, Raghava GP (2010) AntigenDB: an immunoinformatics database of pathogen antigens. Nucleic Acids Res 38:D847–D853

8. Yang B, Sayers S, Xiang Z, He Y (2011) Protegen: a web-based protective antigen data-base and analysis system. Nucleic Acids Res 39:D1073–D1078 (database issue)

9. Peters B, Sidney J, Bourne P et al (2005) The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol 3:e91

10. Wang P, Sidney J, Dow C et al (2008) A sys-tematic assessment of MHC class II peptide binding predictions and evaluation of a

consensus approach. PLoS Comput Biol 4:e1000048

11. Unanue ER (1992) Cellular studies on antigen presentation by class II MHC molecules. Curr Opin Immunol 4:63–69

12. Brown JH, Jardetzky TS, Gorga JC et al (1993) Three-dimensional structure of the human class II histocompatibility antigen HLA-DR1. Nature 364:33–39

13. Chicz RM, Urban RG, Gorga JC et al (1993) Speci fi city and promiscuity among naturally processed peptides bound to HLA-DR alleles. J Exp Med 178:27–47

14. Ahlers JD, Belyakov IM, Thomas EK, Berzofsky JA (2001) High-af fi nity T helper epitope induces complementary helper and APC polar-ization, increased CTL, and protection against viral infection. J Clin Invest 108:1677–1685

15. Plotnicky H, Cyblat-Chanal D, Aubry JP et al (2003) The immunodominant in fl uenza matrix T cell epitope recognized in human induces in fl uenza protection in HLA-A2/K(b) transgenic mice. Virology 309:320–329

16. Bulik S, Peters B, Ebeling C, Holzhutter H (2004) Cytosolic processing of proteasomal cleavage products can enhance the presenta-tion ef fi ciency of MHC-1 epitopes. Genome Inform 15:24–34

17. Peters B, Bulik S, Tampe R et al (2003) Identifying MHC class I epitopes by predict-ing the TAP transport ef fi ciency of epitope precursors. J Immunol 171:1741–1749

18. Nielsen M, Lundegaard C, Lund O, Kesmir C (2005) The role of the proteasome in generat-ing cytotoxic T-cell epitopes: insights obtained from improved predictions of proteasomal cleavage. Immunogenetics 57:33–41

Page 13: [Methods in Molecular Biology] In Silico Models for Drug Discovery Volume 993 || Databases and In Silico Tools for Vaccine Design

127Databases and In Silico Tools for Vaccine Design

19. Stranzl T, Larsen MV, Lundegaard C, Nielsen M (2010) NetCTLpan: pan-speci fi c MHC class I pathway epitope predictions. Immunogenetics 62:357–368

20. Rammensee H, Bachmann J, Emmerich NP et al (1999) SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics 50:213–219

21. Parker KC, Bednarek MA, Coligan JE (1994) Scheme for ranking potential HLA-A2 bind-ing peptides based on independent binding of individual peptide side-chains. J Immunol 152:163–175

22. De Groot AS, Martin W (2009) Reducing risk, improving outcomes: bioengineering less immunogenic protein therapeutics. Clin Immunol 131:189–201

23. Zhang Q, Wang P, Kim Y et al (2008) Immune epitope database analysis resource (IEDB-AR). Nucleic Acids Res 36:W513–W518

24. Lin HH, Zhang GL, Tongchusak S et al (2008) Evaluation of MHC-II peptide binding predic-tion servers: applications for vaccine research. BMC Bioinformatics 9(Suppl 12):S22

25. Gowthaman U, Agrewala JN (2008) In silico tools for predicting peptides binding to HLA-class II molecules: more confusion than con-clusion. J Proteome Res 7:154–163

26. Enshell-Seijffers D, Denisov D, Groisman B et al (2003) The mapping and reconstitution of a conformational discontinuous B-cell epitope of HIV-1. J Mol Biol 334:87–101

27. Schreiber A, Humbert M, Benz A, Dietrich U (2005) 3D-Epitope-Explorer (3DEX): local-ization of conformational epitopes within three-dimensional structures of proteins. J Comput Chem 26:879–887

28. Kulkarni-Kale U, Bhosle S, Kolaskar AS (2005) CEP: a conformational epitope prediction server. Nucleic Acids Res 33:W168–W171

29. Sweredoski MJ, Baldi P (2008) PEPITO: improved discontinuous B-cell epitope predic-tion using multiple distance thresholds and half sphere exposure. Bioinformatics 24:1459–1460

30. Blythe MJ, Flower DR (2005) Benchmarking B cell epitope prediction: underperformance of existing methods. Protein Sci 14:246–248

31. Reimer U (2009) Prediction of linear B-cell epitopes. Methods Mol Biol 524:335–344

32. Rajnavolgyi E, Nagy N, Thuresson B et al (2000) A repetitive sequence of Epstein-Barr virus nuclear antigen 6 comprises overlapping T cell epitopes which induce HLA-DR-restricted CD4(+) T lymphocytes. Int Immunol 12:281–293

33. Graham CM, Barnett BC, Hartlmayr I et al (1989) The structural requirements for class II (I-Ad)-restricted T cell recognition of in fl uenza hemagglutinin: B cell epitopes de fi ne T cell epitopes. Eur J Immunol 19:523–528

34. Fleischmann RD, Adams MD, White O et al (1995) Whole-genome random sequencing and assembly of Haemophilus in fl uenzae Rd. Science 269:496–512

35. Pizza M, Scarlato V, Masignani V et al (2000) Identi fi cation of vaccine candidates against serogroup B meningococcus by whole-genome sequencing. Science 287:1816–1820

36. Sachdeva G, Kumar K, Jain P, Ramachandran S (2005) SPAAN: a software program for pre-diction of adhesins and adhesin-like proteins using neural networks. Bioinformatics 21:483–491

37. He Y, Xiang Z (2010) Bioinformatics analysis of Brucella vaccines and vaccine targets using VIOLIN. Immunome Res 6(Suppl 1):S5

38. Xiang Z, He Y (2009) Vaxign: a web-based vaccine target design program for reverse vac-cinology. Procedia Vaccinol 1(1):23–29

39. Xiang Z, Courtot M, Brinkman RR et al (2010) OntoFox: web-based support for ontology reuse. BMC Res Notes 3:175

40. He Y, Cowell L, Diehl AD et al. (2009) VO: Vaccine ontology. The 1st international con-ference on biomedical ontology (ICBO 2009). 24-26 July 2009, Buffalo, NY. Nature Precedings. http://precedings.nature.com/documents/3553/version/1 . doi: 10.1038/npre.2009.3553.1 . Accessed 23 Apr 2012.

41. Xiang Z, He Y (2009) Improvement of PubMed literature searching using biomedical ontology. The 1st international conference on biomedical ontology (ICBO 2009). 24-26 July 2009, Buffalo, NY. Nature Precedings . http://precedings.nature.com/documents/3491/version/1 . doi: 10.1038/npre.2009.3491.1 . Accessed 23 Apr 2012

42. Hur J, Xiang Z, Feldman EL, He Y (2011) Ontology-based Brucella vaccine literature indexing and systematic analysis of gene-vaccine association network. BMC Immunol 12:49

43. Özgür A, Xiang Z, Radev DR, He Y (2011) Mining of vaccine-associated IFN- γ gene inter-action networks using the vaccine ontology. J Biomed Semantics 2(suppl 2):S8. doi: 10.1186/2041-1480-2-S2-S8

44. Banai M, He Y (2012) Systems biology and bioinformatics help decipher Brucella antigens involved in clinical manifestation of the dis-ease. Front Cell Infect Microbiol 2:34. doi: 10.3389/fcimb.2012.00034

45. Serruto D, Rappuoli R (2006) Post-genomic vaccine development. FEBS Lett 580:2985–2992

46. Mora M, Donati C, Medini D et al (2006) Microbial genomes and vaccine design: re fi nements to the classical reverse vaccinol-ogy approach. Curr Opin Microbiol 9:532–536