Drug Design Goup

EpiDOCK: Molecular docking - based tool for MHC class II binding predicition

T-cell epitope is a continuous sequence with variable length. This sequence represents just a fragment of a larger foreign protein that has been digested inside the cell. In order to be recognized by the components of the immune system the T-cell epitope needs to be presented on the surface of the antigen presenting cell facilitated by a specialized family of proteins named Major Histocompatibility Complex (MHC). The main prerequisite for a peptide to act as a T-cell epitope is that it binds to an MHC protein. The stable binding of the peptide to the MHC molecule is thought to be the major bottleneck in the complicated pathway of antigen presentation.

Human MHCs, named HLA (Human Leukocyte Antigens) are extremely polymorphic and polygenic. The IMGT/HLA database lists more than 1,600 for HLA class II proteins alone. Because of the large number of MHC proteins and antigenic peptides, the experimental identification is a prohibitively time consuming and expensive process. The only tractable alternative approach is MHC binding prediction using computational methods. Therefore an implementation of a sufficiently accurate algorithm capable of predicting the best binding peptides to the MHC molecules is much needed. Dependent on the available input data the techniques range from plain sequence-based approaches, to more detailed, structure-based models.

HLAs class II contain three loci: DR, DQ and DP, which contribute to a different extend for the immune response. Although, for some of the alleles there is vast amount of binding affinity data available, other alleles remain poorly studied, and this reflects significantly on the predictive quality of some of the available models. The method implemented in EpiDOCK is structure-based and utilizes a combination of homology modeling and molecular docking techniques to derive quantitative models for MHC class II binding prediction [Patronov et al., 2011]. Such an approach is less dependent on experimentally determined binding affinity, and essentially relies on the availability of X-ray based structures. Additionally, models of homologous proteins can be successfully built when there is no X-ray data exists.

Our website provides access to a collection of accurate MHC class II binding prediction models, which were derived via molecular docking. The models are represented by docking score-based quantitative matrices (DS-QM), and tested on external test sets of known binders.

EpiDOCK takes as its input the target protein sequence formatted as a single letter code. It converts the input sequence into a collection of overlapping nonamers. Every nonamer is assessed against models for each allele and a score is assigned. Currently, docking score-based models for 23 alleles are currently included.