Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach

M Nielsen, C Lundegaard, P Worning, CS Hvid… - …, 2004 - academic.oup.com
M Nielsen, C Lundegaard, P Worning, CS Hvid, K Lamberth, S Buus, S Brunak, O Lund
Bioinformatics, 2004academic.oup.com
Motivation: Prediction of which peptides will bind a specific major histocompatibility complex
(MHC) constitutes an important step in identifying potential T-cell epitopes suitable as
vaccine candidates. MHC class II binding peptides have a broad length distribution
complicating such predictions. Thus, identifying the correct alignment is a crucial part of
identifying the core of an MHC class II binding motif. In this context, we wish to describe a
novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs …
Abstract
Motivation: Prediction of which peptides will bind a specific major histocompatibility complex (MHC) constitutes an important step in identifying potential T-cell epitopes suitable as vaccine candidates. MHC class II binding peptides have a broad length distribution complicating such predictions. Thus, identifying the correct alignment is a crucial part of identifying the core of an MHC class II binding motif. In this context, we wish to describe a novel Gibbs motif sampler method ideally suited for recognizing such weak sequence motifs. The method is based on the Gibbs sampling method, and it incorporates novel features optimized for the task of recognizing the binding motif of MHC classes I and II. The method locates the binding motif in a set of sequences and characterizes the motif in terms of a weight-matrix. Subsequently, the weight-matrix can be applied to identifying effectively potential MHC binding peptides and to guiding the process of rational vaccine design.
Results: We apply the motif sampler method to the complex problem of MHC class II binding. The input to the method is amino acid peptide sequences extracted from the public databases of SYFPEITHI and MHCPEP and known to bind to the MHC class II complex HLA-DR4(B1*0401). Prior identification of information-rich (anchor) positions in the binding motif is shown to improve the predictive performance of the Gibbs sampler. Similarly, a consensus solution obtained from an ensemble average over suboptimal solutions is shown to outperform the use of a single optimal solution. In a large-scale benchmark calculation, the performance is quantified using relative operating characteristics curve (ROC) plots and we make a detailed comparison of the performance with that of both the TEPITOPE method and a weight-matrix derived using the conventional alignment algorithm of ClustalW. The calculation demonstrates that the predictive performance of the Gibbs sampler is higher than that of ClustalW and in most cases also higher than that of the TEPITOPE method.
Oxford University Press