Understanding the recognition of disease-derived epitopes through T cell receptors (TCRs) has the potential to serve as a stepping stone for the development of efficient immunotherapies and vaccines. While a plethora of sequence-based prediction methods for TCR-epitope binding exists, their pre-trained models have not been comparatively evaluated. To alleviate this shortcoming, we integrated 21 TCR-epitope prediction models into the immune-prediction framework ePytope, offering interoperable interfaces with standard TCR repertoire data formats. We showcase the applicability of ePytope-TCR by evaluating the performance of these publicly available prediction models on two challenging datasets. While novel predictors successfully predicted binding to frequently observed epitopes, all methods failed for less frequently observed epitopes. Further, we detected a strong bias in the prediction scores between different epitope classes. We envision this benchmark to guide researchers in their choice of a predictor and to accelerate the method development by defining standardized evaluation settings.