TY - JOUR AB - Detecting depression is a critical component of mental health diagnosis, and accurate assessment is essential for effective treatment. This study introduces a novel, fully automated approach to predicting depression severity using the E-DAIC dataset. We employ Large Language Models (LLMs) to extract depression-related indicators from interview transcripts, utilizing the Patient Health Questionnaire-8 (PHQ-8) score to train the prediction model. Additionally, facial data extracted from video frames is integrated with textual data to create a multimodal model for depression severity prediction. We evaluate three approaches: text-based features, facial features, and a combination of both. Our findings show the best results are achieved by enhancing text data with speech quality assessment, with a mean absolute error of 2.85 and root mean square error of 4.02. This study underscores the potential of automated depression detection, showing text-only models as robust and effective while paving the way for multimodal analysis. AU - Sadeghi, M.* AU - Richer, R.* AU - Egger, B.* AU - Schindler-Gmelch, L.* AU - Rupp, L.H.* AU - Rahimi, F.* AU - Berking, M.* AU - Eskofier, B.M. C1 - 72890 C2 - 56776 CY - Campus, 4 Crinan St, London, N1 9xw, England TI - Harnessing multimodal approaches for depression detection using large language models and facial expressions. JO - Npj Ment. Health Res. VL - 3 IS - 1 PB - Springernature PY - 2024 SN - 2731-4251 ER -