Withers, C.A.* ; Rufai, A.M.* ; Venkatesan, A.* ; Tirunagari, S.* ; Lobentanzer, S. ; Harrison, M.* ; Zdrazil, B.*
Natural language processing in drug discovery: Bridging the gap between text and therapeutics with artificial intelligence.
Expert Opin. Drug Discov. 20, 765-783 (2025)
INTRODUCTION: The field of Natural Language Processing (NLP) within the life sciences has exploded in its capacity to aid the extraction and analysis of data from scientific texts in recent years through the advancement of Artificial Intelligence (AI). Drug discovery pipelines have been innovated and accelerated by the uptake of AI/Machine Learning (ML) techniques. AREAS COVERED: The authors provide background on Named Entity Recognition (NER) in text - from tagging terms in text using ontologies to entity identification via ML models. They also explore the use of Knowledge Graphs (KGs) in biological data ingestion, manipulation and extraction, leading into the modern age of Large Language Models (LLMs) and their ability to maneuver complex and abundant data. The authors also cover the main strengths and weaknesses of the many methods available when undertaking NLP tasks in drug discovery. Literature was derived from searches utilizing Europe PMC, ResearchRabbit and SciSpace. EXPERT OPINION: The mass of scientific data that is now produced each year is both a huge positive for potential innovation in drug discovery and a new hurdle for researchers to overcome. Notably, methods should be selected to fit a use case and the data available, as each method performs optimally under different conditions.
Impact Factor
Scopus SNIP
Web of Science
Times Cited
Scopus
Cited By
Altmetric
Publication type
Article: Journal article
Document type
Review
Thesis type
Editors
Keywords
Drug discovery; Natural language processing; named entity recognition; large language model; knowledge graph; machine learning; deep learning; ontology; Tool
Keywords plus
Language
english
Publication Year
2025
Prepublished in Year
0
HGF-reported in Year
2025
ISSN (print) / ISBN
1746-0441
e-ISSN
1746-045X
ISBN
Book Volume Title
Conference Title
Conference Date
Conference Location
Proceedings Title
Quellenangaben
Volume: 20,
Issue: 6,
Pages: 765-783
Article Number: ,
Supplement: ,
Series
Publisher
Informa Healthcare
Publishing Place
London
Day of Oral Examination
0000-00-00
Advisor
Referee
Examiner
Topic
University
University place
Faculty
Publication date
0000-00-00
Application date
0000-00-00
Patent owner
Further owners
Application country
Patent priority
Reviewing status
Peer reviewed
POF-Topic(s)
30205 - Bioengineering and Digital Health
Research field(s)
Enabling and Novel Technologies
PSP Element(s)
G-503800-001
Grants
European Bioinformatics Institute of the European Molecular Biology Laboratory (EMBL-EBI)
Copyright
Erfassungsdatum
2025-05-11