The openOCHEM consensus model is the best-performing open-source predictive model in the First EUOS/SLAS Joint Compound Solubility Challenge.
    
    
        
    
    
        
        SLAS Discov. 29:100144 (2024)
    
    
    
		
		
			
				The EUOS/SLAS challenge aimed to facilitate the development of reliable algorithms to predict the aqueous solubility of small molecules using experimental data from 100K compounds. In total, hundred teams took part in the challenge to predict low, medium and highly soluble compounds as measured by the nephelometry assay. This article describes the winning model, which was developed using the publicly available Online CHEmical database and Modeling environment (OCHEM) available on the website https://ochem.eu/article/27. We describe in detail the assumptions and steps used to select methods, descriptors and strategy which contributed to the winning solution. In particular we show that consensus based on 28 models calculated using descriptor-based and representation learning methods allowed us to obtain the best score, which was higher than those based on individual approaches or consensus models developed using each individual approach. A combination of diverse models allowed us to decrease both bias and variance of individual models and to calculate the highest score. The model based on Transformer CNN contributed the best individual score thus highlighting the power of Natural Language Processing (NLP) methods. The inclusion of information about aleatoric uncertainty would be important to better understand and use the challenge data by the contestants.
			
			
				
			
		 
		
			
				
					
					Impact Factor
					Scopus SNIP
					Web of Science
Times Cited
					Scopus
Cited By
					
					Altmetric
					
				 
				
			 
		 
		
     
    
        Publikationstyp
        Artikel: Journalartikel
    
 
    
        Dokumenttyp
        Wissenschaftlicher Artikel
    
 
    
        Typ der Hochschulschrift
        
    
 
    
        Herausgeber
        
    
    
        Schlagwörter
        Kaggle Challenge ; Ochem ; Transformer Cnn ; Consensus ; Descriptor Based Models ; Graph Neural Networks ; Representation Learning ; Solubility Prediction; Molecular Descriptors; Neural Networks; In-vitro; Coefficient; Agreement
    
 
    
        Keywords plus
        
    
 
    
    
        Sprache
        englisch
    
 
    
        Veröffentlichungsjahr
        2024
    
 
    
        Prepublished im Jahr 
        0
    
 
    
        HGF-Berichtsjahr
        2024
    
 
    
    
        ISSN (print) / ISBN
        2472-5552
    
 
    
        e-ISSN
        2472-5560
    
 
    
        ISBN
        
    
 
    
        Bandtitel
        
    
 
    
        Konferenztitel
        
    
 
	
        Konferzenzdatum
        
    
     
	
        Konferenzort
        
    
 
	
        Konferenzband
        
    
 
     
		
    
        Quellenangaben
        
	    Band: 29,  
	    Heft: 2,  
	    Seiten: ,  
	    Artikelnummer: 100144 
	    Supplement: ,  
	
    
 
  
        
            Reihe
            
        
 
        
            Verlag
            Sage
        
 
        
            Verlagsort
            Thousand Oaks, Calif.
        
 
	
        
            Tag d. mündl. Prüfung
            0000-00-00
        
 
        
            Betreuer
            
        
 
        
            Gutachter
            
        
 
        
            Prüfer
            
        
 
        
            Topic
            
        
 
	
        
            Hochschule
            
        
 
        
            Hochschulort
            
        
 
        
            Fakultät
            
        
 
    
        
            Veröffentlichungsdatum
            0000-00-00
        
 
         
        
            Anmeldedatum
            0000-00-00
        
 
        
            Anmelder/Inhaber
            
        
 
        
            weitere Inhaber
            
        
 
        
            Anmeldeland
            
        
 
        
            Priorität
            
        
 
    
        Begutachtungsstatus
        Peer reviewed
    
 
     
    
        POF Topic(s)
        30203 - Molecular Targets and Therapies
    
 
    
        Forschungsfeld(er)
        Enabling and Novel Technologies
    
 
    
        PSP-Element(e)
        G-503000-001
G-503093-001
    
 
    
        Förderungen
        Ministry of Education, Youth and Sports of the Czech Republic
European Union
    
 
    
        Copyright
        
    
 	
    
    
    
    
        Erfassungsdatum
        2024-02-06