Publications ⇤

Investigating extrapolation and low-data challenges via contrastive learning of chemical compositions

Federico Ottomano, Giovanni De Felice, Rahul Savani, Vladimir Gusev, Vladimir Gusev, Matthew Rosseinsky

Tags: AI4Science
Venue: AI4Mat NeurIPS 2023

Practical applications of machine learning for materials discovery remain severely limited by the quantity and quality of the available data. Furthermore, little is known about the ability of machine learning models to extrapolate outside of the training distribution, which is essential for the discovery of compounds with extraordinary properties. To address these challenges, we develop a novel deep representation learning framework for chemical compositions. The proposed model, named COmpositional eMBedding NETwork (CombNet), combines recent developments in graph-based encoding of chemical compositions with a supervised contrastive learning approach. This is motivated by the observation that contrastive learning can produce a regularized representation space from raw data, offering empirical benefits for extrapolation in low-data scenarios. Moreover, our method harnesses exclusively the chemical composition of the underlying materials, as crystal structure is generally unavailable before the material is discovered. We demonstrate the effectiveness of CombNet over state-of-the-art methods under a bespoke evaluation scheme that simulates a realistic materials discovery scenario with experimental data.

Paper