Developing machine learning tools for reactive atomistic material modelling

Machine learning force fields (MLFFs):

- Generation of reference ab initio database of energy/force/stress data and/or data mining of thereof and/or combination of existing datasets via, e.g., multi-task learning (e.g., ala https://doi.org/10.26434/chemrxiv-2023-8n737)
- Training of machine learning force fields using deep neural networks (e.g., using equivariant message passing networks such as MACE)
- Fine-tuning the machine learning force fields via various active-learning methods (query-by-committee, Gaussian mixture models, uncertainty driven enhanced sampling ala https://arxiv.org/abs/2402.03753 or uncertainty attribution https://www.nature.com/articles/s41467-024-50407-9)
- Transfer-learning (e.g., via conformational sampling proposed here 10.1021/acs.jctc.4c00643) to extend the applicability of MLFFs to similar systems and delta-learning to improve the accuracy of the baseline MLFFs for the case/reaction of interest

Machine learning driven (tensorial) property prediction:

- Generation of reference ab initio database of tensorial (NMR tensors, Born effective charge tensors for IR) and/or data mining of thereof
- Training of ML-based predictors using deep neural networks and/or kernel ridge regression (gaussian process regression)
Advanced (ML-accelerated) structure and reaction sampling:
- development and deployment of tools for global structure search (e.g., ML-accelerated via Gaussian process regression such as here 10.1103/PhysRevLett.124.086102)
- acceleration and improvement of methods for reaction network search/mapping (e.g., via OPTIM package https://www-wales.ch.cam.ac.uk/OPTIM/)
- development and deployment of ML-based collective variables (e.g., using autoencoder joined with message passing neural network such as here 10.1021/acs.jctc.2c00729 or in PINES 10.1021/acs.jctc.3c00923s) for enhanced sampling molecular dynamics simulations for: i) specific reactions, ii) for complex reaction in (reactive) solvents, and iii) for phase transitions.
- development and deployment of advanced (biased) molecular dynamics and (hybrid) Monte Carlo (MC) schemes (e.g., OPES - on-the-fly probability enhanced sampling method, kinetic MC for reaction network modelling, transition matrix MC for adsorption modelling, etc.)

Testing systems and beyond:

- extensive library of in-group-thoroughly investigated application systems (zeolites, oxidic surfaces, embedded metal clusters, etc.) for testing/fine-tuning the methods
- an optionality to deploy the herein implemented/developed toolkit for a grand application challenge of modelling synthesis of zeolitic materials

Networking and collaboration:

MLFFs & ML-based property predictors:
R. Gomez-Bombarelli (MIT, US); Alin M. Elena (STFC, UK); A. Fortunelli (U Pisa, IT); A. Bartok-Partay (U Warwick, UK)
ML-accelerated structure and reaction sampling:
D. Wales (U Cambridge, UK); V. van Speybroeck (U Gent, BE); L. Bonati (IIT Genoa, IT), T. Bucko (U Komenskeho, SK)

Relevant Publications:

1. Nature Communications, 2024, doi: 10.1038/s41467-024-48609-2
2. npj Computational Materials, 2022, doi: 10.1038/s41524-022-00865-w
3. Digital Discovery, 2025, doi: 10.1039/D4DD00306C
4. Journal of Chemical Theory and Computation, 2023, doi: 10.1021/acs.jctc.2c00729
5. Proceedings of Machine Learning Research, 2023, doi: 10.48550/arXiv.2301.03480