Using machine learning to discover CVs for enhanced sampling
A key challenge in enhanced sampling simulations is identifying collective variables (CVs) able to efficiently explore rare events. I developed data-driven approaches that automate this process using machine learning techniques. Notably, I proposed a method to build CVs from metastable states alone via neural networks optimized with Fisher’s discriminant (Bonati et al., 2020) and a deep learning framework to extract slow modes from biased simulations, improving rare-event sampling in diverse applications (Bonati et al., 2021).
Left: DeepLDA. Right: DeepTICA.
Recent advances include a descriptor-free approach leveraging geometric graph neural networks for symmetry-invariant CVs (Zhang et al., 2024) and a multitask approach that can learn CVs from transition path sampling simulations while simultaneously optimizing shooting efficiency (Zhang et al., 2024).
Code. All these techniques are implemented in mlcolvar, a Python library I developed which integrates machine learning-based CVs into enhanced sampling workflows (Bonati et al., 2023).
Molecular dynamics simulations hold great promise for providing insight into the microscopic behavior of complex molecular systems. However, their effectiveness is often constrained by long timescales associated with rare events. Enhanced sampling methods have been developed to address these challenges, and recent years have seen a growing integration with machine learning techniques. This review provides a comprehensive overview of how they are reshaping the field, with a particular focus on the data-driven construction of collective variables. Furthermore, these techniques have also improved biasing schemes and unlocked novel strategies via reinforcement learning and generative approaches. In addition to methodological advances, we highlight applications spanning different areas such as biomolecular processes, ligand binding, catalytic reactions, and phase transitions. We conclude by outlining future directions aimed at enabling more automated strategies for rare-event sampling.
@article{Zhu2025EnhancedApplications,author={Zhu, Kai and Trizio, Enrico and Zhang, Jintu and Hu, Renling and Jiang, Linlong and Hou, Tingjun and Bonati, Luigi},journal={Chemical Reviews},doi={10.1021/acs.chemrev.5c00700},month=oct,title={Enhanced Sampling in the Age of Machine Learning: Algorithms and Applications},year={2025},}
2024
JCTC
Descriptor-Free Collective Variables from Geometric Graph Neural Networks
Jintu Zhang, Luigi Bonati, Enrico Trizio, Odin Zhang, Yu Kang, Ting Jun Hou, and Michele Parrinello
Journal of Chemical Theory and Computation, Dec 2024
Enhanced sampling simulations make the computational study of rare events feasible. A large family of such methods crucially depends on the definition of some collective variables (CVs) that could ...
@article{Zhang2024DescriptorFreeNetworks,author={Zhang, Jintu and Bonati, Luigi and Trizio, Enrico and Zhang, Odin and Kang, Yu and Hou, Ting Jun and Parrinello, Michele},doi={10.1021/ACS.JCTC.4C01197},issn={15499626},issue={24},journal={Journal of Chemical Theory and Computation},month=dec,pages={10787-10797},pmid={39665183},publisher={American Chemical Society},title={Descriptor-Free Collective Variables from Geometric Graph Neural Networks},volume={20},url={/doi/pdf/10.1021/acs.jctc.4c01197},year={2024},}
JCTC
Combining Transition Path Sampling with Data-Driven Collective Variables through a Reactivity-Biased Shooting Algorithm
Jintu Zhang, Odin Zhang, Luigi Bonati, and Ting Jun Hou
Journal of Chemical Theory and Computation, Jun 2024
Rare event sampling is a central problem in modern computational chemistry research. Among the existing methods, transition path sampling (TPS) can generate unbiased representations of reaction pro...
@article{Zhang2024CombiningAlgorithm,author={Zhang, Jintu and Zhang, Odin and Bonati, Luigi and Hou, Ting Jun},doi={10.1021/ACS.JCTC.4C00423},issn={15499626},issue={11},journal={Journal of Chemical Theory and Computation},month=jun,pages={4523-4532},pmid={38801759},publisher={American Chemical Society},title={Combining Transition Path Sampling with Data-Driven Collective Variables through a Reactivity-Biased Shooting Algorithm},volume={20},year={2024},}
arXiv
Advanced simulations with PLUMED: OPES and Machine Learning Collective Variables
Enrico Trizio, Andrea Rizzi, Pablo M. Piaggi, Michele Invernizzi, and Luigi Bonati
Many biological processes occur on time scales longer than those accessible to molecular dynamics simulations. Identifying collective variables (CVs) and introducing an external potential to accelerate them is a popular approach to address this problem. In particular, \texttt{PLUMED} is a community-developed library that implements several methods for CV-based enhanced sampling. This chapter discusses two recent developments that have gained popularity in recent years. The first is the On-the-fly Probability Enhanced Sampling (OPES) method as a biasing scheme. This provides a unified approach to enhanced sampling able to cover many different scenarios: from free energy convergence to the discovery of metastable states, from rate calculation to generalized ensemble simulation. The second development concerns the use of machine learning (ML) approaches to determine CVs by learning the relevant variables directly from simulation data. The construction of these variables is facilitated by the \texttt{mlcolvar} library, which allows them to be optimized in Python and then used to enhance sampling thanks to a native interface inside \texttt{PLUMED}. For each of these methods, in addition to a brief introduction, we provide guidelines, practical suggestions and point to examples from the literature to facilitate their use in the study of the process of interest.
@article{Trizio2024AdvancedVariables,author={Trizio, Enrico and Rizzi, Andrea and Piaggi, Pablo M. and Invernizzi, Michele and Bonati, Luigi},isbn={2410.18019v1},journal={arXiv},keywords={Enhanced sampling,OPES,PLUMED,collective variables,machine learning,mlcolvar},month=oct,title={Advanced simulations with PLUMED: OPES and Machine Learning Collective Variables},url={http://arxiv.org/abs/2410.18019},year={2024},}
2023
JCP
A unified framework for machine learning collective variables for enhanced sampling simulations: mlcolvar
Luigi Bonati, Enrico Trizio, Andrea Rizzi, and Michele Parrinello
Identifying a reduced set of collective variables is critical for understanding atomistic simulations and accelerating them through enhanced sampling techniques. Recently, several methods have been proposed to learn these variables directly from atomistic data. Depending on the type of data available, the learning process can be framed as dimensionality reduction, classification of metastable states, or identification of slow modes. Here, we present mlcolvar, a Python library that simplifies the construction of these variables and their use in the context of enhanced sampling through a contributed interface to the PLUMED software. The library is organized modularly to facilitate the extension and cross-contamination of these methodologies. In this spirit, we developed a general multi-task learning framework in which multiple objective functions and data from different simulations can be combined to improve the collective variables. The library’s versatility is demonstrated through simple examples that are prototypical of realistic scenarios.
@article{Bonati2023Amlcolvar,author={Bonati, Luigi and Trizio, Enrico and Rizzi, Andrea and Parrinello, Michele},doi={10.1063/5.0156343},issn={0021-9606},issue={1},journal={The Journal of Chemical Physics},month=jul,pmid={37409767},publisher={AIP Publishing},title={A unified framework for machine learning collective variables for enhanced sampling simulations: <tt>mlcolvar</tt>},volume={159},url={https://pubs.aip.org/jcp/article/159/1/014801/2901354/A-unified-framework-for-machine-learning},year={2023},}
2021
PNAS
Deep learning the slow modes for rare events sampling
Luigi Bonati, GiovanniMaria Piccini, and Michele Parrinello
Proceedings of the National Academy of Sciences, 16 2021
The development of enhanced sampling methods has greatly extended the scope of atomistic simulations, allowing long-time phenomena to be studied with accessible computational resources. Many such methods rely on the identification of an appropriate set of collective variables. These are meant to describe the system’s modes that most slowly approach equilibrium. Once identified, the equilibration of these modes is accelerated by the enhanced sampling method of choice. An attractive way of determining the collective variables is to relate them to the eigenfunctions and eigenvalues of the transfer operator. Unfortunately, this requires knowing the long-term dynamics of the system beforehand, which is generally not available. However, we have recently shown that it is indeed possible to determine efficient collective variables starting from biased simulations. In this paper, we bring the power of machine learning and the efficiency of the recently developed on-the-fly probability enhanced sampling method to bear on this approach. The result is a powerful and robust algorithm that, given an initial enhanced sampling simulation performed with trial collective variables or generalized ensembles, extracts transfer operator eigenfunctions using a neural network ansatz and then accelerates them to promote sampling of rare events. To illustrate the generality of this approach we apply it to several systems, ranging from the conformational transition of a small molecule to the folding of a mini-protein and the study of materials crystallization.
@article{Bonati2021DeepSampling,title={Deep learning the slow modes for rare events sampling},author={Bonati, Luigi and Piccini, GiovanniMaria and Parrinello, Michele},journal={Proceedings of the National Academy of Sciences},doi={10.1073/pnas.2113533118},volume={118},number={44},pages={e2113533118},year={2021},month={16},publisher={National Academy of Sciences},}
2020
J. Phys. Chem. Lett.
Data-Driven Collective Variables for Enhanced Sampling
Luigi Bonati, Valerio Rizzi, and Michele Parrinello
Designing an appropriate set of collective variables is crucial to the success of several enhanced sampling methods. Here we focus on how to obtain such variables from information limited to the metastable states. We characterize these states by a large set of descriptors and employ neural networks to compress this information in a lower-dimensional space, using Fisher’s linear discriminant as an objective function to maximize the discriminative power of the network. We test this method on alanine dipeptide, using the nonlinearly separable data set composed by atomic distances. We then study an intermolecular aldol reaction characterized by a concerted mechanism. The resulting variables are able to promote sampling by drawing nonlinear paths in the physical space connecting the fluctuations between metastable basins. Lastly, we interpret the behavior of the neural network by studying its relation to the physical variables. Through the identification of its most relevant features, we are able to gain chemical insight into the process.
@article{Bonati2020DataDrivenSampling,author={Bonati, Luigi and Rizzi, Valerio and Parrinello, Michele},doi={10.1021/acs.jpclett.0c00535},issn={19487185},issue={8},journal={Journal of Physical Chemistry Letters},pages={2998-3004},pmid={32239945},title={Data-Driven Collective Variables for Enhanced Sampling},volume={11},url={https://dx.doi.org/10.1021/acs.jpclett.0c00535},year={2020},}