W24-11 Training and Deploying Physics-Based and Machine Learning Interatomic Potentials for Advanced Materials Applications

Instructors: Ilia Nikiforov, University of Minnesota; Amit Gupta, University of Minnesota


IN A NUTSHELL: This short course will introduce students to practical methods and tools for training physics-based and machine learning interatomic potentials (IPs) for multicomponent material systems, testing their performance in predicting basic material properties, and deploying them within state-of-the-art molecular simulations software packages including ASE and LAMMPS. The short course will be structured as a competition in which participants compete to train the best possible IP within the allotted time.

REQUIRED SKILLS AND RESOURCES: Basic familiarity with a Unix-based computing environment and Python programming is advised. Participants must bring a laptop that will be used to access online resources to be used during the course.

DETAILED OVERVIEW: Molecular simulations play a major role in materials science and chemistry. The ability to explicitly model microscopic behavior enables the study of materials microstructure and defects, plasticity, elasticity, chemical reactions, drug design, and more. A necessary part of molecular simulations is the interatomic potential (IP) used to predict the energy of each atom and the force acting on it based on its environment. For small simulations, it is possible to calculate the total energy and forces ab initio ("from the beginning", i.e. from quantum mechanical first principles) using methods such as density functional theory (DFT). For many practical problems, this is computationally infeasible and empirical IPs are required.

An empirical IP is an approximate function that estimates the energy of an atom in an atomic configuration based on the relative positions and species of its neighbors. This function varies in complexity, but is always considerably faster than ab initio methods. The gradient with respect to the positions results in the atomic forces. So-called "physics-based" IPs have functional forms based on a physical understanding of the nature of chemical bonding, and have a history that predates electronic computers and spans to the modern day. More recently, machine learning IPs (MLIPs) have been developed that use flexible generic functional forms that in principle are able to describe any possible response with increasing model size. All IPs have adjustable parameters that are fitted to correctly predict experimental physical properties of the materials or molecules they are designed to represent, or to reproduce energies and forces predicted by ab initio methods.

Current day molecular simulations are integrated within a robust software and cyberinfrastructure ecosystem. DFT databases provide both predictions of those material properties that are within its reach, as well as fitting data for training IPs using a variety of available fitting packages. Once an IP is ready for use, it is best practice to archive it in a public repository to make it accessible, and to make the results obtained using it reproducible. Additionally, IP repositories test the models that they archive and, in the case of the OpenKIM repository (openkim.org), provide the KIM API for using them seamlessly within popular molecular simulation packages to model equilibrium or nonequilibrium microscopic behavior.

This short course will instruct students in a vertical slice of IP development and application, focusing on MLIPs and cyberinfrastructures. Projects under the KIM initiative (kim-initiative.org) will be used, as they form a complete, well-integrated infrastructure for IP fitting and use.

After a short introduction to IPs and molecular simulation, the course will review several types of physics-based IPs and MLIPs. The majority of the course will consist of a hands-on exercise in which students will use fitting data from the ColabFit exchange — a DFT database geared specifically towards IP fitting — to fit an IP of their choosing to model a multicomponent material system. KLIFF (KIM-based Learning-Integrated Fitting Framework), a Python package for fitting IPs, will be demonstrated and used for this purpose. KLIFF is able to automatically fetch training data from ColabFit and fit many types of physics-based and machine-learning IPs, producing a package that is compatible with the KIM API. Students will participate in a competition where the IPs they fit are scored based on several criteria, including accuracy in forces with respect to a test dataset, accurate prediction of basic material properties, and stability under finite-temperature dynamics. Students will use the LAMMPS and ASE simulation packages integrated with the KIM API to deploy IPs as part of the testing process. Not only will students achieve a theoretical and practical understanding of IP development and use, the straightforward installation process of all software through Conda-Forge will enable direct transfer of the methods learned to their home lab environments and high-performance computing (HPC) platforms.