/
NIH QM Datasets Research

NIH QM Datasets Research

Goals

This page documents progress made towards the goals of the NIH OMSF subaward with regards to QM datasets:

OMSF will lead in the generation, curation and management of quantum chemistry datasets
via MolSSI QCArchive for the use of OpenFF force field parameterization efforts and other ML
force field efforts, this may include some contributions to QCArchive infrastructure development
and maintenance.

Personnel

  • Primary personnel: @Marcus Wieder

  • Primary supervisor: @John Chodera

  • OMSF approver: @Lily Wang

Current overview

SPICE 2.0 quantum chemical dataset coordination:

  • Taking over coordinating new datasets going into SPICE 2.0 that will be useful for ML and MM potential construction and assessment

  • Curating and preparing datasets

  • Coordinating QCFractal generation of datasets with OpenFF compatible levels of theory

  • Coordinating Exscientia/Prescient contributions to SPICE 2.0, which may include higher levels of theory as well

Metalloprotein quantum chemical dataset coordination:

Developing an ML potential training and assessment framework suitable for producing next-generation potentials

In the next few weeks, he is also working on these things he can likely cite his OpenFF funding source for, and which we therefore might be able to claim contributions to in a progress report:

  • A LiveCoMS best practices paper for construction and assessment of ML potentials with CECAM folks

  • A manuscript with  the Boresch and Exscientia groups assessing the ability of different force fields to provide MM -> ML/MM accuracy improvements in hydration free energy calculations

  • An open source release and corresponding manuscript for the Exscientia physics-ml package that includes both trained ML potentials and evaluation tools. Unclear how this will be harmonized with our modelforge, but this was just something we managed to persuade the Exscientia folks to do.

In future:

  • Lead the targeted generation of QC datasets needed for OpenFF nucleic acids, PDB Chemical Component dictionary, Enamine REALSpace, etc., as well as other needs identified by OpenFF

  • Lead the development of JAX-based molecular modeling and simulation engine

 

Updates