Optimization strategies

Driver

Approver

Contributors

Stakeholder

Driver

Approver

Contributors

Stakeholder

@Pavan Behara @Simon Boothroyd @Hyesu Jang

 

 

 

Objective

Test force field optimization strategies

Due date

Sep 30, 2021 

Key outcomes

  • Find out any deficiencies in current setup that may drive the optimization to bad parameter space

  • Train a better force field

Status

in progress

Problem Statement

Apart from the selection of training data which is critical to force field optimization, there are lot of empirical parameters so that the optimizer samples the intended parameter space giving out reasonable physical parameter values. However, due to the curse of dimensionality and existence of multiple minima, that gives rise to near linear dependencies, it is not always possible to end up with the best parameter set. There are instances where we end up with some bad parameters, such as in case of sulfonamides, which resulted in manually debugging and adding guard rails in the form of canary tests. Lot of work on this front has been done by Lee-Ping Wang and Hyesu Jang in building current force fields. This work is to document the existing knowledge, as well as try out any other possible iterations with the new automated fitting infrastructure by Josh Horton and Simon Boothroyd (qcsubmit + bespoke workflows), which allow rapid prototyping with minimal setup.

Here is a list of things where the FF optimization can be tuned (some of which are already studied):

  • priors on the physical parameters

  • weights on targets, currently we have

    • optgeo = 0.1

    • vibfreq = 1

    • torsion profile = 1

  • scaling factors internal to each target

    • optgeo - scaling factors of

      • 0.05 A for bond rmsd

      • 8 deg for angle rmsd

      • 20 for improper rmsd

      • dihedral rmsd is not included

    • vibfreq - a scaling factor of 200 cm-1

    • torsions - a ramping function with ranges of 0 - 1 kcal/mol, 1 - 5 kcal/mol, and completely cleaving above 5 kcal/mol so that minimum energy region is given more weight in fitting

  • training a specific set of targets at a time in a well-defined sequence

    • optgeo+vibfreq first, and then TD, or

    • optgeo+TD first, and vibfreq

  • training an isolated set of parameters in sequence

    • bonds + angles first, and torsions later

  • training torsions on simple molecules without strong sterics (not exactly tuning parameters, more in the area of training data selection)

  • a better starting point for the parameters from modified seminario