Optimization strategies
@Pavan Behara @Simon Boothroyd @Hyesu Jang 



Objective  Test force field optimization strategies 
Due date  Sep 30, 2021 
Key outcomes 

Status  in progress 
Problem Statement
Apart from the selection of training data which is critical to force field optimization, there are lot of empirical parameters so that the optimizer samples the intended parameter space giving out reasonable physical parameter values. However, due to the curse of dimensionality and existence of multiple minima, that gives rise to near linear dependencies, it is not always possible to end up with the best parameter set. There are instances where we end up with some bad parameters, such as in case of sulfonamides, which resulted in manually debugging and adding guard rails in the form of canary tests. Lot of work on this front has been done by LeePing Wang and Hyesu Jang in building current force fields. This work is to document the existing knowledge, as well as try out any other possible iterations with the new automated fitting infrastructure by Josh Horton and Simon Boothroyd (qcsubmit + bespoke workflows), which allow rapid prototyping with minimal setup.
Here is a list of things where the FF optimization can be tuned (some of which are already studied):
priors on the physical parameters
weights on targets, currently we have
optgeo = 0.1
vibfreq = 1
torsion profile = 1
scaling factors internal to each target
optgeo  scaling factors of
0.05 A for bond rmsd
8 deg for angle rmsd
20 for improper rmsd
dihedral rmsd is not included
vibfreq  a scaling factor of 200 cm1
torsions  a ramping function with ranges of 0  1 kcal/mol, 1  5 kcal/mol, and completely cleaving above 5 kcal/mol so that minimum energy region is given more weight in fitting
training a specific set of targets at a time in a welldefined sequence
optgeo+vibfreq first, and then TD, or
optgeo+TD first, and vibfreq
training an isolated set of parameters in sequence
bonds + angles first, and torsions later
training torsions on simple molecules without strong sterics (not exactly tuning parameters, more in the area of training data selection)
a better starting point for the parameters from modified seminario