Driver

Approver

Contributors

Stakeholder

Simon Boothroyd

Owen Madin

David Mobley Michael Shirts Michael Gilson Lee-Ping Wang John Chodera Karmen Condic-Jurkic

Objective

Develop a plan for a set of studies exploring which data should go into the next main OpenFF refit.

Due date

N / A

Status

Project Timeline

The tentative timeline for refitting is to spend:

Brainstorming

All rough ideas for the properties which should be trained / tested against should be sketched out in the Brainstorming page, before being translated into a feasibility study in the table below with a corresponding project page which serves as that studies plan.

Feasibility Studies

High Priority:

Study

Status

Project Page

Optimisation with / without $H_{vap}$ while including mixture data.

Binary Mixture Data Feasibility Study

Optimization against $V_{excess}(x) or $\rho(x)

Binary Mixture Data Feasibility Study

How much pure data do we need to include to get good parameterization/Can mixture data alone get energetics right?

Binary Mixture Data Feasibility Study

For mixture properties - does including aqueous mixture data in addition to non-aqueous mixture data improve the overall force field performance (i.e. should we only include non-aqueous data to avoid trying to correct for the water model?).

TIP3P vs TIP3P-FB

How affected are low-representation molecules by the amount of data?  Is there regularization that we need?

How reproducible are ForceBalance optimizations with physical properties as targets? If there is an effect, does this effect scale with the number of targets?

Medium Priority:

Study

Status

Project Page

How do density and hvap individually constraint the parameters? 

Do OpenEye and RDKit produce AM1-BCC charges which are identical / close enough to not drastically change the FF performance?

Low Priority:

Study

Status

Project Page

What is the difference in fitting between a set where some molecules have only hvap and others have only density, versus a set where all molecules have both hvap and density?

Covariance of parameters?

A trial run with a sharply reduced set of LJ types? This could reduce the amount of exptl data needed, and Michael Schauperl's results indicate one can do quite nicely with fewer.


Explore the change in performance with a split parameter  ([#1:1]-[#6X4]-[#6X3]=[#7,#8]), which was identified as a target for splitting in Binary Mixture Data Feasibility Study

Actions

Assignee

Simon Boothroyd