Purpose: This page is intended to be used to brainstorm ideas about which data should be included in the training and / or benchmarking of force fields against physical.
All ideas on this page which people wish to be acted upon should be translated into feasibility studies on the main
page, with a corresponding project plan page which outlines the rationale of why the study should be done, what the intended outcomes are, the priority of the study, and who intends to conduct the study (see the https://openforcefield.atlassian.net/wiki/spaces/FF/pages/74055857 page as an example). https://openforcefield.atlassian.net/wiki/spaces/FF/pages/122454022 Reproducibility of ForceBalance runs with phys prop targets
Because of the uncertainty inherent in the equilibrium simulations used to compare against phys prop targets, there is a stochastic element in each ForceBalance optimization. Is this element significant enough to make the results of an LJ optimization less reproducible?
This effect may be more apparent in optimizations against more targets, since the total amount of uncertainty might be increased.
It also may be dependent on the properties that are used as targets, since mixture properties will be noisier than pure properties on average.
Good place to start would be with the sets from
https://openforcefield.atlassian.net/wiki/spaces/FF/pages/122454022 Reduce the number of VdW types in the force field: Differentiate LJ types for ketone/amide/imine hydrogens Pure Property Data which may be used during fitting:
Density is a safe option and likely should be included.
Hvap may still be included to ensure we retain some information about the cohesive energies.
Which and how much Hvap data should be included?
Only ‘non-polar’ molecules (need to define ‘non-polar’ metric + cutoff), the same amount of data as density or less?
Should we fit a dataset that is entirely on enthalpies of mixing with no Hvap?
No surface tension as it is too much of an unknown to implement / test / include by May.
No dielectric constant data until we begin to re-fit the electrostatics.
Mixture Property Data which may be used during fitting:
The current target candidates are some combination of enthalpy of mixing, binary mass density and excess molar volumes.
Should we include aqueous + non-aqueous mixture data, or only non-aqueous (
possible feasibility study).
Should we use TIP3P or TIP3P-FB? Does changing the water model make a noticable difference when fitting aqueous mixture data (
possible feasibility study)?
What can we use for benchmarking?
What partitioning data is there, how is it validated, and is it open?
Activity coefficients, osmotic coefficients
Reduce the number of overly halogenated compounds.
More commonly used compounds such as benzene, ethane, ethanol, etc.?
Try to include as many common solvents from the GRAS list as possible.
Data which may be used for Benchmarking: Rough targets for set sizes: Properties
Surface tension data would be good (will need to figure out system setup issues)
Vapor pressure (Can we do those calculations robustly)?
Host-Guest: Feasibility study?
Partitioning data (LogP, …) if can find high quality open data.
Endpoint heats of mixing/vapor pressure (i.e properties at the mole fraction extremes).
No: Protein ligand binding? Would be good to have as more relevant benchmark, but could be expensive/time-consuming
Dielectric coefficients (we mostly get this for free anyway from the pure data simulations)
Benchmark on systems where one component is side chain analogue parameterised with Amber FF parameters (discuss
@Simon Boothroyd @Michael Shirts @Owen Madin