QM Data Selection

We’re planning a new round of QM dataset selection for Parsley minor releases and potentially subsequent force fields. Particularly, instead of simply fitting to the first/most appealing QM data we have on hand we want to achieve three goals in our datasets, in order of priority:

  1. All parameters are used at least once (“any” coverage)

  2. All parameters are used at least five times (“reasonable” coverage)

  3. Parameters are used in diverse chemical environments

This means designing a systematic procedure for selecting molecules for QM optimization/scanning from potentially available datasets (and possibly in some cases finding unusual chemistries outside datasets we have on hand).

We met to plan this on Feb. 25, 2020 and key decisions/notes are here: https://openforcefield.atlassian.net/l/c/wHiQgmWR

Detailed plans will be made in https://openforcefield.atlassian.net/wiki/pages/createpage.action?spaceKey=FF&title=QM%20Datasets&linkCreation=true&fromPageId=146997319 (Data space).

Key personnel:

  • @Jessica Maat

  • @Hyesu Jang


  • @Lee-Ping Wang

  • @David Mobley


Relevant meeting notes: