2020-03-05 Force Field Release meeting notes

Date

Mar 5, 2020

Time	Item	Notes

Time	Item	Notes
10 min	QM training set generation strategy	Points summarized under “Meeting Summary”

Tree fingerprint, which is 2D molecular similarity measurement has been used for the current validation set ;

CIB suggested LINGO, which is an intermolecular similarity calculation method directly from SMILES strings;

CIB: One concern with using graph-based methods is that it can be too localized. Different scoring methods may be needed.

2. Training set and validation set

Diverse training set will inform generality of the input typing and diversity in validation set will be able to validate how general our parameter set is;

While focusing on training set generation, consideration on how to generate validation set should be given;

Including troublesome molecules from validation sets to training set for the next iteration is one strategy we may consider.