2020-03-05 Force Field Release meeting notes

Date

Mar 5, 2020

Participants

  • @Hyesu Jang

  • @David Mobley

  • @Lee-Ping Wang

  • @Jessica Maat (Deactivated)

  • @Christopher Bayly

  • @Simon Boothroyd

  • @Owen Madin

  • @Daniel Smith (Deactivated)

Discussion topics

Time

Item

Notes

Time

Item

Notes

10 min

QM training set generation strategy

  • Points summarized under “Meeting Summary”

Meeting Summary

 

  1. Discussion about clustering methods

  • Tree fingerprint, which is 2D molecular similarity measurement has been used for the current validation set ;

  • CIB suggested LINGO, which is an intermolecular similarity calculation method directly from SMILES strings;

  • CIB: One concern with using  graph-based methods is that it can be too localized. Different scoring methods may be needed. 

 

2. Training set and validation set

  • Diverse training set will inform generality of the input typing and diversity in validation set will be able to validate how general our parameter set is;

  • While focusing on training set generation, consideration on how to generate validation set should be given;

  • Including troublesome molecules from validation sets to training set for the next iteration is one strategy we may consider.