Page Comparison

Date

25 Feb 2020

Determine new procedure for selecting QM datasets for fitting (potentially for May meeting release, if ready in time)
Divide up work to accomplish that

Coverage of current QM dataset (Su)

...

Talk with Jeff: Class structure? script? Where does it live?

Hyesu Jang Create Confluence page listing all available datasets (with Jessica Maat (Deactivated) enlisting David Mobley as needed)
Jessica Maat (Deactivated) develop prototype notebook which takes a FF and a set of molecules and a target parameter (ID) and picks the five most diverse molecules using that parameter. Should also take an optional argument which is a list of molecules to exclude (so that molecules which have already been used in other sets can be skipped)
Jessica Maat (Deactivated) and Hyesu Jang reach out to Jeffrey Wagner to discuss architecture of tools to be constructed, plan for sustainability and for where they should live. [Scheduled this meeting for Wednesday March 4 10 am -JM]
Hyesu Jang to determine how to enumerate protonation states and tautomers without doing semiempirical calculations (to speed set prep) talking to Chaya Stern (Deactivated) if needed, or if it can’t be done via that route, getting back to David Mobley for help with ideas. [ Update from Chaya: “@Hyesu Jang, the states module in fragmenter generates reasonable protonation / tautomer states. It uses quacpac and does not need AM1 calculations so is fast.
https://github.com/openforcefield/fragmenter/blob/master/fragmenter/states.py “
Jessica Maat (Deactivated) and Hyesu Jang to come up with their goal timeline

Decided to make systematic approach for selecting molecules for QM data generation & fitting given a target dataset; this will be applied dataset-by-dataset to select new molecules for use in fitting
Will attempt to select/redesign a new QM dataset for fitting rather than simply extending our prior QM dataset
Decided on tentative algorithm for molecule selection approach