Develop and improve tools for automated chemical perception.

Objectives for 2020:

  • Identifying pathological cases in the fitting procedure - finding an automated detection protocol

  • Automatically determining how many parameters/types (SMIRKS patterns) are needed in a data-driven manner 


Project leader:

Team members: @Jessica Maat @Trevor Gokey @Hyesu Jang @Lee-Ping Wang @Chaya Stern (Deactivated) @Michael Gilson @John Chodera @Josh Fass (Deactivated) @David Mobley

Problem Space

Scientific questions

  • Diagnostics tools - when to assign a new parameter for bonds, angles and torsions (for example, in cases of multimodal distributions per SMIRKS pattern)? How can we easily improve chemical perception? Bayesian sampling to explore the number of parameters and associated value distributions?

  • How do we identify problems in particular parameters? How do we fix them?

  • How much coverage of each parameter type to get good FF? 

  • How do we know when our typing is bad?

  • How to use WBO to improve accuracy while reducing parameter space?

  • How to leverage ChemPer to address some of these questions? 

  • How do we determine how many QM calculations we might need to achieve high-quality parameter coverage of a dataset like ChEMBL?

  • Can Hessians identify soft degrees of freedom that should be driven (e.g. impropers, rings)?

  • How can small molecule crystal structure data best be used by OpenFF?

  • How do we select the right molecules to train on?

    • Parameter overrepresentation? 

    • How do we identify problems?

    • How do we fix?

  • Molecule set selection and expansion: 

    • Protonation / tautomers

    • Ions (carboxylates/salt bridges, monovalent ions) - What data to use?

    • Element expansion: Si, Br, P, B

Possible solutions?


Current research topics

Idea bucket

This is the space to collect scientific questions, ideas, solution proposals, etc for future work in chemical perception space

