View Source

Public Space > OpenFF Science Roadmap 2020 > openforcefield_v1_full color.png

This scientific roadmap includes the next two planned force field releases and a list of scientific studies which need to be performed in 2020. Each study has a priority assigned to it. This roadmap will be continuously updated.

Labels

Category

Labels

Category	Labels
Priority	\| \|
Effort	\| \|
Status	\| \|

Force Fields

Upcoming force field versions:

Version	Codename	Features	Expected release date	Release date
`openff-1.2.0`	Parsley	Redesigned QM dataset for parameterization with better/broader coverage Parameter fixes	May 2020	Jun 1, 2020
`openff-2.0.0`	Sage	LJ refit (based on the ongoing feasibility study) Limited WBO torsion interpolation for systems for which data already exists (more torsional data needed for a wide range application)	Late 2020 (November)

Scientific studies

The list of scientific studies which need to be performed in 2020, which will be updated every 3 months, as suggested in the science project management workflow. Each study should be linked to its Confluence page with more information about study design, execution and results. The study design should be submitted before study is about to begin.

Estimate start dates and end dates when possible before study has started. Record the real start and end dates for each study below the estimated dates.

Labels

Category	Labels
Priority	\| \|
Effort	\| \|
Status	\| \|

Study	Priority	Effort	Science dependencies	Infrastructure dependencies	Comment	Start date	End date	Status	Driver/Team
Chemical perception
Addition of new parameters – manually fixing problems				Made easier by benchmarking dashboard (Optional)	Made easier by benchmarking dashboard (Optional				Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated)
Automated typing inference from scratch					Organise a meeting to coordinate efforts.				Full-time person needed – to be discussed further. Work of Josh Fass (Deactivated) and Tobias Huefner may assist here. Owen Madin interested.
Mixture Properties
Binary Mixture Data Feasibility Study									Driver: Simon Boothroyd Team: Michael Shirts Owen Madin
Non-bonded optimization									Driver:Simon Boothroyd Team: Michael Shirts Owen Madin
Chemical potential-like properties			Non-bonded optimization	Implementation in `Evaluator`	Need to evaluate the data first (testing needed)				Simon Boothroyd Spinoff (student)
Octanol-water partition coefficients				Implementation in `Evaluator`	Data needed, harder problem				Simon Boothroyd spinoff (Student)
Data coverage and availability			Feasibility studies		Check the available data and identify missing data points. Worry in the future what to do about it. We will use what we have for Sage.			Ongoing	Simon Boothroyd Owen Madin Michael Shirts
QM Data Generation
QM dataset selection (training data)					Need to expand to benchmarking set.				David Mobley Jessica Maat (Deactivated) Hyesu Jang
Benchmarking/re-evaluating our choice of QM theory				(Optional) QC Dataset submission infrastructure	Test of the whole torsiondrive. Keep within 10-50 torsiondrives. More is better.				Hyesu Jang lead; Lee-Ping Wang Hyesu Jang also leading molecule set selection with help from Jessica Maat (Deactivated) and Victoria Lim (Deactivated)
Protomer/tautomer enumerated molecules			QM level of theory validation (QMLoTV)	Protonation/tautomer enumeration integration (Joshua Horton doing OE version in toolkit; there’s currently no good protonation state enumeration with RDKit – see )
Data on molecules with nonzero formal charges			QM level of theory validation (QMLoTV)	(Optional) QC Dataset submission infrastructure
Enamine REAL fragment coverage				Automated fragmentation integration (Joshua Horton
Ligand Expo fragment coverage				Automated fragmentation integration (Joshua Horton	Ligand Expo has higher priority than Enamine Real.
Richer torsion data for WBO fitting				WBO torsion implementation					(person needed to continue work of Chaya Stern (Deactivated) )
Biopolymer data selection (ensure sidechain data is available in QCA)						ASAP			David Cerutti (Deactivated)
Biopolymer data computation				(Optional) QC Dataset submission infrastructure					David Cerutti (Deactivated)
More efficient torsion sampling with less grid points during scan
Fitting
Addition of new parameters – manually fixing problems						Ongoing			Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated)
LJ refitting (Sage)			Non-bonded optimization						Simon Boothroyd and Owen Madin
WBO refitting (Sage)			More torsion data	WBO torsion implementation	Implement what Chaya has already done. As soon as infrastructure is ready.	After May meeting	Late 2020 (Sep 2020)		Jessica Maat (Deactivated) Hyesu Jang Someone else to continue where Chaya left it off
BCC refitting			LJ refit Patterns for BCCs; could start with something simple like bond SMARTS.	ChargeIncrementModel implementation (early May)					Person needed () David Mobley can help
Study how to set prior widths and weights for different sorts of data during FF optimization									Lee-Ping Wang Hyesu Jang Spinoff?
Value of data generated “incidentally” during torsiondrive in fitting, e.g. optimization snapshots, gradients, energies (low control over these data points)				Some parts of Bespoke workflow					Joshua Horton
Benchmarking
Small reference system for fast testing of FE infrastructure – 5-10 small reference systems, possibly subset of SAMPL challenges, for comparison of different free energy methods to avoid using large P-L systems for test calculations					Should use SAMPLing challenge systems plus a couple more similar ones.		ASAP		David Mobley Michael Gilson John Chodera David Hahn – owner
Benchmarking/re-evaluating our choice of QM theory									Lee-Ping Wang
CCDC data selection/release
Create a list of tests to judge the “quality” of biopolymer FF with our scientific advisory board					Organise the meeting with our IAB, invite to May meeting		April / May		David Cerutti (Deactivated)
`openff-1.2.0` (Parsley) benchmarking			Minor release of Parsley	Benchmarking dashboard			Mid 2020
`openff-2.0.0` (Sage) benchmarking			Release of Sage	Benchmarking dashboard			Late 2020
Biopolymers
Which quantum method should we use for biopolymers (should it be the same as small molecules)?			QM benchmarking study						Lee-Ping Wang David Cerutti (Deactivated)
Feasibility/benchmarking studies of torsional CMAPs			After protein FF implementation	CMAP support in OFFTK					David Cerutti (Deactivated)
Feasibility/benchmarking studies of other cross-terms				Support for cross-terms in OFFTK
Charges
GCN charge model					In a few steps: conda-installable tool to assign charges integration of tool into OFFTK under ChargeIncrementModel keyword (and exposure of relevant keywords)				John Chodera Yuanqing Wang
Off-site charge SMIRKS definition/fitting/benchmarking				VirtualSite support in OFFTK	Helpful discussion in Slack: https://openforcefieldgroup.slack.com/archives/C1907SGET/p1590251452068100				(but interface with David Cerutti (Deactivated) work?)
Bayesian inference and surrogate modeling
Testing Bayesian inference on an analytical model									Owen Madin
Generalizing analytical model for Bayesian inference and testing methods					We don’t need Bayesian framework to work immediately				Simon Boothroyd
Constructing full Bayesian architecture with reweighting and simulation to build surrogate models			Analytical Bayesian inference testing						Simon Boothroyd Owen Madin Matt Thompson ?
Automated typing inference from scratch									Full-time person needed – to be discussed further. Work of Josh Fass (Deactivated) and Tobias Wulsdorf may assist here.
Other
Water co-optimization planning study (to be executed later) – discuss with Lee-Ping Wang									spinoff
Thinking about metals / ions / salts / ionic liquids									Owen Madin Matt Thompson spinoff