Page Comparison

Info
This scientific roadmap includes the next two planned force field releases and a list of scientific studies which need to be performed in 2020. Each study has a priority assigned to it. This roadmap can be continuously updated, but the overall status and priorities will be revised and updated in June 2020.

Force Fields

Upcoming force field versions:

Version

Codename

Features

Expected release date

Comment / Blocker

openff-1.2.0

Parsley

Redesigned QM dataset for parameterization with better/broader coverage
Parameter fixes

Expected: May 2020

Released: Jun 3, 2020

openff-1.3.0

Parsley

Improvements to priors by element, etc., allowing for more accurate fitting
Filtering of molecules used in fitting certain torsions (esp. for amides) to improve parameter quality (came up in the progress of fixing a problem elsewhere)
Discussed in this release call: 2020-09-17 FF Release Meeting notes

When ready. September-December 2020.

openff-2.0.0

Sage

LJ refit (based on the ongoing feasibility study)
Limited WBO torsion interpolation for systems for which data already exists (more torsional data needed for a wide range application)

Late 2020 (November)

Exepected: Release date to be set by November 2020 and will have three month lead time. Likely release between December 2020 and Feb. 2021.

How fast can we do WBO interpolations (Pavan)
Simon Boothroyd needs to get in touch with David Hahn and folks from the Chodera lab to discuss some free energy benchmarking after LJ fitting
Late 2020/early 2021 still feasible

Scientific studies

The list of scientific studies which need to be performed in 2020, which will be updated every 3 months, as suggested in the science project management workflow. Each study should be linked to its Confluence page with more information about study design, execution and results. The study design should be submitted before study is about to begin.

Estimate start dates and end dates when possible before study has started. Record the real start and end dates for each study below the estimated dates.

Labels

Category

Labels

Priority

Status

colour	Red
title	high

|

Status

colour	Yellow
title	MEDIUM

|

Status

colour	Green
title	LOW

Effort

Status

colour	Red
title	high

|

Status

colour	Yellow
title	MEDIUM

|

Status

colour	Green
title	LOW

Status

title	Not started

|

Status

colour	Yellow
title	In Progress

|

Status

colour	Blue
title	PROTOTYPE

|

Status

colour	Green
title	Completed

|

Status

colour	Red
title	BLOCKED

|

) David Mobley can help

Study

Priority

Effort

Science dependencies

Infrastructure dependencies

Comment

Start date

End date

Status

Driver/Team

Chemical perception

Addition of new parameters – manually fixing problems

Status

colour	Red
title	high

Status

colour	Red
title	high

Made easier by benchmarking dashboard (Optional)

Made easier by benchmarking dashboard (Optional

Status

colour	Yellow
title	in progress

Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated)

Automated typing inference from scratch

Status

colour	Yellow
title	Medium

Status

colour	Red
title	high

Organise a meeting to coordinate efforts.

Update: Tobias Huefner is doing some basic research, but we don’t have a timeline defined here. Perhaps a more specific study to look at typing issues similar to Schauperl’s work on LJ typing.

Status

colour	Yellow
title	in progress

Slowly

Full-time person needed – to be discussed further. Work of Josh Fass (Deactivated) and Tobias Huefner may assist here. Owen Madin interested. Trevor Gokey is also actively working in this area.

Mixture Properties

Binary Mixture Data Feasibility Study

Status

colour	Red
title	high

In the writing stage.

Status

colour	YellowGreen
title	in progressCOMPLETED

Driver: Simon Boothroyd
Team: Michael Shirts Owen Madin

Non-bonded optimization

Status

colour	Red
title	high

Status

colour	Red
title	high

Parent study for in a long-term progress stage.

Status

colour	Yellow
title	in progress

Driver:Simon Boothroyd
Team: Michael Shirts Owen Madin

Chemical potential-like properties

Status

colour	Yellow
title	MEdium

Non-bonded optimization

Implementation in Evaluator

Need to evaluate the data first (testing needed). Add Confluence page here.

Status

colour	Yellow
title	In Progress

Status

colour	Blue
title	PROTOTYPE

Simon Boothroyd Spinoff (student)Octanol-water

Status

colour	Purple
title	spinoff

Solvent-solvent partition coefficients

Status

colour	Yellow
title	MEdium

Implementation in Evaluator

Data needed, harder problem.
Update: Access to solubility phase, data is less of a problem now (MNSOL)

Status

title	not started

Simon Boothroyd spinoff (Student)

Status

colour	Purple
title	spinoff

Data coverage and availability

Status

colour	Red
title	high

Feasibility studies

Check the available data and identify missing data points. Worry in the future what to do about it. We will use what we have for Sage.

Ongoing

Simon Boothroyd Owen Madin Michael Shirts

QM Data Generation

QM dataset selection (training data) for OpenFF-1.2.0

Status

colour	Red
title	high

Need to expand to benchmarking set.

Status

colour	Green
title	completed

David Mobley Jessica Maat (Deactivated) Hyesu Jang

QM dataset selection for OpenFF-2.0.0

Status

colour	Red
title	high

Status

colour	Yellow
title	in progress

David Mobley Jessica Maat (Deactivated) Hyesu Jang
Lead: Hyesu Jang ??

Benchmarking/re-evaluating our choice of QM theory

Status

colour	Red
title	high

(Optional) QC Dataset submission infrastructure

Test of the whole torsiondrive. Keep within 10-50 torsiondrives. More is better.

Some datasets ready, but analysis is still required (Hyesu Jang ) Pavan might help with this.
No changes made to our fitting data til after 2.0 release, but decision may be made before that (eg while fitting)

Status

colour	Yellow
title	in progress

Hyesu Jang lead; Lee-Ping Wang Pavan Behara

Hyesu Jang also leading molecule set selection with help from Jessica Maat (Deactivated) and Victoria Lim (Deactivated)

Protomer/tautomer enumerated molecules

Status

colour	Red
title	high

QM level of theory validation (QMLoTV)

Protonation/tautomer enumeration integration (Joshua Horton doing OE version in toolkit; there’s currently no good protonation state enumeration with RDKit – see

Github link macro

link	https://github.com/openforcefield/openforcefield/issues/526

)

It can only do enumeration with OpenEye

Status

colour	Blue
title	PRototyped

Joshua Horton

Data selection for ionic species

What kind of experimental data would we need to include charged molecules?

Status

title	NOt started

Lead: Simon Boothroyd (oversight, to assemble team?)

Data on molecules with nonzero formal charges

Status

colour	Red
title	high

QM level of theory validation (QMLoTV)

(Optional) QC Dataset submission infrastructure

Couples to benchmarking

Status

title	NOt started

Pavan Behara

Enamine REAL fragment coverage

Status

colour	Yellow
title	MEDIUM

Automated fragmentation integration ( Joshua Horton

Status

colour	Yellow
title	in progress

Trevor Gokey

Ligand Expo fragment coverage

Status

colour	Yellow
title	MEDIUM

Automated fragmentation integration ( Joshua Horton

Ligand Expo has higher priority than Enamine Real.

Status

title	NOT started

Richer torsion data for WBO fitting

Status

colour	Green
title	Low

WBO torsion implementation

What data to generate and

(person needed to continue work of Chaya Stern (Deactivated); probably Will be Pavan Behara with input from Jessica Maat (Deactivated) or vise versa. Overseen by Simon Boothroyd ? , eventually coordinating with Simon Boothroyd as he moves to his new position. )

Biopolymer data selection (ensure sidechain data is available in QCA)

Status

colour	Red
title	high

ASAP

One dataset ready, but a lot more data needs to be generated if we want sidechain sampling

Status

colour	Yellow
title	in progress

David Cerutti (Deactivated)

Biopolymer data computation

Status

colour	Yellow
title	MEDIUM

(Optional) QC Dataset submission infrastructure

Status

colour	Yellow
title	in progress

David Cerutti (Deactivated) David Dotson

More efficient torsion sampling with less grid points during scan

Status

colour	Green
title	Low

Status

colour	Purple
title	spinoff

Fitting

Addition of new parameters – manually fixing problems

Status

colour	Red
title	high

Status

colour	Red
title	high

Ongoing

Status

colour	Yellow
title	in progress

Hyesu Jang David Mobley Jessica Maat (Deactivated) Victoria Lim (Deactivated)

LJ refitting (Sage)

Status

colour	Red
title	high

Non-bonded optimization

Status

colour	Yellow
title	in progress

Simon Boothroyd and Owen Madin

WBO refitting (Sage)

Status

colour	Red
title	high

More torsion data

WBO torsion implementation: Done.

Implement what Chaya has already done. As soon as infrastructure is ready. Infrastructure ready.

Pavan Behara and Jessica Maat (Deactivated) to divide up chemical space for fitting/test fixes. Hyesu Jang will do central fitting of production FF. Additional discussion: https://openforcefield.atlassian.net/wiki/spaces/OFFO/pages/670629936/90-day+plan+Onboarding+for+Pavan+Behara?accessType=view&grantAccess=true&username=5f592e4d0b6be2006ef56679&userFullName=Pavan%20Behara#Notes-from-9%2F16%2F20-planning-meeting-(feel-free-to-migrate)

After May meeting

Late 2020 (Sep 2020)

Status

colour	Yellow
title	in progress

Jessica Maat (Deactivated) Hyesu Jang Someone else to continue where Chaya left it off and Pavan Behara . Pavan Behara will eventually oversee.

BCC refitting

Status

colour	Red
title	high

LJ refit

Patterns for BCCs; could start with something simple like bond SMARTS.

ChargeIncrementModel implementation (early May)Person needed (

Status

colour

Purple

title
Yellow

spinoff

in progress

Simon Boothroyd Owen Madin

Study how to set prior widths and weights for different sorts of data during FF optimization

Status

Green

colour		Yellow
title	medium

Became higher priority due to need for fixes. Headed towards a 1.3 release late 2020.

Status

colour	Yellow
title	Lowin progress

Lee-Ping Wang Hyesu Jang Spinoff?

Value of data generated “incidentally” during torsiondrive in fitting, e.g. optimization snapshots, gradients, energies (low control over these data points)

Status

colour	Green
title	Low

Some parts of Bespoke workflow OpenFF Bespokefit

Once we have more people working on fitting, someone can run this study

Joshua Horton

Status

colour	Purple
title	spinoff

Benchmarking

Small reference system for fast testing of FE infrastructure – 5-10 small reference systems, possibly subset of SAMPL challenges, for comparison of different free energy methods to avoid using large P-L systems for test calculations

Status

colour	RedGreen
title	highLow

Status

colour	Green
title	Low

Should use SAMPLing challenge systems plus a couple more similar ones.

ASAP

Status

title	Not started

David Mobley Michael Gilson John Chodera David Hahn – owner

Benchmarking/re-evaluating our choice of QM theory

Status

colour	Red
title	high

Status

colour	Yellow
title	medium

See above; to be done while fitting 2.0 (datasets ready). Hyesu Jang leading.

Status

title	Not started

Lee-Ping Wang Hyesu Jang Pavan Behara

CCDC data selection/release

Status

colour	Green
title	Low

Status

colour	Purple
title	spinoff

Create a list of tests to judge the “quality” of biopolymer FF with our scientific advisory board

Status

colour	YellowRed
title	mediumHIGH

Organise the meeting with our IAB, invite to May meeting. Done.

DC and MS will start conversations to get this going.

April / May

Status

colour	Yellow
title	in progress

David Cerutti (Deactivated) Michael Shirts

openff-1.2.0 (Parsley) benchmarking

Minor release of Parsley

Benchmarking dashboard

Done in preprint form, but no benchmarking dashboard. Still need torsion benchmarking; utilize work just done for OpenFF 1.0 paper.

JDC is trying to get a complete FE set run by D. Rufa.

Mid 2020

Done-ish

openff-2.0.0 (Sage) benchmarking

Release of Sage

Benchmarking dashboard

Late 2020

Status

title	Not started

Biopolymers

Which quantum method should we use for biopolymers (should it be the same as small molecules)?

Status

colour	Yellow
title	medium

QM benchmarking study

Short term – using the same method and same level of theory as ANI (wB97D)

Status

title	Not started

Lee-Ping Wang David Cerutti (Deactivated)

Feasibility/benchmarking studies of torsional CMAPs

Status

colour	Yellow
title	medium

After protein FF implementation

CMAP support in OFFTK

Status

title	Not started

David Cerutti (Deactivated)

Feasibility/benchmarking studies of other cross-terms

Status

colour	Green
title	Low

Support for cross-terms in OFFTK

MS – Importance of cross-terms will be related to a number of types

Status

title	Not started

Charges

GCN charge model

Status

colour	Red
title	high

In a few steps:

conda-installable tool to assign charges
integration of tool into OFFTK under ChargeIncrementModel keyword (and exposure of relevant keywords)

Status

colour	Yellow
title	in progress

John Chodera Yuanqing Wang Josh Fass (Deactivated) (maybe John Herr)

Off-site charge SMIRKS definition/fitting/benchmarking

Status

colour	Yellow
title	medium

Status

colour	Red
title	high

VirtualSite support in OFFTK

Helpful discussion in Slack: https://openforcefieldgroup.slack.com/archives/C1907SGET/p1590251452068100

Infrastructure expected in September 2020

Status

title	Not started

Status

colour	Purple
title	spinoff

(but interface with David Cerutti (Deactivated) work?)

Bayesian inference and surrogate modeling

Testing Bayesian inference on an analytical model

Status

colour	Yellow
title	medium

Status

colour	Green
title	Low

Nearing completion

Status

colour	Yellow
title	in progress

Owen Madin

Generalizing analytical model for Bayesian inference and testing methods

Status

colour	YellowRed
title	mediumLOW

Status

colour	Yellow
title	medium

We don’t need Bayesian framework to work immediatelyProof-of-concept work to give us an analytical form for early testing

Status

colour	Yellow
title	in progress

Simon Boothroyd

slower

Owen Madin (and a student)

Constructing full Bayesian architecture with reweighting and simulation to build surrogate models

Status

colour	YellowRed
title	mediumLOW

Status

colour	Red
title	high

Analytical Bayesian inference testing

ForceBalance → pytorch, torchMD (timemachine)

Status

title	Not started

Simon Boothroyd Owen Madin Matt Thompson ?John Herr

Owen Madin (science, not software)

Automated typing inference from scratch

Status

colour	Red
title	high

Status

colour	Red
title	high

Full-time person needed – to be discussed further. Work of Bayesian-based typing (Josh Fass’s work)

Josh Fass (Deactivated) and Tobias Wulsdorf may assist here. → Tobias Huefner

Other

Water co-optimization planning study (to be executed later) – discuss with Lee-Ping Wang

Status

colour	Green
title	Low

Status

colour	Red
title	high

Lack of bandwidth, potentially Bill Swope could help advise with data selection.

Status

colour	Purple
title	spinoff

Thinking about metals / ions / salts / ionic liquids

Status

colour	Green
title	Low

Status

colour	Red
title	high

Biologically relevant, will become high Owen Madin Matt Thompson spinoffpriority at some point

Status

colour	Purple
title	spinoff

Thinking about ionic liquids

Alchemical force fields (for alchemical free energies)

Status

colour	Green
title	Low

/

Status

colour	Yellow
title	MEDIum

Soft core potentials. JDC might have people in his lab working on it, MS is interested to join the effort.

Continuous (smearnoff) typing

ESPALOMA

Yuanqing Wang

Versions Compared

Old Version 29

New Version Current

Key

Force Fields

Scientific studies

Labels