Objective

Develop a procedure to extend a general small molecule force field to also model proteins self-consistently.

Primary Driver

Approvers

Michael Gilson Michael Shirts

Supporting Drivers

Stakeholders (these people will be tagged in project update notifications on Slack)

Jeffrey Wagner Pavan Behara Iván Pulido Lily Wang Joshua Horton David Mobley

Project Manager

Diego Nolasco (Deactivated)

Page Owner (only this person can edit this page)

Chapin Cavender

Decision authority

Majority of Primary Driver and all Approvers, only in “Biopolymer FF call” meetings

Veto authority: Primary Driver, any Approver

Discussion/notification venue

Fortnightly “Biopolymer FF call” meetings (decision forum). NOT “FF release call” meetings.

#ff-biopolymers channel on OpenFF slack and “FF release call” meetings (notification and discussion, no major decisions allowed here. It is not assumed that meeting attendees have read the slack discussions, they must be summarized during meetings to be considered in decisions)

Meeting notes

2022 Protein FF meeting notes

Due date

2022-01-01

Key outcomes

Extensible SMIRNOFF format for amino acid residues
Training datasets for 20 natural amino acids
Selection of FF model for proteins
One or more sets of OpenFF parameters for the 20 canonical amino acids
Identification of key benchmark systems

Status

Status

colour	Purple
title	PLANNING PHASE

Overview of strategy

Generate protein QC datasets for training and validation
Fit multiple models to the same training dataset
- Models vary in types of torsion parameters
- All models optimize valence parameters and torsion amplitudes
Benchmark protein models
- Tier 1 benchmarks for all models
- Tier 2 benchmarks for models that perform well in Tier 1. Specific failures in Tier 1 may lead to new models to address problems.
- Tier 3 benchmarks for release candidate

SMIRNOFF format

We need SMARTS strings that can specify protein-specific terms for general amino acids. To summarize the full discussion here: 2022-06-30 Protein FF meeting note

...

We envision several tiers of models, presented below in order of increasing anticipated effortnumber of parameter types. We will generate and benchmark lower-effort simpler models first and use the results of the benchmarks to inform decisions about higher-effort modelshow to prioritize fitting and benchmarking of more complex models. Benchmarking results for simpler models may also inspire new models not listed below to address specific benchmarking failures.

The same set of training datasets will be used for each model: Sage QC training dataset and protein QC datasets described above. Protein-specific datasets will be weighted equally for each of the twenty canonical sidechains (i.e. weights of all protomers for the same sidechain will sum to one), and the total weight for protein-specific datasets and small molecule datasets will be equal.

...

The null model is that the small molecule force field already describes proteins well and needs no protein-specific parameters. Parsley was trained on compounds that resemble protein backbone and sidechain analogs, so these parameters are likely a good first pass at describing polypeptide chains.

Amber ff99SB typed model

Backbone torsions
- General backbone torsions for phi and psi
- Residue-specific backbone torsions for Gly
Sidechain torsions
- No protein-specific sidechain torsions

...

Backbone torsions
- General backbone torsions for phi and psi
- Residue-specific backbone torsions for Gly
Sidechain torsions
- General sidechain torsions for chi1 and chi2
- Residue-specific sidechain torsions for beta-branched sidechains (Ile, Thr, and Val)

Amber ff14SB typed model

Backbone torsions
- General backbone torsions for phi and psi
- Residue-specific backbone torsions for Gly
Sidechain torsions
- General sidechain torsions for chi1 and chi2
- Residue-specific sidechain torsions for 11 groups of sidechains from Amber ff14SB

...

Backbone torsions
- General backbone torsions for phi and psi
- Residue-specific backbone torsions for Gly, Pro, and beta-branched sidechains (Ile, Thr, and Val)
Sidechain torsions
- General sidechain torsions for chi1 and chi2
- Residue-specific sidechain torsions for beta-branched sidechains (Ile, Thr, and Val)

Beta-branched backbones and Amber ff14SB typed sidechains model

Backbone torsions
- General backbone torsions for phi and psi
- Residue-specific backbone torsions for Gly, Pro, and beta-branched sidechains (Ile, Thr, and Val)
Sidechain torsions
- General sidechain torsions for chi1 and chi2
- Residue-specific sidechain torsions for 11 groups of sidechains from Amber ff14SB

...

Milestone

Owner

Deadline

Status

Notes

Generate ELF10 library charges

Chapin Cavender

2022-06-01

Status

colour	Green
title	Completed

Choose benchmark systems

Biopolymer FF group

2022-04-01

Status

colour	Green
title	COMPLETED

In parallel with LiveCoMS review

Generate protein QC datasets

Chapin Cavender

2022-04-01

Status

colour	Green
title	COMPLETED

Fit parameters for protein-specific modelsat least two models (null and Amber ff99SB typed)

Chapin Cavender

2022-10-01

Status

colour	Blue
title	IN PROGRESS

Software for NMR observable benchmarks

Chapin Cavender

2022-10-01

Status

colour	Blue
title	IN PROGRESS

Tier 1 NMR benchmarks

Chapin Cavender

2022-11-01

Status

title	NOT STARTED

Tier 2 NMR benchmarks

Chapin Cavender

2023-01-01

Status

title	NOT STARTED

Tier 3 NMR benchmarks

Chapin Cavender

2023-07-01

Status

title	NOT STARTED

Versions Compared

Old Version 9

New Version Current

Key

Overview of strategy

SMIRNOFF format

Amber ff99SB typed model

Amber ff14SB typed model

Beta-branched backbones and Amber ff14SB typed sidechains model

Page Comparison

Versions Compared

Old Version 9

New Version Current

Key

Overview of strategy

SMIRNOFF format

Amber ff99SB typed model

Amber ff14SB typed model

Beta-branched backbones and Amber ff14SB typed sidechains model